Search This Blog

Monday, February 28, 2011

Three Ways to Deal with HIPAA Dates in De-Identified Data Sets

The HIPAA rules say that any date connected to a patient is Protected Health Information (PHI).  If you are working to provide a de-identified dataset, you must develop a method of dealing with date values which are typically collected during research or outcomes tracking.  Here are three ways to handle it, with pros and cons of each.

Random Date Shifting - One approach is to use an automatic date shifting function in the extract.  They way that these work is that an offset value is generated based on a participant key (for example, the Patient Identifier).  RedCap has implemented this, and will allow a number from -1 to -364 to be generated and each participant's date will be modified by that amount of time.

Pros
  • This is pretty easy to implement, if your package supports it.  Just indicated that dates should be shifted at extract time
  • Since all dates are shifted by the same number of days, the relative durations all stay intact.
Cons
  • Since all dates are shifted by up to a year, any changes that may have occured during the study (for example, changes to the screening process or adjustments to the protocol), those all become lost.
  • If there is any benefit to comparing earlier participants to later, that can't be done with this approach.
  • Data still contains dates, and individuals that receive it may feel that PHI has been shared.  Documentation about how the dates has been obscured must be included.
For many applications, this is an easy-to-implement process and can work well.


Generalizing Dates - Most Institutions Review Boards (IRBs) will accept that a month and year do not constitute enough specificity for a date to be consider PHI.


Pros
  • The event period is not lost
  • It is obvious that the information is not PHI
Cons
  • In most cases, this does not provide enough information about when something occured.  If a follow-up is done seven days, 14 days and 21 days after injury, they could all occur in the same period.
  • It may require additional variables or programming to convert the raw data into Month/Year periods.
In some situations, this can be worthwhile.  For example, a DOB could be presented as 05/1970 and that would be adequate for most uses.

Convert Dates to Durations - Rather than store specific dates, calculate durations and provide those.  For example, rather than have HospitalDischargeDate, you could provide HospitalStayDays that would be calculated from Admit and Discharge date.


Pros
  • Clearly not PHI, yet preserves the integrity.
  • May be easier in doing data analysis since most inquires are looking at durations rather than specific dates.
  • Could be combined with a Generalized Date approach (e.g. MonthYear of Injury, TransportDurationHours, ICUDurationDays.
Cons
  • Could be a significant amount of work to implement, if the data management application does not provide this automatically.
The QuesGen data management platform supports automatic duration calculations.  Users have requested date shifting functionality and we are considering implementing it.  If you have an opinion about what you would like, please send us an email at support@quesgen.com or leave a comment here.