Sizing Up De-Identification Guidance, Experts Analyze HIPAA Compliance Report (quotes PPR)

To view the full article by Marianne Kolbasuk McGee, please visit: Sizing Up De-Identification Guidance, Experts Analyze HIPAA Compliance Report.

The federal Office of Civil Rights (OCR), charged with protecting the privacy of nation’s health data, released a ‘guidance’ for “de-identifying” health data. Government agencies and corporations want to “de-identify”, release and sell health data for many uses. There are no penalties for not following the ‘guidance’.

Releasing large data bases with “de-identified” health data on thousands or millions of people could enable break-through research to improve health, lower costs, and improve quality of care—-IF “de-identification” actually protected our privacy, so no one knows it’s our personal data—-but it doesn’t.

The ‘guidance’ allows easy ‘re-identification’ of health data. Publically available data bases of other personal information can be quickly compared electronically with ‘de-identified’ health data bases, so can be names re-attached, creating valuable, identifiable health data sets.

The “de-identification” methods OCR proposed are:

  • -The HIPAA “Safe-Harbor” method:  if 18 specific identifiers are removed (such as name, address, age, etc, etc), data can be released without patient consent. But .04% of the data can still be ‘re-identified’
  • -Certification by a statistical  “expert” that the re-identification risk is “small” allows release of data bases without patient consent.

o   There are no requirements to be an “expert”

o   There is no definition of “small risk”

Inadequate “de-identification” of health data makes it a big target for re-identification. Health data is so valuable because it can be used for job and credit discrimination and for targeted product marketing of drugs and expensive treatment. The collection and sale of intimately detailed profiles of every person in the US is a major model for online businesses.

The OCR guidance ignores computer science, which has demonstrated ‘de-identification’ methods can’t prevent re-identification. No single method or approach can work because more and more ‘personally identifiable information’ is becoming publically available, making it easier and easier to re-identify health data.  See: the “Myths and Fallacies of “Personally Identifiable Information” by Narayanan and Shmatikov,  June 2010 at: http://www.cs.utexas.edu/~shmat/shmat_cacm10.pdf Key quotes from the article:

  • -“Powerful re-identification algorithms demonstrate not just a flaw in a specific anonymization technique(s), but the fundamental inadequacy of the entire privacy protection paradigm based on “de-identifying” the data.”
  • -“Any information that distinguishes one person from another can be used for re-identifying data.”
  • -“Privacy protection has to be built and reasoned about on a case-by-case basis.”

OCR should have recommended what Shmatikov and Narayanan proposed:  case-by-case ‘adversarial testing’ by comparing a “de-identified” health data base to multiple publically available data bases to determine which data fields must be removed to prevent re-identification. See PPR’s paper on “adversarial testing” at: http://patientprivacyrights.org/wp-content/uploads/2010/10/ABlumberg-anonymization-memo.pdf

Simplest, cheapest, and best of all would be to use the stimulus billions to build electronic systems so patients can electronically consent to data use for research and other uses they approve of.  Complex, expensive contracts and difficult ‘work-arounds’ (like ‘adversarial testing’) are needed to protect patient privacy because institutions, not patients, control who can use health data. This is not what the public expects and prevents us from exercising our individual rights to decide who can see and use personal health information.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>