Re-Identification. From Netflix to Health Records.

Today’s NY Times story points out the FACT that is very easy to re-identify supposedly “de-identified” information. Singer starts with how the Netflix “de-identified” data base was proven to be re-identifiable and moves on to describe Latanya Sweeney’s famous re-identification of the medical records of Gov Weld.

See the NY Times Article: When 2+2 Equals a Privacy Question

Netflix is about to commit a privacy Valdez with its customers’ viewing data

CU Boulder’s Paul Ohm writes about Netflix’s insane new plan to release millions of customers’ personal information — ZIP code, gender, year of birth — as a sequel to its Netflix Challenge. Latanya Sweeney’s famous study on de-anonymizing data has shown that date (not just year) of birth, gender and ZIP are sufficient to personally identify 87% of Americans. In other words, Netflix is about to put the behavioral data about viewing choices for millions of Americans into the public domain, despite its legal duty to keep this information private.

“Because of this, if it releases the data, Netflix might be breaking the law. The Video Privacy Protection Act (VPPA), 18 USC 2710 prohibits a “video tape service provider” (a broadly defined term) from revealing “personally identifiable information” about its customers. Aggrieved customers can sue providers under the VPPA and courts can order “not less than $2500″ in damages for each violation. If somebody brings a class action lawsuit under this statute, Netflix might face millions of dollars in damages.”

Additionally, the FTC might also decide to fine Netflix for violating its privacy policy as an unfair business practice.

De-identified? Yeah, right.

See these articles:
Netflix Contest Seen As Posing Privacy Risk
Netflix is about to commit a privacy Valdez with its customers’ viewing data
AOL, Netflix and the end of open access to research data

Once again Netflix plans to violate the privacy of those who rate the movies they rent. Two University of Texas computer scientists demonstrated that the Netflix database of 500,000 with movie ratings could be re-identified, revealing sensitive political and sexual preferences of the actual people who rated movies. Netflix did not get the consent of renters to expose their ratings to the public or ot researchers.

Yet Netflix is moving ahead to release even MORE personal data for its next million-dollar contest. The major media (NYT’s STeve Lohr for example) has NOT reported at all on how Netflix is violating movie renters’ privacy, but instead trumpets the prizes paid to those who develop more accurate ways to predict which movies you will want to watch next.

The problem of re-identification is VERY serious for the healthcare system because health data is impossible to de-identify. It is so rich in detail that de-identification is almost impossible.

Today, the treasure trove of all Americans’ sensitive health data is being endlessly used and disclosed without informed consent to millions of “covered entities” and “business associates” (and their millions of employees)–subjecting EVERY American to the theft, sale, and misuse of the most sensitive personal information that exists.

Who will hire you knowing all about your prescriptions, illnesses and genes?