I3P Data Sanitization

This seed project looked at defining means for understanding what data can be sanitized, and how. Traditionally, techniques for sanitizing or anonymizing data have included masking, adding noise, or enforcing regularity. They typically also assume a "closed world." However, these techniques often either make data unusable for research or operational purposes or fail to completely sanitize the data. Thus, our data sanitization work builds on past techniques by also using an "open world" assumption. We also ask, what are the relationships between data fields that would need to be made (e.g., by making associations from external datasets) in order to reveal certain information? Alternatively, what associations need to be protected in order to conceal certain information? Finally, given policy constraints by the different stakeholders (e.g., the person who the data that describes, operational personnel, and research personnel), can dataset X be sanitized in a way that satisfies the policies of all of those people, or would certain compromises to one or more policies need to be made? If so, what?

Researchers involved:

Faculty: Students:
  • Justin Cummins (UC Davis, M.S. 2011 → Square)
  • Anhad Singh (UC Davis)

Sponsor: Institute for Information Infrastructure Protection (I3P)

Publications resulting from this project:

"Relationships in Data Sanitization: A Study in Scarlet"
Matt Bishop, Justin Cummins, Sean Peisert, Bhume Bhumitarana, Anhad Singh, Deborah Agarwal, Deborah Frincke, and Michael Hogarth,
Proceedings of the 2010 New Security Paradigms Workshop (NSPW), pp. 151–164
Concord, MA, September 21–23, 2010.

