Top Banner
Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney
17

Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

Protecting Privacy when Disclosing Information

Pierangela Samarati Latanya Sweeney

Page 2: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

INTRODUCTION

• Today’s society places demands on person-specific data.

• more and more historically public information is also electronically available

• combined, you can identify the personal information

• This paper addresses the problem of releasing person-specific data while preserving the person's anonymity

• k-anonymity: Specific information is ambiguously mapped to k-persons

Page 3: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

EXAMPLE

Page 4: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

RELATED WORK

• several protection techniques in statistical databases

• scrambling, adding noise, swapping values etc..

• suppression and generalization techniques but no formal foundation

• Different from traditional access control - protecting the data vs identity of the data

Page 5: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

OUTLINE

• Formal foundation for anonymity problem and against linking

• quasi-identifiers: attribute that can be exploited for linking

• k-anonymity: degree of protection of data with respect to inference by linking

• preferred generalization: allows user to select among possible minimal generalizations - choose attributes

• Here, they protect the link between the identity and data but not the data itself

Page 6: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

DEFINITIONS & ASSUMPTIONS

• Quasi-identifier: Let T(A1,..,An) be a table. A quasi-identifier is a set of attributes (A1,..,Aj) subset of (A1,..,An) whose release must be controlled.

• Goal: Allow release of information in the table which is related to atleast a given number k of individuals, k is set by data holder

• k-anonymity requirement: Each release of the data must be such that every combination of quasi-identifier can be indistinctly matched to atleast k individuals

• Issue: It is impossible to match the released data to externally available data!!

Page 7: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

DEFINITIONS & ASSUMPTIONS

• Although the data holder knows the external attributes(contributes to quasi-identifiers), the specific values can not be assumed.

• Key: Translate the requirement in terms of the released data• Assumption: All attributes in table PT which are to be released and

which are externally available in combination to a data recipient are defined in a quasi-identifier

• Not a trivial assumption• Sweeney examines this risk and shows that this can not be perfectly

resolved.• k-anonymity for a table: Let T(A1,…,An) be the table and QT be

the set of quasi-identifiers of T. T is said to satisfy k-anonymity iff for each QI belongs to QT, each sequence of values in T[QI] appears at least with k occurences in T[QI].

Page 8: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

GENERALIZING DATA

• first approach is based on the definition and use of generalization relationships between domains and between values that attributes can assume.

• Z0 is the zip code domain and Z1 is the domain where last digit is replaced by 0.

• to achieve k-anonymity, map the attributes in domain Z0 to Z1 where Z1 is more general

• This mapping between domains is stated by means of a generalization relationship which represents a partial order ≤ D on the set Dom of domains– each domain Di has at most one direct generalized domain

– all maximal elements of Dom are singleton(eventually all domains can be generalized to single value)

Page 9: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

DOMAIN & VALUE GENERALIZATION HIERARCHIES

Page 10: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

DOMAIN GENERALIZATION HIERARCHY

• Let Dom be the set of domains, given a tuple DT = (D1, …, Dn) such that Di belongs to Dom for i = 1,…,n, DGHDT = DGHD1x…xDGHDn, assuming the cartesian product is ordered by imposing coordinate wise order.

• Each path from DT to unique maximal element of DGHDT in the graph defines a possible alternative path

• The set of nodes in each such path together with the generalization relationship is called a generalization strategy for DGHDT

Page 11: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

GENERALIZED TABLE

• Tj is a Generalized Table of Ti, written Ti ≤ Tj iff– Ti and Tj have same number of tuples

– Domain of each attribute of Tj (denoted by dom(Az,Tj) )is equal to or generalization of the domain of the attribute in Ti and

– Each tuple ti in Ti has a corresponding tuple tj in Tj (and vice versa) such that the value for each attribute in tj is equal to or generalization of the value of corresponding attribute in ti.

• Not all generalized tables are satisfactory• Don’t need extreme generalized table if more specific table exists

which satisfies k-anonymity• k-minimal generalization

Page 12: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

k-minimal generalization

• Distance vector: Let Ti(A1,…,An) and Tj(A1,…,An) be two tables such that Ti ≤ Tj. The distance vector of Tj from Ti is the vector DVi,j = [d1,…,dn] where dz is the length of unique path between dom(Az,Ti) and dom(Az,Tj) in DGHD

• Given two distance vectors DV = [d1,…,dn] and DV’ = [d1’,…,dn’], DV ≤ DV’ iff di ≤ di’ for all I = 1,…,n; DV < DV’ iff DV ≤ DV’ and DV ≠ DV’.

• k-minimal generalization: Let Ti(A1,…,An) and Tj(A1,…,An) be two tables such that Ti ≤ Tj. Tj is said to be a k-minimal generalization of Ti iff – Tj satisfies k-anonymity

– There is no Tz : Ti ≤ Tz, Tz satisfies k-anonymity and DVi,z < DVi,j

Page 13: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

EXAMPLE

•For k=2, GT[1,0] and GT[0,1] are k-minimal generalizations, but not GT[0,2] and GT[1,1] For k=3, GT[1,0] and GT[0,2] are k-minimal generalizations.

Page 14: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

SUPPRESSING DATA

• Complementary approach to generalization• Used to moderate the generalization process when there are limited

number of tuples(with less than k occurences)• Generalized Table with suppression: Ti(A1,…,An) and Tj(A1,

…,An) be two tables defined on same attributes. Tj is said to be a generalization of Ti – if sizeof(Tj) ≤ sizeof(Ti) – For all z = 1,…,n : dom(Az,Ti) ≤ dom(Az,Ti) – There is an injective mapping between Ti and Tj that associates

tuples ti (in Ti) and tj(in Tj) such that ti[Az] ≤ tj[Az]

• Minimal Required suppression: Let Tj be a generalization of Ti satisfying k-anonymity, Tj is said to enforce minimal required suppression iff there is no Tz such that Ti ≤ Tz, DVi,z = DVi,j, and sizeof(Tj) < sizeof(Tz) and Tz satisfies k-anonymity.

Page 15: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

EXAMPLE

•The tuples written in bold face and marked with double lines in each table are the tuples that must be suppressed to achieve k-anonymity of 2. Suppression of any superset would not satisfy minimal required suppression.

Page 16: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

k-minimal generalization with suppression

• Generalization and suppression are used in conjunction to obtain k-anonymity

• Tradeoff between generalization and suppression• Acceptable suppression threshold MaxSup • Within the threshold, suppression is considered better.• Reason: Generalization affects all the tuples whereas Suppression

affects single tuple.• k-minimal generalization with suppression: Ti(A1,…,An) and

Tj(A1,…,An) be two tables such that Ti ≤ Tj and MaxSup be the specific threshold of acceptance suppression. Tj is k-minimal generalization of Ti iff – Tj satisfies k-anonymity– Sizeof(Ti) - Sizeof(Tj) ≤ MaxSup– There is no Tz: Ti ≤ Tz, Tz satisfies conditions 1 and 2 and DVi,z < DVi,j

Page 17: Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.

EXAMPLE