Top Banner
The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009 QuickTime™ and a decompressor are needed to see thi
27

The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

Dec 14, 2015

Download

Documents

Jasper Wager
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

The Role of History and Prediction in Data Privacy

Kristen LeFevre

University of Michigan

May 13, 2009

QuickTime™ and a decompressor

are needed to see this picture.

Page 2: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

2

Data Privacy

• Personal information collected every day

Healthcare, insurance information

Supermarket transaction data

RFID, GPS Data

E-mailEmployment history

Web search / clickstream

Page 3: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

3

Data Privacy

• Legal, ethical, technical issues surrounding– Data ownership– Data collection– Data dissemination and use

• Considerable recent interest from technical community– High-profile mishaps and lawsuits– Compliance with data-sharing mandates QuickTime™ and a

decompressorare needed to see this picture.

Page 4: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

4

Privacy Protection Technologies for Public Datasets

• Goal: Protect sensitive personal information while preserving data utility

• Privacy Policies and Mechanisms• Example Policies:

– Protect individual identities– Protect the values of sensitive attributes– Differential privacy [Dwork 06]

• Example Mechanisms:– Generalize (“coarsen”) the data– Aggregate the data– Add random noise to the data– Add random noise to query results

Page 5: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

5

Observations

• Much work has focused on static data– One-time snapshot publishing– Disclosure by composing multiple different

snapshots of a static database [Xiao 07, Ganta 08]

– Auditing queries on a static database [Chin 81, Kenthapadi 06, …]

• What are the unique challenges when the data evolves over time?

Page 6: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

6

Outline

• Sample Problem: Continuously publishing privacy-sensitive GPS traces– Motivation & problem setup– Framework for reasoning about privacy– Algorithms for continuous publishing– Experimental results

• Applications to other dynamic dataspeculation

Page 7: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

7

GPS Traces(ongoing work w/ Wen Jin, Jignesh Patel)

• GPS devices attached to phones, cars• Interest in collecting and distributing

location traces in real time– Real-time traffic reporting– Adaptive pricing / placement of outdoor ads

• Simultaneous concern for personal privacy• Challenge: Can we continuously collect

and publish location traces without compromising individual privacy?

Page 8: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

8

Data Recipient

QuickTime™ and a decompressor

are needed to see this picture.

Problem Setting

QuickTime™ and a decompressor

are needed to see this picture.

Central TraceRepository

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

GPS Users (7 AM)P

riva

cy P

oli

cy

“Sanitized” LocationSnapshot

“Sanitized” LocationSnapshot

GPS Users (7:05 AM)

“Sanitized” LocationSnapshot

“Sanitized” LocationSnapshot

Page 9: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

9

Problem Setting

• Finite population of n users with unique identifiers {u1,…,un}

• Assume users’ locations are reported and published in discrete epochs t1,t2,…

• Location snapshot D(tj)– Associates each user with a location during

epoch tj

• Publish sanitized version D*(tj )

Page 10: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

10

Threat Model

• Attacker wants to determine the location of a target user ui during epoch tj

• Auxiliary Information: Attacker knows location information during some other epochs (e.g., Yellow Pages)

QuickTime™ and a decompressor

are needed to see this picture.

Page 11: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

11

Some Naïve Solutions

• Strawman 1: Replace users’ identifiers ({u1,…,un}) with pseudonyms ({p1,…,pn})

– Problem: Once attacker “unmasks” user pi, he can track her location forever

• Strawman 2: New pseudonyms ({p1j,…,pn

j}) at each epoch tj

– Problem: Users can still be tracked using multi-target tracking tools [Gruteser 05, Krumm 07]

Page 12: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

12

Key Problem: Motion Prediction

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture. QuickTime™ and a decompressor

are needed to see this picture.

1

2 3{Alice, Bob, Charlie}

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

4

5

6{Alice, Bob, Charlie}

What if the speedlimit is 60 mph?

Alice Alice

Page 13: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

13

Threat Model

• Attacker wants to determine the location of a target user ui during epoch tj

• Auxiliary Information: Attacker knows location information during some other epochs (e.g., Yellow Pages)

• Motion prediction: Given one or more locations for ui, attacker can predict (probabilistically) ui’s location during following and preceding epochs

Page 14: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

14

Privacy Principle: Temporal Unlinkability

• Consider an attacker who is able to identify (locate) target user uj during m sequential epochs

• Under reasonable assumptions, he should not be able to locate uj with high confidence during any other epochs*

*Similar in spirit to “mix zones” [Beresford 03], which addressed a related problem in a less-formal way.

Page 15: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

15

Sanitization Mechanism

• Needed to select a sanitization mechanism; chose one for maximum flexibility

• Assign each user ui consistent pseudonym pi

• Divide users into clusters– Within each cluster, break association between

pseudonym, location

• Release candidate for D(tj)

D*(tj) = {(C1(tj), L1(tj)),…, (CB(tj), LB(tj))} i=1..B Ci(tj) = {p1,…,pn}– Ci(tj) Ch(tj) = (i h)– Each Li(tj) contains the locations of users in Ci(tj)

Page 16: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

16

Sanitization Mechanism: Example

• Pseudonyms {p1, p2, p3, p4}

{p1,p2}

{p3,p4}

t0

QuickTime™ and a decompressor

are needed to see this picture.1QuickTime™ and a

decompressorare needed to see this picture.2

QuickTime™ and a decompressor

are needed to see this picture.3

QuickTime™ and a decompressor

are needed to see this picture.4

{p1,p2}

{p3,p4}

t1

QuickTime™ and a decompressor

are needed to see this picture.5QuickTime™ and a

decompressorare needed to see this picture.6

QuickTime™ and a decompressor

are needed to see this picture.7

QuickTime™ and a decompressor

are needed to see this picture.8

{p1,p3}

{p2,p4}

t2

QuickTime™ and a decompressor

are needed to see this picture.9

QuickTime™ and a decompressor

are needed to see this picture.10

QuickTime™ and a decompressor

are needed to see this picture.11QuickTime™ and a

decompressorare needed to see this picture.12

Page 17: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

17

Reasoning about Privacy

• How can we guarantee temporal unlinkability under the threats of auxiliary information and motion prediction?– (Using the cluster-based sanitization mechanism)

• Novel framework with two key components– Motion model describes location correlations

between epochs– Breach probability function describes an

attacker’s ability to compromise temporal unlinkability

Page 18: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

18

Motion Models

• Model motion using an h-step Markov chain– Conditional probability for user’s location, given his location

during h prior (future) epochs– Same motion model used by attacker and publisher

• Forward motion model template

– Pr[Loc(P,Tj) = Lj | Loc(P,Tj-1) = Lj-1, …, Loc(P,Tj-h) = Lj-h]

• Backward motion model template

– Pr[Loc(P,Tj) = Lj | Loc(P,Tj+1) = Lj+1, …, Loc(P,Tj+h) = Lj+h]

• Independent and replaceable component– For this work, used 1-step motion model based on velocity

distribution (speed and direction)

Page 19: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

19

Motion Models: Example

{p1,p2}

{p3,p4}

t0 t1

• Pseudonyms {p1, p2, p3, p4}• Epochs t0, t1, t2

QuickTime™ and a decompressor

are needed to see this picture.p1QuickTime™ and a

decompressorare needed to see this picture.p2

QuickTime™ and a decompressor

are needed to see this picture.p3

QuickTime™ and a decompressor

are needed to see this picture.p4

QuickTime™ and a decompressor

are needed to see this picture.aQuickTime™ and a

decompressorare needed to see this picture.b

QuickTime™ and a decompressor

are needed to see this picture.c

QuickTime™ and a decompressor

are needed to see this picture.d

t2

QuickTime™ and a decompressor

are needed to see this picture.p3

QuickTime™ and a decompressor

are needed to see this picture.p1

QuickTime™ and a decompressor

are needed to see this picture.p2QuickTime™ and a

decompressorare needed to see this picture.p4

Pr[loc(p1,t1) = a|Loc(p1,t0)=x]

Pr[loc(p1,t1) = b|Loc(p1,t0)=x]Pr[loc(p1,t1) = a|Loc(p1,t2)=y]

Page 20: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

20

Privacy Breaches

• Forward breach probability– Pr[Loc(P,Tj) = Lj | D(Tj-1), …, D(Tj-h), D*(Tj)]

• Backward breach probability– Pr[Loc(P,Tj) = Lj | D(Tj+1), …, D(Tj+h), D*(Tj)]

• Privacy Breach: Release candidate D*(Tj) causes a breach iff either of the following is true for threshold Cmax P, Lj Pr[Loc(P,Tj) = Lj | D(Tj-1), …, D(Tj-h), D*(Tj)] > C

max P, Lj Pr[Loc(P,Tj) = Lj | D(Tj+1), …, D(Tj-h), D*(Tj)] > C

Page 21: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

21

Privacy Breaches: Example

{p1,p2}

{p3,p4}

t0 t1

QuickTime™ and a decompressor

are needed to see this picture.p1QuickTime™ and a

decompressorare needed to see this picture.p2

QuickTime™ and a decompressor

are needed to see this picture.p3

QuickTime™ and a decompressor

are needed to see this picture.p4

QuickTime™ and a decompressor

are needed to see this picture.aQuickTime™ and a

decompressorare needed to see this picture.b

QuickTime™ and a decompressor

are needed to see this picture.c

QuickTime™ and a decompressor

are needed to see this picture.d

e1 = Pr[loc(p1,t1) = a|Loc(p1,t0)=x]

e2 = Pr[loc(p1,t1) = b|Loc(p1,t0)=x]

e3 = Pr[loc(p2,t1) = a|Loc(p2,t0)=y]

e4 = Pr[loc(p2,t1) = b|Loc(p2,t0)=y]

Pr[loc(p1,t1) = a|D(T0), D*(T1)] =

e1 * e4

e1 * e4 + e2 * e3

…Goal: Verify that all (forward and

backward) breach probabilities < threshold C

x

y

Page 22: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

22

Checking for Breaches

• Does release candidate D*(Tj) cause a breach?

• Brute force algorithm– Exponential in release candidate cluster size

• Heuristic pruning tools– Reduce the search space considerably in

practice

Page 23: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

23

Publishing Algorithms

• How to publish useful data, without causing a privacy breach?

• Cluster-based sanitization mechanism offers two main options– Increase cluster size (or change composition)– Reduce publication frequency

Page 24: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

24

Publishing Algorithms

• General Case– At each epoch Tj, publish the most compact release

candidate D*(Tj) that does not cause a breach– Need to delay publishing until epoch Tj+h to check for

backward breaches– NP-hard optimization problem; proposed alternative

heuristics

• Special Case– Durable clusters (same individuals at each epoch)– Motion model satisfies symmetry property– No need to delay publishing

Page 25: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

25

Experimental Study

• Used real highway traffic data from UM Transportation Research Institute

– GPS data sampled from cars of 72 volunteers– Sampling rate (epoch) = 0.01 seconds– Speed range 0-170 km/hour

• Also synthetic data– Able to control the generative motion distribution

Page 26: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

26

Experimental Study

• All static “snapshot” anonymization mechanisms vulnerable to motion prediction attacks– Applied two representative algorithms (r-Gather

[Aggarwal 06] and k-Condense [Aggarwal 04])– Each produces a set of clusters with k users each

QuickTime™ and a decompressor

are needed to see this picture.

r-Gather

QuickTime™ and a decompressor

are needed to see this picture.

k-Condense

Page 27: The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

27

Speculation / Future Work

• GPS example illustrates importance of reasoning about data dynamics and history, and predictable patterns of change in privacy

• Dynamic private data in other apps.– E.g., longitudinal social science data

• Study subjects age predictably • Most people don’t move very far• Income changes predictably

• Hypothesis: History and prediction are important in these settings, too!