Top Banner
The State of the Art Cynthia Dwork, Microsoft Research
40

The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Dec 18, 2015

Download

Documents

Katherine Sims
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

The State of the Art

Cynthia Dwork, Microsoft Research

Page 2: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Pre-Modern Cryptography

Propose

Break

Page 3: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Modern Cryptography

Propose Definition

Break Definition

Propose STRONGER

Definition

Break Definition

algorithms satisfying definition

Algs

Propose STRONGER

[GoldwasserMicali82,GoldwasserMicaliRivest85]

Page 4: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Modern Cryptography

Propose Definition

Break Definition

Propose STRONGER

Definition

Break Definition

algorithms satisfying definition

Algs

Propose STRONGER

[GoldwasserMicali82,GoldwasserMicaliRivest85]

Page 5: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

No Algorithm?

Propose Definition

?Why?

Page 6: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Provably No Algorithm?

Bad Definition

Propose Definition

?

Propose WEAKER/DIFF

Definition

Alg / ?

Page 7: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

The Privacy Dream

Original Database Sanitized data set

?C

Census, financial, medical data; OTC drug purchases; social networks; MOOCs data; call and text records; energy consumption; loan, advertising, and applicant data; ad clicks product correlations, query logs,…

Page 8: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Fundamental Law of Info Recovery

“Overly accurate” estimates of “too many” statistics is blatantly non-private

DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…

Page 9: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Anonymization (aka De-Identification) Remove “personally identifying information”

Name sex DOB zip symptoms previous admissions medications family history …

Page 10: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Anonymization (aka De-Identification) Remove “personally identifying information”

Name sex DOB zip symptoms previous admissions medications family history …

Page 11: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Anonymization (aka De-Identification) Remove “personally identifying information”

Name sex DOB zip symptoms previous admissions medications family history …

Page 12: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Anonymization (aka De-Identification) Remove “personally identifying information”

Name sex DOB zip symptoms previous admissions medications family history …

?

Page 13: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

William Weld’s Medical Records

ZIP

birthdate

sex

nameaddress

date reg.

partyaffiliation

last voted

ethnicity

visit datediagnosis

proceduremedication

total charge

voter registrationdata

HMO data

Sweeney97

Page 14: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

William Weld’s Medical Records

ZIP

birthdate

sex

nameaddress

date reg.

partyaffiliation

last voted

ethnicity

visit datediagnosis

proceduremedication

total charge

voter registrationdata

HMO data

Can “name” by

(zip, birth, sex)

Sweeney97

Page 15: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Anonymization (aka De-Identification) Remove “personally identifying information”

Name sex DOB zip symptoms previous admissions medications family history …

??

Page 16: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Anonymization (aka De-Identification) Remove “personally identifying information”

Name sex DOB zip symptoms previous admissions medications family history …

??

Page 17: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

NarayananShmatikov08

Page 18: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Can “name” by 3

(title, approx date)

pairs

NarayananShmatikov08

Page 19: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Re-ID Not the Only Worry

k-anonymity

Break Definition

l-diversity

Break Definition

algorithms satisfying definition

Algs

m,t

m-invariancet-closeness

SamaratiSweeney98, MakanvajjhalaGehrkeKiferVenkitasubramaniam06,XiaoTao07,LiLiVenkatasubmraminan07

Page 20: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Culprit: Diverse Background Info

2014

2013

2012

Voter Weld: M, DOB, zipIMDb

Page 21: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Billing for Targeted Advertisements

+

+

[Korolova12]

Page 22: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Product Recommendations

X’s preferences influence Y’s experience Combining evolving similar items lists with a little knowledge (from

your blog) of what you bought, an adversary can infer purchases you did not choose to publicize

CalandrinoKilzerNarayananFeltenShmatikov11

People who bought this also bought…Blog

Page 23: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

SNP: Single Nucleotide (A,C,G,T) polymorphism

CT

TT

SNP statistics of Case Group

“Can” Test Case Group Membership using target’s DNA and HapMap

GWAS Statistics

Homer+08

Page 24: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

HapMap

Culprit: Diverse Background Info

2014

2013

2012

Voter Weld: M, DOB, zipIMDb

People who bought this…Blog

Billing

Page 25: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

How Should We Approach Privacy? “Computer science got us into this mess, can computer science

get us out of it?” (Sweeney, 2012)

Page 26: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

How Should We Approach Privacy? “Computer science got us into this mess, can computer science

get us out of it?” (Sweeney, 2012)

Complexity of this type requires a mathematically rigorous theory of privacy and its loss.

Page 27: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

How Should We Approach Privacy? “Computer science got us into this mess, can computer science

get us out of it?” (Sweeney, 2012)

Complexity of this type requires a mathematically rigorous theory of privacy and its loss. We cannot discuss tradeoffs between privacy and statistical utility

without a measure that captures cumulative harm over multiple uses. Other fields -- economics, ethics, policy -- cannot be brought to bear

without a “currency,” or measure of privacy, with which to work.

Page 28: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Useful Databases that Teach Database teaches that smoking causes cancer.

Smoker S’s insurance premiums rise. Premiums rise even if S not in database!

Learning that smoking causes cancer is the whole point. Smoker S enrolls in a smoking cessation program.

Differential privacy: limit harms to the teachings, not participation

The outcome of any analysis is essentially equally likely, independentof whether any individual joins, or refrains from joining, the database.

Page 29: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Useful Databases that Teach Database teaches that smoking causes cancer.

Smoker S’s insurance premiums rise. Premiums rise even if S not in database!

Learning that smoking causes cancer is the whole point. Smoker S enrolls in a smoking cessation program.

Differential privacy: limit harms to the teachings, not participation

The likelihood of any possible harm to ME is essentially independentof whether I join, or refrain from joining, the database.

Page 30: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Useful Databases that Teach Database teaches that smoking causes cancer.

Smoker S’s insurance premiums rise. Premiums rise even if S not in database!

Learning that smoking causes cancer is the whole point. Smoker S enrolls in a smoking cessation program.

Differential privacy: limit harms to the teachings, not participation

High premiums, busted, purchases revealed to co-worker…Essentially equally likely when I’m in as when I’m out

Page 31: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Differential Privacy [D.,McSherry,Nissim,Smith ‘06]

gives -differential privacy if for all pairs of data sets differing in one element, and all subsets of possible outputs

If a bad event is very unlikely when I’m not in dataset () then it is still very unlikely when I am ()

Randomness introduced by Randomness introduced by

Impossible to know the actual probabilities of bad events.Can still control change in risk due to joining the database.

Page 32: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Differential Privacy [D.,McSherry,Nissim,Smith ‘06]

gives -differential privacy if for all pairs of data sets differing in one element, and all subsets of possible outputs

If a bad event is very unlikely when I’m not in dataset () then it is still very unlikely when I am ()

“Privacy Loss”

Impossible to know the actual probabilities of bad events.Can still control change in risk due to joining the database.

Page 33: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Differential Privacy Nuanced measure of privacy loss

Captures cumulative harm over multiple uses, multiple databases

Adversary’s background knowledge is irrelevant Immune to re-identification attacks, etc.

“Programmable” Construct complicated private analyses from simple

private building blocks

Page 34: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Recall: Fundamental Law

“Overly accurate” estimates of “too many” statistics is blatantly non-private

DinurNissim03; DworkMcSherryTalwar07; DworkYekhanin08, De12, MuthukrishnanNikolov12,…

Page 35: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Answer Only Questions Asked

q1

a1

Database curator data analysts

Cq2

a2

q3

a3

Page 36: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Want to compute Adding pulls

Add random noise to obscure difference vs

Intuition

Page 37: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Want to compute Adding pulls

Add random noise to obscure difference

Intuition

Algorithms, geometry, learning theory, complexity theory,

cryptography, statistics, machine learning, programming languages, verification, databases, economics,…

Page 38: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Not a Panacea Fundamental Law of Information Recovery still holds

Page 39: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Challenge: The Meaning of Loss Sometimes the theory gives exactly the right answer

Large loss in differential privacy translates to “obvious” real life privacy breach, under circumstances known to be plausible

Other times? Do all large losses translate to such realizable privacy breaches, or is

the theory too pessimistic?

Page 40: The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.

Policy Recommendation Publish all Epsilons!

Penalize when

Combines motivation for data breach notification statutes and environmental laws requiring disclosures of toxic releases with an incentive to start using (minimal) differential privacy

DworkMulligan14