Top Banner
Privacy-preserving Release of Statistics: Differential Privacy Giulia Fanti Slides by Anupam Datta CMU Fall 2019 18734: Foundations of Privacy
25

Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Feb 11, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Privacy-preserving Release of Statistics: Differential Privacy

Giulia FantiSlides by Anupam Datta

CMU

Fall 2019

18734: Foundations of Privacy

Page 2: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Administrative Stuff

• HW2 due tonight at midnight on Gradescope– Upload pdf with everything except AdFisher code

and logs to Gradescope– Upload AdFisher code and logs to Canvas

• Note on Piazza use

2

Page 3: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Quiz

• On Canvas

3

Page 4: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Privacy-Preserving Statistics: Non-Interactive Setting

4

Goals: • Accurate statistics (low noise)• Preserve individual privacy

(what does that mean?)

Add noise, sample, generalize, suppress

x1…xn

Database Dmaintained by trusted curator

• Census data• Health data• Network data• …

AnalystSanitized Database D’

Page 5: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Some possible approaches

• Anonymize data – Re-identification, information amplification

• Summary statistics

– Differencing attack

5

Name Age

Alice 10

Bob 50

Carol 40

Summary statisticMean age = 33.33

Page 6: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Privacy-Preserving Statistics:Interactive Setting

6

Goals: • Accurate statistics (low noise)• Preserve individual privacy

(what does that mean?)

Query f

f(D)+noise

x1…xn

Database Dmaintained by trusted curator

Analyst

# individuals with salary > $30K

• Census data• Health data• Network data• …

Privacy-Preserving Statistics:Interactive Setting

5

Goals: • Accurate statistics (low noise)• Preserve individual privacy

(what does that mean?)

Query f

f(D)+noise

x1…xn

Database Dmaintained by trusted curator

Analyst

# individuals with salary > $30K

• Census data• Health data• Network data• …

Page 7: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Classical Intuition for Privacy

• “If the release of statistics S makes it possible to determine the value [of private information] more accurately than is possible without access to S, a disclosure has taken place.” [Dalenius 1977]– Privacy means that anything that can be learned

about a respondent from the statistical database can be learned without access to the database

• Similar to semantic security of encryption

7

Page 8: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Impossibility Result• “Theorem”: For any reasonable definition of breach, if

sanitized database contains information about database, then there exists an adversary and an auxiliary information generator that causes a breach with some nontrivial probability.

• Example– Terry Gross is two inches shorter than the average Lithuanian

woman – DB allows computing average height of a Lithuanian woman– This DB breaks Terry Gross’s privacy according to this

definition… even if her record is not in the database!

8Dwork and Naor. On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy. 2016

Page 9: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Takeaway message

• Our privacy definitions must account for auxiliary information.

• Recall: Netflix paper

10

Page 10: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Differential Privacy: Idea

Released statistic is about the same if any individual’s record is removed from the database

11

[Dwork, McSherry, Nissim, Smith 2006]

Page 11: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

An Information Flow Idea

Changing input databases in a specific way changes output statistic by a small amount

12

Page 12: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Not Absolute Confidentiality

Does not guarantee that Terry Gross’s height won’t be learned by the adversary

13

Page 13: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Differential Privacy: Definition

Randomized sanitization function κ has ε-differential privacy if for all data sets D1 and D2 differing by at most one element and all subsets S of the range of κ,

Pr[κ(D1) ∈ S ] ≤ eε Pr[κ(D2) ∈ S ]

Answer to query # individuals with salary > $30K is in range [100, 110] with approximately the same

probability in D1 and D2

14

Page 14: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Check your understanding

Randomized sanitization function κ has ε-differential privacy if for all data sets D1 and D2 differing by at most one element and all subsets S of the range of κ,

Pr[κ(D1) ∈ S ] ≤ eε Pr[κ(D2) ∈ S ]

• What does differential privacy mean when 𝜖 = 0? • What range of values can 𝜖 take?

15

Page 15: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Achieving Differential Privacy: Interactive Setting

How much and what type of noise should be added?

Tell me f(D)

f(D)+noisex1…xn

Database DUser

16

Page 16: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Example: Noise Addition

17

Slide: Adam Smith

Page 17: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Global Sensitivity

18

Slide: Adam Smith

Page 18: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Exercise

19

• Function f: # individuals with salary > $30K• Global Sensitivity of f = ?

• Answer: 1

Page 19: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Exercise 2

• Function 𝑓 𝑥 = '(∑*+'( 𝑥*, where 𝑥* ∈ 𝑆

• Global Sensitivity of f = ?

• Answer: -./ 0(

20

Page 20: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Background on Probability

21

Page 21: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Continuous Probability Distributions

• Probability density function (PDF), fX

• Example distributions– Normal, exponential, Gaussian, Laplace

22

Page 22: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Laplace Distribution

23

Mean = μ

Variance = 2b2

Laplace random variable has PDF

Source: Wikipedia

We use Lap(𝑏) to denote the 0-mean version of this

Page 23: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Achieving Differential Privacy

24

Page 24: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Laplace Mechanism

25

Slide: Adam Smith

Page 25: Privacy-preserving Release of Statistics: Differential Privacycourse.ece.cmu.edu/~ece734/lectures/10-differential-privacy.pdfImpossibility Result •“Theorem”: For any reasonable

Laplace Mechanism: Proof Idea

26

Work with your neighbors to prove the Theorem.

Hint: Compute 78 9 :78 9; :