Differential privacy without sensitivity [NIPS2016読み会資料]

Differential Privacywithout

Sensitivity南賢太郎（東大情報理工 D1 ）2017/1/19@NIPS2016 読み会

OverviewDifferential privacy (DP)• Degrees of privacy protection [Dwork+06]

Gibbs posterior• A generalization of the Bayesian posterior

ContributionWe proved -DP of the Gibbs posterior without boundednessof the loss

2

Outline

1. Differential privacy2. Differentially private learning

1. Background

2. Main result Differential privacy of Gibbs posterior [Minami+16]

3. Applications

1. Logistic regression2. Posterior approximation method

3

Outline


1. Background


3. Applications


4

Privacy constraint in ML & statistics

5

𝑋 1 𝑋 2 𝑋𝑛

⋯

User’s data Curator Statistic

Privacy constraint in ML & statistics

6


⋯

User’s data Curator Statistic

In many applications of ML & statistics, the data contains user’s personal information

Problem: Calculate a statistic of interest privately

TBD.

Adversarial formulation of privacy

Example: Mean of binary-valued query (Yes: 1, No: 0)

7


⋯



8


⋯

𝑋 1′ 𝑋 2 𝑋𝑛

⋯

Auxiliary info.



9


⋯

Noise



10


⋯

Noise

𝑋 1′ 𝑋 2 𝑋𝑛

⋯



11


⋯

Noise

𝑋 1′ 𝑋 2 𝑋𝑛

⋯

Small noise for Adding noise may not deteriorate the accuracy

Large noise for Privacy preservation

Differential privacyIdea:1. Generate a random from a data-dependent distribution

12


⋯

Differential privacyIdea:2. Two “adjacent” datasets differing in a single individual

should be statistically indistinguishable

13


⋯

𝑋 1′ 𝑋 2 𝑋𝑛

⋯

Close in the sense ofa “statistical distance”

Differential privacyDef: Differential Privacy [Dwork+06]

• privacy parameters

• satisfies -differential privacy if1. for any adjacent datasets , and 2. for any set of outputs,

the following inequality holds:

14

Interpretation of DP• DP prevents identification with statistical

significance• e.g. Adversary cannot construct power -test for

at 5% significance level• See also:

15

DP and statistical learningExample: Linear classification• Find a -DP distribution of hyperplanes

that minimizes the expected classification error

16

Differentially private learning

Question: What kind of random estimators should we use?

1. Noise addition to a deterministic estimator• e.g. maximum likelihood estimator + noise

2. Modification of the Bayesian posterior (this work)

17

Outline


1. Background


3. Applications


18

Gibbs posterior• Bayesian posterior

• Introduce a “scale parameter”

19

Gibbs posteriorA natural data-dependent distribution in statistics & ML

• Contains the Bayesian posterior

• Important in PAC-Bayes theory [Catoni07][Zhang06]20

Loss function Prior distribution

Inverse temperature

Gibbs posterior

21

𝛽→0

Gibbs posteriorProblem

• If is flattened and get close to the prior• Is DP satisfied if we choose sufficiently small?

22

𝛽→0

Gibbs posteriorProblem

• If is flattened and get close to the prior• Is DP satisfied if we choose sufficiently small?

23

𝛽→0AnswerYes, if…

• is bounded (Previously known)• is bounded (This work)

The exponential mechanism

Theorem [MT07]An algorithm that draws from a distribution

satisfies -DP

24

The exponential mechanism

Theorem [MT07]An algorithm that draws from a distribution

satisfies -DP

• This is the Gibbs posterior if • has to satisfy

• : sensitivity (TBD.)25

SensitivityDefinition: Sensitivity of

• The exponential mechanism works if !

26

-norm

Supremum is taken over

adjacent datasets

SensitivityTheorem [Wang+15]

(A) (B)

27

𝜃𝐴

𝜃

𝐴

Loss function that does not satisfy -DP• Logistic loss

• The max difference of loss () grows toward as

28𝜃

𝑀

ℓ (𝜃 , (𝑧 ,+1 )) ℓ (𝜃 , (𝑧 ,−1 ))

+∞

Loss function that does not satisfy -DP• Logistic loss

• The max difference of loss () grows toward as

29𝜃

𝑀

ℓ (𝜃 , (𝑧 ,+1 )) ℓ (𝜃 , (𝑧 ,−1 ))

+∞

We need differential privacy

without sensitivity!

From bounded to Lipschitz• In the example of logistic loss, the 1st derivative

is bounded• The Lipschitz constant is not influenced by

the size of parameter space

30

Main theorem

31

Theorem [Minami+16]

Assumption:

1. For all , is -Lipschitz and convex

2. The prior is log-strongly-concave i.e. is -strongly convex

The Gibbs posterior satisfies -DP if is chosen as

(1)

Independent of the sensitivity!

Outline


1. Background


3. Applications


32

Example: Logistic LossLogistic loss

33

Example: Logistic Loss• Gaussian prior

• The Gibbs posterior is given by:

• satisfies -DP if

34

Langevin Monte Carlo method

• In practice, sampling from the Gibbs posterior can be a computationally hard problem

• Some approximate sampling methods are used(e.g. MCMC, VB)

35


• Langevin Monte Carlo (LMC)

36

GD LMC


• “Mixing-time” results have been derived for log-concave distributions [Dalalyan14][Durmus & Moulines15]

• LMC can attain -approximation after finite iterations• Polynomial time in and :

37

• I have a Privacy Preservation guarantee

• I have an Approximate Posterior

• (Ah…)

38

Privacy Preserving Approximate Posterior (PPAP)

• We can prove -DP of LMC-Gibbs posterior

Proposition [Minami+16]

• Assume that and satisfies the assumption of Main Theorem.

• We also assume that is -smooth for every

• After iterations, the output of the LMC satisfies -DP.

39

Summary1. Differentially private learning

= Differential privacy + Statistical learning

2. We developed a new method to prove -DPfor Gibbs posteriors without “sensitivity”• Applicable to Lipschitz & convex losses• (+) Guarantee for an approximate sampling method

Thank you!40

Differential privacy without sensitivity [NIPS2016読み会資料]

Technology