Differential Privacy without Sensitivity 南 南南南 南南 南南南南 ( D1 ) 2017/1/19@NIPS2016 南南南
Differential Privacywithout
Sensitivity南 賢太郎(東大 情報理工 D1 )2017/1/19@NIPS2016 読み会
OverviewDifferential privacy (DP)• Degrees of privacy protection [Dwork+06]
Gibbs posterior• A generalization of the Bayesian posterior
ContributionWe proved -DP of the Gibbs posterior without boundednessof the loss
2
Outline
1. Differential privacy2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression2. Posterior approximation method
3
Outline
1. Differential privacy2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression2. Posterior approximation method
4
Privacy constraint in ML & statistics
5
𝑋 1 𝑋 2 𝑋𝑛
⋯
User’s data Curator Statistic
Privacy constraint in ML & statistics
6
𝑋 1 𝑋 2 𝑋𝑛
⋯
User’s data Curator Statistic
In many applications of ML & statistics, the data contains user’s personal information
Problem: Calculate a statistic of interest privately
TBD.
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
7
𝑋 1 𝑋 2 𝑋𝑛
⋯
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
8
𝑋 1 𝑋 2 𝑋𝑛
⋯
𝑋 1′ 𝑋 2 𝑋𝑛
⋯
Auxiliary info.
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
9
𝑋 1 𝑋 2 𝑋𝑛
⋯
Noise
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
10
𝑋 1 𝑋 2 𝑋𝑛
⋯
Noise
𝑋 1′ 𝑋 2 𝑋𝑛
⋯
Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
11
𝑋 1 𝑋 2 𝑋𝑛
⋯
Noise
𝑋 1′ 𝑋 2 𝑋𝑛
⋯
Small noise for Adding noise may not deteriorate the accuracy
Large noise for Privacy preservation
Differential privacyIdea:1. Generate a random from a data-dependent distribution
12
𝑋 1 𝑋 2 𝑋𝑛
⋯
Differential privacyIdea:2. Two “adjacent” datasets differing in a single individual
should be statistically indistinguishable
13
𝑋 1 𝑋 2 𝑋𝑛
⋯
𝑋 1′ 𝑋 2 𝑋𝑛
⋯
Close in the sense ofa “statistical distance”
Differential privacyDef: Differential Privacy [Dwork+06]
• privacy parameters
• satisfies -differential privacy if1. for any adjacent datasets , and 2. for any set of outputs,
the following inequality holds:
14
Interpretation of DP• DP prevents identification with statistical
significance• e.g. Adversary cannot construct power -test for
at 5% significance level• See also:
15
DP and statistical learningExample: Linear classification• Find a -DP distribution of hyperplanes
that minimizes the expected classification error
16
Differentially private learning
Question: What kind of random estimators should we use?
1. Noise addition to a deterministic estimator• e.g. maximum likelihood estimator + noise
2. Modification of the Bayesian posterior (this work)
17
Outline
1. Differential privacy2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression2. Posterior approximation method
18
Gibbs posterior• Bayesian posterior
• Introduce a “scale parameter”
19
Gibbs posteriorA natural data-dependent distribution in statistics & ML
• Contains the Bayesian posterior
• Important in PAC-Bayes theory [Catoni07][Zhang06]20
Loss function Prior distribution
Inverse temperature
Gibbs posterior
21
𝛽→0
Gibbs posteriorProblem
• If is flattened and get close to the prior• Is DP satisfied if we choose sufficiently small?
22
𝛽→0
Gibbs posteriorProblem
• If is flattened and get close to the prior• Is DP satisfied if we choose sufficiently small?
23
𝛽→0AnswerYes, if…
• is bounded (Previously known)• is bounded (This work)
The exponential mechanism
Theorem [MT07]An algorithm that draws from a distribution
satisfies -DP
24
The exponential mechanism
Theorem [MT07]An algorithm that draws from a distribution
satisfies -DP
• This is the Gibbs posterior if • has to satisfy
• : sensitivity (TBD.)25
SensitivityDefinition: Sensitivity of
• The exponential mechanism works if !
26
-norm
Supremum is taken over
adjacent datasets
SensitivityTheorem [Wang+15]
(A) (B)
27
𝜃𝐴
𝜃
𝐴
Loss function that does not satisfy -DP• Logistic loss
• The max difference of loss () grows toward as
28𝜃
𝑀
ℓ (𝜃 , (𝑧 ,+1 )) ℓ (𝜃 , (𝑧 ,−1 ))
+∞
Loss function that does not satisfy -DP• Logistic loss
• The max difference of loss () grows toward as
29𝜃
𝑀
ℓ (𝜃 , (𝑧 ,+1 )) ℓ (𝜃 , (𝑧 ,−1 ))
+∞
We need differential privacy
without sensitivity!
From bounded to Lipschitz• In the example of logistic loss, the 1st derivative
is bounded• The Lipschitz constant is not influenced by
the size of parameter space
30
Main theorem
31
Theorem [Minami+16]
Assumption:
1. For all , is -Lipschitz and convex
2. The prior is log-strongly-concave i.e. is -strongly convex
The Gibbs posterior satisfies -DP if is chosen as
(1)
Independent of the sensitivity!
Outline
1. Differential privacy2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression2. Posterior approximation method
32
Example: Logistic LossLogistic loss
33
Example: Logistic Loss• Gaussian prior
• The Gibbs posterior is given by:
• satisfies -DP if
34
Langevin Monte Carlo method
• In practice, sampling from the Gibbs posterior can be a computationally hard problem
• Some approximate sampling methods are used(e.g. MCMC, VB)
35
Langevin Monte Carlo method
• Langevin Monte Carlo (LMC)
36
GD LMC
Langevin Monte Carlo method
• “Mixing-time” results have been derived for log-concave distributions [Dalalyan14][Durmus & Moulines15]
• LMC can attain -approximation after finite iterations• Polynomial time in and :
37
• I have a Privacy Preservation guarantee
• I have an Approximate Posterior
• (Ah…)
38
Privacy Preserving Approximate Posterior (PPAP)
• We can prove -DP of LMC-Gibbs posterior
Proposition [Minami+16]
• Assume that and satisfies the assumption of Main Theorem.
• We also assume that is -smooth for every
• After iterations, the output of the LMC satisfies -DP.
39
Summary1. Differentially private learning
= Differential privacy + Statistical learning
2. We developed a new method to prove -DPfor Gibbs posteriors without “sensitivity”• Applicable to Lipschitz & convex losses• (+) Guarantee for an approximate sampling method
Thank you!40