Gaussian Differential Privacy with Applications to Deep Learning Weijie Su Wharton School, University of Pennsylvania
Gaussian Differential Privacywith Applications to Deep Learning
Weijie Su
Wharton School, University of Pennsylvania
Big Brother is watching you! [1984, George Orwell]
Can anonymization preserve privacy?
The Netflix competition
• In 2006, Narayanan and Shmatikov demonstrated that
Netflix ratings + IMDb = De-anonymization
• The second Netflix competition was canceled
3 / 56Weijie on GDP
Can anonymization preserve privacy?
The Netflix competition
• In 2006, Narayanan and Shmatikov demonstrated that
Netflix ratings + IMDb = De-anonymization
• The second Netflix competition was canceled
3 / 56Weijie on GDP
Can anonymization preserve privacy?
The Netflix competition
• In 2006, Narayanan and Shmatikov demonstrated that
Netflix ratings + IMDb = De-anonymization
• The second Netflix competition was canceled3 / 56Weijie on GDP
Releasing summary statistics?
Genomic research often releases minor allele frequencies (MAFs), i.e., samplemean
In 2008, Homer et al shocked the genetics community by showing that MAFsare not private
4 / 56Weijie on GDP
What we lose if we give up privacy? [WSJ ’13]
Peggy Noonan: A loss of privacy is a lossof something personal and intimate
Nat Hentoff: Privacy is an Americanconstitutional liberty right
5 / 56Weijie on GDP
Is this our future?
6 / 56Weijie on GDP
From hypothesis testing to privacy
In 2006, Dwork, McSherry, Nissim, and Smith related privacy to hypothesistesting
7 / 56Weijie on GDP
From hypothesis testing to privacy
In 2006, Dwork, McSherry, Nissim, and Smith related privacy to hypothesistesting
7 / 56Weijie on GDP
Interpreting privacy via hypothesis testing
Two neighboring datasets:
S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}
Based on output of an algorithm, perform hypothesis testing
H0 : true dataset is S vs H1 : true dataset is S′
• Preserves privacy of Anne and Eva if hypothesis testing is difficult
• Essence in differential privacy (DP)
8 / 56Weijie on GDP
Interpreting privacy via hypothesis testing
Two neighboring datasets:
S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}
Based on output of an algorithm, perform hypothesis testing
H0 : Anne in the dataset vs H1 : Eva in the dataset
• Preserves privacy of Anne and Eva if hypothesis testing is difficult
• Essence in differential privacy (DP)
8 / 56Weijie on GDP
Interpreting privacy via hypothesis testing
Two neighboring datasets:
S = {Anne, Jane, Ed,Bob} and S′ = {Eva, Jane, Ed,Bob}
Based on output of an algorithm, perform hypothesis testing
H0 : Anne in the dataset vs H1 : Eva in the dataset
• Preserves privacy of Anne and Eva if hypothesis testing is difficult
• Essence in differential privacy (DP)
8 / 56Weijie on GDP
The impact of differential privacy
Google (Chrome), Apple (iOS 10+),Microsoft, U.S. Census Bureau [Dworkand Roth ’14; Erlingsson et al ’14; Apple DP
team ’17; Ding et al ’17; Abowd ’16]
Test of time: 2017 Gödel prize
9 / 56Weijie on GDP
What’s new in this talk?
9 / 56Weijie on GDP
Collaborators
Jinshuo Dong (Penn CS) Aaron Roth (Penn CS)
10 / 56Weijie on GDP
A new privacy notion and the old one
f-differential privacy: this talk
• Interpreting privacy via hypothesistesting
• Privacy measure: type I and II errorstrade-off
• Privacy functional parameter:f : [0, 1]→ [0, 1]
• How to achieve: adding Gaussiannoise
(ε, δ)-differential privacy: Dwork et al
• Interpreting privacy via hypothesistesting
• Privacy measure: worst-caselikelihood ratio
• Privacy parameters:ε > 0, 0 6 δ < 1
• How to achieve: adding Laplacenoise
11 / 56Weijie on GDP
A new privacy notion and the old one
f-differential privacy: this talk
• Interpreting privacy via hypothesistesting
• Privacy measure: type I and II errorstrade-off
• Privacy functional parameter:f : [0, 1]→ [0, 1]
• How to achieve: adding Gaussiannoise
(ε, δ)-differential privacy: Dwork et al
• Interpreting privacy via hypothesistesting
• Privacy measure: worst-caselikelihood ratio
• Privacy parameters:ε > 0, 0 6 δ < 1
• How to achieve: adding Laplacenoise
11 / 56Weijie on GDP
A new privacy notion and the old one
f-differential privacy: this talk
• Interpreting privacy via hypothesistesting
• Privacy measure: type I and II errorstrade-off
• Privacy functional parameter:f : [0, 1]→ [0, 1]
• How to achieve: adding Gaussiannoise
(ε, δ)-differential privacy: Dwork et al
• Interpreting privacy via hypothesistesting
• Privacy measure: worst-caselikelihood ratio
• Privacy parameters:ε > 0, 0 6 δ < 1
• How to achieve: adding Laplacenoise
11 / 56Weijie on GDP
A new privacy notion and the old one
f-differential privacy: this talk
• Interpreting privacy via hypothesistesting
• Privacy measure: type I and II errorstrade-off
• Privacy functional parameter:f : [0, 1]→ [0, 1]
• How to achieve: adding Gaussiannoise
(ε, δ)-differential privacy: Dwork et al
• Interpreting privacy via hypothesistesting
• Privacy measure: worst-caselikelihood ratio
• Privacy parameters:ε > 0, 0 6 δ < 1
• How to achieve: adding Laplacenoise
11 / 56Weijie on GDP
A very early preview of f-DP
11 / 56Weijie on GDP
The zoo of differential privacy
Informativeness Composition Subsampling
ε-DP
(ε, δ)-DP
Divergence based DPs
f-DP
12 / 56Weijie on GDP
Informativeness
Informative representation of privacy guarantees of algorithms?
• The two parameters in (ε, δ)-DP are notenough
• Rényi divergence is lossy (concentratedDP [Dwork, Rothblum ’16], zeroconcentrated DP [Bun, Steinke ’16],truncated concentrated DP [Bun, Dwork,Rothblum, Steinke ’18], Rényi DP [Mironov’17])
13 / 56Weijie on GDP
Composition
How does the overall privacy degrade under a sequence of private algorithms?
• The advanced composition theorem in(ε, δ)-DP is not accurate
14 / 56Weijie on GDP
Subsampling
How is privacy amplified with subsampling?
• Privacy amplification by subsampling iseither inaccurate or complicated indivergence based DPs
15 / 56Weijie on GDP
Outline
1. Introduction of f-DP
2. Informative representation of privacy
3. Composition and central limit theorems
4. Amplifying privacy via subsampling
5. Application to deep learning
16 / 56Weijie on GDP
Trade-off functions
H0 : true dataset is S vs H1 : true dataset is S′
H0 : P vs H1 : Q
For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)
DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as
T (P,Q)(α) = infφ{βφ : αφ 6 α}
• The Neyman–Pearson lemma
• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].
17 / 56Weijie on GDP
Trade-off functions
H0 : P vs H1 : Q
For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)
DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as
T (P,Q)(α) = infφ{βφ : αφ 6 α}
• The Neyman–Pearson lemma
• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].
17 / 56Weijie on GDP
Trade-off functions
H0 : P vs H1 : Q
For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)
DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as
T (P,Q)(α) = infφ{βφ : αφ 6 α}
• The Neyman–Pearson lemma
• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].
17 / 56Weijie on GDP
Trade-off functions
H0 : P vs H1 : Q
For rejection rule φ ∈ [0, 1], denote by αφ = EP [φ] (type I error), andβφ = 1− EQ[φ] (type II error)
DefinitionFor two probability distributions P and Q, define the trade-off functionT (P,Q) : [0, 1]→ [0, 1] as
T (P,Q)(α) = infφ{βφ : αφ 6 α}
• The Neyman–Pearson lemma
• Function f is trade-off if and only if f is convex, continuous,non-increasing, and f(α) 6 1− α for α ∈ [0, 1].
17 / 56Weijie on GDP
Definition of f-DP
DefinitionA (randomized) algorithmM is said to be f-differentially private if
T(M(S),M(S′)
)> f
for all neighboring datasets S and S′
• Randomness ofM(S),M(S′) is from the algorithmM
• Distinguishing between Anne and Eva is no easier than telling apart P andQ if f = T (P,Q)
• Related to hypothesis testing region [Kairouz et al ’17]
18 / 56Weijie on GDP
When is it f-DP?
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
IIer
ror
f
Is f -DP
Not f -DP
Not f -DP
19 / 56Weijie on GDP
Symmetrization
Type I and type II errors in privacy are symmetric. How about f?
Proposition
LetM be f-DP. Then,M is f symm-DP with f symm(α) = max{f(α), f−1(α)}
• The inverse f−1(α) := inf{t ∈ [0, 1] : f(t) 6 α}
• f symm is a trade-off function
20 / 56Weijie on GDP
Symmetrization
Type I and type II errors in privacy are symmetric. How about f?
Proposition
LetM be f-DP. Then,M is f symm-DP with f symm(α) = max{f(α), f−1(α)}
• The inverse f−1(α) := inf{t ∈ [0, 1] : f(t) 6 α}
• f symm is a trade-off function
20 / 56Weijie on GDP
Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E
P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ
Proposition (Wasserman and Zhou ’10)
Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
IIer
ror
indistinguishable
�intercept = �
slope = �e"
f",�
21 / 56Weijie on GDP
Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E
P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ
Proposition (Wasserman and Zhou ’10)
Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
IIer
ror
indistinguishable
�intercept = �
slope = �e"
f",�
21 / 56Weijie on GDP
Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E
P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ
Proposition (Wasserman and Zhou ’10)
Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
IIer
ror
indistinguishable
�intercept = �
slope = �e"
f",�
21 / 56Weijie on GDP
Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E
P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ
Proposition (Wasserman and Zhou ’10)
Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
IIer
ror
indistinguishable
�intercept = �
slope = �e"
f",�
21 / 56Weijie on GDP
Happy with (ε, δ)-DP?
• 4 segments. A bit adhoc?
• w.p. δ, very bad eventscan happen
Connection with (ε, δ)-DP(ε, δ)-DP requires that for any (measurable) event E
P(M(S) ∈ E) 6 eε P(M(S′) ∈ E) + δ
Proposition (Wasserman and Zhou ’10)
Denote fε,δ(α) = max {0, 1− δ − eεα, e−ε(1− δ − α)}. An algorithmM is(ε, δ)-DP if and only if it is is fε,δ-DP
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
IIer
ror
indistinguishable
�intercept = �
slope = �e"
f",�
21 / 56Weijie on GDP
Happy with (ε, δ)-DP?
• 4 segments. A bit adhoc?
• w.p. δ, very bad eventscan happen
A primal-dual perspective
21 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
22 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
22 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
22 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
22 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
22 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
II e
rror
f
22 / 56Weijie on GDP
From dual to primal
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
Proposition
Let I be an index set. An algorithm satisfies the collection of (εi, δi)-DP for alli ∈ I if and only if it is f-DP with f(α) = supi∈I fεi,δi(α)
23 / 56Weijie on GDP
From primal to dual
0.0 0.2 0.4 0.6 0.8 1.0type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
II e
rror
f
24 / 56Weijie on GDP
From primal to dual
0.0 0.2 0.4 0.6 0.8 1.0type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
II e
rror
f
24 / 56Weijie on GDP
From primal to dual
0.0 0.2 0.4 0.6 0.8 1.0type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
II e
rror
f
24 / 56Weijie on GDP
From primal to dual
0.0 0.2 0.4 0.6 0.8 1.0type I error
0.0
0.2
0.4
0.6
0.8
1.0
type
II e
rror
f
24 / 56Weijie on GDP
From primal to dual
Proposition
An algorithm is f-DP if and only if it is(ε, δ(ε)
)-DP for all ε > 0 with
δ(ε) = 1 + f∗(−eε)
• Conjugate f∗(y) = supx∈[0,1] yx− f(x)
• f-DP is equivalent to an infinite collection of (ε, δ)-DP
• Which domain is more convenient? A powerful tool
25 / 56Weijie on GDP
From primal to dual
Proposition
An algorithm is f-DP if and only if it is(ε, δ(ε)
)-DP for all ε > 0 with
δ(ε) = 1 + f∗(−eε)
• Conjugate f∗(y) = supx∈[0,1] yx− f(x)
• f-DP is equivalent to an infinite collection of (ε, δ)-DP
• Which domain is more convenient? A powerful tool
25 / 56Weijie on GDP
Is f too general? Let’s focus!
25 / 56Weijie on GDP
Gaussian differential privacy (GDP)
Consider Gaussian trade-off function
Gµ := T(N (0, 1),N (µ, 1)
)for µ > 0. Explicitly, Gµ(α) = Φ
(Φ−1(1− α)− µ
)DefinitionAn algorithmM is said to be µ-GDP if
T(M(S),M(S′)
)> Gµ
for all neighboring datasets S and S′
• Focal to f-DP (a central limit theorem phenomenon)
26 / 56Weijie on GDP
Gaussian differential privacy (GDP)
Consider Gaussian trade-off function
Gµ := T(N (0, 1),N (µ, 1)
)for µ > 0. Explicitly, Gµ(α) = Φ
(Φ−1(1− α)− µ
)DefinitionAn algorithmM is said to be µ-GDP if
T(M(S),M(S′)
)> Gµ
for all neighboring datasets S and S′
• Focal to f-DP (a central limit theorem phenomenon)
26 / 56Weijie on GDP
How to interpt µ in GDP?
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0 �intercept = �
slope = �e"
f",�
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0G0.5
G1
G3
G6
• Privacy amounts to distinguishing betweenN (0, 1) andN (µ, 1)
• µ 6 0.5: reasonably private; µ > 6: blatantly non-private
27 / 56Weijie on GDP
Mask individual characteristics via adding noise
Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|
TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ
• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP
• Lighter tail than Laplace noise
29 / 56Weijie on GDP
Mask individual characteristics via adding noise
Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|
TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ
• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP
• Lighter tail than Laplace noise
29 / 56Weijie on GDP
Mask individual characteristics via adding noise
Sensitivity ∆θ := maxS∼S′ |θ(S)− θ(S′)|
TheoremConsider the Gaussian mechanismM(S) = θ(S) + ξ, where ξ ∼ N (0, σ2) forσ > 0. Then,M is µ-GDP for µ = ∆θ/σ
• Gaussian mechanism is to GDP as Laplace mechanism is to ε-DP
• Lighter tail than Laplace noise
29 / 56Weijie on GDP
Outline
1. Introduction of f-DP
2. Informative representation of privacy
3. Composition and central limit theorems
4. Amplifying privacy via subsampling
5. Application to deep learning
30 / 56Weijie on GDP
Blackwell’s theorem
We say (P ′, Q′) is Blackwell harder to distinguish than (P,Q) if there is a postprocessing such that P ′ = Proc(P ), Q′ = Proc(Q)
Theorem (Blackwell ’50)The following two statements are equivalent:
(a) T (P ′, Q′) > T (P,Q)
(b) (P ′, Q′) is Blackwell harder to distinguish than (P,Q)
• Trade-off functions lead to the equivalent ordering as post processing
31 / 56Weijie on GDP
Rényi divergence is not as informative
Rényi divergence of order γ
Rγ(P‖Q) :=1
γ − 1logEQ
(dP
dQ
)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke
’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence
Proposition
Let Pε = Bern( eε
1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:
(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)
)(b) Using total variation, dTV(Pε, Qε) > dTV
(N (0, 1),N (ε, 1)
)• No such a phenomenon for trade-off functions
• Similar examples exist for (ε, δ)-DP
32 / 56Weijie on GDP
Rényi divergence is not as informative
Rényi divergence of order γ
Rγ(P‖Q) :=1
γ − 1logEQ
(dP
dQ
)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke
’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence
Proposition
Let Pε = Bern( eε
1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:
(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)
)(b) Using total variation, dTV(Pε, Qε) > dTV
(N (0, 1),N (ε, 1)
)
• No such a phenomenon for trade-off functions
• Similar examples exist for (ε, δ)-DP
32 / 56Weijie on GDP
Rényi divergence is not as informative
Rényi divergence of order γ
Rγ(P‖Q) :=1
γ − 1logEQ
(dP
dQ
)γ• Concentrated DP [Dwork, Rothblum ’16], zero concentrated DP [Bun, Steinke
’16], truncated concentrated DP [Bun, Dwork, Rothblum, Steinke ’18], and RényiDP [Mironov ’17] are all defined via Rényi divergence
Proposition
Let Pε = Bern( eε
1+eε ), Qε = Bern( 11+eε ). For 0 < ε < 4, the following are true:
(a) For all γ > 1, Rγ(Pε‖Qε) < Rγ(N (0, 1)‖N (ε, 1)
)(b) Using total variation, dTV(Pε, Qε) > dTV
(N (0, 1),N (ε, 1)
)• No such a phenomenon for trade-off functions
• Similar examples exist for (ε, δ)-DP
32 / 56Weijie on GDP
Properties f-DP
• Informative representation of privacy
32 / 56Weijie on GDP
Outline
1. Introduction of f-DP
2. Informative representation of privacy
3. Composition and central limit theorems
4. Amplifying privacy via subsampling
5. Application to deep learning
33 / 56Weijie on GDP
What is composition?
34 / 56Weijie on GDP
What is composition?
34 / 56Weijie on GDP
What is composition?
34 / 56Weijie on GDP
Composition surely leads to aprivacy compromise. But how fast?
Definition of composition
LetM1 : X → Y1 andM2 : X × Y1 → Y2 be private algorithms. Define theircompositionM : X → Y1 × Y2 as
M(S) = (M1(S),M2(S,M1(S)))
Given a sequence of algorithmsMi : X × Y1 × · · · × Yi−1 → Yi for i 6 k,recursively define the composition:
M : X → Y1 × · · · × Yk
35 / 56Weijie on GDP
Tensor product of trade-off functions
Definition
The tensor product of two trade-off functions f = T (P,Q) and g = T (P ′, Q′) isdefined as
f ⊗ g := T (P × P ′, Q×Q′)
• Well-defined
• The operator ⊗ is commutative and associative
• For GDP, Gµ1⊗Gµ2
⊗ · · · ⊗Gµk = Gµ, where µ =√µ21 + · · ·+ µ2
k
36 / 56Weijie on GDP
Composition is an algebra
TheoremSupposeMi(·, y1, · · · , yi−1) is fi-DP for all y1 ∈ Y1, . . . , yi−1 ∈ Yi−1. Then thecomposition algorithmM : X → Y1 × · · · × Yk is f1 ⊗ · · · ⊗ fk-DP
• Cannot be improved in general
• Composition in f-DP is reduced to algebra
• k-step composition of µ-GDP algorithms is√kµ-GDP
37 / 56Weijie on GDP
Composition is an algebra
TheoremSupposeMi(·, y1, · · · , yi−1) is fi-DP for all y1 ∈ Y1, . . . , yi−1 ∈ Yi−1. Then thecomposition algorithmM : X → Y1 × · · · × Yk is f1 ⊗ · · · ⊗ fk-DP
• Cannot be improved in general
• Composition in f-DP is reduced to algebra
• k-step composition of µ-GDP algorithms is√kµ-GDP
37 / 56Weijie on GDP
Central limit theorem for f-DP
Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/
√k) close to perfect privacy. Then
limk→∞
fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ
• The convergence is uniform on [0, 1]
• µ can be computed from {fki}
• IfMki is fki-DP, their composition is approximately µ-GDP
• An effective approximation tool
• GDP is to f-DP as Gaussian variables (rvs) to general rvs
• Proof follows from Le Cam’s third lemma
38 / 56Weijie on GDP
Central limit theorem for f-DP
Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/
√k) close to perfect privacy. Then
limk→∞
fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ
• The convergence is uniform on [0, 1]
• µ can be computed from {fki}
• IfMki is fki-DP, their composition is approximately µ-GDP
• An effective approximation tool
• GDP is to f-DP as Gaussian variables (rvs) to general rvs
• Proof follows from Le Cam’s third lemma
38 / 56Weijie on GDP
Central limit theorem for f-DP
Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/
√k) close to perfect privacy. Then
limk→∞
fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ
• The convergence is uniform on [0, 1]
• µ can be computed from {fki}
• IfMki is fki-DP, their composition is approximately µ-GDP
• An effective approximation tool
• GDP is to f-DP as Gaussian variables (rvs) to general rvs
• Proof follows from Le Cam’s third lemma
38 / 56Weijie on GDP
Central limit theorem for f-DP
Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/
√k) close to perfect privacy. Then
limk→∞
fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ
• The convergence is uniform on [0, 1]
• µ can be computed from {fki}
• IfMki is fki-DP, their composition is approximately µ-GDP
• An effective approximation tool
• GDP is to f-DP as Gaussian variables (rvs) to general rvs
• Proof follows from Le Cam’s third lemma
38 / 56Weijie on GDP
Central limit theorem for f-DP
Theorem (informal)Let {fki : 1 6 i 6 k, k = 1, 2, . . .} be a triangular array of trade-off functions,each being O(1/
√k) close to perfect privacy. Then
limk→∞
fk1 ⊗ fk2 ⊗ · · · ⊗ fkk = Gµ
• The convergence is uniform on [0, 1]
• µ can be computed from {fki}
• IfMki is fki-DP, their composition is approximately µ-GDP
• An effective approximation tool
• GDP is to f-DP as Gaussian variables (rvs) to general rvs
• Proof follows from Le Cam’s third lemma
38 / 56Weijie on GDP
Central limit theorem for ε-DP
Theorem
Fix µ > 0 and assume ε =√µ/k. Then
Gµ
(α+
c
k
)− c
k6 f⊗kε,0 (α) 6 Gµ
(α− c
k
)+c
k
• Local computation is #P-complete [Murtagh and Vadhan ’16]
• Sharper than the O(1/√k) bound in Berry–Esseen
Privacy CLT Beats Berry–Esseen for ε-DP! Why?
• Due to randomization of rejection rules, leading to continuity of trade-offfunctions
39 / 56Weijie on GDP
Central limit theorem for ε-DP
Theorem
Fix µ > 0 and assume ε =√µ/k. Then
Gµ
(α+
c
k
)− c
k6 f⊗kε,0 (α) 6 Gµ
(α− c
k
)+c
k
• Local computation is #P-complete [Murtagh and Vadhan ’16]
• Sharper than the O(1/√k) bound in Berry–Esseen
Privacy CLT Beats Berry–Esseen for ε-DP! Why?
• Due to randomization of rejection rules, leading to continuity of trade-offfunctions
39 / 56Weijie on GDP
Central limit theorem for ε-DP
Theorem
Fix µ > 0 and assume ε =√µ/k. Then
Gµ
(α+
c
k
)− c
k6 f⊗kε,0 (α) 6 Gµ
(α− c
k
)+c
k
• Local computation is #P-complete [Murtagh and Vadhan ’16]
• Sharper than the O(1/√k) bound in Berry–Esseen
Privacy CLT Beats Berry–Esseen for ε-DP! Why?
• Due to randomization of rejection rules, leading to continuity of trade-offfunctions
39 / 56Weijie on GDP
Central limit theorem for ε-DP
Theorem
Fix µ > 0 and assume ε =√µ/k. Then
Gµ
(α+
c
k
)− c
k6 f⊗kε,0 (α) 6 Gµ
(α− c
k
)+c
k
• Local computation is #P-complete [Murtagh and Vadhan ’16]
• Sharper than the O(1/√k) bound in Berry–Esseen
Privacy CLT Beats Berry–Esseen for ε-DP! Why?
• Due to randomization of rejection rules, leading to continuity of trade-offfunctions
39 / 56Weijie on GDP
Central limit theorem for ε-DP
Theorem
Fix µ > 0 and assume ε =√µ/k. Then
Gµ
(α+
c
k
)− c
k6 f⊗kε,0 (α) 6 Gµ
(α− c
k
)+c
k
• Local computation is #P-complete [Murtagh and Vadhan ’16]
• Sharper than the O(1/√k) bound in Berry–Esseen
Privacy CLT Beats Berry–Esseen for ε-DP! Why?
• Due to randomization of rejection rules, leading to continuity of trade-offfunctions
39 / 56Weijie on GDP
In CLT we trust
10-fold composition of (1/√
10, 0)-DP. δ = 0.001 in green curve
40 / 56Weijie on GDP
Properties of f-DP
• Informative representation of privacy
• Algebraically convenient and tight composition operations
40 / 56Weijie on GDP
Outline
1. Introduction of f-DP
2. Informative representation of privacy
3. Composition and central limit theorems
4. Amplifying privacy via subsampling
5. Application to deep learning
41 / 56Weijie on GDP
What is subsampling for privacy?
Given dataset S, apply the algorithmM on a subsampled dataset sub(S),resulting a new algorithmM◦ sub(S)
• Subsampling provides stronger privacy guarantees than when run on thewhole dataset
• A frequently used tool for amplifying privacy
42 / 56Weijie on GDP
Subsampling theorem for f-DP
subm uniformly picks an m-sized subset from S. Let p := m/n
p-sampling operator Cp acting on trade-off functions
Cp(f) := Conv(
min{fp, f−1p })
= min{fp, f−1p }∗∗
• fp = pf + (1− p)Id, with Id(α) = 1− α
• min{fp, f−1p }∗∗ is double (convex) conjugate of min{fp, f−1p } (the greatestconvex lower bound)
TheoremIfM is f-DP, thenM◦ subm is Cp(f)-DP, and it is tight
• The subsampling theorem for Rényi DP is quite complicated [Wang, Balle,and Kasiviswanathan ’18]
43 / 56Weijie on GDP
Subsampling theorem for f-DP
subm uniformly picks an m-sized subset from S. Let p := m/n
p-sampling operator Cp acting on trade-off functions
Cp(f) := Conv(
min{fp, f−1p })
= min{fp, f−1p }∗∗
• fp = pf + (1− p)Id, with Id(α) = 1− α
• min{fp, f−1p }∗∗ is double (convex) conjugate of min{fp, f−1p } (the greatestconvex lower bound)
TheoremIfM is f-DP, thenM◦ subm is Cp(f)-DP, and it is tight
• The subsampling theorem for Rényi DP is quite complicated [Wang, Balle,and Kasiviswanathan ’18]
43 / 56Weijie on GDP
Numerical examples
0 x∗ fp(x∗) 1
0
x∗
fp(x∗)
1
f
fp
f−1p
Cp(f)
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0fε,δ
fε′,δ′
Cp(fε,δ)
Left: f = G1.8, p = 0.35. Right: ε = 3, δ = 0.1, p = 0.2
44 / 56Weijie on GDP
Our gain
Properties of f-DP
• Informative representation of privacy
• Algebraically convenient and tight composition operations
• Sharp privacy amplification via subsampling
44 / 56Weijie on GDP
Outline
1. Introduction of f-DP
2. Informative representation of privacy
3. Composition and central limit theorems
4. Amplifying privacy via subsampling
5. Application to deep learning
45 / 56Weijie on GDP
Collaborators
Zhiqi Bu (Wharton) Jinshuo Dong (Penn CS) Qi Long (Penn Biostatistics)
46 / 56Weijie on GDP
Privacy concerns in deep learningPrivacy Issues of Training Data
Dataset Server Model
• Influential paper deep learning with differential privacy by Google Brain[Abadi et al ’16]
47 / 56Weijie on GDP
Privacy concerns in deep learningPrivacy Issues of Training Data
Dataset Server Model
• Influential paper deep learning with differential privacy by Google Brain[Abadi et al ’16]
47 / 56Weijie on GDP
Private deep learning
48 / 56Weijie on GDP
Reproducing results of Google Brain
0 10 20 30 40 50 60 70epochs
0.70
0.75
0.80
0.85
0.90
0.95
1.00
accu
racy
Medium noise: = 0.7
Private test accuracyNon-private test accuracy
Experiments on MNIST [Abadi et al ’16]
49 / 56Weijie on GDP
Can the f-DP framework improveprivacy analysis and prediction accuracy?
49 / 56Weijie on GDP
Composition and subsampling via f-DP
SGD equationθt+1 = SGD ◦ sub(S; θt)
LemmaSGD with noise sampled fromN (0, 4C2/(B2µ2)Id) is µ-GDP
Thus, we get
TheoremThe SGD mechanismM(S) = (θ1, θ2, . . . , θm) is
CB/n(Gµ)⊗ CB/n(Gµ)⊗ · · · ⊗ CB/n(Gµ)︸ ︷︷ ︸T
-DP,
which is, asymptotically, µ̃-GDP with
µ̃ =B
n
√2m(eµ2Φ(3µ/2) + 3Φ(−µ/2)− 2)
50 / 56Weijie on GDP
Tighter privacy analysis via f-DP
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
95.0% accuracy in 15 epochs, σ = 1.3
0.285-GDP by CLT
(1.19,1.0e-5)-DP by MA
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
96.6% accuracy in 60 epochs, σ = 1.1
0.737-GDP by CLT
(3.01,1.0e-5)-DP by MA
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
97.0% accuracy in 45 epochs, σ = 0.7
1.554-GDP by CLT
(7.1,1.0e-5)-DP by MA
Solid red: our subsample theorem and CLT. Dashed blue: moments accountant(MA) developed in [Abadi et al ’16]
51 / 56Weijie on GDP
Holography for privacy?
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
95.0% accuracy in 15 epochs, σ = 1.3
0.285-GDP by CLT
(1.19,1.0e-5)-DP by MA
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
96.6% accuracy in 60 epochs, σ = 1.1
0.737-GDP by CLT
(3.01,1.0e-5)-DP by MA
0.0 0.2 0.4 0.6 0.8 1.0
type I error
0.0
0.2
0.4
0.6
0.8
1.0
typ
eII
erro
r
97.0% accuracy in 45 epochs, σ = 0.7
1.554-GDP by CLT
(7.1,1.0e-5)-DP by MA
Solid red: our subsample theorem and CLT. Dashed blue: moments accountant(MA) developed in [Abadi et al ’16]
51 / 56Weijie on GDP
Privacy guarantees in (ε, δ)-DP
0 5 10 15 20 25 30epochs
1
2
3
4
5
6
7
Medium noise: = 0.7
GDP-DP-
Experiments on MNIST, with δ = 10−7
52 / 56Weijie on GDP
Same ε and δ, but less noise
0 5 10 15 20 25 30epochs
0.50
0.55
0.60
0.65
0.70= 0.7: necessary noise by GDP
Noise added by -DPNoise added by GDP
0 5 10 15 20 25 30epochs
0.70
0.75
0.80
0.85
0.90
0.95
1.00
accu
racy
= 0.7: accuracy with only necessary noise by GDP
-DP Private test accuracyNon-private test accuracyGDP Private test accuracy
53 / 56Weijie on GDP
More privacy and higher accuracyPrivacy Issues of Training Data
Dataset Server Model
Privacy w/ Better Accuracy
54 / 56Weijie on GDP
Concluding remarks
54 / 56Weijie on GDP
Summary
Informativeness Composition Subsampling
ε-DP
(ε, δ)-DP
Divergence based DPs
f-DP
In the f-DP framework
• Trade-off functions are informative for privacy loss
• Composition is equivalent to tensor product
• Subsampling is to average and convexify
55 / 56Weijie on GDP
Take-home messages
• Gaussian Differential Privacywith Jinshuo Dong and Aaron Roth. arXiv:1905.02383
• Deep Learning with Gaussian Differential Privacywith Zhiqi Bu, Jinshuo Dong, and Qi Long. arXiv:1911.11607
NSF CAREER DMS-1847415, CCF-1763314, CCF-1934876, and Wharton Dean’s Fund
56 / 56Weijie on GDP
The Return of the King
• Gaussian Differential Privacywith Jinshuo Dong and Aaron Roth. arXiv:1905.02383
• Deep Learning with Gaussian Differential Privacywith Zhiqi Bu, Jinshuo Dong, and Qi Long. arXiv:1911.11607
NSF CAREER DMS-1847415, CCF-1763314, CCF-1934876, and Wharton Dean’s Fund
56 / 56Weijie on GDP