Logarithmic Sobolev inequalities in discrete product spaces: proof by a transportation cost distance Katalin Marton Alfréd Rényi Institute of Mathematics of the Hungarian Academy of Sciences
Logarithmic Sobolev inequalities in discrete product spaces:proof by a transportation cost distance
Katalin Marton
Alfréd Rényi Institute of Mathematicsof the Hungarian Academy of Sciences
Relative entropy
DefinitionZ: measurable space, µ, ν: probability measures on Z.Z, U : random variables, L(Z) = µ, L(U) = ν.
Relative entropy:
D(µ||ν) = D(Z||U) =
∫Z
logdµ
dνdµ (1)
If Z is finite:
D(µ||ν) = D(Z||U) =∑z∈Z
µ(z) logµ(z)
ν(z). (2)
Entropy contraction of Markov kernels
Definition(Z, ν) : probability space,P(Z): measures on Z,Γ: Markov kernel on Z with invariant measure ν.
Entropy contraction for (Z, ν,Γ)with rate 1− c, 0 < c ≤ 1:
D(µΓ||ν) ≤ (1− c) ·D(µ||ν). (3)
Equivalently:
c ·D(µ||ν) ≤(D(µ||ν)−D(µΓ||ν)
), for all µ ∈ P(Z) (4)
Gibbs sampler governed by local specifications of qn
DefinitionΓi: Markov kernel X n 7→ Xn
Γi(zn|yn) = δ(yi, zi) · qi(zi|yi),
Γ: Markov kernel X n 7→ Xn:
Γ =1
n·n∑i=1
Γi.
Definitionqn has the entropy contraction property if:
its Gibbs sampler Γ has.
Entropy contraction in product space
(X n, qn): probability space
QuestionWhich measures qn have the entropy contraction property witha reasonable constant c?
c cannot be larger than O(1/n).Changing notation: WHEN
c
n·D(pn||qn) ≤
(D(pn||qn)−D(pnΓ||qn)
)for all pn ∈ P(X n)
?(5)
Equivalently: WHEN
D(pn||qn) ≤ (1− c
n) ·D(pnΓ||qn) ? (6)
Conditional relative entropy
DefinitionZ: measurable space,
V: another measurable space, π: probability measure on V,V : random variable, L(V ) = π
For v ∈ Vprobability measures on Z:µ(·|v) = L(Z|V = v) ν(·|v) = L(U |V = v)
Conditional relative entropy:
D(µ(·|V )||ν(·|V )
)= D
(Z|V )||U |V )
),∫VD(µ(·|v)||ν(·|v)
)dπ
(7)
Sufficient condition for entropy contraction in productspace
(X n, qn,Γ): probability space with Gibbs sampler,qn = L(Xn)pn = L(Y n): another distribution on X n
Recall: Γ = 1n
∑ni=1 Γi Yi = (Y1, . . . , Yi−1, Yi+1, . . . , Yn),
Proposition
c ·D(pn||qn) ≤n∑i=1
D(pi(·|Yi)||qi(·|Yi
)(∗)
=⇒
D(pnΓ||qn) ≤(1− c
n
)D(pn||qn).
(9)
If qn is a product measure then: c = 1.
NotationFor xn = (x1, x2, . . . , xn) ∈ X n:
xi , (x1, . . . , xi−1, xi+1, . . . , xn)
pi , L(Yi), qi , L(Xi)
Expansion formula
D(pn||qn) = D(pi||qi) +D(pi(·|Yi)||qi(·|Yi)
)=⇒
D(pn||qn)
=1
n·n∑i=1
D(pi||qi) +1
n·n∑i=1
D(pi(·|Yi)||qi(·|Yi)
)
Proof of Proposition
D(pn||qn)−D(pnΓ||qn)
≥ (by convexity of entropy)
D(pn||qn)− 1
n·n∑i=1
D(pi||qi)
=1
n
n∑i=1
D(pi(·|Yi||qi(·|Yi)
)(by assumption) ≥ c
nD(pn||qn)
Entropy contraction in DISCRETE product spaces
X finite
(Xn, qn,Γ)
Wanted: inequality
D(pn||qn) ≤ C·n∑i=1
D(pi(·|Yi||qi(·|Yi)
)for all pn ∈ P(Xn) (∗)
To get (*):use a Wassersteine-like distance.
A reverse Pinsker’s inequality
µ, ν: probability measures on X X finite
Notation: Variational distance
|µ− ν| = 1
2·∑x∈X|µ(x)− ν(x)|
LemmaSet
X+ , {x ∈ X : ν(x) > 0}, α , min{ν(x) : x ∈ X+
}Then
D(µ||ν) ≤ 4
α· |µ− ν|2
Follows from the inequality D(µ||ν) ≤∑
x∈X|µ(x)−ν(x)|2
ν(x)
A distance between measures on product spaces
Xn: product space
µn = L(Zn), νn = L(Un): probability measures on Xn
Definition (P. Massart)The square of the W2-distance of µn and νn:
W 22 (µn, νn) , min
n∑i=1
Pr2{Zi 6= Ui
}inf: on couplings of µn and νn.
NotationFor I ⊂ [1, n], pn = L(Y n) and yn ∈ X n:
yI , (yi : i ∈ I), yI , (yi : i /∈ I)
pI , L(YI), pI(·|yI) , L(YI |YI = yI)
For Theorem 1 we need the inequality
W 22
(pn||qn) ≤ C · E
n∑i=1
∣∣pi(·|Yi)− qi(·|Yi)∣∣2.in a MORE GENERAL FORM:
We require a bound for
W 22
(pI(·|yI), qI(·|yI)
)for ALL subsets I ⊂ [1, n] (not just I = [1, n]), and all yI .
Main Theorem
(X n, qn), X finite !
Theorem 1Set
α = min{qi(xi|xi) : qn(xn) > 0, 1 ≤ i ≤ n
}(10)
Fix a pn = L(Y n) ∈ P(X n); assume
qn(xn) = 0 =⇒ pn(xn) = 0. (11)
Main assumption:
W 22
(pI(·|yI), qI(·|yI))
≤ C · E{∑i∈I
∣∣pi(·|Yi)− qi(·|Yi)∣∣2 ∣∣∣∣ YI = yI
},
(12)
for all I ⊂ [1, n], yI ∈ X [1,n]\I .
Main Theorem continued
Assume all the inequalities
W 22
(pI(·|yI), qI(·|yI))
≤ C · E{∑i∈I
∣∣pi(·|Yi)− qi(·|Yi)∣∣2 ∣∣∣∣ YI = yI
},
(13)
where I ⊂ [1, n] and yI ∈ X [1,n]\I is fixed.Then
D(pn||qn) ≤
4C
α·n∑i=1
E∣∣pi(·|Yi)− qi(·|Yi)∣∣2
≤ 2C
α·n∑i=1
D(pi(·|Yi)||qi(·|Yi)
).
(14)
An analogous result for densities in Rn
Theoremf(xn) = exp(−V (xn) : density on Rn,qn : probability measure with density f .
Assume: conditional densities a f(xi|xi) satisfy alogarithmic Sobolev inequality with constant ρ, for all i, xi.Under some conditions on
1
ρ·Hess V (xn)
(expressing that V is not too far from being uniformly convex):
D(pn||qn) ≤ C ·n∑i=1
D(pi(·|Yi)||qi(·|Yi)
)for all pn
(C = C(qn))
Proof of Theorem 1By induction on n. Assume Theorem 1 for n− 1
Notation
pi(·|yi) , L(Yi|Yi = yi)
For every i ∈ [1, n]
D(pn||qn) = D(Y n||Xn) = D(Yi||Xi) +D(pi(·|Yi)||qi(·|Yi)
)=⇒
D(pn||qn) =1
n
n∑i=1
D(Yi||Xi) +1
n
n∑i=1
D(pi(·|Yi)|qi(·|Yi)
)(15)
By the induction hypothesis the second term is
≤(1− 1
n
)· 4C
α
n∑i=1
E∣∣pi(·|Yi)− qi(·|Yi)∣∣2
Proof of Theorem 1 Cont’dSecond term:
≤(1− 1
n
)· 4C
α
n∑i=1
E∣∣pi(·|Yi)− qi(·|Yi)∣∣2
First term:
1
n
n∑i=1
D(Yi||Xi) ≤ (by the Lemma)1
n· 4
α·∣∣L(Yi)− L(Xi)
∣∣2≤
n∑i=1
Pr2{Yi 6= Xi
}in any coupling of pn, qn
=1
n· 4
α·W 2
2 (pn, qn) for the best coupling
≤ (by the assumption of Theorem 1)
1
n· 4C
α·n∑i=1
∣∣pi(·|Yi)− qi(·|Yi)∣∣2(16)
Entropy contraction
(X n, qn,Γ), X finiteΓ: Gibbs sampler
Corollary 1If qn satisfies the conditions of Theorem 1 then
D(pnΓ||qn) ≤(
1− α
2nC
)·D(pn||qn). (17)
Logarithmic Sobolev inequality
NotationE : Dirichlet form associated with Γ isthe quadratic form
E(f, g) =⟨(Id− Γ)f, g
⟩qn
Definitionqn satisfies a logarithmic Sobolev inequalitywith constant c > 0 if:
c ·D(pn||qn) ≤ E(√
pn
qn,
√pn
qn
)for all pn ∈ P(X n)
Logarithmic Sobolev inequality for Gibbs sampler
(X n, qn,Γ), X finite
Corollary 2Under conditions of Theorem 1the logarithmic Sobolev inequality holds true:
1
n·D(pn||qn) ≤ 4C
α· EΓ
(√pn
qn,
√pn
qn
)=
4C
αn·n∑i=1
E(
1−(∑yi∈X
√pi(yi|Yi
)· qi(yi|Yi
))2).
(18)
=⇒ hypercontractivity
Application: Gibbs measures with Dobrushin’suniqueness condition
(X n, qn), X finite
Definitionqn satisfies (an L2-version of)Dobrushin’s uniqueness condition with coupling matrix
A =(ak,i)nk,i=1
, ai,i = 0,
if:
(i) max |qi(·|zi)− qi(·|si)| ≤ ak,i, k 6= i,
max : for all zi, si differing only in the k-th coordinate,(19)
and (ii)||A||2 < 1.
(X n, qn), X finite
Theorem 2Assume Dobrushin’s uniqueness condition with coupling matrix
A, ||A||2 < 1.
Then conditions of Theorem 1 hold with
C = 1/(1− ||A||
)2.
Thus
D(pn||qn) ≤ 4
α· 1(
1− ||A||)2 · n∑
i=1
E∣∣pi(·|Yi)− qi(·|Yi)∣∣2
≤ 2
α· 1(
1− ||A||)2 · n∑
i=1
D(pi(·|Yi)||qi(·|Yi)
),
(20)
Cont’d
ProofDobrushin’s uniqueness condition implies that Γ contractsW2-distance with rate
1− 1
n·(
1− ||A||2).
Application: Gibbs measures on Zd
Notation
Zd : d-dimensional lattice, i ∈ Zd: site
ρ(k, i) = max1≤ν≤d |kν − iν |: distance on Zd,
Λ ⊂⊂ Zd: finite set of sites
X finite: spin space
xZd
= (xi : i ∈ Zd) ∈ X Zd: spin configuration
X Zd: configuration space,
For xZd
and Λ ⊂ Zd
xΛ = (xi : i ∈ Λ), xΛ = (xi : i /∈ Λ),
xΛ is called an outside configuration for Λ.
Local specificatons
Definition
qΛ(·|xΛ), Λ ⊂⊂ Zd : conditional distributions on XΛ.Assume compatibility conditions.
There exists at least one probability measure q on the space ofconfigurations:
q = L(X) ∈ P(X Zd)
satisfying
L(XΛ|XΛ = xΛ) = qΛ(·|xΛ), all Λ ⊂⊂ Zd.
qΛ(·|xΛ): local specifications of q.
Finite range interactions
DefinitionThe local specifications have finite range of interactions if:there is an R > 0:
qΛ(·|xΛ) only depends on coordinates k /∈ Λ with
ρ(k,Λ) ≤ R.
q may not be uniquely defined by the local specifications, evenfor finite range interactions.
Dobrushin-Shlosman’s strong mixing condition
Given local specifications qΛ(·|xΛ).
Assumption
There exists a function ϕ(ρ) of the distance such that:(i) ∑
i∈Zd
ϕ(ρ(k, i)
)<∞,
and:(ii) for every
Λ ⊂⊂ Zd, M ⊂ Λ, k /∈ Λ
and everyyΛ, zΛ, differing only at k :∣∣qM (·|yΛ)− qM (·|zΛ)
∣∣ ≤ ϕ(ρ(k,M)).
In case of finite range interactions:
If Dobrushin-Shlosman’s strong mixing condition holds then
ϕ(ρ) = C · exp(−γ · ρ)
can be taken
Dobrushin-Shlosman’s strong mixing condition
Cont’dMeaning: The influence of the spin at k /∈ Λon the spins in M ⊂ Λ that are far away from kis small.
Essential:
The spins over Λ are fixed in two different ways.
The spins over Λ \M are not fixed.
Logarithmic Sobolev inequality for strongly mixingmeasures
Earlier results for the case of finite range interactions:D. Stroock, B. Zegarlinski 1992, F. Cesi 2001F. Martinelli, E. Olivieri
Theorem 3
(XΛ, qΛ(·|yΛ)) for fixed Λ and yΛ
α = min{qi(xi|xi) : qi(xi|xi) > 0
}.
(Finite range is not assumed.){Dobrushin-Shlosman’s strong mixing condition + {α > 0}
}=⇒conditions of Theorem 1 for qΛ(·|yΛ), with uniform constant=⇒logarithmic Sobolev inequality for qΛ(·|yΛ) with uniform constant
Logarithmic Sobolev inequality for strongly mixingmeasures
Cont’dThe proof uses a Gibbs sampler
updating cubes of size mdepending on the dimension and on the function ϕ(ρ).
We get
W 22
(pΛ, qΛ(·|yΛ)
)≤ Cm ·
∑I:m-sided cube
E∣∣pI∩Λ(·|YI∩Λ)− qI∩Λ(·|YI∩Λ)
∣∣2≤ Cm,α ·
∑i∈Λ
E∣∣pi(·|YΛ\i
)− qi
(·|YΛ\i, yΛ
)∣∣2for an appropriate m that is good enough for any Λ and yΛ.