Markov and Gibbs Random Fields Bruno Galerne [email protected] Université Paris Descartes UE Champs aléatoires, modélisation de textures (M2 TI)
Markov and Gibbs Random Fields
Bruno [email protected]
Université Paris Descartes
UE Champs aléatoires, modélisation de textures (M2 TI)
Outline
The Ising Model
Markov Random Fields and Gibbs Fields
Sampling Gibbs Fields
Reference
The main reference for this course is Chapter 4 of the book
Pattern Theory: The Stochastic Analysis of Real-World Signalsby D. Mumford and A. Desolneux[Mumford and Desolneux 2010]
Outline
The Ising Model
Markov Random Fields and Gibbs Fields
Sampling Gibbs Fields
The Ising ModelI The Ising model is the simplest and most famous Gibbs model.I It has its origin in statistical mechanics but can be presented in the
context of image segmentation.
Framework:I A real-valued image I ∈ RM×N of size M × N is given (with gray-levels
mostly in [−1, 1] after some contrast change).I One wants to associate a binary image J ∈ −1, 1M×N with values in−1, 1 (−1 for black, and 1 for white) supposed to model thepredominantly black/white areas of I.
I There is only a finite number of 2MN values for J.
Input image I A realization J of the associatedIsing model (with T = 1.5)
The Ising Model
Notation:
I ΩM,N = 0, . . . ,M − 1 × 0, . . . ,N − 1 is the set of pixel indexes.I For keeping the notation short we will denote by α or β pixel coordinates
(k , l), (m, n), so that I(α) = I(k , l) for some (k , l).I We note α ∼ β to mean that pixels α and β are neighbors for the
4-connectivity (each pixel has 4 neighbors, except at the border).
Energy:To each couple of images (I, J), I ∈ RM×N , J ∈ −1, 1M×N , one associatesthe energy E(I, J) defined by
E(I, J) = c∑
α∈ΩM,N
(I(α)− J(α))2 +∑α∼β
(J(α)− J(β))2
where c > 0 is a positive constant and the sum∑α∼β
means that each pair of
connected pixels is summed once (and not twice).
I The first term of the energy measures the similarity between I and J.I The second term measures the similarity between pixels of J that are
neighbors.
The Ising Model
Energy:
E(I, J) = c∑
α∈ΩM,N
(I(α)− J(α))2 +∑α∼β
(J(α)− J(β))2
The Ising model associated with I: For any fixed image I, this energyenables to define a discrete probability distribution on −1, 1M×N by
pT (J) =1
ZTe−
1T E(I,J),
where T > 0 is a constant called the temperature and ZT is the normalizingconstant
ZT =∑
J∈−1,1M×N
e−1T E(I,J).
The probability distribution pT is the Ising model associated with I (andconstant c and temperature T ).
The Ising Model
Energy:
E(I, J) = c∑
α∈ΩM,N
(I(α)− J(α))2 +∑α∼β
(J(α)− J(β))2
Probability distribution:
pT (J) =1
ZTe−
1T E(I,J)
I Minimizing with respect to J the energy E(I, J) is equivalent to find themost probable state of the discrete distribution pT .
I Note that this most probable state (also called the mode of thedistribution) is the same for all temperatures T .
I As T tends to 0, the distribution pT tends to be concentrated at thismode.
I As T tends to +∞, pT tends to be uniform over all the possible imageconfigurations.
I Hence the temperature parameter T controls the amount of allowedrandomness around the most probable state.
The Ising Model
The main questions regarding the Ising model are the following:
1. How can we sample efficiently from the distribution pT ? Although thedistribution is a discrete probability on the finite set of binary images−1, 1M×N , one cannot compute the table of the probability distributioneven for 10× 10 images ! Indeed, the memory size of the table would be
210×10︸ ︷︷ ︸Number of binary images
× 4︸︷︷︸4 bytes for a float
> 1030 Bytes = 1018 TeraBytes
2. How can we compute the common mode(s) of the distributions pT , thatis (one of) the most probable state of the distributions pT ?
3. What is the good order for the temperature value T ?
The Ising Model
Outline
The Ising Model
Markov Random Fields and Gibbs Fields
Sampling Gibbs Fields
A General Framework
I The Ising model is a very special case of a large class of stochasticmodels called Gibbs fields.
General framework:I We consider an arbitrary finite graph (V , E) with vertices V and
undirected edges E ⊂ V × V .I Each vertex has a random label in a phase space Λ (also supposed to
be finite, e.g. discrete gray-levels).I The set Ω = ΛV of all label configurations is called the state space.I The elements of Ω = ΛV are denoted by x , y ,. . . , that is, x = (xα)α∈V ,
xα ∈ Λ.
Example of the Ising model:
I The vertices V are the pixels of ΩM,N .I The edges are the ones of the 4-connectivity.I The phase space is Λ = −1, 1.I The state space is the set of binary images Ω = −1, 1M×N .
Markov Random Fields
Definition (Markov Random Field)A random variable X = (Xα)α∈V with values in the state space Ω = ΛV is aMarkov random field (MRF) if for any partition V = V1 ∪ V2 ∪ V3 of the set ofvertices V such that there is no edge between vertices of V1 and V3, anddenoting by Xi = X|Vi the restriction of X to Vi , i = 1, 2, 3, one has
P(X1 = x1|X2 = x2, X3 = x3) = P(X1 = x1|X2 = x2)
whenever both conditional probabilities are well defined, that is wheneverP(X2 = x2, X3 = x3) > 0.
I The above property is called the Markov property .I It is equivalent to saying that X1 and X3 are conditionally independent
given X2, that is,
P(X1 = x1, X3 = x3|X2 = x2) = P(X1 = x1|X2 = x2)P(X3 = x3|X2 = x2)
whenever the conditional probabilities are well defined, that is wheneverP(X2 = x2) > 0...
Conditional Independence
Proof: Markov property⇒ conditional independence
Assuming P(X2 = x2, X3 = x3) > 0 (since otherwise there is nothing toshow):
P(X1 = x1|X2 = x2)P(X3 = x3|X2 = x2)
= P(X1 = x1|X2 = x2, X3 = x3)P(X3 = x3|X2 = x2)
=P(X1 = x1, X2 = x2, X3 = x3)
P(X2 = x2, X3 = x3)P(X3 = x3|X2 = x2)
= P(X1 = x1, X3 = x3|X2 = x2)P(X2 = x2)
P(X2 = x2, X3 = x3)P(X3 = x3|X2 = x2)
= P(X1 = x1, X3 = x3|X2 = x2).
Gibbs Fields
Definition (Cliques)Given a graph (V , E), a subset C ⊂ V is a clique if any distinct vertices of Care linked with an edge of E :
∀α, β ∈ C, α 6= β ⇒ (α, β) ∈ E .
The set of cliques of a graph is denoted by C.
Example
1. The singletons α of V are always cliques.2. For the graph of the 4-connectivity, the cliques are
I the singletons,I the pairs of horizontal adjacent pixels,I the pairs of vertical adjacent pixels.
Gibbs Fields
Definition (Families of potentials)A family of function UC : Ω→ R, C ∈ C, is said to be a family of potentials ifeach function UC only depends of the restriction on the cliques C:
∀x , y ∈ Ω, x|C = y|C ⇒ UC(x) = UC(y).
Definition (Gibbs distribution and Gibbs field)A Gibbs distribution on the state space Ω = ΛV is a probability distribution Pthat comes from an energy function deriving from a family of potentials oncliques (UC)C∈C :
P(x) =1Z
e−E(x), where E(x) =∑C∈C
UC(x),
and Z is the normalizing constant Z =∑x∈Ω
e−E(x). A Gibbs field is a random
variable X = (Xα)α∈V ∈ Ω such that its law is a Gibbs distribution.
Gibbs Fields
Gibbs distribution:
P(x) =1Z
e−E(x), where E(x) =∑C∈C
UC(x).
ExampleThe distribution of the Ising model is given by
pT (J) =1
ZTe−
1T E(I,J), where
E(I, J) = c∑
α∈ΩM,N
(I(α)− J(α))2 +∑α∼β
(J(α)− J(β))2.
Hence it is a Gibbs distribution that derives from the family of potentials
Uα =cT
(I(α)− J(α))2, α ∈ V ,
andUα,β =
1T
(J(α)− J(β))2, (α, β) ∈ E .
Gibbs Fields
Gibbs distribution:
P(x) =1Z
e−E(x), where E(x) =∑C∈C
UC(x).
Remark: One can always introduce a temperature parameter T : For allT > 0,
PT (x) =1
ZTe−
1T E(x)
is also a Gibbs distribution.
The Hammersley-Clifford Theorem
Theorem (Hammersley-Clifford)Gibbs Fields and MRF are equivalent in the following sense:
1. If X is a Gibbs field then it satisfies the Markov property.
2. If P is the distribution of a MRF such that P(x) > 0 for all x ∈ Ω, thenthere exists an energy function E deriving from a family of potentials(UC)C∈C such that
P(x) =1Z
e−E(x).
Take-away message:
Gibbs fields and Markov random fields are essentially the same thing.
Proof of Gibbs fields are MRFI Let X be a Gibbs field with distribution P(x) = 1
Z e−E(x), whereE(x) =
∑C∈C
UC(x).
I Let V = V1 ∪ V2 ∪ V3 be any partition of the set of vertices V such thatthere is no edge between vertices of V1 and V3.
I For each i = 1, 2, 3, one denotes by xi = x|Vi the restriction of a state xto Vi .
I Remark that a clique C ∈ C cannot intersect both V1 and V3, that is,either C ⊂ V1 ∪ V2 or C ⊂ V2 ∪ V3.
I Since UC(x) only depends on the restriction of x to C, UC(x) eitherdepends on (x1, x2) or (x2, x3).
One can thus write
E(x) = E1(x1, x2) + E2(x2, x3) and thus P(x) = F1(x1, x2)F2(x2, x3).
Hence
P(x1|x2, x3) =P(x1, x2, x3)
P(x2, x3)=
P(x1, x2, x3)∑y∈ΛV1 P(y , x2, x3)
=F1(x1, x2)F2(x2, x3)∑
y∈ΛV1 F1(y , x2)F2(x2, x3)
=F1(x1, x2)∑
y∈ΛV1 F1(y , x2)
=P(x1, x2)
P(x2)= P(x1|x2).
The Hammersley-Clifford Theorem
I Hence a Gibbs field satisfies the Markov property, it is a MRF.I The difficult part of the Hammersley-Clifford theorem is the converse
implication !I It solely relies on the Möbius inversion formula.
Proposition (Möbius inversion formula)Let V be a finite subset and f and g be two functions defined on the set P(V )of subsets of V . Then,
∀A ⊂ V , f (A) =∑B⊂A
(−1)|A\B|g(B) ⇐⇒ ∀A ⊂ V , g(A) =∑B⊂A
f (B).
Proof of Möbius inversion formulaSimple fact: Let n be an integer, then
n∑k=0
(nk
)(−1)k =
1 if n = 0,0 if n > 0,
(since it is equal to (1 + (−1))n !)
Proof of ∀A ⊂ V , f (A) =∑B⊂A
(−1)|A\B|g(B) ⇒ ∀A ⊂ V , g(A) =∑B⊂A
f (B):
Let A ⊂ V . Then,∑B⊂A
f (B) =∑B⊂A
∑D⊂B
(−1)|B\D|g(D)
=∑D⊂A
∑B⊃D
(−1)|B\D|g(D)
=∑D⊂A
∑E⊂A\D
(−1)|E|g(D)
=∑D⊂A
g(D)∑
E⊂A\D
(−1)|E|
=∑D⊂A
g(D)
|A\D|∑k=0
(|A \ D|
k
)(−1)k
︸ ︷︷ ︸=0 if |A \ D| > 0 and 1 if D = A
= g(A).
Proof of Möbius inversion formula
Proof of ∀A ⊂ V , g(A) =∑B⊂A
f (B) ⇒ ∀A ⊂ V , f (A) =∑B⊂A
(−1)|A\B|g(B):
Let A ⊂ V . Then,∑B⊂A
(−1)|A\B|g(B) =∑B⊂A
(−1)|A\B|∑D⊂B
f (D)
=∑D⊂A
∑B⊃D
(−1)|A\B|f (D)
=∑D⊂A
∑E⊂A\D
(−1)|A\(E∪D)|
=∑D⊂A
f (D)∑
E⊂A\D
(−1)|A|−|E|−|D|
=∑D⊂A
f (D)(−1)|A|−|D|∑
E⊂A\D
(−1)|E|
=∑D⊂A
f (D)(−1)|A|−|D||A\D|∑k=0
(−1)k
︸ ︷︷ ︸=0 if |A \ D| > 0 and 1 if D = A
= f (A).
Proof of MRF with Positive Probability Are Gibbs FieldsNotation:
I Let λ0 ∈ Λ denote a fixed value in the phase space Λ.I For a subset A ⊂ V , and a configuration x ∈ Ω = ΛV , we denote by xA
the configuration that is the same as x at each site of A and is λ0
elsewhere, that is,
(xA)α =
xα if α ∈ A,λ0 otherwise.
I In particular, we have xV = x and we denote x0 = x∅ the configurationwith the value λ0 at all the sites.
Candidate energy:I Remark that if P is a Gibbs distribution with energy E , then,
P(x) =1Z
e−E(x) ⇒ E(x) = logP(x0)
P(x)+ E(x0).
Besides, the constant E(x0) has no influence since it can be included inZ .
I Hence given our Markov distribution P, we now define a functionE : Ω→ R by
E(x) = logP(x0)
P(x)(P(x) > 0 by hypothesis).
Proof of MRF with Positive Probability Are Gibbs FieldsCandidate energy: E(x) = log
P(x0)
P(x)so that P(x) = P(x0)e−E(x).
Goal: Show that this energy E derives from a family of potentials on cliques(UC)C∈C : For all x ∈ Ω,
E(x) =∑C∈C
UC(x).
I Let x be a given configuration.I For all subset A ⊂ V , we define
UA(x) =∑B⊂A
(−1)|A\B|E(xB).
I By the Möbius inversion formula, we have, for all A ⊂ V ,
E(xA) =∑B⊂A
UB(x).
I In particular, taking A = V , we get E(x) =∑A⊂V
UA(x) and thus
P(x) = P(x0)e−E(x) = P(x0) exp
(−∑A⊂V
UA(x)
).
Proof of MRF with Positive Probability Are Gibbs Fields
P(x) = P(x0)e−E(x) = P(x0) exp
(−∑A⊂V
UA(x)
).
I P has nearly the targeted form !
I UA(x) =∑B⊂A
(−1)|A\B|E(xB) only depends on the values of x at the sites
of A.I It only remains to show that UA(x) = 0 if A is not a clique.I Indeed, in this case the energy
E(x) =∑A⊂V
UA(x) =∑C∈C
UC(x)
derives from a family of potential, that is, P is a Gibbs field.I The proof of UA(x) = 0 if A is not a clique will use the Markov property
satisfied by P... (not used yet)
Proof of MRF with Positive Probability Are Gibbs Fields
Proof of UA(x) = 0 if A is not a clique:
I Let A ⊂ V such that A is not a clique, that is A /∈ C.I Then, A contains two vertices α and β that are not neighbors in the
graph (V , E).
UA(x) =∑B⊂A
α∈B, β∈B
(−1)|A\B|E(xB) +∑B⊂A
α/∈B, β /∈B
(−1)|A\B|E(xB)
+∑B⊂A
α∈B, β /∈B
(−1)|A\B|E(xB) +∑B⊂A
α/∈B, β∈B
(−1)|A\B|E(xB)
=∑
B⊂A\α,β
(−1)|A\B|E(x(B ∪ α, β)) +∑
B⊂A\α,β
(−1)|A\B|E(xB)
−∑
B⊂A\α,β
(−1)|A\B|E(x(B ∪ α))−∑
B⊂A\α,β
(−1)|A\B|E(x(B ∪ β))
=∑
B⊂A\α,β
(−1)|A\B|
E(x(B ∪ α, β)) + E(xB)− E(x(B ∪ α))− E(x(B ∪ β))︸ ︷︷ ︸Let us show that it is = 0
.
Proof of MRF with Positive Probability Are Gibbs FieldsProof of UA(x) = 0 if A is not a clique:Goal: E(x(B ∪ α, β)) + E(xB)− E(x(B ∪ α))− E(x(B ∪ β)) = 0Since
E(x) = logP(x0)
P(x)
E(x(B ∪ α, β)) + E(xB)− E(x(B ∪ α))− E(x(B ∪ β))
= logP(x(B ∪ α))P(x(B ∪ β))
P(x(B ∪ α, β))P(xB).
I Let us use the conditional independence with the partition V1 = α,V2 = V \ α, β, and V3 = β.
I Remark that the four states x(B ∪ α), x(B ∪ β), x(B ∪ α, β), andxB have the same restriction x2 to V2 = V \ α, β.
P(x(B ∪ α, β)) = P(Xα = xα, Xβ = xβ , X2 = x2)
= P(Xα = xα, Xβ = xβ |X2 = x2)P(X2 = x2)
= P(Xα = xα|X2 = x2)P(Xβ = xβ |X2 = x2)P(X2 = x2).
Similarly
P(xB) = P(Xα = λ0|X2 = x2)P(Xβ = λ0|X2 = x2)P(X2 = x2)
P(x(B ∪ α)) = P(Xα = xα|X2 = x2)P(Xβ = λ0|X2 = x2)P(X2 = x2)
P(x(B ∪ β)) = P(Xα = λ0|X2 = x2)P(Xβ = xβ |X2 = x2)P(X2 = x2)
Proof of MRF with Positive Probability Are Gibbs Fields
Proof of UA(x) = 0 if A is not a clique:Goal: E(x(B ∪ α, β)) + E(xB)− E(x(B ∪ α))− E(x(B ∪ β)) = 0Hence,
P(x(B ∪ α))P(x(B ∪ β))
P(x(B ∪ α, β))P(xB)= 1
which shows that
E(x(B ∪ α, β)) + E(xB)− E(x(B ∪ α))− E(x(B ∪ β))
= logP(x(B ∪ α))P(x(B ∪ β))
P(x(B ∪ α, β))P(xB)= 0
and thus UA(x) = 0.
Outline
The Ising Model
Markov Random Fields and Gibbs Fields
Sampling Gibbs Fields
Considered ProblemsGibbs fields pose several difficult computational problems
1. Compute samples from this distribution,
2. Compute the mode, the most probable state x = argmax P(x) (if it isunique...),
3. Compute the marginal distribution on each component Xα for some fixedα ∈ V .
I The simplest method to sample from a Gibbs field is to use a type ofMonte Carlo Markov Chain (MCMC) known as the MetropolisAlgorithm.
I The main idea of Metropolis algorithms are to let evolve a Markov chaintaking values in the space of configurations Ω = ΛV .
I Running a sampler for a reasonable length of time enables one toestimate the marginals on each component Xα or the mean of otherrandom functions of the state.
I Simulated annealing: Samples at very low temperatures will clusteraround the mode of the field. But the Metropolis algorithm takes a verylong time to give good samples if the temperature is too low. Simulatedannealing consists in starting the Metropolis algorithm at sufficiently hightemperatures and gradually lowering the temperature.
Recalls on Markov ChainsWe refer to the textbook [Brémaud 1998] for a complete reference on Markovchains. The main reference for this section is [Mumford and Desolneux 2010].
Definition (Markov Chain)Let Ω be a finite set of states. A sequence (Xn)n∈N of random variables takingvalues in Ω is said to be a Markov chain if it satisfies the Markov condition:For all n ≥ 1 and x0, x1, . . . , xn ∈ Ω,
P(Xn = xn|X0 = x0,X1 = x, . . . ,Xn−1 = xn−1) = P(Xn = xn|Xn−1 = xn−1),
whenever both conditional probabilities are well-defined, that is, wheneverP(X0 = x0,X1 = x, . . . ,Xn−1 = xn−1) > 0.Moreover the Markov chain (Xn)n∈N is said homogeneous if the probabilitydoes not depend on n, that is, for all n ≥ 1, and x , y ∈ Ω,
P(Xn = y |Xn−1 = x) = P(X1 = y |X0 = x).
In what follows Markov chains will always be assumed homogeneous. In thiscase, the transition matrix Q of the chain is defined as the |Ω| × |Ω| matrixof all transition probabilities
Q(x , y) = qx→y = P(X1 = y |X0 = x)[
= P(Xn = y |Xn−1 = x), ∀n].
Recalls on Markov ChainsPropositionFor all n ≥ 1, the matrix Qn = Q × · · · ×Q gives the law of Xn given the initialstate X0, that is,
P(Xn = y |X0 = x) = Qn(x , y).
Proof.By induction. This is true for n = 1. Suppose it is true for some n ≥ 1. Then,
P(Xn+1 = y |X0 = x) =∑z∈Ω
P(Xn+1 = y ,Xn = z|X0 = x)
=∑z∈Ω
P(Xn+1 = y ,Xn = z,X0 = x)
P(X0 = x)
=∑z∈Ω
P(Xn+1 = y ,Xn = z,X0 = x)
P(Xn = z,X0 = x)
P(Xn = z,X0 = x)
P(X0 = x)
=∑z∈Ω
P(Xn+1 = y |Xn = z,X0 = x)P(Xn = z|X0 = x)
=∑z∈Ω
P(Xn+1 = y |Xn = z)Qn(x , z)
=∑z∈Ω
Qn(x , z)Q(z, y) = Qn+1(x , y).
Recalls on Markov Chains
DefinitionWe say that the Markov chain is irreducible if for all x , y ∈ Ω, there existsn ≥ 0 such that Qn(x , y) > 0.
I If a Markov chain is irreducible then every pair of states can be“connected” by the chain.
I Some Markov chains can have periodic behavior in visiting in aprescribed order all the states of Ω. This is generally a behavior that isbest avoided.
DefinitionWe say that a Markov chain is aperiodic if for all x ∈ Ω the greatest commondivisor of all the n ≥ 1 such that Qn(x , x) > 0 is 1.
Equilibrium probability distributionDefinitionA probability distribution Π on Ω is an equilibrium probability distributionof Q if
∀y ∈ Ω,∑x∈Ω
Π(x)Q(x , y) = Π(y).
Interpretation: If the initial state X0 follows the distribution Π, then all therandom variables Xn also follow the distribution Π:
P(X1 = y) =∑x∈Ω
P(X1 = y |X0 = x)P(X0 = x) =∑x∈Ω
Q(x , y)Π(x) = Π(y).
TheoremIf Q is irreducible, then there exists a unique equilibrium probabilitydistribution Π for Q. If moreover Q is aperiodic, then
∀x , y ∈ Ω, limn→+∞
Qn(x , y) = Π(y).
Consequence: If X0 follows any distribution P0, then, for all y ∈ Ω,
P(Xn = y) =∑x∈Ω
P(Xn = y |X0 = x)P(X0 = x)
=∑x∈Ω
Qn(x , y)P0(x) −→n→+∞
∑x∈Ω
Π(y)P0(x) = Π(y).
Equilibrium probability distribution
Consequence:P(Xn = y) −→
n→+∞Π(y)
I Whatever the distribution of the initial state X0, as n tends to +∞, thesequence (Xn)n∈N converges in distribution to Π.
Approximate simulation of the distribution Π:
1. Draw randomly an initial state x0 (with some distribution P0).
2. For n = 1 to N (N “large”), compute xn by simulating the discretedistribution
P(Xn|Xn−1 = xn−1) = Q(xn−1, ·).
3. Return the random variate xN .
This procedure is called a Monte Carlo Markov Chain (MCMC) method and isat the heart of the Metropolis algorithm to simulate Gibbs fields.
Metropolis algorithm
Framework:I Ω a finite state space (Ω = ΛV in our case).I Given any energy function E : Ω→ R, we define a probability distribution
on Ω at all temperatures T via the formula:
pT (x) =1
ZTe−
1T E(x), where ZT =
∑x∈Ω
e−1T E(x).
I Idea: Simulate a Markov chain (Xn)n≥0, Xn ∈ Ω, such that theequilibrium probability distribution is PT , so that, as n tends to +∞, thedistribution of Xn will tend to PT .
I We start with some simple Markov chain, given by a transition matrixQ(x , y) = (qx→y )(x,y)∈Ω2 , qx→y being the probability of the transition froma state x ∈ Ω to a state y ∈ Ω.
I Hypotheses on Q:I Q is symmetric (qx→y = qy→x ),I Q is irreducible,I Q is aperiodic.
Metropolis algorithm
DefinitionThe Metropolis transition matrix associated with PT and Q is the matrix Pdefined by
P(x , y) = px→y =
qx→y if x 6= y and PT (y) ≥ PT (x),PT (y)
PT (x)qx→y if x 6= y and PT (y) ≤ PT (x),
1−∑z 6=x
px→z if y = x .
TheoremThe Metropolis transition matrix P(x , y) = (px→y ) is irreducible and aperiodicif Q(x , y) = (qx→y ) is. In addition, the equilibrium probability distribution ofP(x , y) = (px→y ) is PT , that is,
∀x , y ∈ Ω, limn→+∞
Pn(x , y) = PT (y).
Metropolis algorithmProof: Since PT (y) 6= 0 for all y ∈ Ω,
Q(x , y) 6= 0⇒ P(x , y) 6= 0,
and by induction,Qn(x , y) 6= 0⇒ Pn(x , y) 6= 0.
I As Q, P is irreducible and aperiodic.I In consequence, P has a unique equilibrium probability distribution.
We now prove that there is “detailed balance”, which will imply that PT is theequilibrium probability distribution of P:
∀x , y ∈ Ω, PT (x)px→y = PT (y)py→x .
Let x , y ∈ Ω. If x = y , there is obvious. If x 6= y , we can suppose thatPT (x) ≥ PT (y). Then,
PT (x)px→y = PT (x)PT (y)
PT (x)qx→y = PT (y)qx→y = PT (y)qy→x = PT (y)py→x .
Summing this equality over all x , one gets for all y ∈ Ω,∑x∈Ω
PT (x)px→y = PT (y)∑x∈Ω
py→x = PT (y),
i.e. PT is the equilibrium probability distribution of the transition matrix P.
Metropolis algorithm
Practical aspect: Choice of the transition matrix Q
I The uniform change satisfies the hypotheses
∀x , y ∈ Ω, qx→y =1|Ω| .
However, this choice is not practical since one attempts at randomlydraw a structured image from a white noise image.
I A more practical transition matrix is the one corresponding to thefollowing procedure: one selects randomly a pixel/vertex α ∈ V , and onedraws uniformly a new value λ ∈ Λ \ xα for this vertex. Hence,
qx→y =
1
|V |(|Λ|−1)if x and y differs on exactly one vertex/pixel,
0 otherwise.
I One of the main problem of the Metropolis algorithm is that manyproposed moves x → y will not be accepted.
I The Gibbs sampler attempts to reduce this problem.
Gibbs Sampler Algorithm
I Let Ω = ΛV be the Gibbs state space.I The Gibbs sampler selects one site α ∈ V at random and then picks a
new value λ for xα with probability distribution given by PT , conditionedon the values of the state at all other vertices:
px→y =
1|V |PT (Xα = yα|XV\α = xV\α) if y differs from x only at α,∑α∈V
1|V |PT (Xα = xα|XV\α) if y = x ,
0 otherwise.
I The transition matrix P satisfies the same property as the one of theMetropolis algorithm (Exercice).
Bibliographic references I
P. Brémaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, andQueues, Springer, 1998
D. Mumford and A. Desolneux, Pattern Theory: The Stochastic Analysisof Real-World Signals, Ak Peters Series, 2010