Network Intelligence and Analysis Lab Network Intelligence and Analysis Lab Markov Chain Basic 2014.07.11 Sanghyuk Chun
Network Intelligence and Analysis Lab
Network Intelligence and Analysis Lab
Markov Chain Basic
2014.07.11
Sanghyuk Chun
Network Intelligence and Analysis Lab
• Exact Counting
• #P Complete
• Sampling and Counting
Previous Chapters
2
Network Intelligence and Analysis Lab
• Markov Chain Basic
• Ergodic MC has an unique stationary distribution
• Some basic concepts (Coupling, Mixing time)
• Coupling from past
• Coupling detail
• Ising Model
• Bounding Mixing time via Coupling
• Random spanning tree
• Path coupling framework
• MC for k-coloring graph
Remaining Chapters
3
Today!
Network Intelligence and Analysis Lab
• Introduce Markov Chain
• Show a potential algorithmic use of Markov Chain for sampling from complex distribution
• Prove that Ergodic Markov Chain always converge to unique stationary distribution
• Introduce Coupling techniques and Mixing time
In this chapter…
4
Network Intelligence and Analysis Lab
• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian in the following sense
• For all t, all 𝑥0, … , 𝑥𝑡 , 𝑦 ∈ Ω, we require
• Pr 𝑋𝑡+1 = 𝑦 𝑋0 = 𝑥0, … , 𝑋𝑡 = 𝑥𝑡 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥𝑡)
• The Markov property: “Memoryless”
Markov Chain
5
Network Intelligence and Analysis Lab
• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian
• Let’s Ω denote the set of shuffling (ex. 𝑋1=1,2,3,…,52)
• The next shuffling state only depends on previous shuffling state, or 𝑋𝑡 only depends on 𝑋𝑡+1
• Question 1: How can we uniformly shuffle the card?
• Question 2: Can we get fast uniform shuffling algorithm?
Example of Markov Chain (Card Shuffling)
6
Network Intelligence and Analysis Lab
• Transition Matrix
• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)
• Transitions are independent of the time (time-homogeneous)
Transition Matrix
7
Network Intelligence and Analysis Lab
• Transition Matrix
• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)
• Transitions are independent of the time (time-homogeneous)
• The t-step distribution is defined in the natural way
• 𝑃𝑡 𝑥, 𝑦 = 𝑃 𝑥, 𝑦 , 𝑡 = 1
𝑧∈Ω𝑃 𝑥, 𝑧 𝑃𝑡−1(𝑧, 𝑦) , 𝑡 > 1
Transition Matrix
8
Network Intelligence and Analysis Lab
• A distribution 𝜋 is a stationary distribution if it is invariant with respect to the transition matrix
•
for all 𝑦 ∈ Ω, 𝜋 𝑦 =
𝑥∈Ω
𝜋 𝑥 𝑃(𝑥, 𝑦)
• Theorem 1
• For a finite ergodic Markov Chain, there exists a unique stationary distribution 𝜋
• Proof?
Stationary Distribution
9
Network Intelligence and Analysis Lab
• A Markov Chain is ergodic if there exists t such that for all x, 𝑦 ∈ Ω, 𝑃 𝑥, 𝑦 𝑡 > 0
• It is possible to go from every state to every state (not necessarily in one move)
• For finite MC following conditions are equivalent to ergodicity
• Irreducible:
• For all 𝑥, 𝑦 ∈ Ω, there exists 𝑡 = 𝑡 𝑥, 𝑦 𝑠. 𝑡. 𝑃𝑡 𝑥, 𝑦 > 0
• Aperiodic:
• For all 𝑥 ∈ Ω, gcd 𝑡: 𝑃𝑡 𝑥, 𝑥 > 0 = 1
• Since regardless of their initial state, Ergodic MCs eventually reach a unique stationary distribution, EMCs are useful algorithmic tools
Ergodic Markov Chain
10
Network Intelligence and Analysis Lab
• Goal: we have a probability distribution we’d like to generate random sample from
• Solution via MC: If we can design an ergodic MC whose unique stationary distribution is desired distribution, we then run the chain and can get the distribution
• Example: sampling matching
Algorithmic usage of Ergodic Markov Chains
11
Network Intelligence and Analysis Lab
• For a graph 𝐺 = (𝑉, 𝐸), let Ω denote the set of matching of G
• We define a MC on Ω whose transitions are as• Choose an edge e uniformly at random from E
• Let,
𝑋′ = 𝑋𝑡 ∪ 𝑒, if 𝑒 ∉ 𝑋𝑡
𝑋𝑡\𝑒, if 𝑒 ∈ 𝑋𝑡
• If 𝑋′ ∈ Ω, then set 𝑋𝑡+1 = 𝑋′ with probability ½ ; otherwise set 𝑋𝑡+1 = 𝑋𝑡
• The MC is aperiodic (𝑃 𝑀,𝑀 ≥ 1/2 for all 𝑀 ∈ Ω)
• The MC is irreducible (via empty set) with symmetric transition probabilities
• Symmetric transition matrix has uniform stationary distribution
• Thus, the unique stationary distribution is uniform over all matching of G
Sampling Matching
12
Network Intelligence and Analysis Lab
• We will prove the theorem using the coupling technique and coupling Lemma
Proof of Theorem (introduction)
13
Network Intelligence and Analysis Lab
• For distribution 𝜇, 𝜈 on a finite set Ω, a distribution ω on Ω × Ωis a coupling if
• In other words, ω is a joint distribution whose marginal distributions are the appropriate distributions
• Variation distance between 𝜇, 𝜈 is defined as
Coupling Technique
14
Network Intelligence and Analysis Lab
Coupling Lemma
15
Network Intelligence and Analysis Lab
Proof of Lemma (a)
16
Network Intelligence and Analysis Lab
• For all 𝑧 ∈ Ω, let
• 𝜔 𝑧, 𝑧 = min{𝜇 𝑧 , 𝜈(𝑧)}
• 𝑑𝑇𝑉 = Pr(𝑋 ≠ 𝑌)
• We need to complete the construction of w in valid way
• For y, 𝑧 ∈ Ω, y ≠ 𝑧, let
• It is straight forward to verify that w is valid coupling
Proof of Lemma (b)
17
Network Intelligence and Analysis Lab
• Consider a pair of Markov chains 𝑋𝑡 , 𝑌𝑡 on Ω with transition matrices 𝑃𝑋, 𝑃𝑌 respectively
• Typically, MCs are identical in applications (𝑃𝑋 = 𝑃𝑌)
• The Markov chain Xt′ , Yt
′ on Ω × Ω is a Markovian coupling if
• For such a Markovian coupling we have variance distance as
• If we choose 𝑌0 as stationary distribution π then we have
• This shows how can we use coupling to bound the distance from stationary
Couplings for Markov Chain
18
Network Intelligence and Analysis Lab
• Create MCs 𝑋𝑡 , 𝑌𝑡, where initial 𝑋0, 𝑌0 are arbitrary state on Ω
• Create coupling for there chains in the following way
• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P
• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡
• By ergodicity, there exist 𝑡∗ s. t. for all 𝑥, 𝑦 ∈ Ω, 𝑃𝑡∗ 𝑥, 𝑦 ≥ 𝜖 > 0
• Therefore, for all 𝑋0, 𝑌0 ∈ Ω
• We can see similarly get step 𝑡∗ → 2𝑡∗
Proof of Theorem (1/4)
19
Network Intelligence and Analysis Lab
• Create coupling for there chains in the following way
• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P
• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡
• If once 𝑋𝑠 = 𝑌𝑠 , we have 𝑋𝑠′ = 𝑌𝑠′ for all 𝑠′ ≥ 𝑠
• From earlier observation,
Proof of Theorem (2/4)
20
Network Intelligence and Analysis Lab
• For integer k > 0,
• Therefore,
• Since Xt = 𝑌𝑡 , implies Xt+1 = 𝑌𝑡+1, for all 𝑡′ ≥ 𝑡, we have
• Note that coupling of MC we defined, defines a coupling of Xt, 𝑌𝑡. Hence by Coupling Lemma,
• This proves that from any initial points we reach same distribution
Proof of Theorem (3/4)
21
For all 𝑥, 𝑦 ∈ Ω
Network Intelligence and Analysis Lab
• From previous result, we proves there is a limiting distribution σ
• Question: is σ a stationary distribution? Or satisfies
for all 𝑦 ∈ Ω, 𝜎 𝑦 =
𝑥∈Ω
𝜎 𝑥 𝑃(𝑥, 𝑦)
Proof of Theorem (4/4)
22
Network Intelligence and Analysis Lab
• Convergence itself is guaranteed if MC is Ergodic MC
• However, it gives no indication to the convergence rate
• We define the mixing time 𝜏𝑚𝑖𝑥(𝜖) as the time until the chain is within variance distance ε from the worst initial state
• 𝜏𝑚𝑖𝑥 𝜖 = maxmin𝑋0∈Ω{ 𝑡: 𝑑𝑇𝑉 𝑃𝑡 𝑋0, ∙ , 𝜋 ≤ ϵ }
• To get efficient sampling algorithms (e.x. matching chain), we hope that mixing time is polynomial for input size
Markov Chains for Algorithmic Purpose: Mixing Time
23