Top Banner
Network Intelligence and Analysis Lab Network Intelligence and Analysis Lab Markov Chain Basic 2014.07.11 Sanghyuk Chun
23

Markov Chain Basic

Jul 11, 2015

Download

Technology

SangHyuk Chun
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Markov Chain Basic

Network Intelligence and Analysis Lab

Network Intelligence and Analysis Lab

Markov Chain Basic

2014.07.11

Sanghyuk Chun

Page 2: Markov Chain Basic

Network Intelligence and Analysis Lab

• Exact Counting

• #P Complete

• Sampling and Counting

Previous Chapters

2

Page 3: Markov Chain Basic

Network Intelligence and Analysis Lab

• Markov Chain Basic

• Ergodic MC has an unique stationary distribution

• Some basic concepts (Coupling, Mixing time)

• Coupling from past

• Coupling detail

• Ising Model

• Bounding Mixing time via Coupling

• Random spanning tree

• Path coupling framework

• MC for k-coloring graph

Remaining Chapters

3

Today!

Page 4: Markov Chain Basic

Network Intelligence and Analysis Lab

• Introduce Markov Chain

• Show a potential algorithmic use of Markov Chain for sampling from complex distribution

• Prove that Ergodic Markov Chain always converge to unique stationary distribution

• Introduce Coupling techniques and Mixing time

In this chapter…

4

Page 5: Markov Chain Basic

Network Intelligence and Analysis Lab

• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian in the following sense

• For all t, all 𝑥0, … , 𝑥𝑡 , 𝑦 ∈ Ω, we require

• Pr 𝑋𝑡+1 = 𝑦 𝑋0 = 𝑥0, … , 𝑋𝑡 = 𝑥𝑡 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥𝑡)

• The Markov property: “Memoryless”

Markov Chain

5

Page 6: Markov Chain Basic

Network Intelligence and Analysis Lab

• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian

• Let’s Ω denote the set of shuffling (ex. 𝑋1=1,2,3,…,52)

• The next shuffling state only depends on previous shuffling state, or 𝑋𝑡 only depends on 𝑋𝑡+1

• Question 1: How can we uniformly shuffle the card?

• Question 2: Can we get fast uniform shuffling algorithm?

Example of Markov Chain (Card Shuffling)

6

Page 7: Markov Chain Basic

Network Intelligence and Analysis Lab

• Transition Matrix

• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)

• Transitions are independent of the time (time-homogeneous)

Transition Matrix

7

Page 8: Markov Chain Basic

Network Intelligence and Analysis Lab

• Transition Matrix

• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)

• Transitions are independent of the time (time-homogeneous)

• The t-step distribution is defined in the natural way

• 𝑃𝑡 𝑥, 𝑦 = 𝑃 𝑥, 𝑦 , 𝑡 = 1

𝑧∈Ω𝑃 𝑥, 𝑧 𝑃𝑡−1(𝑧, 𝑦) , 𝑡 > 1

Transition Matrix

8

Page 9: Markov Chain Basic

Network Intelligence and Analysis Lab

• A distribution 𝜋 is a stationary distribution if it is invariant with respect to the transition matrix

for all 𝑦 ∈ Ω, 𝜋 𝑦 =

𝑥∈Ω

𝜋 𝑥 𝑃(𝑥, 𝑦)

• Theorem 1

• For a finite ergodic Markov Chain, there exists a unique stationary distribution 𝜋

• Proof?

Stationary Distribution

9

Page 10: Markov Chain Basic

Network Intelligence and Analysis Lab

• A Markov Chain is ergodic if there exists t such that for all x, 𝑦 ∈ Ω, 𝑃 𝑥, 𝑦 𝑡 > 0

• It is possible to go from every state to every state (not necessarily in one move)

• For finite MC following conditions are equivalent to ergodicity

• Irreducible:

• For all 𝑥, 𝑦 ∈ Ω, there exists 𝑡 = 𝑡 𝑥, 𝑦 𝑠. 𝑡. 𝑃𝑡 𝑥, 𝑦 > 0

• Aperiodic:

• For all 𝑥 ∈ Ω, gcd 𝑡: 𝑃𝑡 𝑥, 𝑥 > 0 = 1

• Since regardless of their initial state, Ergodic MCs eventually reach a unique stationary distribution, EMCs are useful algorithmic tools

Ergodic Markov Chain

10

Page 11: Markov Chain Basic

Network Intelligence and Analysis Lab

• Goal: we have a probability distribution we’d like to generate random sample from

• Solution via MC: If we can design an ergodic MC whose unique stationary distribution is desired distribution, we then run the chain and can get the distribution

• Example: sampling matching

Algorithmic usage of Ergodic Markov Chains

11

Page 12: Markov Chain Basic

Network Intelligence and Analysis Lab

• For a graph 𝐺 = (𝑉, 𝐸), let Ω denote the set of matching of G

• We define a MC on Ω whose transitions are as• Choose an edge e uniformly at random from E

• Let,

𝑋′ = 𝑋𝑡 ∪ 𝑒, if 𝑒 ∉ 𝑋𝑡

𝑋𝑡\𝑒, if 𝑒 ∈ 𝑋𝑡

• If 𝑋′ ∈ Ω, then set 𝑋𝑡+1 = 𝑋′ with probability ½ ; otherwise set 𝑋𝑡+1 = 𝑋𝑡

• The MC is aperiodic (𝑃 𝑀,𝑀 ≥ 1/2 for all 𝑀 ∈ Ω)

• The MC is irreducible (via empty set) with symmetric transition probabilities

• Symmetric transition matrix has uniform stationary distribution

• Thus, the unique stationary distribution is uniform over all matching of G

Sampling Matching

12

Page 13: Markov Chain Basic

Network Intelligence and Analysis Lab

• We will prove the theorem using the coupling technique and coupling Lemma

Proof of Theorem (introduction)

13

Page 14: Markov Chain Basic

Network Intelligence and Analysis Lab

• For distribution 𝜇, 𝜈 on a finite set Ω, a distribution ω on Ω × Ωis a coupling if

• In other words, ω is a joint distribution whose marginal distributions are the appropriate distributions

• Variation distance between 𝜇, 𝜈 is defined as

Coupling Technique

14

Page 15: Markov Chain Basic

Network Intelligence and Analysis Lab

Coupling Lemma

15

Page 16: Markov Chain Basic

Network Intelligence and Analysis Lab

Proof of Lemma (a)

16

Page 17: Markov Chain Basic

Network Intelligence and Analysis Lab

• For all 𝑧 ∈ Ω, let

• 𝜔 𝑧, 𝑧 = min{𝜇 𝑧 , 𝜈(𝑧)}

• 𝑑𝑇𝑉 = Pr(𝑋 ≠ 𝑌)

• We need to complete the construction of w in valid way

• For y, 𝑧 ∈ Ω, y ≠ 𝑧, let

• It is straight forward to verify that w is valid coupling

Proof of Lemma (b)

17

Page 18: Markov Chain Basic

Network Intelligence and Analysis Lab

• Consider a pair of Markov chains 𝑋𝑡 , 𝑌𝑡 on Ω with transition matrices 𝑃𝑋, 𝑃𝑌 respectively

• Typically, MCs are identical in applications (𝑃𝑋 = 𝑃𝑌)

• The Markov chain Xt′ , Yt

′ on Ω × Ω is a Markovian coupling if

• For such a Markovian coupling we have variance distance as

• If we choose 𝑌0 as stationary distribution π then we have

• This shows how can we use coupling to bound the distance from stationary

Couplings for Markov Chain

18

Page 19: Markov Chain Basic

Network Intelligence and Analysis Lab

• Create MCs 𝑋𝑡 , 𝑌𝑡, where initial 𝑋0, 𝑌0 are arbitrary state on Ω

• Create coupling for there chains in the following way

• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P

• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡

• By ergodicity, there exist 𝑡∗ s. t. for all 𝑥, 𝑦 ∈ Ω, 𝑃𝑡∗ 𝑥, 𝑦 ≥ 𝜖 > 0

• Therefore, for all 𝑋0, 𝑌0 ∈ Ω

• We can see similarly get step 𝑡∗ → 2𝑡∗

Proof of Theorem (1/4)

19

Page 20: Markov Chain Basic

Network Intelligence and Analysis Lab

• Create coupling for there chains in the following way

• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P

• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡

• If once 𝑋𝑠 = 𝑌𝑠 , we have 𝑋𝑠′ = 𝑌𝑠′ for all 𝑠′ ≥ 𝑠

• From earlier observation,

Proof of Theorem (2/4)

20

Page 21: Markov Chain Basic

Network Intelligence and Analysis Lab

• For integer k > 0,

• Therefore,

• Since Xt = 𝑌𝑡 , implies Xt+1 = 𝑌𝑡+1, for all 𝑡′ ≥ 𝑡, we have

• Note that coupling of MC we defined, defines a coupling of Xt, 𝑌𝑡. Hence by Coupling Lemma,

• This proves that from any initial points we reach same distribution

Proof of Theorem (3/4)

21

For all 𝑥, 𝑦 ∈ Ω

Page 22: Markov Chain Basic

Network Intelligence and Analysis Lab

• From previous result, we proves there is a limiting distribution σ

• Question: is σ a stationary distribution? Or satisfies

for all 𝑦 ∈ Ω, 𝜎 𝑦 =

𝑥∈Ω

𝜎 𝑥 𝑃(𝑥, 𝑦)

Proof of Theorem (4/4)

22

Page 23: Markov Chain Basic

Network Intelligence and Analysis Lab

• Convergence itself is guaranteed if MC is Ergodic MC

• However, it gives no indication to the convergence rate

• We define the mixing time 𝜏𝑚𝑖𝑥(𝜖) as the time until the chain is within variance distance ε from the worst initial state

• 𝜏𝑚𝑖𝑥 𝜖 = maxmin𝑋0∈Ω{ 𝑡: 𝑑𝑇𝑉 𝑃𝑡 𝑋0, ∙ , 𝜋 ≤ ϵ }

• To get efficient sampling algorithms (e.x. matching chain), we hope that mixing time is polynomial for input size

Markov Chains for Algorithmic Purpose: Mixing Time

23