Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Coupling of Markov Chains

Andreas Klappenecker

Texas A&M University

© 2018 by Andreas Klappenecker. All rights reserved.

1 / 42

Shuffling Cards

Card Shuffling

Let us consider the following simple procedure to shuffle n cards.Select a card uniformly at random at put it on the top of the deck.Repeat this step.

ObservationsThis shuffling process is a Markov chain. Any of the n!permutations can be reached from any permutation, so the chain isirreducible. Since with probability 1{n the state remains the same,each state is aperiodic, so the Markov chain is aperiodic. Hence thechain has a unique stationary distribution.

2 / 42

Shuffling CardsQuestion

What is the stationary distribution of the shuffling Markov chain?

AnswerThe uniform distribution is the stationary distribution on the Markov chain.Indeed, the stationary distribution π satisfies πP “ π. More explicitly, if x is astate of the chain and Npxq the set of states that can reach x in the next step,then

n “ |Npxq|,

since the top card in x could have been in n different positions. Thus, we have

πx “1

n

ÿ

yPNpxq

πy .

Since the uniform distribution satisfies these equations, it must coincide with π.

3 / 42

Shuffling CardsQuestion

What is the stationary distribution of the shuffling Markov chain?

AnswerThe uniform distribution is the stationary distribution on the Markov chain.Indeed, the stationary distribution π satisfies πP “ π. More explicitly, if x is astate of the chain and Npxq the set of states that can reach x in the next step,then

n “ |Npxq|,

since the top card in x could have been in n different positions. Thus, we have

πx “1

n

ÿ

yPNpxq

πy .

Since the uniform distribution satisfies these equations, it must coincide with π.

3 / 42

Key Question

Question

We know that the stationary distribution is the limiting distributionof the Markov chain. So eventually the states will be uniformlydistributed. But we would like to shuffle the cards just a finitenumber of times.

How many times should we shuffle until the distribution is close touniform?

4 / 42

Total Variation Distance

Definition

If p “ pp0, p1, . . . , pn´1q and q “ pq0, q1, . . . , qn´1q are probabilitydistributions on a finite state space, then

dTV pp, qq “1

2

n´1ÿ

k“0

|pk ´ qk |

is called the total variation distance between p and q.

In general, 0 ď dTV pp, qq ď 1. If p “ q, then dTV pp, qq “ 0.

5 / 42

Total Variation Distance

Proposition

Let p1 and p2 be discrete probability distributions on a set S . Forany subset A of S , we define

pipAq “ÿ

xPA

pipxq.

ThendTV pp1, p2q “ max

APPpSq|p1pAq ´ p2pAq|.

6 / 42

Proof.

Let S˘ be the set of states such that

S` “ tx P S | p1pxq ě p2pxqu

S´ “ tx P S | p1pxq ă p2pxqu

Then

maxAPPpSq

p1pAq ´ p2pAq “ p1pS`q ´ p2pS

`q,

maxAPPpSq

p2pAq ´ p1pAq “ p2pS´q ´ p1pS

´q.

7 / 42

Proof. (Continued)

Since p1pSq “ p2pSq “ 1, we have

p1pS`q ` p1pS

´q “ p2pS

`q ` p2pS

´q,

hencep1pS

`q ´ p2pS

`q “ p2pS

´q ´ p1pS

´q.

Therefore,

maxAPPpSq

|p1pAq ´ p2pAq| “ |p1pS`q ´ p2pS

`q| “ |p1pS

´q ´ p2pS

´q|.

8 / 42

Proof. (Continued)

Since

|p1pS`q ´ p2pS

`q| ` |p1pS

´q ´ p2pS

´q| “

ÿ

xPS

|p1pxq ´ p2pxq|

“ 2dTV pp1, p2q,

we can conclude that

maxAPPpSq

|p1pAq ´ p2pAq| “ dTV pp1, p2q.

9 / 42

Card Shuffling

Suppose that we run our shuffling Markov chain until the variationdistance between the distribution of the chain and the uniformdistribution is less than ε.

This is a strong notion of close to uniform, because everypermutation of the cards must have probability at most 1{n!` ε.

The bound on the variation distance gives an even strongerstatement: For any subset A of S , the probability that the finalpermutation is from the set A is at most πpAq ` ε

10 / 42

Card Shuffling

Example

Suppose someone is trying to make the top card in the deck an ace.If the total variation distance from the distribution p1 to theuniform distribution p2 is less than ε, then the probability that anace is the first card of the deck is at most ε greater than if we hada perfect shuffle.

11 / 42

Card ShufflingExample

As another example, suppose we take a standard 52 card deck andshuffle all the cards, but leave the ace of space on top. In this case,the variation distance between the resulting distribution p1 and theuniform distribution p2 could be bounded by considering the set Bof states where the ace of space is on the top of the deck.

dTV pp1, p2q “ maxAPPpSq

|p1pAq ´ p2pAq| ě |p1pBq ´ p2pBq|

“ 1´1

52“

51

52.

See how easy it is now to obtain a lower bound on the total variation distance?

12 / 42

Markov Chains

NotationLet π be the stationary distribution of a Markov chain with statespace S . Let ptx denote the distribution of the state of the chainstarting at state x after t steps. We define

∆xptq “ dTV pptx , πq.

The maximum over all starting states is denoted by

∆ptq “ maxxPS

dTV pptx , πq.

13 / 42

Mixing Time of Markov Chains

Definition

The mixing time τxpεq of the Markov chain starting in state x isgiven by

τxpεq “ mintt : ∆xptq ď εu.

The mixing time τpεq is given by

τpεq “ maxxPS

τxpεq.

A chain is called rapidly mixing if and only if τpεq is polynomial inlogp1{εq and the size of the problem.

14 / 42

Coupling

15 / 42

MotivationCoupling of Markov chains is a general technique for bounding themixing time of a Markov chain.

16 / 42

Coupling

DefinitionA coupling of a Markov chain Mt with state space S is a Markovchain Zt “ pXt ,Ytq on the state space S ˆ S such that

PrrXt`1 “ x 1 | Zt “ px , yqs “ PrrMt`1 “ x 1 | Mt “ xs,

PrrYt`1 “ y 1 | Zt “ px , yqs “ PrrMt`1 “ y 1 | Mt “ y s.

In other words, a coupling consists of two copies of the Markov chain M runningsimultaneously. These two copies are not literal copies; the two chains are not necessarily insame state, nor do they necessarily make the same move. Instead, we mean that each copybehaves exactly like the original Markov chain in terms of its transition probabilities.

17 / 42

Goal

We are interested in couplings that

1 bring the two copies of the chain to the same state and then

2 keep them in the same state by having the two chainsidentical moves once they are in the same state.

When the two copies of the chain reach the same state, they aresaid to have coupled.

18 / 42

Coupling Lemma

Lemma

Let Zt “ pXt ,Ytq be a coupling for a Markov chain M on a statespace S . Suppose that there exists a T such that for every x , y in S

PrrXT ‰ YT | X0 “ x ,Y0 “ y s ď ε.

Then the mixing time after T steps is at most ε, so

τpεq ď T .

In other words, the total variation distance between the distributionof the chain after T steps and the stationary distribution is atmost ε.

19 / 42

Proof.Let X0 be an arbitrarily chosen value and let Y0 be chosen according to thestationary distribution. For the given T and ε and for any subset A of the set ofstates S , we have

PrrXT P As ě PrrpXT “ YT q ^ pYT P Aqs

“ 1´ PrrpXT ‰ YT q _ pYT R Aqs

ě 1´ PrrXT ‰ YT s ´ PrrYT R As

ě PrrYT P As ´ ε

“ πpAq ´ ε.

The same argument for the set S ´ A shows that

PrrXT R As ě πpS ´ Aq ´ ε,

whencePrrXT P As ď πpAq ` ε.

20 / 42

Proof. (Continued)

It follows thatmaxx ,A

|pTx pAq ´ πpAq| ď ε.

By the previous proposition, the total variation distance from thestationary distribution is bounded by ε. So

τpεq ď T .

21 / 42

Card Shuffling

22 / 42

Card Shuffling

Let us analyze how quickly the card shuffling procedure convergesto a perfect shuffle.

Recall that in each step, we choose one card uniformly at randomand place it on top.

23 / 42

Card Shuffle Coupling

DefinitionWe will now define a coupling. Choose a position j uniformly atrandom from 1 to n and then obtain Xt`1 from Xt by moving thej-th card to the top. Denote the value of this card by C .

To obtain Yt`1 from Yt , move the card with value C to the top.

The coupling is valid, because in both chains the probability aspecific card is moved to the top at each step is 1{n.

24 / 42


ObservationOnce a card C is moved to the top, it is always in the sameposition in both copies of the chain.

Hence, the two copies are sure to become coupled once every cardhas been moved to the top at least once.

25 / 42


We can bound the number of steps until the chains couple bybounding how many times cards must be chosen uniformly atrandom before every card is chosen at least once.

26 / 42

Card Shuffling: Bounding the Number of Steps

If the Markov chain runs for at least n ln n ` cn steps, then theprobability that a specific card has not been moved to the top atleast once is at most

ˆ

1´1

n

˙n ln n`cn

ď e´pln n`cq “e´c

n.

By the union bound, the probability that any card has not beenmoved to the top at least once is at most e´c . Hence, after only

n ln n ` n lnp1{εq “ n lnpn{εq

steps, the probability that the chains have not coupled is at most ε.

27 / 42

Card Shuffle: Conclusion

The coupling lemma allows us to conclude that the variationdistance between the uniform distribution and the distribution ofthe state of the chain after n lnpn{εq steps is bounded above by ε.

28 / 42

Random Walk on the Hypercube

29 / 42

Hypercube

DefinitionThe hypercube has 2n vertices that are labeled by bit strings oflength n.

Two vertices u and v are connected by an edge if and only if theirlabels differ in exactly one bit.

30 / 42

Markov Chain on the Hypercube

Markov ChainAt each step, choose a coordinate i uniformly at random fromt0, . . . , n ´ 1u. The new state x 1 is obtained from the current statex by keeping all coordinates of x the same, except possibly for xi .The coordinate xi is set to 0 with probability 1{2 and to 1 withprobability 1{2.

RemarkThis Markov chain is exactly the random walk on the hypercube,except that with probability 1{2 the chain stays at the same vertexinstead of moving to a new one, so the chain is aperiodic.Evidently, the chain is also irreducible.

31 / 42

Hypercube: Stationary Distribution

Proposition

The stationary distribution of the Markov chain is the uniformdistribution.

Indeed, the uniform distribution is reversible for this chain. Sincethis is an aperiodic irreducible finite Markov chain, the uniformdistribution is the unique stationary distribution.

32 / 42

Hypercube: Coupling

Coupling

We bound the mixing time τpεq of this Markov chain by using theobvious coupling between two copies Xt and Yt of the Markovchain: at each step, we have both chains make the same move.

With this coupling, the two copies of the chain will surely agree onthe i -th coordinate, once the i -th coordinate has been chosen for amove of the Markov chain. Hence the chains will have coupled afterall n coordinates have each been chosen at least once.

33 / 42

Hypercube: Mixing Time

Mixing Time

The mixing time can therefore be bounded by bounding the numberof steps until each coordinate has been chosen at least once by theMarkov chain. As in the card shuffling, the probability is less than εthat after n lnpn{εq steps the chains have not coupled. By thecoupling lemma, the mixing time satisfies

τpεq ď n lnpn{εq.

This is a rapidly mixing Markov chain.

34 / 42

Convergence to the Stationary Distribution

35 / 42

Proposition

Any finite irreducible aperiodic Markov chain converges to a uniquestationary distribution in the limit.

36 / 42

Second Coupling Lemma

LemmaFor any discrete random variables X and Y , we have

dTV pX ,Y q ď PrrX ‰ Y s.

37 / 42

Proof.Let A be an event for which PrrX P As and PrrY P As are defined. Then

PrrX P As “ PrrX P A^ Y P As ` PrrX P A^ Y R As

PrrY P As “ PrrX P A^ Y P As ` PrrX R A^ Y P As.

Therefore,

PrrX P As ´ PrrY P As “ PrrX P A^ Y R As ´ PrrX R A^ Y P As.

ď PrrX P A^ Y R As

ď PrrX ‰ Y s.

Thus, we get

dTV pX ,Y q “ maxA|PrrX P As ´ PrrY P As| ď PrrX ‰ Y s.

38 / 42

Proof.

Consider two copies of the chain tXtu and tYtu, where X0 starts insome arbitrary distribution x and Y0 starts in a stationarydistribution π. Define a coupling between tXtu and tYtu by thefollowing rule:

1 if Xt ‰ Yt , thenPrrXt`1 “ j ^ Yt`1 “ j 1 | Xt “ i ^ Yt “ i 1s “ pijpi 1j 1.

2 if Xt “ Yt , then PrrXt`1 “ Yt`1 “ j | Xt “ Yt “ is “ pij .

Intuitively, we let both chains run independently until they collide,after which we run them together.

Since each chain individually moves from state i to state j with probability pij ineither case, we have that Xt evolves normally and Yt remains in the stationarydistribution.

39 / 42

Proof. (Continued)

By the second coupling lemma,

dTV pptx , πq “ max

A|ptxpAq ´ πpAq| ď PrrXt ‰ Yts,

so it suffices to show that

limtÑ8

PrrXt ‰ Yts “ 0.

40 / 42

Proof. (Continued)

Consider a state i . The first passage time from i to j is theminimum time t such that ptij ‰ 0. Let r be the maximum of allfirst passage times. Let s be a time such that ptii ‰ 0 for all t ě s.Suppose that at time `pr ` sq, we have

X`pr`sq “ j ‰ j 1 “ Y`pr`sq.

Then there are times `pr ` sq ` u and `pr ` sq ` u1, whereu, u1 ď r , such that X reaches i at time `pr ` sq ` u and Y reachesi at time `pr ` sq ` u1 with nonzero probability.

41 / 42

Proof. (Continued)

Since pr ` s ´ uq ě s and pr ` s ´ u1q ě s, after having reached iat these times, X and Y both return to i at time`pr ` sq ` pr ` sq “ pl ` 1qpr ` sq with nonzero probability. Letε ą 0 be the product of these nonzero probabilities. Then

PrrXp``1qpr`sq ‰ Yp``1qpr`sqs ď p1´ εqPrrX`pr`sq ‰ Y`pr`sqs.

In general, we have

PrrXt ‰ Yts ď p1´ εqtt{pr`squ,

whencelimtÑ8

PrrXt ‰ Yts “ 0.

42 / 42

Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Documents