Top Banner
Coupling of Markov Chains Andreas Klappenecker Texas A&M University © 2018 by Andreas Klappenecker. All rights reserved. 1 / 42
43

Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Jul 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Coupling of Markov Chains

Andreas Klappenecker

Texas A&M University

© 2018 by Andreas Klappenecker. All rights reserved.

1 / 42

Page 2: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Shuffling Cards

Card Shuffling

Let us consider the following simple procedure to shuffle n cards.Select a card uniformly at random at put it on the top of the deck.Repeat this step.

ObservationsThis shuffling process is a Markov chain. Any of the n!permutations can be reached from any permutation, so the chain isirreducible. Since with probability 1{n the state remains the same,each state is aperiodic, so the Markov chain is aperiodic. Hence thechain has a unique stationary distribution.

2 / 42

Page 3: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Shuffling CardsQuestion

What is the stationary distribution of the shuffling Markov chain?

AnswerThe uniform distribution is the stationary distribution on the Markov chain.Indeed, the stationary distribution π satisfies πP “ π. More explicitly, if x is astate of the chain and Npxq the set of states that can reach x in the next step,then

n “ |Npxq|,

since the top card in x could have been in n different positions. Thus, we have

πx “1

n

ÿ

yPNpxq

πy .

Since the uniform distribution satisfies these equations, it must coincide with π.

3 / 42

Page 4: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Shuffling CardsQuestion

What is the stationary distribution of the shuffling Markov chain?

AnswerThe uniform distribution is the stationary distribution on the Markov chain.Indeed, the stationary distribution π satisfies πP “ π. More explicitly, if x is astate of the chain and Npxq the set of states that can reach x in the next step,then

n “ |Npxq|,

since the top card in x could have been in n different positions. Thus, we have

πx “1

n

ÿ

yPNpxq

πy .

Since the uniform distribution satisfies these equations, it must coincide with π.

3 / 42

Page 5: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Key Question

Question

We know that the stationary distribution is the limiting distributionof the Markov chain. So eventually the states will be uniformlydistributed. But we would like to shuffle the cards just a finitenumber of times.

How many times should we shuffle until the distribution is close touniform?

4 / 42

Page 6: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Total Variation Distance

Definition

If p “ pp0, p1, . . . , pn´1q and q “ pq0, q1, . . . , qn´1q are probabilitydistributions on a finite state space, then

dTV pp, qq “1

2

n´1ÿ

k“0

|pk ´ qk |

is called the total variation distance between p and q.

In general, 0 ď dTV pp, qq ď 1. If p “ q, then dTV pp, qq “ 0.

5 / 42

Page 7: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Total Variation Distance

Proposition

Let p1 and p2 be discrete probability distributions on a set S . Forany subset A of S , we define

pipAq “ÿ

xPA

pipxq.

ThendTV pp1, p2q “ max

APPpSq|p1pAq ´ p2pAq|.

6 / 42

Page 8: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof.

Let S˘ be the set of states such that

S` “ tx P S | p1pxq ě p2pxqu

S´ “ tx P S | p1pxq ă p2pxqu

Then

maxAPPpSq

p1pAq ´ p2pAq “ p1pS`q ´ p2pS

`q,

maxAPPpSq

p2pAq ´ p1pAq “ p2pS´q ´ p1pS

´q.

7 / 42

Page 9: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof. (Continued)

Since p1pSq “ p2pSq “ 1, we have

p1pS`q ` p1pS

´q “ p2pS

`q ` p2pS

´q,

hencep1pS

`q ´ p2pS

`q “ p2pS

´q ´ p1pS

´q.

Therefore,

maxAPPpSq

|p1pAq ´ p2pAq| “ |p1pS`q ´ p2pS

`q| “ |p1pS

´q ´ p2pS

´q|.

8 / 42

Page 10: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof. (Continued)

Since

|p1pS`q ´ p2pS

`q| ` |p1pS

´q ´ p2pS

´q| “

ÿ

xPS

|p1pxq ´ p2pxq|

“ 2dTV pp1, p2q,

we can conclude that

maxAPPpSq

|p1pAq ´ p2pAq| “ dTV pp1, p2q.

9 / 42

Page 11: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffling

Suppose that we run our shuffling Markov chain until the variationdistance between the distribution of the chain and the uniformdistribution is less than ε.

This is a strong notion of close to uniform, because everypermutation of the cards must have probability at most 1{n!` ε.

The bound on the variation distance gives an even strongerstatement: For any subset A of S , the probability that the finalpermutation is from the set A is at most πpAq ` ε

10 / 42

Page 12: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffling

Example

Suppose someone is trying to make the top card in the deck an ace.If the total variation distance from the distribution p1 to theuniform distribution p2 is less than ε, then the probability that anace is the first card of the deck is at most ε greater than if we hada perfect shuffle.

11 / 42

Page 13: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card ShufflingExample

As another example, suppose we take a standard 52 card deck andshuffle all the cards, but leave the ace of space on top. In this case,the variation distance between the resulting distribution p1 and theuniform distribution p2 could be bounded by considering the set Bof states where the ace of space is on the top of the deck.

dTV pp1, p2q “ maxAPPpSq

|p1pAq ´ p2pAq| ě |p1pBq ´ p2pBq|

“ 1´1

52“

51

52.

See how easy it is now to obtain a lower bound on the total variation distance?

12 / 42

Page 14: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Markov Chains

NotationLet π be the stationary distribution of a Markov chain with statespace S . Let ptx denote the distribution of the state of the chainstarting at state x after t steps. We define

∆xptq “ dTV pptx , πq.

The maximum over all starting states is denoted by

∆ptq “ maxxPS

dTV pptx , πq.

13 / 42

Page 15: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Mixing Time of Markov Chains

Definition

The mixing time τxpεq of the Markov chain starting in state x isgiven by

τxpεq “ mintt : ∆xptq ď εu.

The mixing time τpεq is given by

τpεq “ maxxPS

τxpεq.

A chain is called rapidly mixing if and only if τpεq is polynomial inlogp1{εq and the size of the problem.

14 / 42

Page 16: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Coupling

15 / 42

Page 17: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

MotivationCoupling of Markov chains is a general technique for bounding themixing time of a Markov chain.

16 / 42

Page 18: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Coupling

DefinitionA coupling of a Markov chain Mt with state space S is a Markovchain Zt “ pXt ,Ytq on the state space S ˆ S such that

PrrXt`1 “ x 1 | Zt “ px , yqs “ PrrMt`1 “ x 1 | Mt “ xs,

PrrYt`1 “ y 1 | Zt “ px , yqs “ PrrMt`1 “ y 1 | Mt “ y s.

In other words, a coupling consists of two copies of the Markov chain M runningsimultaneously. These two copies are not literal copies; the two chains are not necessarily insame state, nor do they necessarily make the same move. Instead, we mean that each copybehaves exactly like the original Markov chain in terms of its transition probabilities.

17 / 42

Page 19: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Goal

We are interested in couplings that

1 bring the two copies of the chain to the same state and then

2 keep them in the same state by having the two chainsidentical moves once they are in the same state.

When the two copies of the chain reach the same state, they aresaid to have coupled.

18 / 42

Page 20: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Coupling Lemma

Lemma

Let Zt “ pXt ,Ytq be a coupling for a Markov chain M on a statespace S . Suppose that there exists a T such that for every x , y in S

PrrXT ‰ YT | X0 “ x ,Y0 “ y s ď ε.

Then the mixing time after T steps is at most ε, so

τpεq ď T .

In other words, the total variation distance between the distributionof the chain after T steps and the stationary distribution is atmost ε.

19 / 42

Page 21: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof.Let X0 be an arbitrarily chosen value and let Y0 be chosen according to thestationary distribution. For the given T and ε and for any subset A of the set ofstates S , we have

PrrXT P As ě PrrpXT “ YT q ^ pYT P Aqs

“ 1´ PrrpXT ‰ YT q _ pYT R Aqs

ě 1´ PrrXT ‰ YT s ´ PrrYT R As

ě PrrYT P As ´ ε

“ πpAq ´ ε.

The same argument for the set S ´ A shows that

PrrXT R As ě πpS ´ Aq ´ ε,

whencePrrXT P As ď πpAq ` ε.

20 / 42

Page 22: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof. (Continued)

It follows thatmaxx ,A

|pTx pAq ´ πpAq| ď ε.

By the previous proposition, the total variation distance from thestationary distribution is bounded by ε. So

τpεq ď T .

21 / 42

Page 23: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffling

22 / 42

Page 24: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffling

Let us analyze how quickly the card shuffling procedure convergesto a perfect shuffle.

Recall that in each step, we choose one card uniformly at randomand place it on top.

23 / 42

Page 25: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffle Coupling

DefinitionWe will now define a coupling. Choose a position j uniformly atrandom from 1 to n and then obtain Xt`1 from Xt by moving thej-th card to the top. Denote the value of this card by C .

To obtain Yt`1 from Yt , move the card with value C to the top.

The coupling is valid, because in both chains the probability aspecific card is moved to the top at each step is 1{n.

24 / 42

Page 26: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffle Coupling

ObservationOnce a card C is moved to the top, it is always in the sameposition in both copies of the chain.

Hence, the two copies are sure to become coupled once every cardhas been moved to the top at least once.

25 / 42

Page 27: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffle Coupling

We can bound the number of steps until the chains couple bybounding how many times cards must be chosen uniformly atrandom before every card is chosen at least once.

26 / 42

Page 28: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffling: Bounding the Number of Steps

If the Markov chain runs for at least n ln n ` cn steps, then theprobability that a specific card has not been moved to the top atleast once is at most

ˆ

1´1

n

˙n ln n`cn

ď e´pln n`cq “e´c

n.

By the union bound, the probability that any card has not beenmoved to the top at least once is at most e´c . Hence, after only

n ln n ` n lnp1{εq “ n lnpn{εq

steps, the probability that the chains have not coupled is at most ε.

27 / 42

Page 29: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Card Shuffle: Conclusion

The coupling lemma allows us to conclude that the variationdistance between the uniform distribution and the distribution ofthe state of the chain after n lnpn{εq steps is bounded above by ε.

28 / 42

Page 30: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Random Walk on the Hypercube

29 / 42

Page 31: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Hypercube

DefinitionThe hypercube has 2n vertices that are labeled by bit strings oflength n.

Two vertices u and v are connected by an edge if and only if theirlabels differ in exactly one bit.

30 / 42

Page 32: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Markov Chain on the Hypercube

Markov ChainAt each step, choose a coordinate i uniformly at random fromt0, . . . , n ´ 1u. The new state x 1 is obtained from the current statex by keeping all coordinates of x the same, except possibly for xi .The coordinate xi is set to 0 with probability 1{2 and to 1 withprobability 1{2.

RemarkThis Markov chain is exactly the random walk on the hypercube,except that with probability 1{2 the chain stays at the same vertexinstead of moving to a new one, so the chain is aperiodic.Evidently, the chain is also irreducible.

31 / 42

Page 33: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Hypercube: Stationary Distribution

Proposition

The stationary distribution of the Markov chain is the uniformdistribution.

Indeed, the uniform distribution is reversible for this chain. Sincethis is an aperiodic irreducible finite Markov chain, the uniformdistribution is the unique stationary distribution.

32 / 42

Page 34: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Hypercube: Coupling

Coupling

We bound the mixing time τpεq of this Markov chain by using theobvious coupling between two copies Xt and Yt of the Markovchain: at each step, we have both chains make the same move.

With this coupling, the two copies of the chain will surely agree onthe i -th coordinate, once the i -th coordinate has been chosen for amove of the Markov chain. Hence the chains will have coupled afterall n coordinates have each been chosen at least once.

33 / 42

Page 35: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Hypercube: Mixing Time

Mixing Time

The mixing time can therefore be bounded by bounding the numberof steps until each coordinate has been chosen at least once by theMarkov chain. As in the card shuffling, the probability is less than εthat after n lnpn{εq steps the chains have not coupled. By thecoupling lemma, the mixing time satisfies

τpεq ď n lnpn{εq.

This is a rapidly mixing Markov chain.

34 / 42

Page 36: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Convergence to the Stationary Distribution

35 / 42

Page 37: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proposition

Any finite irreducible aperiodic Markov chain converges to a uniquestationary distribution in the limit.

36 / 42

Page 38: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Second Coupling Lemma

LemmaFor any discrete random variables X and Y , we have

dTV pX ,Y q ď PrrX ‰ Y s.

37 / 42

Page 39: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof.Let A be an event for which PrrX P As and PrrY P As are defined. Then

PrrX P As “ PrrX P A^ Y P As ` PrrX P A^ Y R As

PrrY P As “ PrrX P A^ Y P As ` PrrX R A^ Y P As.

Therefore,

PrrX P As ´ PrrY P As “ PrrX P A^ Y R As ´ PrrX R A^ Y P As.

ď PrrX P A^ Y R As

ď PrrX ‰ Y s.

Thus, we get

dTV pX ,Y q “ maxA|PrrX P As ´ PrrY P As| ď PrrX ‰ Y s.

38 / 42

Page 40: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof.

Consider two copies of the chain tXtu and tYtu, where X0 starts insome arbitrary distribution x and Y0 starts in a stationarydistribution π. Define a coupling between tXtu and tYtu by thefollowing rule:

1 if Xt ‰ Yt , thenPrrXt`1 “ j ^ Yt`1 “ j 1 | Xt “ i ^ Yt “ i 1s “ pijpi 1j 1.

2 if Xt “ Yt , then PrrXt`1 “ Yt`1 “ j | Xt “ Yt “ is “ pij .

Intuitively, we let both chains run independently until they collide,after which we run them together.

Since each chain individually moves from state i to state j with probability pij ineither case, we have that Xt evolves normally and Yt remains in the stationarydistribution.

39 / 42

Page 41: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof. (Continued)

By the second coupling lemma,

dTV pptx , πq “ max

A|ptxpAq ´ πpAq| ď PrrXt ‰ Yts,

so it suffices to show that

limtÑ8

PrrXt ‰ Yts “ 0.

40 / 42

Page 42: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof. (Continued)

Consider a state i . The first passage time from i to j is theminimum time t such that ptij ‰ 0. Let r be the maximum of allfirst passage times. Let s be a time such that ptii ‰ 0 for all t ě s.Suppose that at time `pr ` sq, we have

X`pr`sq “ j ‰ j 1 “ Y`pr`sq.

Then there are times `pr ` sq ` u and `pr ` sq ` u1, whereu, u1 ď r , such that X reaches i at time `pr ` sq ` u and Y reachesi at time `pr ` sq ` u1 with nonzero probability.

41 / 42

Page 43: Coupling of Markov Chainsfaculty.cs.tamu.edu/klappi/csce658-s19/coupling.pdf · Mixing Time of Markov Chains De nition The mixing time ˝ xp qof the Markov chain starting in state

Proof. (Continued)

Since pr ` s ´ uq ě s and pr ` s ´ u1q ě s, after having reached iat these times, X and Y both return to i at time`pr ` sq ` pr ` sq “ pl ` 1qpr ` sq with nonzero probability. Letε ą 0 be the product of these nonzero probabilities. Then

PrrXp``1qpr`sq ‰ Yp``1qpr`sqs ď p1´ εqPrrX`pr`sq ‰ Y`pr`sqs.

In general, we have

PrrXt ‰ Yts ď p1´ εqtt{pr`squ,

whencelimtÑ8

PrrXt ‰ Yts “ 0.

42 / 42