Lectures on Probability & Statistical Models · 13 Markov chains Imprecise (intuitive) deﬁnition . A Markov process is a random process that “forgets its past”, in the following

Lectures on Probability andStatistical Models

Phil Pollett

Professor of Mathematics

The University of Queensland

c© These materials can be used for any educationalpurpose provided they are are not altered

Probability & Statistical Models c© Philip K. Pollett

13 Markov chains

Imprecise (intuitive) definition . A Markov process is arandom process that “forgets its past”, in the followingsense:

Pr(Future = y|Present = x and Past = z)

= Pr(Future = y|Present = x).

Thus, given the past and the present “state” of the process,only the present state is of use in predicting the future.


Markov chains

Equivalently,

Pr(Future = y and Past = z|Present = x)

= Pr(Future = y|Present = x)×Pr(Past = z|Present = x),

so that, given the present state of the process, its past andits future are independent . If the set of states S is discrete,then the process is called a Markov chain.

Remark . At first sight this definition might appear to coveronly trivial examples, but note that the current state couldbe complicated and could include a record of the recentpast.


Andrei Andreyevich Markov

(Born: 14/06/1856, Ryazan, Russia; Died: 20/07/1922, St Petersburg, Russia)

Markov is famous for his pioneering work on Markov chains, whichlaunched the theory of stochastic processes. His early work was innumber theory, analysis, continued fractions, limits of integrals,approximation theory and convergence of series.


Markov chains

Example . There are two rooms, labelled A and B. There isa spider, initially in Room A, hunting a fly that is initially inRoom B. They move from room to room independently:every minute each changes rooms (with probability p for thespider and q for the fly) or stays put, with thecomplementary probabilities. Once in the same room, thespider eats the fly and the hunt ceases.

The hunt can be represented as a Markov chain with threestates: (0) the spider and the fly are in the same room (thehunt has ended), (1) the spider is in Room A and the fly isin Room B, and, (2) the spider is in Room B and the fly is inRoom A.


Markov chains

Eventually we will be able to answer questions like “What isthe probability that the hunt lasts more than two minutes?”

Let Xn be the state of the process at time n (that is, after nminutes). Then, Xn ∈ S = {0, 1, 2}. The set S is called thestate space. The initial state is X0 = 1. State 0 is called anabsorbing state, because the process remains there once itis reached.


Markov chains

Definition . A sequence {Xn, n = 0, 1, . . . } of randomvariables is called a discrete-time stochastic process; Xn

usually represents the state of the process at time n. If{Xn} takes values in a discrete state space S, then it iscalled a Markov chain if

Pr(Xm+1 = j|Xm = i, Xm−1 = im−1, . . . , X0 = i0)

= Pr(Xm+1 = j|Xm = i). (1)

for all time points m and all states i0, . . . , im−1, i, j ∈ S. If theright-hand side of (1) is the same for all m, then the Markovchain is said to be time homogeneous.


Markov chains

We will consider only time-homogeneous chains, and weshall write

p(n)ij = Pr(Xm+n = j|Xm = i)

= Pr(Xn = j|X0 = i)

for the n-step transition probabilities and

pij := p(1)ij = Pr(Xm+1 = j|Xm = i)

= Pr(X1 = j|X0 = i)

for the 1-step transition probabilities (or simply transitionprobabilities).


Markov chains

By the law of total probability, we have that∑

j∈S

p(n)ij =

∑

j∈S

Pr(Xn = j|X0 = i) = 1,

and in particular that∑

j∈S pij = 1.

The matrix P (n) = (p(n)ij , i, j ∈ S) is called the n-step

transition matrix and P = (pij , i, j ∈ S) is called the 1-steptransition matrix (or simply transition matrix).


Markov chains

Remarks . (1) Matrices like this (with non-negative entriesand all row sums equal to 1) are called stochastic matrices.Writing 1 = (1, 1, . . . )T (where T denotes transpose), we seethat P1 = 1. Hence, P (and indeed any stochastic matrix)has an eigenvector 1 corresponding to an eigenvalue λ = 1.

(2) We may usefully set P (0) = I, where, as usual, Idenotes the identity matrix:

p(0)ij = δij :=

{

1 if i = j,

0 if i 6= j.


Markov chains

Example . Returning to the hunt, the three states were: (0)the spider and the fly are in the same room, (1) the spider isin Room A and the fly is in Room B, and, (2) the spider is inRoom B and the fly is in Room A. Since the spider changesrooms with probability p and the fly changes rooms withprobability q,

P =

1 0 0

r (1 − p)(1 − q) pq

r pq (1 − p)(1 − q)

,

where r = p(1 − q) + q(1 − p) = p + q − 2pq

= 1 − [(1 − p)(1 − q) + pq].


Markov chains

For example, if p = 1/4 and q = 1/2, then

P =

1 0 0

1/2 3/8 1/8

1/2 1/8 3/8

.

What is the chance that the hunt is over by n minutes?

Can we calculate the chance of being in each of the variousstates after n minutes?


Markov chains

By the law of total probability, we have

p(n+m)ij = Pr(Xn+m = j|X0 = i)

=∑

k∈S

Pr(Xn+m = j|Xn = k,X0 = i)

× Pr(Xn = k|X0 = i).

But,

Pr(Xn+m = j|Xn = k,X0 = i)

= Pr(Xn+m = j|Xn = k) (Markov property)

= Pr(Xm = j|X0 = k) (time homogeneous)


Markov chains

and so, for all m,n ≥ 1,

p(n+m)ij =

∑

k∈S

p(n)ik p

(m)kj , i, j ∈ S,

or, equivalently, in terms of transition matrices,P (n+m) = P (n)P (m). Thus, in particular, we haveP (n) = P (n−1)P (remembering that P := P (1)). Therefore,

P (n) = Pn, n ≥ 1.

Note that since P (0) = I = P 0, this expression is valid for alln ≥ 0.


Markov chains

Example . Returning to the hunt, if the spider and the flychange rooms with probability p = 1/4 and q = 1/2,respectively, then

P =

1 0 0

1/2 3/8 1/8

1/2 1/8 3/8

.

A simple calculation gives

P 2 =

1 0 0

3/4 5/32 3/32

3/4 3/32 5/32

,


Markov chains

P 3 =

1 0 0

7/8 9/128 7/128

7/8 7/128 9/128

,

et cetera, and, to four decimal places,

P 15 =

1 0 0

1.0000 0.0000 0.0000

1.0000 0.0000 0.0000

.

Recall that X0 = 1, so p(n)10 is the probability that the hunts

ends by n minutes. What, then, is the probability that thehunt lasts more than two minutes? Answer: 1 − 3/4 = 1/4.


Markov chains

Arbitrary initial conditions . What if we are unsure aboutwhere the process starts?

Let π(n)j = Pr(Xn = j) and define a row vector

π(n) = (π

(n)j , j ∈ S),

being the distribution of the chain at time n.

Suppose that we know the initial distribution π(0), that is,

the distribution of X0 (in the previous example we hadπ

(0) = (0 1 0)).


Markov chains

By the law of total probability, we have

π(n)j = Pr(Xn = j) =

∑

i∈S

Pr(Xn = j|X0 = i) Pr(X0 = i)

=∑

i∈S

π(0)i p

(n)ij ,

and so π(n) = π

(0)Pn, n ≥ 0.

Definition . If π(n) = π is the same for all n, then π is called

a stationary distribution. If limn→∞ π(n) exists and equals π,

then π is called a limiting distribution.


Markov chains

Example . Returning to the hunt with p = 1/4 and q = 1/2,suppose that, at the beginning of the hunt, each creature isequally likely to be in either room, so thatπ

(0) = (1/2 1/4 1/4).

Then,

π(n) = π

(0)Pn

= (1/2 1/4 1/4)

1 0 0

1/2 3/8 1/8

1/2 1/8 3/8

n

.


Markov chainsFor example,

π(3) = (1/2 1/4 1/4)

1 0 0

1/2 3/8 1/8

1/2 1/8 3/8

3

= (1/2 1/4 1/4)

1 0 0

7/8 9/128 7/128

7/8 7/128 9/128

= (15/16 1/32 1/32).

So, if, initially, each creature is equally likely to be in eitherroom, then the probability that the hunt ends within 3minutes is 15/16.


Markov chains

The two state chain . Let S = {0, 1} and let

P =

(

1 − p p

q 1 − q

)

,

where p, q ∈ (0, 1). It can be shown that

P =1

p + q

(

1 p

1 −q

)(

1 0

0 r

)(

q p

1 −1

)

,

where r = 1 − p − q. This is of the form P = V DV −1. Checkit! (The procedure is called diagonalization.)


Markov chains

This is good news because

P 2 = (V DV −1)(V DV −1) = V D(V −1V )DV −1

= V (DID)V −1 = V D2V −1.

Similarly, Pn = V DnV −1 for all n ≥ 1. Hence,

P (n) =1

p + q

(

1 p

1 −q

)(

1 0

0 rn

)(

q p

1 −1

)

=1

p + q

(

q + prn p − prn

q − qrn p + qrn

)

.


Markov chains

Thus we have an explicit expression for the n-step transitionprobabilities.

Remark . The above procedure generalizes to any Markovchain with a finite state space.


Markov chains

If the initial distribution is π(0) = (a b), then, since

π(n) = π

(0)Pn,

Pr(Xn = 0) =q + (ap − bq)rn

p + q,

Pr(Xn = 1) =p − (ap − bq)rn

p + q.

(You should check this for n = 0 and n = 1.) Notice thatwhen ap = bq, we have

Pr(Xn = 0) = 1 − Pr(Xn = 1) = q/(p + q) ,

for all n ≥ 0, so that π = (q/(p + q) p/(p + q)) is a stationarydistribution.


Markov chains

Notice also that |r| < 1, since p, q ∈ (0, 1). Therefore, π isalso a limiting distribution because

limn→∞

Pr(Xn = 0) = q/(p + q) ,

limn→∞

Pr(Xn = 1) = p/(p + q) .

Remark . If, for a general Markov chain, a limitingdistribution π exists, then it is a stationary distribution, thatis, πP = π (π is a left eigenvector corresponding to theeigenvalue 1).

For details (and the converse), you will need a moreadvanced course on Stochastic Processes.


Markov chains

Example . Max (a dog) is subjected to a series of trials, ineach of which he is given a choice of going to a dish to hisleft, containing tasty food, or a dish to his right, containingfood with an unpleasant taste.

Suppose that if, on any given occasion, Max goes to theleft, then he will return there on the next occasion withprobability 0.99, while if he goes to the right, he will do soon the next occasion with probability 0.1 (Max is smart, buthe is not infallible).


Poppy and Max


Markov chains

Let Xn be 0 or 1 according as Max chooses the dish to theleft or the dish to the right on trial n. Then, {Xn} is atwo-state Markov chain with p = 0.01 and q = 0.9 and hencer = 0.09. Therefore, if the first dish is chosen at random (attime n = 1), then Max chooses the tasty food on the n-thtrial with probability

90

91−

89

182(0.09)n−1,

the long-term probability being 90/91.


Markov chains

Birth-death chains . Their state space S is either theintegers, the non-negative integers, or {0, 1, . . . , N}, and,jumps of size greater than 1 are not permitted; theirtransition probabilities are therefore of the form pi,i+1 = ai,pi,i−1 = bi and pii = 1 − ai − bi, with pij = 0 otherwise.

The birth probabilities (ai) and the death probabilities (bi)are strictly positive and satisfy ai + bi ≤ 1, except perhaps atthe boundaries of S, where they could be 0. If ai = a andbi = b, the chain is called a random walk .


Markov chains

Gambler’s ruin . A gambler successively wagers a singleunit in an even-money game. Xn is his capital after n betsand S = {0, 1, . . . , N}. If his capital reaches N he stops andleaves happy, while state 0 corresponds to “bust”. Hereai = bi = 1/2, except at the boundaries (0 and 1 areabsorbing states). It is easy to show that the player goesbust with probability 1 − i/N if his initial capital is i.


Markov chains

The Ehrenfest diffusion model . N particles are allowed topass through a small aperture between two chambers Aand B. We assume that at each time epoch n, a singleparticle, chosen uniformly and at random from the N ,passes through the aperture.

Let Xn be the number in chamber A at time n. Then,S = {0, 1, . . . , N} and, for i ∈ S, ai = 1 − i/N and bi = i/N . Inthis model, 0 and N are reflecting barriers. It is easy toshow that the stationary distribution is binomial B(N, 1/2).


Markov chains

Population models . Here Xn is the size of the populationtime n (for example, at the end of the n-th breeding cycle, orat the time of the n-th census). S = {0, 1, . . . }, orS = {0, 1, . . . , N} when there is an upper limit N on thepopulation size (frequently interpretted as the carryingcapacity). Usually 0 is an absorbing state, corresponding topopulation extinction, and N is reflecting.


Markov chains

Example . Take S = {0, 1, . . . } with a0 = 0 and, for i ≥ 1,ai = a > 0 and bi = b > 0, where a + b = 1. It can be shownthat extinction occurs with probability 1 when a ≤ b, andwith probability (b/a)i when a > b, where i is the initialpopulation size. This is a good simple model for apopulation of cells: a = λ/(λ + µ) and b = µ/(λ + µ), where µand λ are, respectively, the death and the cell division rates.


Markov chains

The logistic model . This has S = {0, . . . , N}, with 0absorbing and N reflecting, and, for i = 1, . . . , N − 1,

ai =λ(1 − i/N)

µ + λ(1 − i/N), bi =

µ

µ + λ(1 − i/N).

Here λ and µ are birth and death rates. Notice that the birthand the death probabilities depend on i only through i/N , aquantity which is proportional to the population density :i/N = (i/Area)/(N/Area). Models with this property arecalled density dependent .


Markov chains

Telecommunications . (1) A communications link in atelephone network has N circuits. One circuit is held byeach call for its duration. Calls arrive at rate λ > 0 and arecompleted at rate µ > 0. Let Xn be the number of calls inprogress at the n-th time epoch (when an arrival or adeparture occurs). Then, S = {0, . . . , N}, with 0 and N bothreflecting barriers, and, for i = 1, . . . , N − 1,

ai =λ

λ + iµ, bi =

iµ

λ + iµ.


Markov chains

(2) At a node in a packet-switching network, data packetsare stored in a buffer of size N . They arrive at rate λ > 0and are transmitted one at a time (in the order in which theyarrive) at rate µ > 0. Let Xn be the number of packets yet tobe transmitted just after the n-th time epoch (an arrival or adeparture). Then, S = {0, . . . , N}, with 0 and N bothreflecting barriers, and, for i = 1, . . . , N − 1,

ai =λ

λ + µ, bi =

µ

λ + µ.


Markov chains

Genetic models . The simplest of these is the Wright-Fishermodel . There are N individuals, each of two genetic types,A-type and a-type. Mutation (if any) occurs at birth. Weassume that A-types are selectively superior in that therelative survival rate of A-type over a-type individuals insuccessive generations is γ > 1. Let Xn be the number ofA-type individuals, so that N − Xn is the number of a-type.


Markov chains

Wright and Fisher postulated that the composition of thenext generation is determined by N Bernoulli trials, wherethe probability pi of producing an A-type offspring is givenby

pi =γ[i(1 − α) + (N − i)β]

γ[i(1 − α) + (N − i)β] + [iα + (N − i)(1 − β)],

where α and β are the respective mutation probabilities. Wehave S = {0, . . . , N} and

pij =

(

N

j

)

pji (1 − pi)

N−j , i, j ∈ S.


Lectures on Probability & Statistical Models · 13 Markov chains Imprecise (intuitive) deﬁnition . A Markov process is a random process that “forgets its past”, in the following

Documents