Computational methods for continuous time Markov chains with

Computational methods for continuous time Markov chainswith applications to biological processes

David F. Anderson∗

∗[email protected]

Department of Mathematics

University of Wisconsin - Madison

Penn. State

January 13th, 2012

Stochastic Models of Biochemical Reaction Systems

I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.

I Often called chemical master equation type models in biosciences.

Common examples include:

1. Gene regulatory networks.

2. Models of viral infection.

3. General population models (epidemic, predator-prey, etc.)

Path-wise simulation methods include:

Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change

representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”


I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.

I Often called chemical master equation type models in biosciences.

Common examples include:

1. Gene regulatory networks.

2. Models of viral infection.

3. General population models (epidemic, predator-prey, etc.)

Path-wise simulation methods include:

Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change

representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”


Path-wise methods can approximate values such as

Ef (X (t))

For example,

1. Means: f (x) = xi .

2. Moments/variances: f (x) = x2i .

3. Probabilities: f (x) = 1{x∈A}.

or compute sensitivitiesd

dκEf (Xκ(t)).

Problem: solving using these algorithms can be computationally expensive.


Path-wise methods can approximate values such as

Ef (X (t))

For example,

1. Means: f (x) = xi .

2. Moments/variances: f (x) = x2i .

3. Probabilities: f (x) = 1{x∈A}.

or compute sensitivitiesd

dκEf (Xκ(t)).

Problem: solving using these algorithms can be computationally expensive.

First problem: joint with Des Higham

Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased estimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ± ε.

First problem: joint with Des Higham

Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased estimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ± ε.

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.


Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).


we need1√n= O(ε) =⇒ n = O(ε−2).



= O(Nε−2).



Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).


we need1√n= O(ε) =⇒ n = O(ε−2).



= O(Nε−2).


Benefits/drawbacks

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

We need to develop the model for better ideas....

Benefits/drawbacks

Benefits:


2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:



We need to develop the model for better ideas....

Build up model: Random time change representation of Tom Kurtz

Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t , and

I X (0) is the initial condition.


Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t , and

I X (0) is the initial condition.


Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).



κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds



≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).



κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds



≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).


• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.


• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

ExampleConsider a model of gene transcription and translation:

G 25→ G + M, (Transcription)

M 1000→ M + P, (Translation)

P + P 0.001→ D, (Dimerization)

M 0.1→ ∅, (Degradation of mRNA)

P 1→ ∅ (Degradation of Protein).

Then, if X = [XM ,XP ,XD]T ,

X (t) = X (0) + Y1 (25t)

100

+ Y2

(1000

∫ t

0XM(s)ds

) 010

+ Y3

(0.001

∫ t

0XP(s)(XP(s)− 1)ds

) 0−21

+ Y4

(0.1∫ t

0XM(s)ds

) −100

+ Y5

(1∫ t

0XP(s)ds

) 0−10

ExampleConsider a model of gene transcription and translation:

G 25→ G + M, (Transcription)

M 1000→ M + P, (Translation)

P + P 0.001→ D, (Dimerization)

M 0.1→ ∅, (Degradation of mRNA)

P 1→ ∅ (Degradation of Protein).

Then, if X = [XM ,XP ,XD]T ,

X (t) = X (0) + Y1 (25t)

100

+ Y2

(1000

∫ t

0XM(s)ds

) 010

+ Y3

(0.001

∫ t

0XP(s)(XP(s)− 1)ds

) 0−21

+ Y4

(0.1∫ t

0XM(s)ds

) −100

+ Y5

(1∫ t

0XP(s)ds

) 0−10

Back to our problem

Recall:

Benefits:


2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:



Let’s try an approximate scheme.

Tau-leaping: Euler’s method

Explicit tau-leaping 1 or Euler’s method, was first formulated by Dan Gillespiein this setting .

Tau-leaping is essentially an Euler approximation of∫ t

0λk (X (s))ds:

Z (h) = Z (0) +∑

k

Yk

(∫ h

0λk (Z (s)) ds

)ζk

≈ Z (0) +∑

k

Yk

(λk (Z (0)) h

)ζk

d= Z (0) +

∑k

Poisson(λk (Z (0)) h

)ζk .

1D. T. Gillespie, J. Chem. Phys., 115, 1716 – 1733.

Euler’s method

Path-wise representation for Z (t) generated by Euler’s method is

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk ,

where

η(s) = tn if tn ≤ s < tn+1 = tn + h

is a step function giving left endpoints of time discretization.

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).


µn =1n

n∑i=1

f (ZL,[i](t)).

We note




Ef (X (t))− Ef (ZL(t)) = O(hL).




= O(ε−3).


µn =1n

n∑i=1

f (ZL,[i](t)).

We note




Ef (X (t))− Ef (ZL(t)) = O(hL).




= O(ε−3).


µn =1n

n∑i=1

f (ZL,[i](t)).

We note




Ef (X (t))− Ef (ZL(t)) = O(hL).


2. n = ε−2.

Suppose a path costs O(ε−1) steps. Then


= O(ε−3).


µn =1n

n∑i=1

f (ZL,[i](t)).

We note




Ef (X (t))− Ef (ZL(t)) = O(hL).




= O(ε−3).

Benefits/drawbacks

Benefits:

1. Can drastically lower the computational complexity of a problem ifε−1 � N.

CC of using exact = Nε−2

CC of using approximate = ε−1ε−2.

Drawbacks:

1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.

2. Tau-leaping has problems: what happens if you go negative?

3. Gone away from an unbiased estimator.

Benefits/drawbacks

Benefits:

1. Can drastically lower the computational complexity of a problem ifε−1 � N.

CC of using exact = Nε−2

CC of using approximate = ε−1ε−2.

Drawbacks:

1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.

2. Tau-leaping has problems: what happens if you go negative?

3. Gone away from an unbiased estimator.

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·


EX ≈ 1n

n∑i=1

X[i],




Var(X − ZL)

is small.

I Then use


n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].




EX ≈ 1n

n∑i=1

X[i],




Var(X − ZL)

is small.

I Then use


n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].




EX ≈ 1n

n∑i=1

X[i],




Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL

≈ 1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].




EX ≈ 1n

n∑i=1

X[i],




Var(X − ZL)

is small.

I Then use


n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].




EX ≈ 1n

n∑i=1

X[i],




Var(X − ZL)

is small.

I Then use


n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].


EX = E(X − ZL) + EZL =

E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·


EX ≈ 1n

n∑i=1

X[i],




Var(X − ZL)

is small.

I Then use


n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].



Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) =

E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?


Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).


QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0




Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).


QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0




Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).


QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0




Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).


QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0




Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).


QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0



How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .




Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),




= 26.1t .




Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),




= 26.1t .



I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .



I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .


More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.






Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),


Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),







Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),


Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),


Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t


)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.


X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t


)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t


)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t


)ζk



X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t


)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t


)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t


)ζk


For approximate processes

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (Z` ◦ η`(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

Z`−1(t) = Z`−1(0) +∑

k

Yk,1

(∫ t

0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z`−1 ◦ η`−1(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk ,

Algorithm for simulation is equivalent in to τ -leaping.

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,

supt≤T

E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,

supt≤T

E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/ãnderson.

arxiv.org:1107.2181

www.math.wisc.edu/~anderson.

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,

supt≤T

E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,

supt≤T

E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/ãnderson.

arxiv.org:1107.2181


Multi-level Monte Carlo: an unbiased estimator

For well chosen n0, n`, and nE . We have

Var(Q) = Var

QE +L∑

`=`0+1

Q` + Q0

= O(ε2),

with

Comp. cost =[ε−2(N−ρhL + h2

L)]

N+ε−2(

h−1`0

+ ln(ε)2N−ρ + ln(ε−1)1

M − 1h`0

)


Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.


Some observations:







Some observations:






Example

Consider a model of gene transcription and translation:

G 25→ G + M,

M 1000→ M + P,

P + P 0.001→ D,

M 0.1→ ∅,

P 1→ ∅.

Suppose:

1. initialize with: G = 1, M = 0, P = 0, D = 0,

2. want to estimate the expected number of dimers at time T = 1,

3. to an accuracy of ± 1.0 with 95% confidence.

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!





h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109


M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109






h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109


M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109






h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109


M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109


Example

Method: Exact algorithm with crude Monte Carlo.


Unbiased Multi-level Monte Carlo with M = 3, L = 5, and `0 = 2.

Level # paths CPU Time Var. estimator # updates(X ,Z3−5) 3,900 279.6 S 0.0658 6.8 ×107

(Z3−5 ,Z3−4) 30,000 49.0 S 0.0217 8.8 ×107

(Z3−4 ,Z3−3) 150,000 71.7 S 0.0179 1.5 ×108

(Z3−3 ,Z3−2) 510,000 112.3 S 0.0319 1.7 ×108

Tau-leap with h = 3−2 8,630,000 518.4 S 0.1192 4.7 ×108

Totals N.A. 1031.0 S 0.2565 9.5 ×108

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.


expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.





X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .




expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.





X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .




expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.





X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .




expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.





X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .



Another example: Viral infectionLet

1. T = viral template.

2. G = viral genome.

3. S = viral structure.

4. V = virus.

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

Another example: Viral infection

Stochastic equations for X = (XG,XS ,XT ,XV ) are

X1(t) = X1(0) + Y1

(∫ t

0X3(s)ds

)− Y2

(0.025

∫ t

0X1(s)ds

)− Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

)X2(t) = X2(0) + Y3

(1000

∫ t

0X3(s)ds

)− Y5

(2∫ t

0X2(s)ds

)− Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

)X3(t) = X3(0) + Y2

(0.025

∫ t

0X1(s)ds

)− Y4

(0.25

∫ t

0X3(s)ds

)X4(t) = X4(0) + Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

).


Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).

Can average out to get approximate process Z (t).


Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).

Can average out to get approximate process Z (t).


Approximate process satisfies.

Z1(t) = X1(0) + Y1

(∫ t

0Z3(s)ds

)− Y2

(0.025

∫ t

0Z1(s)ds

)− Y6

(3.75× 10−3

∫ t

0Z1(s)Z3(s)ds

)Z3(t) = X3(0) + Y2

(0.025

∫ t

0Z1(s)ds

)− Y4

(0.25

∫ t

0Z3(s)ds

)Z4(t) = X4(0) + Y6

(3.75× 10−3

∫ t

0Z1(s)Z3(s)ds

).

(1)

Now useEf (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).


X(t) = X(0) + Y1,1

(∫ t

0min{X3(s), Z3(s)}ds

)ζ1 + Y1,2

(∫ t

0X3(s)− min{X3(s), Z3(s)}ds

)ζ1

+ Y2,1

(0.025

∫ t


)ζ2 + Y2,2

(0.025

∫ t

0X1(s)− min{X1(s), Z1(s)}ds

)ζ2

+ Y3

(1000

∫ t

0X3(s)ds

)ζ3

+ Y4,1

(0.25

∫ t

0min{X3(s), Z3(s)}(s)ds

)ζ4 + Y4,2

(0.25

∫ t

0X3(s)− min{X3(s), Z3(s)}(s)ds

)ζ4

+ Y5

(2∫ t

0X2(s)ds

)ζ5

+ Y6,1

(∫ t

0min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6 − Y6,2

(∫ t

0λ6(X(s))− min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6

Z (t) = Y1,1

(∫ t


)ζ1 + Y1,3

(∫ t

0Z3(s)− min{X3(s), Z3(s)}ds

)ζ1

+ Y2,1

(0.025

∫ t


)ζ2 + Y2,3

(0.025

∫ t

0Z1(s)− min{X1(s), Z1(s)}ds

)ζ2

+ Y4,1

(0.25

∫ t

0min{X3(s), Z3(s)}(s)ds

)ζ4 + Y4,3

(0.25

∫ t

0Z3(s)− min{X3(s), Z3(s)}(s)ds

)ζ4

+ Y6,1

(∫ t

0min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6 − Y6,3

(∫ t

0Λ6(Z (s))− min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6,


Suppose wantEXvirus(20)

Given T (0) = 10, all others zero.


Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010

Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108

Exact + crude Monte Carlo used:

1. 60 times more total steps.

2. 22 times more CPU time.


Suppose wantEXvirus(20)

Given T (0) = 10, all others zero.


Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010

Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108

Exact + crude Monte Carlo used:

1. 60 times more total steps.

2. 22 times more CPU time.

Mathematical Analysis

We had

X (t) = X (0) +∑

k

Yk

(∫ t

0λ′k (X (s))ds

)ζk .

Assumed ∑k

λ′k (X (·)) ≈ N � 1.

There are therefore two extreme parameters floating around our models:

1. Some parameter N � 1, causing N � 1 (inherent to model).

2. h, the stepsize (inherent to approximation).

To quantify errors, need to account for both.

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .






κ′k = κk Nβk



X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t


)ζN

k .






κ′k = κk Nβk



X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t


)ζN

k .

Results

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nckλk (X N(s))ds

)ζN

k .

Let ρk ≥ 0 satisfy

|ζNk | ≈ N−ρk ,

and setρ = min{ρk}.

Theorem (A., Higham 2011)Suppose (Z N

` ,ZN`−1) satisfy coupling with Z N

` (0) = Z N`−1(0). Then,

supt≤T

E|Z N` (t)− Z N

`−1(t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2` .

Results

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nckλk (X N(s))ds

)ζN

k .

Let ρk ≥ 0 satisfy

|ζNk | ≈ N−ρk ,

and setρ = min{ρk}.

Theorem (A., Higham 2011)Suppose (X N ,Z N

` ) satisfy coupling with X N(0) = Z N` (0). Then,

supt≤T

E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2

` .

Flavor of Proof

Theorem (A., Higham 2011)Suppose (X N ,Z N

` ) satisfy coupling with X N(0) = Z N` (0). Then,

supt≤T

E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2

` .

X N (t) =X N (0) +∑

k

Yk,1

(NγNck

∫ t

0λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

+∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

Z N` (t) =Z N

` (0) +∑

k

Yk,1

(NγNck

∫ t

0λk (X N (s)) ∧ λk (Z` ◦ η`(s))ds

)ζN

k

+∑

k

Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Flavor of Proof

So,

X N (t)− Z N (t) =∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

−Yk,3

(NγNck

∫ t

0λk (Z N


)ζN

k

Hence,

X N(t)− Z N(t) = MN(t) +∑

k

NγζNk Nck

∫ t

0(λk (X N(s))− λk (Z N

` ◦ η`(s)))ds.

Now work.

Flavor of Proof

So,

X N (t)− Z N (t) =∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

−Yk,3

(NγNck

∫ t

0λk (Z N


)ζN

k

Hence,

X N(t)− Z N(t) = MN(t) +∑

k

NγζNk Nck

∫ t

0(λk (X N(s))− λk (Z N

` ◦ η`(s)))ds.

Now work.

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).



We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).


J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).



We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).


J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).



We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).


J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).


Noting that

J ′(θ) =ddθ

Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))

ε+ o(ε).

The usual finite difference estimator is

DR(ε) = ε−1

1R

R∑i=1

f (X θ+ε[i] (t))− 1

R

R∑j=1

f (X θ[j](t))

If generated independently, then

Var(DR(ε)) = O(R−1ε−2).


Noting that

J ′(θ) =ddθ

Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))

ε+ o(ε).

The usual finite difference estimator is

DR(ε) = ε−1

1R

R∑i=1

f (X θ+ε[i] (t))− 1

R

R∑j=1

f (X θ[j](t))

If generated independently, then

Var(DR(ε)) = O(R−1ε−2).


Couple the processes.

X θ+ε(t) = X θ+ε(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,2

(∫ t

0λθ+ε

k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk

X θ(t) = X θ(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,3

(∫ t

0λθk (X

θ(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk ,

Use:

DR(ε) = ε−1 1R

R∑i=1

[f (X θ+ε

[i] (t))− f (X θ[i](t))

].


Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E

[supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2]≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/ãnderson.

arxiv.org:1109.2890



Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E

[supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2]≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/ãnderson.

arxiv.org:1109.2890


Parameter Sensitivities

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂kE[X k

protein(30)], k ≈ 1/4.

Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S

CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S

Parameter Sensitivities

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂kE[X k

protein(30)], k ≈ 1/4.

Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S

CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

Proof:

Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.

Now work on Martingale and absolutely continuous part.

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

Proof:Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.

Now work on Martingale and absolutely continuous part.

Thanks!

References:

1. David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo forcontinuous time Markov chains, with applications in biochemical kinetics,to appear in SIAM: Multiscale Modeling and Simulation.

Available at arXiv.org:1107.2181. Also on my website:www.math.wisc.edu/ãnderson.

2. David F. Anderson, Efficient Finite Difference Method for ParameterSensitivities of Continuous time Markov Chains, submitted.

Available at arXiv.org:1109.2890. Also on my website:www.math.wisc.edu/ãnderson.

Funding: NSF-DMS-1009275.

arXiv.org:1107.2181


arXiv.org:1109.2890


Computational methods for continuous time Markov chains with

Documents