Top Banner
Computational methods for continuous time Markov chains with applications to biological processes David F. Anderson * * [email protected] Department of Mathematics University of Wisconsin - Madison Penn. State January 13th, 2012
104

Computational methods for continuous time Markov chains with

Feb 03, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational methods for continuous time Markov chains with

Computational methods for continuous time Markov chainswith applications to biological processes

David F. Anderson∗

[email protected]

Department of Mathematics

University of Wisconsin - Madison

Penn. State

January 13th, 2012

Page 2: Computational methods for continuous time Markov chains with

Stochastic Models of Biochemical Reaction Systems

I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.

I Often called chemical master equation type models in biosciences.

Common examples include:

1. Gene regulatory networks.

2. Models of viral infection.

3. General population models (epidemic, predator-prey, etc.)

Path-wise simulation methods include:

Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change

representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”

Page 3: Computational methods for continuous time Markov chains with

Stochastic Models of Biochemical Reaction Systems

I Most common stochastic models of biochemical reaction systems arecontinuous time Markov chains.

I Often called chemical master equation type models in biosciences.

Common examples include:

1. Gene regulatory networks.

2. Models of viral infection.

3. General population models (epidemic, predator-prey, etc.)

Path-wise simulation methods include:

Language in Biology Language in MathGillespie’s Algorithm Sim. embedded DTMCNext reaction method Sim. random time change

representation of Tom KurtzFirst reaction method Sim. using exponential “alarm clocks”

Page 4: Computational methods for continuous time Markov chains with

Stochastic Models of Biochemical Reaction Systems

Path-wise methods can approximate values such as

Ef (X (t))

For example,

1. Means: f (x) = xi .

2. Moments/variances: f (x) = x2i .

3. Probabilities: f (x) = 1{x∈A}.

or compute sensitivitiesd

dκEf (Xκ(t)).

Problem: solving using these algorithms can be computationally expensive.

Page 5: Computational methods for continuous time Markov chains with

Stochastic Models of Biochemical Reaction Systems

Path-wise methods can approximate values such as

Ef (X (t))

For example,

1. Means: f (x) = xi .

2. Moments/variances: f (x) = x2i .

3. Probabilities: f (x) = 1{x∈A}.

or compute sensitivitiesd

dκEf (Xκ(t)).

Problem: solving using these algorithms can be computationally expensive.

Page 6: Computational methods for continuous time Markov chains with

First problem: joint with Des Higham

Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased estimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ± ε.

Page 7: Computational methods for continuous time Markov chains with

First problem: joint with Des Higham

Our first problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased estimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ± ε.

Page 8: Computational methods for continuous time Markov chains with

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

Page 9: Computational methods for continuous time Markov chains with

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

Page 10: Computational methods for continuous time Markov chains with

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

Page 11: Computational methods for continuous time Markov chains with

Benefits/drawbacks

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

We need to develop the model for better ideas....

Page 12: Computational methods for continuous time Markov chains with

Benefits/drawbacks

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

We need to develop the model for better ideas....

Page 13: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t , and

I X (0) is the initial condition.

Page 14: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t))T gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t , and

I X (0) is the initial condition.

Page 15: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

Page 16: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

Page 17: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

Assuming intensity or propensity of reaction is

κXA(s)XB(s),

We can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson point process.

Hence XA(t)XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

Page 18: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

Page 19: Computational methods for continuous time Markov chains with

Build up model: Random time change representation of Tom Kurtz

• Now consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

Page 20: Computational methods for continuous time Markov chains with

ExampleConsider a model of gene transcription and translation:

G 25→ G + M, (Transcription)

M 1000→ M + P, (Translation)

P + P 0.001→ D, (Dimerization)

M 0.1→ ∅, (Degradation of mRNA)

P 1→ ∅ (Degradation of Protein).

Then, if X = [XM ,XP ,XD]T ,

X (t) = X (0) + Y1 (25t)

100

+ Y2

(1000

∫ t

0XM(s)ds

) 010

+ Y3

(0.001

∫ t

0XP(s)(XP(s)− 1)ds

) 0−21

+ Y4

(0.1∫ t

0XM(s)ds

) −100

+ Y5

(1∫ t

0XP(s)ds

) 0−10

Page 21: Computational methods for continuous time Markov chains with

ExampleConsider a model of gene transcription and translation:

G 25→ G + M, (Transcription)

M 1000→ M + P, (Translation)

P + P 0.001→ D, (Dimerization)

M 0.1→ ∅, (Degradation of mRNA)

P 1→ ∅ (Degradation of Protein).

Then, if X = [XM ,XP ,XD]T ,

X (t) = X (0) + Y1 (25t)

100

+ Y2

(1000

∫ t

0XM(s)ds

) 010

+ Y3

(0.001

∫ t

0XP(s)(XP(s)− 1)ds

) 0−21

+ Y4

(0.1∫ t

0XM(s)ds

) −100

+ Y5

(1∫ t

0XP(s)ds

) 0−10

Page 22: Computational methods for continuous time Markov chains with

Back to our problem

Recall:

Benefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased.

Drawbacks:

1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

Let’s try an approximate scheme.

Page 23: Computational methods for continuous time Markov chains with

Tau-leaping: Euler’s method

Explicit tau-leaping 1 or Euler’s method, was first formulated by Dan Gillespiein this setting .

Tau-leaping is essentially an Euler approximation of∫ t

0λk (X (s))ds:

Z (h) = Z (0) +∑

k

Yk

(∫ h

0λk (Z (s)) ds

)ζk

≈ Z (0) +∑

k

Yk

(λk (Z (0)) h

)ζk

d= Z (0) +

∑k

Poisson(λk (Z (0)) h

)ζk .

1D. T. Gillespie, J. Chem. Phys., 115, 1716 – 1733.

Page 24: Computational methods for continuous time Markov chains with

Euler’s method

Path-wise representation for Z (t) generated by Euler’s method is

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk ,

where

η(s) = tn if tn ≤ s < tn+1 = tn + h

is a step function giving left endpoints of time discretization.

Page 25: Computational methods for continuous time Markov chains with

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Page 26: Computational methods for continuous time Markov chains with

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Page 27: Computational methods for continuous time Markov chains with

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Page 28: Computational methods for continuous time Markov chains with

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.

Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Page 29: Computational methods for continuous time Markov chains with

Return to approximating Ef (X (T ))Let ZL denote an approximate processes generated with time discretizationstep of hL. Let

µn =1n

n∑i=1

f (ZL,[i](t)).

We note

Ef (X (t))− µn =[Ef (X (t))− Ef (ZL(t))

]+ Ef (ZL(t))− µn

Suppose have an order one method

Ef (X (t))− Ef (ZL(t)) = O(hL).

We need:1. hL = O(ε).

2. n = ε−2.Suppose a path costs O(ε−1) steps. Then

Total computational complexity = (# paths)× (cost per path)

= O(ε−3).

Page 30: Computational methods for continuous time Markov chains with

Benefits/drawbacks

Benefits:

1. Can drastically lower the computational complexity of a problem ifε−1 � N.

CC of using exact = Nε−2

CC of using approximate = ε−1ε−2.

Drawbacks:

1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.

2. Tau-leaping has problems: what happens if you go negative?

3. Gone away from an unbiased estimator.

Page 31: Computational methods for continuous time Markov chains with

Benefits/drawbacks

Benefits:

1. Can drastically lower the computational complexity of a problem ifε−1 � N.

CC of using exact = Nε−2

CC of using approximate = ε−1ε−2.

Drawbacks:

1. Convergence results usually give order of convergence. Can’t give aprecise hL. Bias is a problem.

2. Tau-leaping has problems: what happens if you go negative?

3. Gone away from an unbiased estimator.

Page 32: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 33: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 34: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 35: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL

≈ 1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 36: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 37: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL =

E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 38: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo and control variatesI Suppose I want

EX ≈ 1n

n∑i=1

X[i],

but realizations of X are expensive.

I Suppose X ≈ ZL, and ZL is cheap.

I Suppose X , ZL can be generated simultaneously so that

Var(X − ZL)

is small.

I Then use

EX = E[X − ZL] + EZL ≈1n1

n1∑i=1

(X[i] − ZL,[i]) +1n2

n2∑i=1

ZL,[i].

I Multi-level Monte Carlo (Mike Giles, Stefan Heinrich) = Keep going

EX = E(X − ZL) + EZL = E(Z − ZL) + E(ZL − ZL−1) + EZL−1 = · · ·

Page 39: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) =

E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Page 40: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Page 41: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Page 42: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Page 43: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Page 44: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimatorIn our setting:

Ef (X (t)) = E[f (X (t))− f (ZL(t))]+L∑

`=`0+1

E[f (Z`(t))− f (Z`−1(t))]+Ef (Z`0(t)).

For appropiate choices of n0, n`, and nE , we define the estimators for thethree terms above via

QEdef=

1nE

nE∑i=1

(f (X[i](T )− f (ZL,[i](T ))),

Q`def=

1n`

n∑i=1

(f (Z`,[i](T ))− f (Z`−1,[i](T ))), for ` ∈ {`0 + 1, . . . , L}

Q0def=

1n0

n0∑i=1

f (Z`0,[i](T )),

and note that

Q def= QE +

L∑`=`0+1

Q` + Q0

is an unbiased estimator for Ef (X (T )).

So what is the coupling and the variance of the estimator?

Page 45: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

Page 46: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

Page 47: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent, unit-rate Poisson processes,and set

Z13.1(t) = Y1(13.1t),

Z13(t) = Y2(13t),

Using this representation, these processes are independent and, hence,not coupled.

The variance of difference is large:

Var(Z13.1(t)− Z13(t)) = Var(Y1(13.1t)) + Var(Y2(13t))

= 26.1t .

Page 48: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

Page 49: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

Page 50: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.

Page 51: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.

Page 52: Computational methods for continuous time Markov chains with

How do we generate processes simultaneously

More generally, suppose we want

1. non-homogeneous Poisson process with intensity f (t) and

2. non-homogeneous Poisson process with intensity g(t).

We can can let Y1, Y2, and Y3 be independent, unit-rate Poisson processesand define

Zf (t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

),

Zg(t) = Y1

(∫ t

0f (s) ∧ g(s)ds

)+ Y3

(∫ t

0g(s)− (f (s) ∧ g(s)) ds

),

where we are using that, for example,

Y1

(∫ t

0f (s) ∧ g(s)ds

)+Y2

(∫ t

0f (s)− (f (s) ∧ g(s)) ds

)= Y

(∫ t

0f (s)ds

),

where Y is a unit rate Poisson process.

Page 53: Computational methods for continuous time Markov chains with

Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.

Page 54: Computational methods for continuous time Markov chains with

Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.

Page 55: Computational methods for continuous time Markov chains with

Back to our processes

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk .

Now couple

X (t) = X (0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (X (s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z` ◦ η`(s))− λk (X (s)) ∧ λk (Z` ◦ η`(s))ds

)ζk

Algorithm for simulation is equivalent to next reaction method or Gillespie.

Page 56: Computational methods for continuous time Markov chains with

For approximate processes

Z`(t) = Z`(0) +∑

k

Yk,1

(∫ t

0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

+∑

k

Yk,2

(∫ t

0λk (Z` ◦ η`(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

Z`−1(t) = Z`−1(0) +∑

k

Yk,1

(∫ t

0λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk

+∑

k

Yk,3

(∫ t

0λk (Z`−1 ◦ η`−1(s))− λk (Z` ◦ η`(s)) ∧ λk (Z`−1 ◦ η`−1(s))ds

)ζk ,

Algorithm for simulation is equivalent in to τ -leaping.

Page 57: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,

supt≤T

E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,

supt≤T

E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.

Page 58: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: chemical kinetic setting

Can prove:

Theorem (Anderson, Higham 2011)Suppose (X ,Z`) satisfy coupling. Then,

supt≤T

E|X (t)− Z`(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

Theorem (Anderson, Higham 2011)Suppose (Z`,Z`−1) satisfy coupling. Then,

supt≤T

E|Z`(t)− Z`−1(t)|2 ≤ C1(T )N−ρh` + C2(T )h2` .

1David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo for stochastically modeledchemical kinetic systems. To appear in SIAM: Modeling and Simulation. Available atarxiv.org:1107.2181. Also at www.math.wisc.edu/˜anderson.

Page 59: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimator

For well chosen n0, n`, and nE . We have

Var(Q) = Var

QE +L∑

`=`0+1

Q` + Q0

= O(ε2),

with

Comp. cost =[ε−2(N−ρhL + h2

L)]

N+ε−2(

h−1`0

+ ln(ε)2N−ρ + ln(ε−1)1

M − 1h`0

)

Page 60: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimator

Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.

Page 61: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimator

Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.

Page 62: Computational methods for continuous time Markov chains with

Multi-level Monte Carlo: an unbiased estimator

Some observations:

1. Weak error plays no role in analysis: free to choose hL.

2. Common problems associated with tau-leaping

I Negativity of species numbers,

does not matter. Just define process in a sensible way.

3. The method is unbiased.

Page 63: Computational methods for continuous time Markov chains with

Example

Consider a model of gene transcription and translation:

G 25→ G + M,

M 1000→ M + P,

P + P 0.001→ D,

M 0.1→ ∅,

P 1→ ∅.

Suppose:

1. initialize with: G = 1, M = 0, P = 0, D = 0,

2. want to estimate the expected number of dimers at time T = 1,

3. to an accuracy of ± 1.0 with 95% confidence.

Page 64: Computational methods for continuous time Markov chains with

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

Page 65: Computational methods for continuous time Markov chains with

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

Page 66: Computational methods for continuous time Markov chains with

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

Page 67: Computational methods for continuous time Markov chains with

ExampleMethod: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Method: Euler tau-leaping with crude Monte Carlo.

Step-size Approximation # paths CPU Time # updatesh = 3−7 3,712.3 ± 1.0 4,750,000 13,374.6 S 6.2× 1010

h = 3−6 3,707.5 ± 1.0 4,750,000 6,207.9 S 2.1× 1010

h = 3−5 3,693.4 ± 1.0 4,700,000 2,803.9 S 6.9× 109

h = 3−4 3,654.6 ± 1.0 4,650,000 1,219.0 S 2.6× 109

Method: unbiased MLMC with `0 = 2, and M and L detailed below.Step-size parameters Approx. CPU Time # updates

M = 3, L = 6 3,713.9 ± 1.0 1,063.3 S 1.1 ×109

M = 3, L = 5 3,714.7 ± 1.0 1,114.9 S 9.4 ×108

M = 3, L = 4 3,714.2 ± 1.0 1,656.6 S 1.0 ×109

M = 4, L = 4 3714.2 ± 1.0 1,334.8 S 1.1 ×109

M = 4, L = 5 3,713.8 ± 1.0 1,014.9 S 1.1 ×109

I the exact algorithm with crude Monte Carlo demanded 140 times moreCPU time than our unbiased MLMC estimator!

Page 68: Computational methods for continuous time Markov chains with

Example

Method: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates3,714.2 ± 1.0 4,740,000 149,000 CPU S (41 hours!) 8.27 ×1010

Unbiased Multi-level Monte Carlo with M = 3, L = 5, and `0 = 2.

Level # paths CPU Time Var. estimator # updates(X ,Z3−5) 3,900 279.6 S 0.0658 6.8 ×107

(Z3−5 ,Z3−4) 30,000 49.0 S 0.0217 8.8 ×107

(Z3−4 ,Z3−3) 150,000 71.7 S 0.0179 1.5 ×108

(Z3−3 ,Z3−2) 510,000 112.3 S 0.0319 1.7 ×108

Tau-leap with h = 3−2 8,630,000 518.4 S 0.1192 4.7 ×108

Totals N.A. 1031.0 S 0.2565 9.5 ×108

Page 69: Computational methods for continuous time Markov chains with

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Page 70: Computational methods for continuous time Markov chains with

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Page 71: Computational methods for continuous time Markov chains with

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Page 72: Computational methods for continuous time Markov chains with

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Page 73: Computational methods for continuous time Markov chains with

Some conclusions about this method1. Gillespie’s algorithm is by far the most common way to compute

expectations:

1.1 Means.

1.2 Variances.

1.3 Probabilities.

2. The new method (MLMC) also performs this task with no bias (exact).

3. Will be at worst the same speed as Gillespie(exact algorithm + crude Monte Carlo).

4. Will commonly be many orders of magnitude faster.

5. Applicable to essentially all continuous time Markov chains:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

6. Con: Is substantially harder to implement; good software is needed.

7. Makes no use of any specific structure or scaling in the problem.

Page 74: Computational methods for continuous time Markov chains with

Another example: Viral infectionLet

1. T = viral template.

2. G = viral genome.

3. S = viral structure.

4. V = virus.

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

Page 75: Computational methods for continuous time Markov chains with

Another example: Viral infection

Stochastic equations for X = (XG,XS ,XT ,XV ) are

X1(t) = X1(0) + Y1

(∫ t

0X3(s)ds

)− Y2

(0.025

∫ t

0X1(s)ds

)− Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

)X2(t) = X2(0) + Y3

(1000

∫ t

0X3(s)ds

)− Y5

(2∫ t

0X2(s)ds

)− Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

)X3(t) = X3(0) + Y2

(0.025

∫ t

0X1(s)ds

)− Y4

(0.25

∫ t

0X3(s)ds

)X4(t) = X4(0) + Y6

(7.5× 10−6

∫ t

0X1(s)X2(s)ds

).

Page 76: Computational methods for continuous time Markov chains with

Another example: Viral infection

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).

Can average out to get approximate process Z (t).

Page 77: Computational methods for continuous time Markov chains with

Another example: Viral infection

Reactions:

R1) T+ stuffκ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T+ stuffκ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

If T > 0,I reactions 3 and 5 are much faster than others.I Looks like S is approximately Poisson(500× T ).

Can average out to get approximate process Z (t).

Page 78: Computational methods for continuous time Markov chains with

Another example: Viral infection

Approximate process satisfies.

Z1(t) = X1(0) + Y1

(∫ t

0Z3(s)ds

)− Y2

(0.025

∫ t

0Z1(s)ds

)− Y6

(3.75× 10−3

∫ t

0Z1(s)Z3(s)ds

)Z3(t) = X3(0) + Y2

(0.025

∫ t

0Z1(s)ds

)− Y4

(0.25

∫ t

0Z3(s)ds

)Z4(t) = X4(0) + Y6

(3.75× 10−3

∫ t

0Z1(s)Z3(s)ds

).

(1)

Now useEf (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Page 79: Computational methods for continuous time Markov chains with

Another example: Viral infection

X(t) = X(0) + Y1,1

(∫ t

0min{X3(s), Z3(s)}ds

)ζ1 + Y1,2

(∫ t

0X3(s)− min{X3(s), Z3(s)}ds

)ζ1

+ Y2,1

(0.025

∫ t

0min{X1(s), Z1(s)}ds

)ζ2 + Y2,2

(0.025

∫ t

0X1(s)− min{X1(s), Z1(s)}ds

)ζ2

+ Y3

(1000

∫ t

0X3(s)ds

)ζ3

+ Y4,1

(0.25

∫ t

0min{X3(s), Z3(s)}(s)ds

)ζ4 + Y4,2

(0.25

∫ t

0X3(s)− min{X3(s), Z3(s)}(s)ds

)ζ4

+ Y5

(2∫ t

0X2(s)ds

)ζ5

+ Y6,1

(∫ t

0min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6 − Y6,2

(∫ t

0λ6(X(s))− min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6

Z (t) = Y1,1

(∫ t

0min{X3(s), Z3(s)}ds

)ζ1 + Y1,3

(∫ t

0Z3(s)− min{X3(s), Z3(s)}ds

)ζ1

+ Y2,1

(0.025

∫ t

0min{X1(s), Z1(s)}ds

)ζ2 + Y2,3

(0.025

∫ t

0Z1(s)− min{X1(s), Z1(s)}ds

)ζ2

+ Y4,1

(0.25

∫ t

0min{X3(s), Z3(s)}(s)ds

)ζ4 + Y4,3

(0.25

∫ t

0Z3(s)− min{X3(s), Z3(s)}(s)ds

)ζ4

+ Y6,1

(∫ t

0min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6 − Y6,3

(∫ t

0Λ6(Z (s))− min{λ6(X(s)), Λ6(Z (s))}ds

)ζ6,

Page 80: Computational methods for continuous time Markov chains with

Another example: Viral infection

Suppose wantEXvirus(20)

Given T (0) = 10, all others zero.

Method: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010

Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108

Exact + crude Monte Carlo used:

1. 60 times more total steps.

2. 22 times more CPU time.

Page 81: Computational methods for continuous time Markov chains with

Another example: Viral infection

Suppose wantEXvirus(20)

Given T (0) = 10, all others zero.

Method: Exact algorithm with crude Monte Carlo.

Approximation # paths CPU Time # updates13.85 ± 0.07 75,000 24,800 CPU S 1.45× 1010

Method: Ef (X (t)) = E[f (X (t))− f (Z (t))] + Ef (Z (t)).

Approximation CPU Time # updates13.91 ± 0.07 1,118.5 CPU S 2.41× 108

Exact + crude Monte Carlo used:

1. 60 times more total steps.

2. 22 times more CPU time.

Page 82: Computational methods for continuous time Markov chains with

Mathematical Analysis

We had

X (t) = X (0) +∑

k

Yk

(∫ t

0λ′k (X (s))ds

)ζk .

Assumed ∑k

λ′k (X (·)) ≈ N � 1.

There are therefore two extreme parameters floating around our models:

1. Some parameter N � 1, causing N � 1 (inherent to model).

2. h, the stepsize (inherent to approximation).

To quantify errors, need to account for both.

Page 83: Computational methods for continuous time Markov chains with

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .

Page 84: Computational methods for continuous time Markov chains with

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .

Page 85: Computational methods for continuous time Markov chains with

Mathematical Analysis: Scaling in style of Thomas Kurtz

For each species i , define the normalized abundance

X Ni (t) = N−αi Xi(t),

where αi ≥ 0 should be selected so that X Ni = O(1).

Rate constants, κ′k , may also vary over several orders of magnitude. We write

κ′k = κk Nβk

where the βk are selected so that κk = O(1).

Eventually leads to scaled model

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nβk +α·νk−γλk (X N(s))ds

)ζN

k .

Page 86: Computational methods for continuous time Markov chains with

Results

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nckλk (X N(s))ds

)ζN

k .

Let ρk ≥ 0 satisfy

|ζNk | ≈ N−ρk ,

and setρ = min{ρk}.

Theorem (A., Higham 2011)Suppose (Z N

` ,ZN`−1) satisfy coupling with Z N

` (0) = Z N`−1(0). Then,

supt≤T

E|Z N` (t)− Z N

`−1(t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2` .

Page 87: Computational methods for continuous time Markov chains with

Results

X N(t) = X N(0) +∑

k

Yk

(Nγ

∫ t

0Nckλk (X N(s))ds

)ζN

k .

Let ρk ≥ 0 satisfy

|ζNk | ≈ N−ρk ,

and setρ = min{ρk}.

Theorem (A., Higham 2011)Suppose (X N ,Z N

` ) satisfy coupling with X N(0) = Z N` (0). Then,

supt≤T

E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2

` .

Page 88: Computational methods for continuous time Markov chains with

Flavor of Proof

Theorem (A., Higham 2011)Suppose (X N ,Z N

` ) satisfy coupling with X N(0) = Z N` (0). Then,

supt≤T

E|X N(t)− Z N` (t)|2 ≤ C1(T ,N, γ)N−ρh` + C2(T ,N, γ)h2

` .

X N (t) =X N (0) +∑

k

Yk,1

(NγNck

∫ t

0λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

+∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

Z N` (t) =Z N

` (0) +∑

k

Yk,1

(NγNck

∫ t

0λk (X N (s)) ∧ λk (Z` ◦ η`(s))ds

)ζN

k

+∑

k

Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Page 89: Computational methods for continuous time Markov chains with

Flavor of Proof

So,

X N (t)− Z N (t) =∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

−Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Hence,

X N(t)− Z N(t) = MN(t) +∑

k

NγζNk Nck

∫ t

0(λk (X N(s))− λk (Z N

` ◦ η`(s)))ds.

Now work.

Page 90: Computational methods for continuous time Markov chains with

Flavor of Proof

So,

X N (t)− Z N (t) =∑

k

Yk,2

(NγNck

∫ t

0λk (X N (s))− λk (X N (s)) ∧ λk (Z N

` ◦ η`(s))ds)ζN

k

−Yk,3

(NγNck

∫ t

0λk (Z N

` ◦ η`(s))− λk (X N (s)) ∧ λk (Z N` ◦ η`(s))ds

)ζN

k

Hence,

X N(t)− Z N(t) = MN(t) +∑

k

NγζNk Nck

∫ t

0(λk (X N(s))− λk (Z N

` ◦ η`(s)))ds.

Now work.

Page 91: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Page 92: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Page 93: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Page 94: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Motivated by Jim Rawlings.

We have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X

θ(s))ds)ζk .

and we define

J(θ) = Ef (X θ(t)].

We wantJ ′(θ) =

ddθ

Ef (X θ(t)).

There are multiple methods. We consider finite differences:

J ′(θ) =J(θ + ε)− J(θ)

ε+ O(ε).

Page 95: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Noting that

J ′(θ) =ddθ

Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))

ε+ o(ε).

The usual finite difference estimator is

DR(ε) = ε−1

1R

R∑i=1

f (X θ+ε[i] (t))− 1

R

R∑j=1

f (X θ[j](t))

If generated independently, then

Var(DR(ε)) = O(R−1ε−2).

Page 96: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Noting that

J ′(θ) =ddθ

Ef (X θ(t)) =Ef (X θ+ε(t))− Ef (X θ(t))

ε+ o(ε).

The usual finite difference estimator is

DR(ε) = ε−1

1R

R∑i=1

f (X θ+ε[i] (t))− 1

R

R∑j=1

f (X θ[j](t))

If generated independently, then

Var(DR(ε)) = O(R−1ε−2).

Page 97: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Couple the processes.

X θ+ε(t) = X θ+ε(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,2

(∫ t

0λθ+ε

k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk

X θ(t) = X θ(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,3

(∫ t

0λθk (X

θ(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk ,

Use:

DR(ε) = ε−1 1R

R∑i=1

[f (X θ+ε

[i] (t))− f (X θ[i](t))

].

Page 98: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E

[supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2]≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/˜anderson.

Page 99: Computational methods for continuous time Markov chains with

Next problem: parameter sensitivities.

Theorem (Anderson, 2011)Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E

[supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2]≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

1David F. Anderson, An efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains. Submitted. Available at arxiv.org:1109.2890. Also atwww.math.wisc.edu/˜anderson.

Page 100: Computational methods for continuous time Markov chains with

Parameter Sensitivities

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂kE[X k

protein(30)], k ≈ 1/4.

Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S

CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S

Page 101: Computational methods for continuous time Markov chains with

Parameter Sensitivities

G 2→ G + M,

M 10→ M + P,

M k→ ∅,

P 1→ ∅.

Want∂

∂kE[X k

protein(30)], k ≈ 1/4.

Method # paths Approximation # updates CPU TimeLikelihood ratio 689,600 -312.1 ± 6.0 2.9× 109 3,506.6 SExact/Naive FD 246,200 -318.8 ± 6.0 2.1× 109 3,282.1 S

CRP 26,320 -320.7 ± 6.0 2.2× 108 410.0 SCoupled 4,780 -321.2 ± 6.0 2.1× 107 35.3 S

Page 102: Computational methods for continuous time Markov chains with

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

Proof:

Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.

Now work on Martingale and absolutely continuous part.

Page 103: Computational methods for continuous time Markov chains with

Analysis

TheoremSuppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is a CT ,f > 0for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

Proof:Key observation of proof:

X θ+ε(t)− X θ(t) = Mθ,ε(t) +∫ t

0F θ+ε(X θ+ε(s))− F θ(X θ(s))ds.

Now work on Martingale and absolutely continuous part.

Page 104: Computational methods for continuous time Markov chains with

Thanks!

References:

1. David F. Anderson and Desmond J. Higham, Multi-level Monte Carlo forcontinuous time Markov chains, with applications in biochemical kinetics,to appear in SIAM: Multiscale Modeling and Simulation.

Available at arXiv.org:1107.2181. Also on my website:www.math.wisc.edu/˜anderson.

2. David F. Anderson, Efficient Finite Difference Method for ParameterSensitivities of Continuous time Markov Chains, submitted.

Available at arXiv.org:1109.2890. Also on my website:www.math.wisc.edu/˜anderson.

Funding: NSF-DMS-1009275.