Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo

Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo

Monte Carlointegration

Quality ofSampling

Quasi-MonteCarlo

Pseudorandom

Pseudorandomnumbergeneration

Otherdistributions

Limitations

Multivariatedistributions

Markov

Chains

Markov chains

Markov chains

A FamousMarkov Chain

Google ...

Monte Carlo Simulations, Sampling and

Markov Chain Monte Carlo

Xin-She Yang

c©2010

Xin-She Yang Monte Carlo & MCMC

Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Estimating π

How to estimate π using only a ruler and some match sticks?


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Buffon’s Needle Problem

Buffon’s needle problem (1733). Probability of crossing a line

p =2

π· Ld

,

where L = length of needles, and d =spacing.


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Probability of Crossing a Line

Since p ≈ n/N ≈ 2L/πd , we have

π ≈ 2N

n· Ld

.

Lazzarini (1901): L = 5d/6, N = 3408, n = 1808, so

π ≈ 2× 3408

1808· 56≈ 3.14159290.

Too accurate?! Is this right? What happens when n = 1809?

Errors ∼ 1/√

N ∼ 2%.


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Monte Carlo Methods

Everyone has used Monte Carlo methods in some way ...

Measure temperatures, choose a product, ...

Taste soup, wine ...


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Monte Carlo Integration

I =

∫

Ωfdv = V

[ 1

N

n∑

i=1

fi

]

+ O(ǫ),

ǫ ∼

√

1N

∑Ni=1 f 2

i − µ2

N∼ O(1/

√N).


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Importance and Quality of the Samples

Higher dimensions – even more challenging!

I =

∫ ∫

...

∫

f (u, v , ...,w) du dv ...dw .

Errors ∼ 1/√

N

Higher dimensional integrals

How to distribute these sampling points?

Regular grids: E ∼ O(N−2/d ) in d ≥ 4 dimensions (notenough!)

Strategies: importance sampling, Latin hypercube, ...

Any other ways?


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Quasi-Monte Carlo Methods

In essence, that is to distribute (consecutive) sampling pointsas far away as possible, using quasi-random or low-discrepancynumbers (not pseudo-random)... Halton, Sobol, Corput ...

For example, Corput express an integer n as a prime base b

n =m

∑

j=0

aj(n)bj , aj ∈ 0, 1, 2, ..., b − 1.

Then, it is reversed or reflected

φb(n) =

m∑

j=0

aj(n)1

bj+1.

For example, 0, 1, 2, ..., 15 =⇒ 0, 12 , 1

4 , 34 , 1

8 , ..., 1516 .

Errors ∼ O(1/N)


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Pseudorandom numbers – by deterministic

sequences

Uniform Distributions:

di = (adi−1 + c) mod m,

Classic IBM generator:

a = 65539, c = 0, m = 231(strong correlation!)

In fact, correlation coefficient is 1!Better choice (old Matlab):

a = 75 = 16807, c = 0, m =31 −1 = 2, 147, 483, 647.

If scaled by m, all numbers are in [1/m, (m − 1)/m].New Matlab: [ǫ, 1− ǫ], ǫ = 2−53 ≈ 1.1 × 10−16.

IEEE: 64-bits system = 53 bits for a signed fraction in base 2and 11 bits for a signed exponent.


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Other Distributions

Inverse transform method, rejection method, Mersenne twister,..., Markov chain Monte Carlo.

Standard norm distribution: p(u) = 1√2π

e−u2/2,

CDF: Φ(v) = 1√2π

∫ v

−∞ e−u2/2du = 12 [1 + ( v√

2)],

v = Φ−1(u) =√

2 erf−1(2u − 1),

0

200

400

600

800

1000

1200

0 0.2 0.4 0.6 0.8 10

2000

4000

6000

8000

10000

-6 -4 -2 0 2 4 6


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Transform method: Limitations

v = Φ−1(u) =√

2 erf−1(2u − 1),

erf−1(x) =

√π

2

[

x +πx3

12+

7π2x5

480+

127π3x7

40320+ · · ·

]

.

Not so easy to calculate!

Sometimes, the inverse may not be possible.


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Multivariate Distributions

Bivariate normal distributions:

p(v1, v2) =1

2πe−(v2

1 +v22 )/2.

Box-Muller method: from u1, u2 ∼ uniform distributions

v1 =√

−2 ln u1 cos(2πu2), v2 =√

−2 ln u1 sin(2πu2).

Problems

Difficult to calculate the inverse in most cases(sometimes, even impossible!).

Other methods (e.g., rejection method) are inefficient.

So – the Markov chain Monte Carlo (MCMC) way!


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Random Walk down the Markov Chains

Random walk – A drunkard’s walk:

ut+1 = µ + ut + wt ,

where wt is a random variable, and µ is the drift.For example, wt ∼ N(0, σ2) (Gaussian).

-10

-5

0

5

10

15

20

25

0 100 200 300 400 500-20

-15

-10

-5

0

5

10

-15 -10 -5 0 5 10 15 20


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Markov Chains

Markov chain: the next state only depends on the current stateand the transition probability.

P(i , j) ≡ P(Vt+1 = Sj

∣

∣

∣V0 = Sp, ...,Vt = Si)

= P(Vt+1 = Sj

∣

∣

∣Vt = Sj),

=⇒ Pijπ∗i = Pjiπ

∗j , π∗ = stionary probability distribution.

Examples: Brownian motion

ui+1 = µ + ui + ǫi , ǫi ∼ N(0, σ2).


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Markov Chains

Monopoly (board games)

Monopoly Animation


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

A Famous $Billion Markov Chain – PageRank

Google PageRank Algorithm (by Page et al., 1997)

Billions of web pages: pages = states, link probability ∼ 1/twhere t ≈ the expectation of the number of clicks.


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Googling as a Markov Chain

Rank(t+1)j =

1− α

N+ α

∑

pi∈Ω(pi )

Rank(t)i

B(pi),

where N=number of pages, B(pi) is the link bounds of page

pi , and α=a ranking factor (≈ 0.85). Rank(t=0)i = 1/N.

Let R =(

Rank1, ...,RankN

)T, and L(pi , pj ) = 0 if no links

=⇒

R =1

N

(1 − α)

...

(1 − α)

+ α

L(p1, p1) ... L(p1, pj ) ...L(p1, pN )...

L(pi , p1) L(pi , pj ) ...L(pi , pN )...

. . .

L(pN , p1) ... L(pN , pN )

R,

where∑N

i=1 L(pi , pj ) = 1. Google Matrix (stochastic, sparse).=⇒ a stationary probability distribution R (update monthly).


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Markov Chain Monte Carlo

Landmarks: Monte Carlo method (1930s, 1945, from 1950s)e.g., Metropolis Algorithm (1953), Metropolis-Hastings (1970).

Markov Chain Monte Carlo (MCMC) methods – A class ofmethods.

Really took off in 1990s, now applied to a wide range of areas:physics, Bayesian statistics, climate changes, machine learning,finance, economy, medicine, biology, materials and engineering...


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Metropolis-Hastings

The Metropolis-Hastings algorithm algorithm:

1 Begin with any initial θ0 at time t ← 0 such thatp(θ0) > 0

2 Generating a candidate sample θ∗ ∼ q(θt , .) from aproposal distribution

3 Evaluate the acceptance probability α(θt , θ∗) given by

α = min[p(θ∗)q(θ∗, θt)

p(θt)q(θt , θ∗), 1

]

4 Generate a uniformly-distributed random number u ∼Unif[0, 1], and accept θ∗ if α ≥ u. That is, if α ≥ u thenθt+1 ← θ∗ else θt+1 ← θt

5 Increase the counter or time t ← t + 1, and go to step 2


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Mixture distribution: A distribution with known

mean and variance.

f (x |µ, σ2) =∑K

i=i αipi (x |µi , σ2i ),

∑Ki=1 αi = 1.

E.g., α1 = α2 = 1/2, µ1 = 2, µ2 = −2 and σ1 = σ2 = 1.

-4

-2

0

2

4

6

0 2000 4000 6000 8000 10000

−6 −4 −2 0 2 4 60

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

When to Stop the Chain

As the MCMC runs, convergence may be reached

When does a chain converge? When to stop the chain ... ?

Are the samples correlated ?

0

100

200

300

400

500

600

0 100 200 300 400 500 600 700 800 900


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

A Long Single Chain or Multiple Short Chains?

When a Markov chain will converge in practice? If it hasconverged, what does it mean?

Is a very long chain really good enough (from statisticalpoint of view)?

How long is long enough?

Are multiple chains better?

How to improve the sampling efficiency and/or mixingproperties ?


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Simulated Tempering

Simulated annealing: temperature T from high to low.Simulated tempering: raise T to a higher value, reduce to low.

πτ = π(x)1/τ , πτ→∞ → 1, as τ →∞.

The basic idea is to reduce from a very high τ to τ0 = 1.

=⇒flatten

π≥0 πτ = π(x)1/τ

Tempering

Use flattened (near uniform) distributions asproposals/candidates to produce high quality samplings.


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Sampling: Forward or Backward? Which Way?

Is this the only way?

No! – Coupling from the Past & Metaheuristics

If we go backward along the chain, any advantages? If so, how?

Is there a universally efficient sampling tool for drawingsamples in general?

No! – No-free-lunch theorem (Wolpert & Macready, 1997)

The aim of the research is to find the best algorithm(s) for agiven/specific problem/distribution.

Also Metaheuristics (very promosing).


Monte Carlo

& MCMC

Xin-She Yang

Monte Carlo

Estimating π

Buffon’sproblem

Probability

Monte Carlo


Quality ofSampling

Quasi-MonteCarlo

Pseudorandom


Otherdistributions

Limitations


Markov

Chains

Markov chains

Markov chains


Google ...

Thank you

References

Gamerman D., Markov Chain Monte Carlo, Chapman & Hall/CRC, (1997).

Corcoran J. and Tweedie R., Perfect sampling ... Jour. Stat. Plan. Infer., 104, 297 (2002).

Cox M., Forbes A. B., Harris P. M., Smith I., Classification and solution of regression ..., NPL SSfMReport, (2004).

Propp J. & Wilson D., Exact sampling ..., Random Stru. Alg., 9, 223 (1996).

Yang X. S., Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).

Yang X. S., Introduction to Computational Mathematics, World Scientific, (2008).

Yang X. S., Engineering Optimization: An Introduction with Metaheuristic Applications, Wiley,(2010).

Acknowledgement:

EPSRC, SSfM, NPL, CUED, and London Maths Society.

Thank you!


http://www.amazon.com/Nature-Inspired-Metaheuristic-Algorithms-Xin-She-Yang/dp/1905986106

http://www.worldscibooks.com/mathematics/6867.html

http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470582464.html

Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo

Documents