Monte Carlo & MCMC Xin-She Yang Monte Carlo Estimating π Buffon’s problem Probability Monte Carlo Monte Carlo integration Quality of Sampling Quasi-Monte Carlo Pseudorandom Pseudorandom number generation Other distributions Limitations Multivariate distributions Markov Chains Markov chains Markov chains A Famous Markov Chain Google ... Monte Carlo Simulations, Sampling and Markov Chain Monte Carlo Xin-She Yang c 2010 Xin-She Yang Monte Carlo & MCMC
25
Embed
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
This is the invited talk give at the Basque Center for Applied Mathematics (BCAM) in Spain in 2010.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
How to estimate π using only a ruler and some match sticks?
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Buffon’s Needle Problem
Buffon’s needle problem (1733). Probability of crossing a line
p =2
π· Ld
,
where L = length of needles, and d =spacing.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Probability of Crossing a Line
Since p ≈ n/N ≈ 2L/πd , we have
π ≈ 2N
n· Ld
.
Lazzarini (1901): L = 5d/6, N = 3408, n = 1808, so
π ≈ 2× 3408
1808· 56≈ 3.14159290.
Too accurate?! Is this right? What happens when n = 1809?
Errors ∼ 1/√
N ∼ 2%.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Monte Carlo Methods
Everyone has used Monte Carlo methods in some way ...
Measure temperatures, choose a product, ...
Taste soup, wine ...
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Monte Carlo Integration
I =
∫
Ωfdv = V
[ 1
N
n∑
i=1
fi
]
+ O(ǫ),
ǫ ∼
√
1N
∑Ni=1 f 2
i − µ2
N∼ O(1/
√N).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Importance and Quality of the Samples
Higher dimensions – even more challenging!
I =
∫ ∫
...
∫
f (u, v , ...,w) du dv ...dw .
Errors ∼ 1/√
N
Higher dimensional integrals
How to distribute these sampling points?
Regular grids: E ∼ O(N−2/d ) in d ≥ 4 dimensions (notenough!)
Strategies: importance sampling, Latin hypercube, ...
Any other ways?
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Quasi-Monte Carlo Methods
In essence, that is to distribute (consecutive) sampling pointsas far away as possible, using quasi-random or low-discrepancynumbers (not pseudo-random)... Halton, Sobol, Corput ...
For example, Corput express an integer n as a prime base b
n =m
∑
j=0
aj(n)bj , aj ∈ 0, 1, 2, ..., b − 1.
Then, it is reversed or reflected
φb(n) =
m∑
j=0
aj(n)1
bj+1.
For example, 0, 1, 2, ..., 15 =⇒ 0, 12 , 1
4 , 34 , 1
8 , ..., 1516 .
Errors ∼ O(1/N)
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Pseudorandom numbers – by deterministic
sequences
Uniform Distributions:
di = (adi−1 + c) mod m,
Classic IBM generator:
a = 65539, c = 0, m = 231(strong correlation!)
In fact, correlation coefficient is 1!Better choice (old Matlab):
a = 75 = 16807, c = 0, m =31 −1 = 2, 147, 483, 647.
If scaled by m, all numbers are in [1/m, (m − 1)/m].New Matlab: [ǫ, 1− ǫ], ǫ = 2−53 ≈ 1.1 × 10−16.
IEEE: 64-bits system = 53 bits for a signed fraction in base 2and 11 bits for a signed exponent.
Box-Muller method: from u1, u2 ∼ uniform distributions
v1 =√
−2 ln u1 cos(2πu2), v2 =√
−2 ln u1 sin(2πu2).
Problems
Difficult to calculate the inverse in most cases(sometimes, even impossible!).
Other methods (e.g., rejection method) are inefficient.
So – the Markov chain Monte Carlo (MCMC) way!
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Random Walk down the Markov Chains
Random walk – A drunkard’s walk:
ut+1 = µ + ut + wt ,
where wt is a random variable, and µ is the drift.For example, wt ∼ N(0, σ2) (Gaussian).
-10
-5
0
5
10
15
20
25
0 100 200 300 400 500-20
-15
-10
-5
0
5
10
-15 -10 -5 0 5 10 15 20
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Markov Chains
Markov chain: the next state only depends on the current stateand the transition probability.
P(i , j) ≡ P(Vt+1 = Sj
∣
∣
∣V0 = Sp, ...,Vt = Si)
= P(Vt+1 = Sj
∣
∣
∣Vt = Sj),
=⇒ Pijπ∗i = Pjiπ
∗j , π∗ = stionary probability distribution.
Examples: Brownian motion
ui+1 = µ + ui + ǫi , ǫi ∼ N(0, σ2).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Markov Chains
Monopoly (board games)
Monopoly Animation
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
A Famous $Billion Markov Chain – PageRank
Google PageRank Algorithm (by Page et al., 1997)
Billions of web pages: pages = states, link probability ∼ 1/twhere t ≈ the expectation of the number of clicks.
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Googling as a Markov Chain
Rank(t+1)j =
1− α
N+ α
∑
pi∈Ω(pi )
Rank(t)i
B(pi),
where N=number of pages, B(pi) is the link bounds of page
pi , and α=a ranking factor (≈ 0.85). Rank(t=0)i = 1/N.
Let R =(
Rank1, ...,RankN
)T, and L(pi , pj ) = 0 if no links
=⇒
R =1
N
(1 − α)
...
(1 − α)
+ α
L(p1, p1) ... L(p1, pj ) ...L(p1, pN )...
L(pi , p1) L(pi , pj ) ...L(pi , pN )...
. . .
L(pN , p1) ... L(pN , pN )
R,
where∑N
i=1 L(pi , pj ) = 1. Google Matrix (stochastic, sparse).=⇒ a stationary probability distribution R (update monthly).
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Markov Chain Monte Carlo
Landmarks: Monte Carlo method (1930s, 1945, from 1950s)e.g., Metropolis Algorithm (1953), Metropolis-Hastings (1970).
Markov Chain Monte Carlo (MCMC) methods – A class ofmethods.
Really took off in 1990s, now applied to a wide range of areas:physics, Bayesian statistics, climate changes, machine learning,finance, economy, medicine, biology, materials and engineering...
Xin-She Yang Monte Carlo & MCMC
Monte Carlo
& MCMC
Xin-She Yang
Monte Carlo
Estimating π
Buffon’sproblem
Probability
Monte Carlo
Monte Carlointegration
Quality ofSampling
Quasi-MonteCarlo
Pseudorandom
Pseudorandomnumbergeneration
Otherdistributions
Limitations
Multivariatedistributions
Markov
Chains
Markov chains
Markov chains
A FamousMarkov Chain
Google ...
Metropolis-Hastings
The Metropolis-Hastings algorithm algorithm:
1 Begin with any initial θ0 at time t ← 0 such thatp(θ0) > 0
2 Generating a candidate sample θ∗ ∼ q(θt , .) from aproposal distribution
3 Evaluate the acceptance probability α(θt , θ∗) given by
α = min[p(θ∗)q(θ∗, θt)
p(θt)q(θt , θ∗), 1
]
4 Generate a uniformly-distributed random number u ∼Unif[0, 1], and accept θ∗ if α ≥ u. That is, if α ≥ u thenθt+1 ← θ∗ else θt+1 ← θt
5 Increase the counter or time t ← t + 1, and go to step 2