Introduction to Monte Carlo Methods Lecture Notes Chapter 1 Introduction, Examples, and References Qing Zhou * Contents 1 Introduction ................................. 2 1.1 Calculating Area ........................... 2 1.2 Computation of Integral ....................... 3 1.3 Computing Expectations ....................... 4 1.4 Bayesian Inference .......................... 4 1.5 Statistical Physics .......................... 5 2 Inverse-Transformation Method ...................... 7 3 Sampling from Finite Discrete Distributions ............... 10 3.1 Bernoulli Distribution ........................ 10 3.2 Any Finite Discrete Distribution .................. 10 4 The Acceptance-Rejection Method .................... 12 5 Composition Methods ........................... 16 5.1 Normal Distribution ......................... 16 5.1.1 Normal random variables .................. 16 5.1.2 Multivariate Normal ..................... 17 5.2 Mixture Distributions ........................ 17 * UCLA Department of Statistics (email: [email protected]). 1
18
Embed
Chapter 1 Introduction and Exampleszhou/courses/Stats102C-Intr.pdfedition), Springer. 2.Howard M. Taylor and Samuel Karlin (1998) An Introduction to Stochastic Modeling (third edition),
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
This set of lecture notes, consisting of four chapters, is for an undergraduatecourse on Monte Carlo methods. Two main references are:
1. Jun S. Liu (2001) Monte Carlo Strategies in Scientific Computing (firstedition), Springer.
2. Howard M. Taylor and Samuel Karlin (1998) An Introduction to StochasticModeling (third edition), Academic Press.
In particular, materials for sequential importance sampling and Markov chainMonte Carlo are mostly adapted from selected topics in chapters 2, 3, 5 and6 of Liu (2001), supplemented with some simpler examples. A brief introduc-tion to Markov chains is developed based on chapters 3 and 4 of Taylor andKarlin (1998).
1. Introduction
Goal of Monte Carlo: Use computer simulation to generate random variablesfrom a given distribution p(x).
1.1. Calculating Area
y
xx1 x2
y1
y2
DWant to compute the area of Din R2.
Find rectangle A : [x1, x2]× [y1, y2] ⊃ D; Randomly generate n points in A andsuppose M of them in D,
Sn(D) =M
n· S(A) =
M
n· (x2 − x1)(y2 − y1).
P(a point in D) =S(D)
S(A):= p. Let M (random variable) be the number of points
Zhou, Q/Monte Carlo Methods, Chapter 1 3
in D if n points are uniformly generated in A. Then
M ∼ Bin(n, p),
E(M) = n · p ⇒ E(M
n
)= p,
Var(M) = np(1− p), Var
(M
n
)=p(1− p)
n→ 0 as n→∞.
∴ limn→∞
E[M/n− p
]2= 0 ⇒ lim
n→∞E[Sn(D)− S(D)
]2= 0.
Or apply strong law of large numbers (SLLN), M =∑ni=1Xi, Xi ∼iid Bern(p):
1. Generate U ∼ Unif(0, 1); [(1− U ∼Unif(0, 1))]2. Let X = − 1
λ logU ∼ exp(λ).
Zhou, Q/Monte Carlo Methods, Chapter 1 9
Example 3. Geometric Distribution Ge(p).
P (X = k) = (1 − p)pk, for k = 0, 1, 2, · · · . Here X is the number of successesbefore the first failure in a sequence of independent Bern(p) experiments.
F (x) =∑[x]k=0 P (x = k) = 1− p[x]+1, [x]: integer part of x.
X = F−1(U) = min{z : F (z) ≥ U}.
F (z) = 1− p[z]+1 ≥ U ⇒ [z] ≥ log(1− U)
log p− 1.
X = min
{z : [z] ≥ log(1− U)
log p− 1
}=
[log(1− U)
log p
]. (We can ignore the case
that log(1− U)/log p is an integer. Why?)
∴ X = F−1(U) =
[log(1− U)
log p
].
1. U ∼ Unif(0, 1);2. Let X = [logU/ log p] ∼ Ge(p).
1. Generate U ∼ Unif (0, 1);2. If U ≤ p, X = 1; otherwise, X = 0.
Proof. P (X = 1) = P (U ≤ p) = p, P (X = 0) = P (U > p) = 1− p.
3.2. Any Finite Discrete Distribution
P(X = xk) = pk, k ∈ [m] as in (2).
Put F0 = 0 and Fk =k∑i=1
pi for k ∈ [m]. Note Fk = P(X ≤ xk) and Fm = 1.
1. Generate U ∼Unif(0, 1);2. If Fk−1 < U ≤ Fk, then X = xk.
Proof. P (X = xk) = P (U ∈ (Fk−1, Fk]) = Fk − Fk−1 = pk.
𝑋 = 𝐹$% 𝑈 = 𝑥2
𝐹2
𝑈
𝑥3
𝐹3 = 1
𝑥2𝑥1
𝐹1
This is in fact the inverse-cdf method: Let I(xi ≤ z < xi+1) be the indicatorfunction of {xi ≤ z < xi+1}. Then the c.d.f. of X is F (z) = FiI(xi ≤ z < xi+1).
Zhou, Q/Monte Carlo Methods, Chapter 1 11
Thus, F (z) = FiI(xi ≤ z < xi+1) ≥ U ∈ (Fk−1, Fk] if and only if z ≥ xk. ByTheorem 1, if U ∈ (Fk−1, Fk],
X = F−1(U) = min{z : F (z) ≥ U} = min{z : z ≥ xk} = xk.
Example 4. Suppose the joint distribution of X and Y is given by:
X\Y 0 10 0.2 0.61 0.1 0.1
Then regard x1 = (0, 0), x2 = (0, 1), x3 = (1, 0) and x4 = (1, 1) and apply thesame algorithm.
Zhou, Q/Monte Carlo Methods, Chapter 1 12
4. The Acceptance-Rejection Method
Consider a pdf f(x), defined on [a, b], ∃M ≥ f(x) ∀x ∈ [a, b].
(1) Draw X ∼ Unif (a, b), compute r(X) = f(X)/M ;(2) Draw U ∼ Unif (0, 1).
If U ≤ r(X), accept X; otherwise, repeat (1) and (2).
Lemma 2. If X is accepted in the above algorithm, then it follows the distri-bution with pdf f(x).
Proof. Want to show the conditional density pX(x | X is accepted) = f(x).For continuous random variable X, p(X = x) = pX(x) is understood as itsprobability density at x.
(A) If c < 0, generate Z ∼ N(0, 1). Accept Z if Z > c. P (Acceptance) ≥ 0.5
(B) If c� 0, P (Acceptance) = 1− Φ(c)→ 0.
Acceptance - Rejection: Use shifted Exp(λ) as the trial distribution, so
g(x) =
{λe−λ(x−c), for x > c;0, otherwise.
If X ′ ∼ Exp(λ), then X = X ′ + c ∼ g.
Zhou, Q/Monte Carlo Methods, Chapter 1 15
M = maxx>c
f(x)
g(x)=
maxφ(x)
1− Φ(c)
eλ(x−c)
λ=
max exp(−x2
2 + λx− λc)√
2πλ(1− Φ(c))
Plug in x = λ maximize f(x)g(x) ⇒ M =
exp(λ2
2 −λc)√2πλ(1−Φ(c))
if λ > c.
Choose λ such that M(λ) is minimized, subject to λ > c.
⇒ λ∗ = c+√c2+42 (> c)
∴ g(x) = λe−λ(x−c) for x > c, where λ = c+√c2+42 .
Under this design, P (Acceptance) = 0.76, 0.88, 0.93 for c = 0, 1, 2, respectively.
Remark 1. In fact, we do not need to know the normalizing constant to dorejection sampling. Suppose we want to draw from p(x) = f(x)/Z, where Z =∫fdx <∞ is the normalizing constant, and f(x) is given but Z is unknown or
cannot be computed easily. We can apply the same rejection sampling methodwith f(x) (unnormalized). Then the distribution of an accepted sample X isp(x). This can be shown by modifying the proof of Theorem 3: The normalizingconstant Z cancels when calculating pX(x | X is accepted).
Zhou, Q/Monte Carlo Methods, Chapter 1 16
5. Composition Methods
5.1. Normal Distribution
5.1.1. Normal random variables
Suppose we want to draw a pair of iid N(0, 1):(XY
)∼ N
((00
),
(1 00 1
)),
the joint pdf is fXY (x, y) = 12π exp
(−x
2+y2
2
).
Consider polar coordinates:
{x = r cos θy = r sin θ
.
Jacobian of (r, θ) 7→ (x, y) is
det
[∂(x, y)
∂(r, θ)
]= r.
Apply change-of-variables:
fXY (x, y)dxdy =1
2πexp
(− r2
2
)rdrdθ =
1
2πdθ · 1
2e−
r2
2 d(r2)
= fΘ,R2(θ, r2)d(r2)dθ
Therefore, the density of (Θ, R2) is
fΘ,R2(θ, r2) =
(1
2π
)·(
1
2e−
r2
2
),
i.e. Θ ∼ Unif(0, 2π) and R2 ∼ Exp(1/2) are independent.
Now we can generate a bivariate normal vector by the following algorithm:
Lemma 4. If Z = (z1, · · · , zp)> ∼ N(0, Ip), then Y = µ+A·Z ∼ N(µ,AIpA>︸ ︷︷ ︸
=AA>
).
Cholesky Decomposition: Finding A s.t. AA> = Σ.
Theorem 5 (Cholesky decomposition). If Σ is positive definite (and symmet-ric), there exists a unique lower triangular matrix T = (tij), (tij = 0, i < j) withpositive diagonal elements s.t. Σ = TT>.
Algorithm to sample from X ∼ N(µ,Σ) (p-variate Normal):
1. Generate Z1, Z2, · · · , Zpiid∼ N(0, 1), let Z = (z1, · · · , zp)>;
2. Cholesky decomposition Σ = A ·A>, A: lower triangular;
3. Let X = µ+AZ. Then X ∼ N(µ,Σ).
5.2. Mixture Distributions
A mixture distribution has pdf
f(x) =
K∑i=1
θifi(x),
fi(x) : pdf of a component distribution,∫fidx = 1.
Algorithm to draw from the mixture distribution f :
1. Generate Z ∼ Discrete (θ1, θ2, · · · , θK); i.e., P (Z = i) = θi for i =1, . . . ,K.
2. Generate X ∼ fZ i.e. X ∼ fi if Z = i.
To verify this algorithm, we need to confirm that the pdf of X, pX(x), is indeedf . Note that the algorithm generates a pair of random variables (Z,X) so themarginal distribution of X is given by
pX(x) =∑i
pX,Z(x, i) =∑i
P(Z = i)pX|Z(x | i) =∑i
θifi(x) = f(x).
Computation of Cholesky Decomposition
B = (bij)n×n, B is symmetric and B > 0, B = TT>.b11 b12 · · · b1nb21 b22 · · · b2n...