Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Markov-Chain Monte-CarloAdvanced Seminar “Machine Learning”

Sascha Meusel

04.02.2015

Winter Semester 2014/2015

Motivation Introduction Markov-Chain Monte-Carlo References

Motivation

What is Markov-Chain Monte-Carlo, and what use has it?

Problems can be difficult to solve analytically,or don’t even have any analytical solutionMCMC is a class of algorithms based on Monte Carlosampling, tackling such problems

for Monte Carlo: needed distributions can be difficult toimplement (e.g. non-Gaussian / non-Uniform)but Markov chains can provide also complexer distributionsMarkov chains are a kind of state machines with transitions toother states having a certain probabilityStarting with an initial state, calculate the probability whicheach state will have after N transitions→ distribution over states

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 2 / 22


Motivation

Example: calculate volume of d-dimensional convex bodySolution with MCMC: Formulate distribution over x ∈ Rd with

p(x) ={

1 if x inside body0 else

Draw N samples xi of a d-dimensional bounding box BB inRd with the convex body completely inside itThe volume of the bounding box is known(side1 ∗ side2 ∗ ... ∗ sided)|samples inside box|

N ∗ volume(BB) ≈ volume(body)In this simple example no Markov chain usage is visible, but thereexists more sophisticated MCMC methods using Markov chains tosolve this problem.



Contents

1 Motivation

2 IntroductionIntroduction to Monte-CarloIntroduction to Markov-Chains

3 Markov-Chain Monte-CarloMetropolis-HastingsRejection SamplingImportance SamplingGibbs samplingHybrid Monte CarloSlice sampling

4 References



Introduction to Monte-Carlo

Task: expectation value needed:

Ep(x)[f (x)] =∫

f (x)p(x)dx

Problem: no or only an expensive analytical solutionSolution: Sample over p(x):

Ep(x)[f (x)] ≈ f̂ = 1S

S∑s=1

f (x(s)), x(x) ∼ p(x)



Introduction to Monte-Carlo

Properties:Unbiased estimator f̂ :

Ep({x(s)})

[f̂]

=S∑

s=1Ep(x)[f (x)] = Ep({x(s)}) [f (x)]

Variance shrinks ∝ 1S :

varp({x(s)})

[f̂]

= 1S2

S∑s=1

varp(x)[f (x)] = 1S varp({x(s)}) [f (x)]



Introduction to Markov-Chains

Markov chain on finite state space:

stochastic process x(i) ∈ X = {x1, ..., xS}(sequence of random variables)p(x(i)|x(i−1), ..., x(1)) = T (x(i)|x(i−1))→ T depends only on current state i − 1

homogeneous Markov chain:

T is invariant ∀i, with∑

x(i)∈X T (x(i)|x(i−1)) = 1 ∀i→ fixed T matrix, withp(x(i)|x(i−1), ..., x(1)) = Tp(x(i−1)|x(i−2), ..., x(1))given irreducibility and aperiodity, chain converges toinvariant distribution p(x) after several steps:pN (x) = TN p0(x)




T =

0 0 0.61 0.1 0.40 0.9 0

, initial distribution: p0(x) =

0.50.20.3

Ti,j : the probability to go to state i given state jpN (x) = TN p0(x), gives pN (x) =

(0.2 0.4 0.4

)T




Markov chain on continuous state space:∫p(x(i))K (x(i+1)|x(i))dx(i) = p(x(i+1))

instead T an integral kernel K: the conditional density ofx(i+1) given x(i)

is a mathematical description for a Markov chain algorithm



Metropolis-Hastings

proposal distribution q(x∗|x), with x∗ being a samplingcandidate and x being the current valuetarget distribution p(x)acceptance probability A(x(i), x∗) = min

(1, p(x∗)q(x(i)|x∗)

p(x(i))q(x∗|x(i))

)initialize x(0)

for i = 0 to N − 1:sample u ∼ U[0,1] //U is uniform distributionsample x∗ ∼ q(x∗|x(i))if u < A(x(i), x∗):

x(i+1) = x∗

else:x(i+1) = x(i)



Metropolis-Hastings

proposal distribution q(x∗|x(i)) = N (x(i), 100)bimodal target distribution p(x) ∝ 0.3e−0.2x2 + 0.7e−0.2(x−10)2



Metropolis-Hastings



Rejection Sampling

Given: complex distribution p(x)Choose a distribution q(x) which we can sample (e.g. Gaussian)Find factor M , so that p(x) ≤ Mq(x), with M <∞



Rejection Sampling

Sampling algorithm:

i := 1while i ≤ N :

sample x(i) ∼ q(x)sample u ∼ U(0,Mq(x(i)))if u < p(x(i)):

accept x(i) as samplei + +

else: reject sample

To avoid too many rejections, Mq(x) should be chosen so that itbounds p(x) as strong as possible.



Importance Sampling

∫f (x)p(x)dx =

∫f (x)p(x)

q(x)q(x)dx

≈ 1S

S∑s=1

f (x(s))p(x(s))q(x(s))

, with x(s) ∼ q(x)

p(x(s))q(x(s)) is the importance weight w(s)

So we can simply sample over q(x) and multiply each sample withits weight w(s) → no rejections



Gibbs sampling

Let x be n-dimensional.Also let be given, that we can calculate

p(xj |x1, ..., xj−1, xj+1, ..., xn) = p(xj |x−j)

with a proposal distribution q(x∗|x(i)) ={

p(x∗j |x(i)−j ) if x∗−j = x(i)

−j0 else

and A(x(i), x∗) = min(1, p(x∗)q(x(i)|x∗)

p(x(i))q(x∗|x(i))

)= min

(1, p(x∗

−j)p(x(i)

−j )

)= 1

initialize x(0)1:n

for i = 0 to N − 1:for j = 0 to n:

x(i+1)j ∼ p(xj |x(i)

1 , ..., x(i)j−1, x

(i)j+1, ..., x

(i)n )



Hybrid Monte Carlo

Also known as Hamilton Monte Carlo.Basic idea: using gradient of target distribution

Simulate walk through target distribution as a sphere withoutfriction on a potential field surface.Therefore auxiliary variables u ∈ Rnx needed to storemomentum of sphere.Sphere will be more often in areas with lower potential, sothose areas represents regions in the target distribution withhigher density.parameters: stepsize ρ and number of steps per iteration L



Hybrid Monte Carlo

initialize x(0)

for i = 0 to N − 1:sample v ∼ U[0,1] and u∗ ∼ N (0, Inx )define x0 = x(i) and u0 = u∗ + ρ∆(x0)/2for l = 1 to L:

xl = xl−1 + ρul−1

ul = ul−1 + ρl∆(xl) with ρl ={ρ if l < Lρ/2 if l = L

(x(i+1), u(i+1)) ={

(xL, uL) if v < A(x(i), u∗) else

with ∆(x) = ∂∂x logp(x),

and A = min(1, p(xL)

p(x(i))exp(−12(uT

L uL − u∗T u∗)))



Slice sampling

Idea: use auxiliary variable u ∈ R and extended target distribution

p∗(x, u) ={

1 if 0 ≤ u ≤ p(x)0 else

with∫

p∗(x, u)du =∫ p(x)

0du = p(x)

So we can sample over p∗(x, u) and then ignore u. We can alsoextend this to L many variables, resulting to following sampler:

for l = 1 to L:sample u(i)

l ∼ U[0,fl(x(i−1))](ul)

sample x(i) ∼ UA(i)(x)with A(i) = {x|fl(x) ≥ u(i)

l , l = 1, ...,L}



Slice sampling



Thanks for your Attention :-)

Thanks for your attention :-)



References

Andrieu, Christophe and de Freitas, Nandoand Doucet, Arnaud and Jordan, Michael I.:An Introduction to MCMC for Machine LearningIn: Machine Learning, Kluwer Academic Publishers, 2003, 50,5-43.Murray, Iain:Markov chain Monte CarloIn: Tutorial at Machine Learning Summer School, 2009.


Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Documents