Top Banner
Markov-Chain Monte-Carlo Advanced Seminar “Machine Learning” Sascha Meusel 04.02.2015 Winter Semester 2014/2015
22

Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Sep 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Markov-Chain Monte-CarloAdvanced Seminar “Machine Learning”

Sascha Meusel

04.02.2015

Winter Semester 2014/2015

Page 2: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Motivation

What is Markov-Chain Monte-Carlo, and what use has it?

Problems can be difficult to solve analytically,or don’t even have any analytical solutionMCMC is a class of algorithms based on Monte Carlosampling, tackling such problems

for Monte Carlo: needed distributions can be difficult toimplement (e.g. non-Gaussian / non-Uniform)but Markov chains can provide also complexer distributionsMarkov chains are a kind of state machines with transitions toother states having a certain probabilityStarting with an initial state, calculate the probability whicheach state will have after N transitions→ distribution over states

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 2 / 22

Page 3: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Motivation

Example: calculate volume of d-dimensional convex bodySolution with MCMC: Formulate distribution over x ∈ Rd with

p(x) ={

1 if x inside body0 else

Draw N samples xi of a d-dimensional bounding box BB inRd with the convex body completely inside itThe volume of the bounding box is known(side1 ∗ side2 ∗ ... ∗ sided)|samples inside box|

N ∗ volume(BB) ≈ volume(body)In this simple example no Markov chain usage is visible, but thereexists more sophisticated MCMC methods using Markov chains tosolve this problem.

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 3 / 22

Page 4: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Contents

1 Motivation

2 IntroductionIntroduction to Monte-CarloIntroduction to Markov-Chains

3 Markov-Chain Monte-CarloMetropolis-HastingsRejection SamplingImportance SamplingGibbs samplingHybrid Monte CarloSlice sampling

4 References

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 4 / 22

Page 5: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Introduction to Monte-Carlo

Task: expectation value needed:

Ep(x)[f (x)] =∫

f (x)p(x)dx

Problem: no or only an expensive analytical solutionSolution: Sample over p(x):

Ep(x)[f (x)] ≈ f̂ = 1S

S∑s=1

f (x(s)), x(x) ∼ p(x)

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 5 / 22

Page 6: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Introduction to Monte-Carlo

Properties:Unbiased estimator f̂ :

Ep({x(s)})

[f̂]

=S∑

s=1Ep(x)[f (x)] = Ep({x(s)}) [f (x)]

Variance shrinks ∝ 1S :

varp({x(s)})

[f̂]

= 1S2

S∑s=1

varp(x)[f (x)] = 1S varp({x(s)}) [f (x)]

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 6 / 22

Page 7: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Introduction to Markov-Chains

Markov chain on finite state space:

stochastic process x(i) ∈ X = {x1, ..., xS}(sequence of random variables)p(x(i)|x(i−1), ..., x(1)) = T (x(i)|x(i−1))→ T depends only on current state i − 1

homogeneous Markov chain:

T is invariant ∀i, with∑

x(i)∈X T (x(i)|x(i−1)) = 1 ∀i→ fixed T matrix, withp(x(i)|x(i−1), ..., x(1)) = Tp(x(i−1)|x(i−2), ..., x(1))given irreducibility and aperiodity, chain converges toinvariant distribution p(x) after several steps:pN (x) = TN p0(x)

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 7 / 22

Page 8: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Introduction to Markov-Chains

T =

0 0 0.61 0.1 0.40 0.9 0

, initial distribution: p0(x) =

0.50.20.3

Ti,j : the probability to go to state i given state jpN (x) = TN p0(x), gives pN (x) =

(0.2 0.4 0.4

)T

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 8 / 22

Page 9: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Introduction to Markov-Chains

Markov chain on continuous state space:∫p(x(i))K (x(i+1)|x(i))dx(i) = p(x(i+1))

instead T an integral kernel K: the conditional density ofx(i+1) given x(i)

is a mathematical description for a Markov chain algorithm

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 9 / 22

Page 10: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Metropolis-Hastings

proposal distribution q(x∗|x), with x∗ being a samplingcandidate and x being the current valuetarget distribution p(x)acceptance probability A(x(i), x∗) = min

(1, p(x∗)q(x(i)|x∗)

p(x(i))q(x∗|x(i))

)initialize x(0)

for i = 0 to N − 1:sample u ∼ U[0,1] //U is uniform distributionsample x∗ ∼ q(x∗|x(i))if u < A(x(i), x∗):

x(i+1) = x∗

else:x(i+1) = x(i)

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 10 / 22

Page 11: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Metropolis-Hastings

proposal distribution q(x∗|x(i)) = N (x(i), 100)bimodal target distribution p(x) ∝ 0.3e−0.2x2 + 0.7e−0.2(x−10)2

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 11 / 22

Page 12: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Metropolis-Hastings

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 12 / 22

Page 13: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Rejection Sampling

Given: complex distribution p(x)Choose a distribution q(x) which we can sample (e.g. Gaussian)Find factor M , so that p(x) ≤ Mq(x), with M <∞

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 13 / 22

Page 14: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Rejection Sampling

Sampling algorithm:

i := 1while i ≤ N :

sample x(i) ∼ q(x)sample u ∼ U(0,Mq(x(i)))if u < p(x(i)):

accept x(i) as samplei + +

else: reject sample

To avoid too many rejections, Mq(x) should be chosen so that itbounds p(x) as strong as possible.

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 14 / 22

Page 15: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Importance Sampling

∫f (x)p(x)dx =

∫f (x)p(x)

q(x)q(x)dx

≈ 1S

S∑s=1

f (x(s))p(x(s))q(x(s))

, with x(s) ∼ q(x)

p(x(s))q(x(s)) is the importance weight w(s)

So we can simply sample over q(x) and multiply each sample withits weight w(s) → no rejections

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 15 / 22

Page 16: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Gibbs sampling

Let x be n-dimensional.Also let be given, that we can calculate

p(xj |x1, ..., xj−1, xj+1, ..., xn) = p(xj |x−j)

with a proposal distribution q(x∗|x(i)) ={

p(x∗j |x(i)−j ) if x∗−j = x(i)

−j0 else

and A(x(i), x∗) = min(1, p(x∗)q(x(i)|x∗)

p(x(i))q(x∗|x(i))

)= min

(1, p(x∗

−j)p(x(i)

−j )

)= 1

initialize x(0)1:n

for i = 0 to N − 1:for j = 0 to n:

x(i+1)j ∼ p(xj |x(i)

1 , ..., x(i)j−1, x

(i)j+1, ..., x

(i)n )

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 16 / 22

Page 17: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Hybrid Monte Carlo

Also known as Hamilton Monte Carlo.Basic idea: using gradient of target distribution

Simulate walk through target distribution as a sphere withoutfriction on a potential field surface.Therefore auxiliary variables u ∈ Rnx needed to storemomentum of sphere.Sphere will be more often in areas with lower potential, sothose areas represents regions in the target distribution withhigher density.parameters: stepsize ρ and number of steps per iteration L

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 17 / 22

Page 18: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Hybrid Monte Carlo

initialize x(0)

for i = 0 to N − 1:sample v ∼ U[0,1] and u∗ ∼ N (0, Inx )define x0 = x(i) and u0 = u∗ + ρ∆(x0)/2for l = 1 to L:

xl = xl−1 + ρul−1

ul = ul−1 + ρl∆(xl) with ρl ={ρ if l < Lρ/2 if l = L

(x(i+1), u(i+1)) ={

(xL, uL) if v < A(x(i), u∗) else

with ∆(x) = ∂∂x logp(x),

and A = min(1, p(xL)

p(x(i))exp(−12(uT

L uL − u∗T u∗)))

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 18 / 22

Page 19: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Slice sampling

Idea: use auxiliary variable u ∈ R and extended target distribution

p∗(x, u) ={

1 if 0 ≤ u ≤ p(x)0 else

with∫

p∗(x, u)du =∫ p(x)

0du = p(x)

So we can sample over p∗(x, u) and then ignore u. We can alsoextend this to L many variables, resulting to following sampler:

for l = 1 to L:sample u(i)

l ∼ U[0,fl(x(i−1))](ul)

sample x(i) ∼ UA(i)(x)with A(i) = {x|fl(x) ≥ u(i)

l , l = 1, ...,L}

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 19 / 22

Page 20: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Slice sampling

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 20 / 22

Page 21: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

Thanks for your Attention :-)

Thanks for your attention :-)

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 21 / 22

Page 22: Markov-Chain Monte-Carlo - Advanced Seminar ``Machine ... · In this simple example no Markov chain usage is visible, but there exists more sophisticated MCMC methods using Markov

Motivation Introduction Markov-Chain Monte-Carlo References

References

Andrieu, Christophe and de Freitas, Nandoand Doucet, Arnaud and Jordan, Michael I.:An Introduction to MCMC for Machine LearningIn: Machine Learning, Kluwer Academic Publishers, 2003, 50,5-43.Murray, Iain:Markov chain Monte CarloIn: Tutorial at Machine Learning Summer School, 2009.

Sascha Meusel Advanced Seminar “Machine Learning” WS 14/15: Markov-Chain Monte-Carlo 04.02.2015 22 / 22