Hybrid Monte Carlo: Geometric Integration and Statistics · Hybrid Monte Carlo: Geometric Integration and Statistics Andrew Stuart1 1Mathematics Institute and Centre for Scientiﬁc

INVARIANT MEASURES AND DYNAMICAL SYSTEMS MARKOV CHAIN MONTE CARLO EXPLICIT DISCRETIZATIONS IMPLICIT DISCRETIZATIONS CONCLUSIONS

Hybrid Monte Carlo:Geometric Integration and Statistics

Andrew Stuart1

1Mathematics Institute andCentre for Scientific Computing

University of Warwick

SCMS2010Heriot-Watt, September 6th 2010

Funded by EPSRC, ONR


“Stable Periodic Bifurcations of an Explicit Discretization of aNonlinear Partial Differential Equation in Reaction Diffusion”.D.F. Griffiths and A.R.MitchellIMA J Numerical Analysis 8(1988), 435-454.


Outline

1 INVARIANT MEASURES AND DYNAMICAL SYSTEMS

2 MARKOV CHAIN MONTE CARLO

3 EXPLICIT DISCRETIZATIONS

4 IMPLICIT DISCRETIZATIONS

5 CONCLUSIONS


Outline





5 CONCLUSIONS


Goals of Work

To find numerical methods to sample a probability densityfunction (pdf) π : Rn → R+.

To analyze and develop methods in the cases of highdimensions n� 1.Baisc building blocks are π−invariant dynamical systemsand MCMC methodology.


Langevin Stochastic Dynamics

Let A be a positive-definite symmetric matrix.The Langevin SDE is

x = A∇ logπ(x) +√

2AW .

This equation is π− invariant:if x(0) ∼ π then x(t) ∼ π for all t > 0.


Hybrid Monte Carlo

(Duane et al 1987)

Let A be a positive-definite symmetric matrix.Define the Hamiltonian

H(x ,p) =12〈p,Ap〉 − logπ(x).

Hamiltons equations are

x = Ap,p = ∇

(logπ(x)

).

Assume that p(0) ∼ N (0,A−1).This equation is π− invariant:if x(0) ∼ π, then x(t) ∼ π for all t > 0.


Outline





5 CONCLUSIONS


Metropolis-Hastings Algorithm

Enforcing π−invariant dynamics via accept-reject:

1. Set k = 0 and choose x (0) ∈ Rn.

2. Propose y = G(x (k), ξ(k),∆t), ξ(k) ∼ N (0,1).

3. Set x (k+1) = y with probability α; else x (k+1) = x (k).4. Set k → k + 1 and goto 2.

Step 3. is the proposal: how do we choose it?


Choice of Parameters

We choose G to be a time-discretization of one of theinvariant dynamical systems.We want largest ∆t compatible with O(1) averageacceptance probability for n� 1.Choose ∆t = n−γ Courant condition.γ0 = minγ≥0

{γ : lim infn→∞ Eα > 0

}.

Number of steps required to adequately sample π is thenM(n) = O(∆t)−1 = O(nγ0).


Structure of the Target

IID Product in Rn

π0(x) = Πni=1f (xi).

Change of Measure From Gaussian in Rn

π(x) = exp(−Φn(x)

)π0(x)

π0(x) ∝ exp(−1

2〈x , C−1

0 x〉).

The covariance matrix C0 is assumed to have conditionnumber O(n2k ). The resulting Gaussian part is assumed todominate Φn, uniformly in n.


Outline





5 CONCLUSIONS


Langevin 1


Recall that the SDE

x = ∇ logπ0(x) +√

2W

is π0− invariant. We use the following discretization asproposal:

Proposaly − x

∆t= β∇ logπ0(x) +

√2

∆tξ, ξ ∼ N (0, I).

Theorem 1. (Roberts et al 97, Roberts/Rosenthal 98)β = 0 then M(n) = O(n1).β = 1 then M(n) = O(n1/3).

Steepest Descents Impacts Cost


Langevin 2

π(x) = exp(−Φn(x)

)π0(x) = exp

(−Φn(x)− 1

2〈x , C−1

0 x〉).

Proposaly − x

∆t= A∇ logπ0(x) +

√2A∆t

ξ, ξ ∼ N (0, I).

Theorem 2. (Beskos, Roberts, Stuart 2009)

A = I then M(n) = O(n(2k+1/3)).

A = C0 then M(n) = O(n1/3).

Preconditioning Impacts Cost


Hybrid Monte Carlo 1

The key dynamical system is:

x = Ap,p = ∇

(logπ(x)

).

Volume preserving reversible integration is required toensure that the acceptance probability α is tractable.This can be achieved by operator splitting (eg Verlet)based on the two dynamical systems

x = Ap,∥∥∥ x = 0,

p = 0.∥∥∥ p = ∇

(logπ(x)

).




Theorem 3. (Beskos, Pillai, Roberts, Sanz-Serna and Stuart2010)

For Verlet integration within Hybrid Monte Carlo we haveM(n) = O(n1/4).

Hamiltonian Formulation Impacts Cost


Outline





5 CONCLUSIONS


Langevin

π(x) = exp(−Φn(x)− 1

2〈x , C−1

n x〉).

y − x∆t

+ A(θC−1

n y + (1− θ)C−1n x

)=

√2A∆t

ξ, ξ ∼ N (0, I).

Theorem 4. (Beskos, Roberts, Stuart 2009)

θ 6= 12 and A = Cn then M(n) = O(n1/3).

θ = 12 and A = I, Cn then M(n) = O(1).

Implicitness Impacts Cost



The key dynamical system is:

x = Ap,

p = −C−1n x −∇Φn(x).

Volume preserving reversible integration is required toensure that the acceptance probability α is tractable.This can be achieved by operator splitting based on thetwo dynamical systems

x = Ap,∥∥∥ x = 0,

p = −C−1n x .

∥∥∥ p = −∇Φn(x).



If we use a second-order (Strang-splitting) for this operator-splitthen we obtain:

Theorem 4. (Beskos, Pinski, Sanz-Serna and Stuart 2010)If A = Cn then M(n) = O(1).

Implicitness Impacts Cost


Outline





5 CONCLUSIONS


What We Have Shown

We have shown that the following ideas from numericalanalysis have direct impact on MCMC based statisticalsampling methods in high dimensions:

Steepest descentsPreconditioningImplicit integration for dissipative systemsGeometric integration


References

For all papers see:

http : //www .maths.warwick .ac.uk/ ∼ masdr/sample.html

A. Beskos, N. Pillai, G.O. Roberts, J.-M. Sanz-Serna andA.M. Stuart. “Optimal tuning of hybrid Monte Carlo”.Submitted.A. Beskos, F. Pinski, J.-M. Sanz-Serna and A.M. Stuart.“Hybrid Monte Carlo on Hilbert spaces”. Submitted.A. Beskos, G.O. Roberts and A.M. Stuart. ”Optimalscalings for local Metropolis-Hastings chains onnon-product targets in high dimensions.” Ann. Appl. Prob.19(2009), 863–898.


References (Continued)

A. Beskos and A.M. Stuart. ”MCMC Methods for SamplingFunction Space”. To appear, proceedings of ICIAM 2007.M. Hairer, A.M.Stuart and J. Voss. ”Sampling the posterior:an approach to non-Gaussian data assimilation.”PhysicaD, 230(2007), 50–64.S. Duane, A.D. Kennedy, B.J. Pendelton and D. Roweth.“Hybrid Monte Carlo.” Physics Letters B, 195(1987),216-222.


References (Continued)

A. Gelman, W.R. Gilks and G.O. Roberts, Weakconvergence and optimal scaling of random walkMetropolis algorithms. Ann. Appl. Prob. 7(1997), 110–120.G.O. Roberts and J. Rosenthal, Optimal scaling of discreteapproximations to Langevin diffusions. JRSSB 60(1998),255–268.M. Bédard, Weak Convergence of Metropolis Algorithmsfor Non-iid Target Distributions. Ann. Appl. Probab.17(2007), 1222-44.M. Bédard and J.S. Rosenthal, Optimal Scaling ofMetropolis Algorithms: Heading Towards General TargetDistributions. To appear Can. J. Stat.

Hybrid Monte Carlo: Geometric Integration and Statistics · Hybrid Monte Carlo: Geometric Integration and Statistics Andrew Stuart1 1Mathematics Institute and Centre for Scientiﬁc

Documents