A Family of MCMC Methods on Implicitly Defined Manifolds Marcus A. Brubaker ,+ , Mathieu Salzmann and Raquel Urtasun Toyota Technological Institute at Chicago + University of Toronto, Canada Introduc)on: • Tradi&onal MCMC methods (e.g., GaussMetropolis, HMC) assume the target distribu&on is over a Euclidean space • However, many problems exist which are most naturally characterized over a nonlinear manifold • Sampling from posteriors that arise in such problems has typically required the deriva&on of posteriorspecific sampling schemes Contribu)ons: • Here we derive an MCMC scheme based on Hamiltonian dynamics on an implicitly defined manifold • We prove that, subject to suitable condi&ons, the Markov Chain converges to the target posterior • We present constrained variants of several MCMC methods including: GaussMetropolis, Hamiltonian (and Langevin) Monte Carlo and Riemann Manifold HMC [6] • These algorithms are demonstrated on a range of problems including: o Sampling from a linearly constrained Gaussian distribu&on o Sampling from the Binghamvon MisesFisher distribu&on over o Bayesian matrix factoriza&on for collabora&ve filtering o Human pose es&ma&on • Matlab code available from: hSp://www.cs.toronto.edu/~mbrubake/ Previous Work: • Similar methods are commonly used in molecular dynamics to compute the free energy of a constrained system (eg, [13]) • Gibbs samplers have been derived for some distribu&ons (eg, [4]) but even those specialized methods are outperformed by methods presented here M = {q ∈ R n |c(q)=0} π(q) S n Experimental Results: • Gaussian distribu&on in a linear subspace • Binghamvon MisesFisher • Collabora&ve filtering • Human pose es&ma&on o Pose is a set of 3D joint posi&ons o Manifold is induced by the limb length constraints of the skeleton o Posterior combines noisy 2D joint projec&ons with a PCA based prior model of pose o Compared with direct op&miza&on for different levels of noise References: 1. G. Cicco^ and J. P. Ryckaert. Molecular dynamics simula&on of rigid molecules. Computer Physics Report, 4(6):346–392, 1986 2. C. Hartmann. An ergodic sampling scheme for constrained Hamiltonian systems with applica&ons to molecular dynamics. Journal of Sta&s&cal Physics, 130:687–711, 2008 3. T. Lelièvre, M. Rousset, and G. Stoltz. Free energy computa&ons: A Mathema&cal Perspec&ve. Imperial College Press, 2010 4. P. D. Hoff. Simula&on of the matrix Binghamvon MisesFIsher distribu&on, with applica&ons to mul&variate and rela&onal data. Journal of Computa&onal and Graphical Sta&s&cs, 18:438–456, 2009 5. E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integra&on. Springer, 2nd edi&on, 2006 6. M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Sta&s&cal Society: Series B, 73:123–214, 2011 0 0.01 0.02 0.03 0.04 0.05 0 0.2 0.4 0.6 0.8 1 CHMC (L = 4) CHMC (L = 3) CHMC (L = 2) CLangevin CMetropolis Gibbs 20 40 60 80 100 100 200 300 400 Frame # Mean joint error [mm] Constr opt Ours MAP Ours mean 0 2 4 6 8 10 50 100 150 200 250 Noise std Mean joint error [mm] Constr opt Ours MAP Ours mean M = {q ∈ R n |c(q)=0} Theore)cal Result: • Assume that is connected, smooth and differen&able with fullrank everywhere and the target posterior is strictly posi&ve on • Given: a mass matrix which is posi&ve definite on a simula&on poten&al energy func&on which is con&nuous a numerical integra&on method which is symmetric, locally accessible, consistent with the Simula&on Hamiltonian , and symplec0c on the cotangent bundle • Theorem: For all where denotes steps of the Markov transi&on kernel of the Constrained Hamiltonian Monte Carlo algorithm C(q)= ∂c ∂q M(q) M M π(q) ˆ U (q) Φ ˆ H h : T ∗ M → T ∗ M T ∗ M = (p, q)|c(q) = 0 and C(q) ∂ ˆ H ∂p (p, q)=0 C 2 ˆ H q 0 ∈ M lim n→∞ T n (q 0 → ·) − π(·) =0 T n (q 0 → ·) n Simula)on of constrained Hamiltonian systems • Need a symplec&c, consistent and symmetric integra&on method on • Generalized RATTLE Algorithm (see [5] for details and other op&ons) • If and the mass matrix is constant, RATTLE reduces to Leapfrog M p1/2 = p0 − h 2 ∂ ˆ H(p1/2,q0) ∂q + C(q0) T λ q1 = q0 + h 2 ∂ ˆ H(p1/2,q0) ∂p + ∂ ˆ H(p1/2,q1) ∂p 0 = c(q1) p1 = p1/2 − h 2 ∂ ˆ H(p1/2,q1) ∂q + C(q1) T μ 0 = C(q1) ∂ ˆ H(p1,q1) ∂p M = R n Instances of Constrained HMC: • GaussMetropolis with covariance can expressed as HMC with and . Constrained GaussMetropolis is thus similarly defined. • Constrained Langevin Monte Carlo arises with • Constrained Riemann Manifold HMC [6] arises for suitable choices of Σ ˆ U (q)=0 M(q)= Σ −1 L =1 M(q) ï10 0 10 ï15 ï10 ï5 0 5 CHMC ï10 0 10 ï15 ï10 ï5 0 5 CLangevin ï10 0 10 ï15 ï10 ï5 0 5 CMetropolis M = S n π(q) ∝ exp(d T q + q T Aq) Method E[− log π(q)] ESS % ESS/second CHMC (L = 4) -999.021 27.3 183.756 CHMC (L = 3) -998.759 25.4 217.427 CHMC (L = 2) -999.121 37.9 440.898 CLangevin -998.757 33.0 619.339 CMetropolis -998.82 3.8 90.1513 Gibbs [4] -998.742 50.8 160.722 M = Vr(R N ) × Vr(R M ) × R r π(U, S, V) ∝ (i,j)∈E exp − (f (UiSVj) − Yi,j) 2 2σ 2 p 1M Movie Lens (RMSE) EachMovie (RMSE) r 5 10 15 5 10 15 HMC 1.577 ± 0.39 2.001 ± 0.66 2.306 ± 0.25 1.153 ± 0.002 1.161 ± 0.002 1.204 ± 0.018 HMC-l 0.909 ± 0.008 0.949 ± 0.01 0.99 ± 0.01 1.155 ± 0.007 1.164 ± 0.001 1.184 ± 0.004 CHMC 0.893 ± 0.01 0.888 ± 0.01 0.889 ± 0.01 1.144 ± 0.002 1.121 ± 0.001 1.116 ± 0.001 CHMC-l 0.888 ± 0.01 0.881 ± 0.01 0.881 ± 0.01 1.137 ± 0.003 1.115 ± 0.002 1.11 ± 0.002 Constrained Hamiltonian Monte Carlo: • Input: • Define: o Cotangent Projec0on: o Acceptance Hamiltonian: o Simula0on Hamiltonian: 1. , 2. For , 3. With probability o Return 4. Else o Return q 0 ,M(q), h, L, π(q), ˆ U (q) i =1,...,L (p i ,q i ) ← Φ ˆ H h (p i−1 ,q i−1 ) P(q)= I − M(q) −T C(q) T C(q)M(q) −1 M(q) −T C(q) T −1 C(q)M(q) −1 ˆ H(p, q)= 1 2 p T M(q) −1 p + ˆ U (q) H(p, q)= 1 2 p T M(q) −1 p + 1 2 log |2πP(q) T M(q)P(q)| − log π(q) q L q 0 p 0 ∼ N (0,M(q 0 )) p 0 ← P(q 0 )p 0 min {1, exp(H(p 0 ,q 0 ) − H(p L ,q L ))}