Top Banner
INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS MCMC In High Dimensions Andrew Stuart Mathematics Institute and Centre for Scientific Computing, University of Warwick SIAM, Oxford, 5th January 2007 Collaboration with: Alexandros Beskos Funded by EPSRC
23

MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

Feb 21, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

MCMC In High Dimensions

Andrew Stuart

Mathematics Institute and Centre for Scientific Computing,University of Warwick

SIAM, Oxford, 5th January 2007

Collaboration with:Alexandros Beskos

Funded by EPSRC

Page 2: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Outline

1 Introduction

2 Our Results

3 Applications

4 Conclusions

Page 3: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Outline

1 Introduction

2 Our Results

3 Applications

4 Conclusions

Page 4: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Markov Chain Monte Carlo

Objective: Sample distribution πn : Rn 7→ R+.Method: Construct Markov chain {x (k)} with πn invariant.Ergodicity: the Markov chain samples πn after mixing timeis reached and

1K

J∑k=1

f (x (k)) →∫

Rnf (x)πn(dx) as J →∞.

Question: How do methods behave as n →∞?

Page 5: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Metropolis-Hastings

Propose move x → y according to user-specified

qn(x , dy) = qn(x , y)dy

Accept y with probability

an(x , y) = 1 ∧ πn(y)qn(y , x)

πn(x)qn(x , y)

otherwise stay at x .New Markov chain has πn as invariant.

Page 6: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Local Metropolis-Hastings Algorithms

Random Walk Metropolis (RWM):

y = x + σnZ , Z ∼ N (0, In)

Metropolis-Adjusted Langevin Algorithm (MALA)The Langevin SDE

dXt =12∇ log πn(Xt)dt + dWt

has invariant distribution πn. Suggests the proposal:

y = x +σ2

n2∇ log πn(x) + σnZ , Z ∼ N (0, In)

Question: What is the appropriate σn for large n?Courant restriction is computational PDE analogy

Page 7: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

The Context

Existing work concerns product targets:

πn(x) = Πni=1f (xi)

Roberts and coworkers (1997–2001) have shown:

RWM: σ2n = O(n−1)

MALA: σ2n = O(n−1/3)

in the sense that, in stationarity, for these scalings:

limn→∞

E[an(x , y)] ∈ (0, 1)

and, for larger time-steps,

limn→∞

E[an(x , y)] = 0.

Mixing time M(n) ="number of steps to reach stationarity",RWM: M(n) = O(n), MALA: M(n) = O(n1/3)

Page 8: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

The Context

Our work:

We investigate non-product targets using a new approach,extending existing results and, in the process, simplfyingthe proofs.Furthermore, we exploit ideas from numerical analysis toconstruct new schemes which, in important applications,give σ2

n = O(1).We demonstrate the relevance of our results for infinitedimensional sampling applications.

Page 9: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Outline

1 Introduction

2 Our Results

3 Applications

4 Conclusions

Page 10: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

The Family of Targets

We consider changes of measure from product measures:

π̃n(x) =n∏

j=1

1λj

f(

xj

λj

)λj is standard deviation of xj .Our target πn is defined as:

dπn

d π̃n(x) = exp

(−Gn(x)

)for Gn : Rn 7→ R.Motivated by applications, we assume that

λj = j−κ, j = 1, 2, . . . n

for integer κ ≥ 0.

Page 11: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Infinite-Dimensional Motivation

We find that if π̃n is the target then:

RWM: σ2n = O(λ2

nn−1)

MALA: σ2n = O(λ2

nn−1/3)

We anticipate similar MALA, RWM behaviour for πn, π̃n, inthe presence of absolute continuity in the limit n = ∞:

dπ∞d π̃∞

(x) = exp(−G∞(x)

)Such systems appear in many applications: conditioneddiffusions, conditioned Gaussian random fields.In such applications πn is then a discretization of π∞.

Page 12: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Theorems

Note: For MALA we use proposal as if target was π̃n.

Theorem (MALA): Assume

∃M > 0 : for all n, |Gn| ≤ M

and conditions on f . Then the average acceptance probabilityof MALA in stationarity satisfies:

lim infn→∞ E[an(x , y)] > 0, if σ2n ≤ O(λ2

nn−1/3),

limn→∞ E [an(x , y)] = 0, if σ2n > O(λ2

nn−1/3).

Theorem (RWM): Similar; replace O(λ2nn−1/3) → O(λ2

nn−1).

Page 13: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Sketch of Proof

Average acceptance probability:

αn = Ean = E(

1 ∧ eRn)

σ2n ≤ O(λ2

nn−1/3): we have supn E|Rn| < ∞ and, for anyγ > 0,

αn ≥ e−γP(|Rn| ≤ γ) ≥ e−γ

(1− E|Rn|

γ

)> 0.

σ2n > O(λ2

nn−1/3): we get E Rn = −2cn ↓ −∞ and

αn ≤ e−cn + P(

Rn ≥ −cn))≤ e−cn +

E |Rn − E Rn |cn

.

Page 14: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Special Case: MALA + Gaussian Reference Measure

Target: πn(x) = exp(−Gn(x)

)π̃n(x) with

π̃n =n∏

j=1

N(0, λ2j )

Proposal: Implicit MALA (θ = 0 in previous)

y = x − (1− θ)σ2

n2

Ln x − θσ2

n2

Ln y + σn Z , Z ∼ N (0, In)

θ ∈ [0, 1], Ln diagonal n × n with j-th diagonal element λ−2j .

Theorem: If θ 6= 1/2 then σ2n = O(λ2

nn−1/3).If θ = 1/2 then σ2

n = O(1).

Page 15: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Outline

1 Introduction

2 Our Results

3 Applications

4 Conclusions

Page 16: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Conditioned Diffusions

Sample X (t) ∈ L2([0, 1], R):

dXdt

= f (X ) +dWdt

GivenX (0) = X− & X (1) = X+

Target measure: π∞Reference measure: Brownian Bridge (f ≡ 0): π̃∞From the Girsanov theorem:

dπ∞d π̃∞

(x) = exp(−G∞(x)

)for G∞ : L2([0, 1], R) 7→ R.

Page 17: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Unveil Product Structure

Karhunen-Loève representation of x ∈ L2 from Gaussianmeasure N (0, C) is

x(t) =∞∑

j=1

xjej(t).

Here xj ∼ N (0, λ2j ) and C has evalues/evectors (λj , ej(t)).

For Brownian bridge

λ2j =

(πj

)−2, ej(t) = sin(jπt).

Using the isometry between L2 and `2 this (random)Fourier series shows

N (0, C) ↔∞∏

j=1

N (0, λ2j ).

Page 18: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Conditioned Diffusions

Infinite-Dimensional diffusion-bridge target π∞:

π∞(x) = exp(−G∞(x)

)π̃∞(x), π̃∞ =

∞∏i=1

N (0, λ2i )

with Brownian bridge eigenvalues λ2i = π−2i−2.

Spectral Method πn:Use Fourier expansion truncation:

x =∞∑

j=1

xjej ≈n∑

j=1

xjej

Theory suggests implicit MALA with θ = 1/2 givingσ2

n = O(1).

Page 19: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Example - MALA Results

We applied implicit MALA to sample a non-Gaussianbridge.We verified σ2

n = O(λ2nn−1/3) = O(n−7/3) for θ 6= 1/2.

And at θ = 1/2, σ2n = O(1).

Average Acceptance Probability in Stationarity

10−5

10−4

10−3

10−2

10−1

10−7

10−5

10−4

10−3

10−2

10−1

n=1000n=2000n=4000

n=1000n=2000n=4000

10−6

1.0

0.6

0.4

0.2

0.0

0.8

10

θ=0.5

10 10 10−8 −7 −6 −8

σ σ

θ=0.4

2 2

Page 20: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

Outline

1 Introduction

2 Our Results

3 Applications

4 Conclusions

Page 21: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

What We Have Shown

1 We have found scaling of step σ2n in Metropolis-Hastings

proposals for non-product targets in high dimensions.2 We have thus extended existing results in literature in a

manner which makes them much more applicable.3 When the reference measure is Gaussian, an implicit

scheme gives MALA with scaling σ2n = O(1).

4 Changes of measure from Gaussian law appears in manyapplications.

Page 22: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

What Remains Open

1 Relax conditions on {Gn} for theorems:

|G∞(x)|β ≤ M|x |γ ∀x .|G∞(x)−G∞(y)|β ≤ M|x − y |γ ∀x , y .

2 Does the step scaling O(n−ρ) imply mixing O(nρ)?

For product measure MCMC method has an SDE limit forany fixed component xj (Roberts et al);this facilitates proof of mixing time;we conjecture existence of a limiting SPDE for entire vectorx in non-product case.

Page 23: MCMC In High Dimensions...proposals for non-product targets in high dimensions. 2 We have thus extended existing results in literature in a manner which makes them much more applicable.

INTRODUCTION RESULTS APPLICATIONS CONCLUSIONS

References

G.O. Roberts, A. Gelman and W.R. Gilks. ”Weakconvergence and optimal scaling of random walkMetropolis algorithms”. Ann. Appl. Prob. 7(1997),110–120.G.O. Roberts and J.S. Rosenthal. ”Optimal scaling ofdiscrete approximations to Langevin diffusions”. J. Roy.Stat. Soc. 60B(1998), 255–268.A. Beskos, G.O. Roberts, A.M. Stuart and J. Voss. ”AnMCMC Method for diffusion bridges.” See:

http : //www .maths.warwick .ac.uk/ ∼ stuart/sample.html

A. Beskos and A.M. Stuart. ”Scalings for localMetropolis-Hastings chains on non-product targets.” Inpreparation.