Top Banner
The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016
44

The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Apr 11, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

The Metropolis-Hastings Algorithm

Frank SchorfheideUniversity of Pennsylvania

EABCN Training School

May 10, 2016

Page 2: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Markov Chain Monte Carlo (MCMC)

• Main idea: create a sequence of serially correlated draws such thatthe distribution of θi converges to the posterior distribution p(θ|Y ).

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 3: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Generic Metropolis-Hastings Algorithm

For i = 1 to N:

1 Draw ϑ from a density q(ϑ|θi−1).

2 Set θi = ϑ with probability

α(ϑ|θi−1) = min

1,

p(Y |ϑ)p(ϑ)/q(ϑ|θi−1)

p(Y |θi−1)p(θi−1)/q(θi−1|ϑ)

and θi = θi−1 otherwise.

Recall p(θ|Y ) ∝ p(Y |θ)p(θ).

We draw θi conditional on a parameter draw θi−1: leads to Markovtransition kernel K (θ|θ).

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 4: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Importance Invariance Property

• It can be shown that

p(θ|Y ) =

∫K (θ|θ)p(θ|Y )d θ.

• Write

K (θ|θ) = u(θ|θ) + r(θ)δθ(θ).

• u(θ|θ) is the density kernel (note that u(θ|·) does not integrated toone) for accepted draws:

u(θ|θ) = α(θ|θ)q(θ|θ).

• Rejection probability:

r(θ) =

∫ [1− α(θ|θ)

]q(θ|θ)dθ = 1−

∫u(θ|θ)dθ.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 5: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Importance Invariance Property

• Reversibility: Conditional on the sampler not rejecting the proposeddraw, the density associated with a transition from θ to θ is identicalto the density associated with a transition from θ to θ:

p(θ|Y )u(θ|θ) = p(θ|Y )q(θ|θ) min

1,

p(θ|Y )/q(θ|θ)

p(θ|Y )/q(θ|θ)

= min

p(θ|Y )q(θ|θ), p(θ|Y )q(θ|θ)

= p(θ|Y )q(θ|θ) min

p(θ|Y )/q(θ|θ)

p(θ|Y )/q(θ|θ), 1

= p(θ|Y )u(θ|θ).

• Using the reversibility result, we can now verify the invarianceproperty:∫

K (θ|θ)p(θ|Y )d θ =

∫u(θ|θ)p(θ|Y )d θ +

∫r(θ)δθ(θ)p(θ|Y )d θ

=

∫u(θ|θ)p(θ|Y )d θ + r(θ)p(θ|Y )

= p(θ|Y )

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 6: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

A Discrete Example

• Suppose parameter vector θ is scalar and takes only two values:

Θ = τ1, τ2

• The posterior distribution p(θ|Y ) can be represented by a set ofprobabilities collected in the vector π, say π = [π1, π2] with π2 > π1.

• Suppose we obtain ϑ based on transition matrix Q:

Q =

[q (1− q)

(1− q) q

].

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 7: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Discrete MH Algorithm

• Iteration i : suppose that θi−1 = τj . Based on transition matrix

Q =

[q (1− q)

(1− q) q

],

determine a proposed state ϑ = τs .

• With probability α(τs |τj) the proposed state is accepted. Setθi = ϑ = τs .

• With probability 1− α(τs |τj) stay in old state and set θi = θi−1 = τj .

• Choose (Q terms cancel because of symmetry)

α(τs |τj) = min

1,πsπj

.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 8: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Discrete MH Algorithm: Transition Matrix

• The resulting chain’s transition matrix is:

K =

[q (1− q)

(1− q)π1

π2q + (1− q)

(1− π1

π2

) ].

• Straightforward calculations reveal that the transition matrix K haseigenvalues:

λ1(K ) = 1, λ2(K ) = q − (1− q)π1

1− π1.

• Equilibrium distribution is eigenvector associated with uniteigenvalue.

• For q ∈ [0, 1) the equilibrium distribution is unique.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 9: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Convergence

• The persistence of the Markov chain depends on second eigenvalue,which depends on the proposal distribution Q.

• Define the transformed parameter

ξi =θi − τ1τ2 − τ1

.

• We can represent the Markov chain associated with ξi as first-orderautoregressive process

ξi = (1− k22) + λ2(K )ξi−1 + ν i .

• Conditional on ξi = j , j = 0, 1, the innovation ν i has support on kjjand (1− kjj), its conditional mean is equal to zero, and itsconditional variance is equal to kjj(1− kjj).

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 10: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Convergence

• Autocovariance function of h(θi ):

COV (h(θi ), h(θ(i−l)))

=(h(τ2)− h(τ1)

)2π1(1− π1)

(q − (1− q)

π11− π1

)l

= Vπ[h]

(q − (1− q)

π11− π1

)l

• If q = π1 then the autocovariances are equal to zero and the drawsh(θi ) are serially uncorrelated (in fact, in our simple discrete settingthey are also independent).

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 11: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Convergence

• Define the Monte Carlo estimate

hN =1

N

N∑i=1

h(θi ).

• Deduce from CLT√N(hN − Eπ[h]) =⇒ N

(0,Ω(h)

),

where Ω(h) is the long-run covariance matrix

Ω(h) = limL−→∞

Vπ[h]

(1 + 2

L∑l=1

L− l

L

(q − (1− q)

π11− π1

)l).

• In turn, the asymptotic inefficiency factor is given by

InEff∞ =Ω(h)

Vπ[h]= 1 + 2 lim

L−→∞

L∑l=1

L− l

L

(q − (1− q)

π11− π1

)l

.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 12: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Autocorrelation Function of θi , π1 = 0.2

0 1 2 3 4 5 6 7 8 9

0.0

0.5

1.0

q = 0.00q = 0.20

q = 0.50q = 0.99

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 13: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Asymptotic Inefficiency InEff∞, π1 = 0.2

0.0 0.2 0.4 0.6 0.8 1.010−1

100

101

102

q

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 14: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Small Sample Variance V[hN ] across Chains versus HACEstimates of Ω(h)

10−4 10−3 10−210−5

10−4

10−3

10−2

10−1

Solid line is 45-degree line.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 15: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Posterior Inference

• We discussed how to solve a DSGE model;

• and how to compute the likelihood function p(Y |θ) for a DSGEmodel.

• According to Bayes Theorem

p(θ|Y ) =p(Y |θ)p(θ)∫p(Y |θ)p(θ)dθ

• We want to generate draws from posterior...

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 16: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Benchmark Random-Walk Metropolis-Hastings (RWMH)Algorithm for DSGE Models

• Initialization:

1 Use a numerical optimization routine to maximize the log posterior,which up to a constant is given by ln p(Y |θ) + ln p(θ). Denote theposterior mode by θ.

2 Let Σ be the inverse of the (negative) Hessian computed at theposterior mode θ, which can be computed numerically.

3 Draw θ0 from N(θ, c20 Σ) or directly specify a starting value.

• Main Algorithm – For i = 1, . . . ,N:

1 Draw ϑ from the proposal distribution N(θi−1, c2Σ).2 Set θi = ϑ with probability

α(ϑ|θi−1) = min

1,

p(Y |ϑ)p(ϑ)

p(Y |θi−1)p(θi−1)

and θi = θi−1 otherwise.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 17: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Benchmark RWMH Algorithm for DSGE Models

• Initialization steps can be modified as needed for particularapplication.

• If numerical optimization does not work well, one could let Σ be adiagonal matrix with prior variances on the diagonal.

• Or, Σ could be based on a preliminary run of a posterior sampler.

• It is good practice to run multiple chains based on different startingvalues.

• For the subsequent illustrations we chose Σ = Vπ[h], where theposterior variance matrix is obtained from a long MCMC run.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 18: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Observables for Small-Scale New Keynesian Model

−2−1

012 Quarterly Output Growth

0

4

8 Quarterly Inflation

1985 1990 1995 2000048

12 Federal Funds Rate

Notes: Output growth per capita is measured in quarter-on-quarter(Q-o-Q) percentages. Inflation is CPI inflation in annualized Q-o-Qpercentages. Federal funds rate is the average annualized effective fundsrate for each quarter.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 19: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Convergence of Monte Carlo Average τN|N0

0.0 0.5 1.0×105

0

1

2

3

4

5

0.0 0.5 1.0×105

0.0 0.5 1.0×105

Notes: The x−axis indicates the number of draws N. N0 is set to 0,25, 000 and 50, 000, respectively.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 20: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Posterior Estimates of DSGE Model Parameters

Parameter Mean [0.05, 0.95] Parameter Mean [0.05,0.95]τ 2.83 [ 1.95, 3.82] ρr 0.77 [ 0.71, 0.82]κ 0.78 [ 0.51, 0.98] ρg 0.98 [ 0.96, 1.00]ψ1 1.80 [ 1.43, 2.20] ρz 0.88 [ 0.84, 0.92]ψ2 0.63 [ 0.23, 1.21] σr 0.22 [ 0.18, 0.26]r (A) 0.42 [ 0.04, 0.95] σg 0.71 [ 0.61, 0.84]π(A) 3.30 [ 2.78, 3.80] σz 0.31 [ 0.26, 0.36]γ(Q) 0.52 [ 0.28, 0.74]

Notes: We generated N = 100, 000 draws from the posterior anddiscarded the first 50,000 draws. Based on the remaining draws weapproximated the posterior mean and the 5th and 95th percentiles.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 21: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Effect of Scaling Constant c

0.0

0.5

1.0 Acceptance Rate α

100

102

104 InEff∞

0.0 0.5 1.0 1.5 2.0100102104106

c

InEffN

Notes: Results are based on Nrun = 50 independent Markov chains. Theacceptance rate (average across multiple chains), HAC-based estimate ofInEff∞[τ ] (average across multiple chains), and InEffN [τ ] are shown as afunction of the scaling constant c .

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 22: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Acceptance Rate α versus Inaccuracy InEffN

0.0 0.2 0.4 0.6 0.8 1.0Acceptance Rate α

101

102

103

104

105

InE

ff N

Notes: InEffN [τ ] versus the acceptance rate α.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 23: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Impulse Responses of Exogenous Processes

0 2 4 6 8 100.0

0.2

0.4

0.6

0.8

1.0

t

Response of gt to εg,t

0 2 4 6 8 10t

Response of zt to εz,t

Notes: The figure depicts pointwise posterior means and 90% crediblebands. The responses are in percent relative to the initial level.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 24: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Parameter Transformations: Impulse Responses

0.00.20.40.60.8

Out

put

εg,t

-0.1

0.1

0.3

0.5

0.7εz,t

-0.2

-0.1

0.0εr ,t

-2.0

-1.0

0.0

1.0

Infla

tion

-0.20.20.61.01.4

-1.0

-0.6

-0.2

0 2 4 6 8 10-1.0

0.0

1.0

Fede

ralF

unds

Rat

e

0 2 4 6 8 10-0.2

0.2

0.6

1.0

1.4

0 2 4 6 8 100.0

0.4

0.8

Notes: The figure depicts pointwise posterior means and 90% crediblebands. The responses of output are in percent relative to the initial level,whereas the responses of inflation and interest rates are in annualizedpercentages.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 25: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Challenges Due to Irregular Posteriors

• A stylized state-space model:

yt = [1 1]st , st =

[φ1 0φ3 φ2

]st−1 +

[10

]εt , εt ∼ iidN(0, 1).

where

• Structural parameters θ = [θ1, θ2]′, domain is unit square.

• Reduced-form parameters φ = [φ1, φ2, φ3]′

φ1 = θ21, φ2 = (1− θ21), φ3 − φ2 = −θ1θ2.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 26: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Challenges Due to Irregular Posteriors

• s1,t looks like an exogenous technology process.

• s2,t evolves like an endogenous state variable, e.g., the capital stock.

• θ2 is not identifiable if θ1 = 0 because θ2 enters the model onlymultiplicatively.

• Law of motion of yt is restricted ARMA(2,1) process:(1− θ21L

)(1− (1− θ21)L

)yt =

(1− θ1θ2L

)εt .

• Given θ1 and θ2, we obtain an observationally equivalent process byswitching the values of the two roots of the autoregressive lagpolynomial.

• Choose θ1 and θ2 such that

θ1 =√

1− θ21, θ2 = θ1θ2/θ1.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 27: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Posteriors for Stylized State-Space Model

Local Identification Problem Global Identification Problem

0.0 0.1 0.2 0.3 0.4 0.5

0.2

0.4

0.6

0.8

θ2

θ10.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

θ2

θ1

Notes: Intersections of the solid lines indicate parameter values that wereused to generate the data from which the posteriors are constructed. Leftpanel: θ1 = 0.1 and θ2 = 0.5. Right panel: θ1 = 0.8, θ2 = 0.3.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 28: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Improvements to MCMC: Blocking

• In high-dimensional parameter spaces the RWMH algorithmgenerates highly persistent Markov chains.

• What’s bad about persistence?√N(hN − E[hN ])

=⇒ N

(0,

1

N

n∑i=1

V[h(θi )] +1

N

N∑i=1

∑j 6=i

COV[h(θi ), h(θj)

]).

• Potential Remedy:• Partition θ = [θ1, . . . , θK ].• Iterate over conditional posteriors p(θk |Y , θ<−k>).

• To reduce persistence of the chain, try to find partitions such thatparameters are strongly correlated within blocks and weaklycorrelated across blocks or use random blocking.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 29: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Block MH Algorithm

Draw θ0 ∈ Θ and then for i = 1 to N:

1 Create a partition B i of the parameter vector into Nblocks blocksθ1, . . . , θNblocks

via some rule (perhaps probabilistic), unrelated to thecurrent state of the Markov chain.

2 For b = 1, . . . ,Nblocks :

1 Draw ϑb ∼ q(·|[θi<b, θ

i−1b , θi−1

≥b

]).

2 With probability,

α = max

p([θi<b, ϑb, θ

i−1>b

]|Y )q(θi−1

b , |θi<b, ϑb, θi−1>b )

p(θi<b, θi−1b , θi−1

>b |Y )q(ϑb|θi<b, θi−1b , θi−1

>b ), 1

,

set θib = ϑb, otherwise set θib = θi−1b .

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 30: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Random-Block MH Algorithm

1 Generate a sequence of random partitions B iNi=1 of the parametervector θ into Nblocks equally sized blocks, denoted by θb,b = 1, . . . ,Nblocks as follows:

1 assign an iidU[0, 1] draw to each element of θ;

2 sort the parameters according to the assigned random number;

3 let the b’th block consist of parameters (b − 1)Nblocks , . . . , bNblocks .1

2 Execute Algorithm Block MH Algorithm.

1If the number of parameters is not divisible by Nblocks , then the size of a subset ofthe blocks has to be adjusted.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 31: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Metropolis-Adjusted Langevin Algorithm

• The proposal distribution of Metropolis-Adjusted Langevin (MAL)algorithm is given by

µ(θi−1) = θi−1 +c12M1

∂θln p(θi−1|Y )

∣∣∣∣θ=θi−1

,

Σ(θi−1) = c22M2.

that is θi−1 is adjusted by a step in the direction of the gradient ofthe log posterior density function.

• One standard practice is to set M1 = M2 = M, with

M = −[

∂θ∂θ′ln p(θ|Y )

∣∣∣∣θ=θ

]−1,

where θ is the mode of the posterior distribution obtained using anumerical optimization routine.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 32: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Newton MH Algorithm

• Newton MH Algorithm replaces the Hessian evaluated at theposterior mode θ by the Hessian evaluated at θi−1.

• The proposal distribution is given by

µ(θi−1) = θi−1 − s

[∂

∂θ∂θ′ln p(θ|Y )

∣∣∣∣θ=θi−1

]−1× ∂

∂θln p(θi−1|Y )

∣∣∣∣θ=θi−1

Σ(θi−1) = −c22[

∂θ∂θ′ln p(θ|Y )

∣∣∣∣θ=θi−1

]−1.

• It is useful to let s be independently of θi−1:

c1 = 2s, s ∼ iidU[0, s],

where s is a tuning parameter.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 33: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Run Times and Tuning Constants for MH Algorithms

Algorithm Run Time Acceptance Tuning[hh:mm:ss] Rate Constants

1-Block RWMH-I 00:01:13 0.28 c = 0.0151-Block RWMH-V 00:01:13 0.37 c = 0.4003-Block RWMH-I 00:03:38 0.40 c = 0.0703-Block RWMH-V 00:03:36 0.43 c = 1.2003-Block MAL 00:54:12 0.43 c1 = 0.400, c2 = 0.7503-Block Newton MH 03:01:40 0.53 s = 0.700, c2 = 0.600

Notes: In each run we generate N = 100, 000 draws. We report thefastest run time and the average acceptance rate across Nrun = 50independent Markov chains.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 34: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Autocorrelation Function of τ i

0 5 10 15 20 25 30 35 40−0.2

0.00.20.40.60.81.01.2

1-Block RWMH-V1-Block RWMH-I

3-Block RWMH-V3-Block RWMH-I

3-Block MAL3-Block Newton MH

Notes: The autocorrelation functions are computed based on a single runof each algorithm.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 35: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Inefficiency Factor InEffN [τ ]

3-BlockMAL

3-BlockNewton MH

3-BlockRWMH-V

1-BlockRWMH-V

3-BlockRWMH-I

1-BlockRWMH-I

100

101

102

103

104

105

Notes: The small-sample inefficiency factors are computed based onNrun = 50 independent runs of each algorithm.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 36: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

IID Equivalent Draws Per Second

iid-equivalent draws per second =N

Run Time [seconds]· 1

InEffN.

Algorithm Draws Per Second1-Block RWMH-V 7.763-Block RWMH-V 5.653-Block MAL 1.243-Block RWMH-I 0.143-Block Newton MH 0.131-Block RWMH-I 0.04

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 37: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Performance of Different MH Algorithms

RWMH-V (1 Block) RWMH-V (3 Blocks)

10−10 10−8 10−6 10−4 10−2 10010−10

10−8

10−6

10−4

10−2

100

10−10 10−8 10−6 10−4 10−2 10010−10

10−8

10−6

10−4

10−2

100

MAL Newton

10−10 10−8 10−6 10−4 10−2 10010−10

10−8

10−6

10−4

10−2

100

10−10 10−8 10−6 10−4 10−2 10010−10

10−8

10−6

10−4

10−2

100

Notes: Each panel contains scatter plots of the small sample varianceV[θ] computed across multiple chains (x-axis) versus the HAC[h]estimates of Ω(θ)/N (y -axis).

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 38: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Recall: Posterior Odds and Marginal Data Densities

• Posterior model probabilities can be computed as follows:

πi,T =πi,0p(Y |Mi )∑j πj,0p(Y |Mj)

, j = 1, . . . , 2, (1)

• where

p(Y |M) =

∫p(Y |θ,M)p(θ|M)dθ (2)

• Note:

ln p(Y1:T |M) =T∑t=1

ln

∫p(yt |θ,Y1:t−1,M)p(θ|Y1:t−1,M)dθ

• Posterior odds and Bayes Factor

π1,Tπ2,T

=π1,0π2,0︸︷︷︸

Prior Odds

× p(Y |M1)

p(Y |M2)︸ ︷︷ ︸Bayes Factor

(3)

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 39: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Computation of Marginal Data Densities

• Reciprocal importance sampling:

• Geweke’s modified harmonic mean estimator

• Sims, Waggoner, and Zha’s estimator

• Chib and Jeliazkov’s estimator

• For a survey, see Ardia, Hoogerheide, and van Dijk (2009).

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 40: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Modified Harmonic Mean

• Reciprocal importance samplers are based on the following identity:

1

p(Y )=

∫f (θ)

p(Y |θ)p(θ)p(θ|Y )dθ, (4)

where∫f (θ)dθ = 1.

• Conditional on the choice of f (θ) an obvious estimator is

pG (Y ) =

[1

N

N∑i=1

f (θi )

p(Y |θi )p(θi )

]−1, (5)

where θi is drawn from the posterior p(θ|Y ).

• Geweke (1999):

f (θ) = τ−1(2π)−d/2|Vθ|−1/2 exp[−0.5(θ − θ)′V−1θ (θ − θ)

(θ − θ)′V−1θ (θ − θ) ≤ F−1χ2d

(τ). (6)

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 41: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Chib and Jeliazkov

• Rewrite Bayes Theorem:

p(Y ) =p(Y |θ)p(θ)

p(θ|Y ). (7)

• Thus,

pCS(Y ) =p(Y |θ)p(θ)

p(θ|Y ), (8)

where we replaced the generic θ in (7) by the posterior mode θ.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 42: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Chib and Jeliazkov

• Use output of Metropolis-Hastings Algorithm.

• Proposal density for transition θ 7→ θ: q(θ, θ|Y ).

• Probability of accepting proposed draw:

α(θ, θ|Y ) = min

1,

p(θ|Y )/q(θ, θ|Y )

p(θ|Y )/q(θ, θ|Y )

.

• Note that∫α(θ, θ|Y )q(θ, θ|Y )p(θ|Y )dθ

=

∫min

1,

p(θ|Y )/q(θ, θ|Y )

p(θ|Y )/q(θ, θ|Y )

q(θ, θ|Y )p(θ|Y )dθ

= p(θ|Y )

∫min

p(θ|Y )/q(θ, θ|Y )

p(θ|Y )/q(θ, θ|Y ), 1

q(θ, θ|Y )dθ

= p(θ|Y )

∫α(θ, θ|Y )q(θ, θ|Y )dθ

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 43: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

Chib and Jeliazkov

• Posterior density at the mode can be approximated as follows

p(θ|Y ) =1N

∑Ni=1 α(θi , θ|Y )q(θi , θ|Y )1J

∑Jj=1 α(θ, θj |Y )

, (9)

• θi are posterior draws obtained with the the M-H Algorithm;

• θj are additional draws from q(θ, θ|Y ) given the fixed value θ.

Frank Schorfheide The Metropolis-Hastings Algorithm

Page 44: The Metropolis-Hastings Algorithm 2... · 2016. 5. 11. · The Metropolis-Hastings Algorithm Frank Schorfheide University of Pennsylvania EABCN Training School May 10, 2016

MH-Based Marginal Data Density Estimates

Model Mean(ln p(Y )) Std. Dev.(ln p(Y ))Geweke (τ = 0.5) -346.17 0.03Geweke (τ = 0.9) -346.10 0.04SWZ (q = 0.5) -346.29 0.03SWZ (q = 0.9) -346.31 0.02Chib and Jeliazkov -346.20 0.40

Notes: Table shows mean and standard deviation of log marginal datadensity estimators, computed over Nrun = 50 runs of the RWMH-Vsampler using N = 100, 000 draws, discarding a burn-in sample ofN0 = 50, 000 draws. The SWZ estimator uses J = 100, 000 draws tocompute τ , while the CJ estimators uses J = 100, 000 to compute thedenominator of p(θ|Y ).

Frank Schorfheide The Metropolis-Hastings Algorithm