Sequential Monte Carlo samplers for Bayesian DSGE models Drew Creal Department of Econometrics, Vrije Universitiet Amsterdam, NL-1081 HV Amsterdam [email protected]August 2007 Abstract Bayesian estimation of DSGE models typically uses Markov chain Monte Carlo as impor- tance sampling (IS) algorithms have a difficult time in high-dimensional spaces. I develop improved IS algorithms for DSGE models using recent advances in Monte Carlo methods known as sequential Monte Carlo samplers. Sequential Monte Carlo samplers are a general- ization of particle filtering designed for full simulation of all parameters from the posterior. I build two separate algorithms; one for sequential Bayesian estimation and a second which performs batch estimation. The sequential Bayesian algorithm provides a new method for inspecting the time variation and stability of DSGE models. The posterior distribution of the DSGE model considered here changes substantially over time. In addition, the algorithm is a method for implementing Bayesian updating of the posterior. Keywords: Sequential Monte Carlo; DSGE; Bayesian analysis; particle filter. 1 Introduction In this paper, I expand the Bayesian toolkit for the analysis and estimation of DSGE models by building new algorithms based upon a recently developed methodology known as sequential Monte Carlo (SMC) samplers. I design two algorithms and I compare their performance with MCMC and IS on both real and simulated data. One of the algorithms performs sequential Bayesian estimation, which is entirely new to the DSGE literature. These new methods address two issues in the current literature. Lately, An and Schorfeide (2007) provided a review of Bayesian methods for estimating and comparing DSGE models, which included both Markov chain Monte Carlo (MCMC) and importance sampling (IS) algo- rithms. MCMC algorithms are definitely the preferred tool in the literature as IS algorithms do not work effectively in higher dimensional spaces. SMC samplers are a form of IS and can be viewed as a more robust method that will enable researchers to check their MCMC out- put. Secondly, there is a recent emphasis on parameter instability in DSGE models; see, e.g. Fern´ andez-Villaverde and Rubio-Ram´ ırez (2007b). The sequential Bayesian estimation algo- rithm provides useful information on the adequacy of the model as it demonstrates how the posterior distribution evolves over time. 1
31
Embed
Sequential Monte Carlo samplers for Bayesian …faculty.chicagobooth.edu/drew.creal/research/papers/...Sequential Monte Carlo samplers for Bayesian DSGE models Drew Creal Department
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Sequential Monte Carlo samplers for Bayesian DSGE models
Drew Creal
Department of Econometrics, Vrije Universitiet Amsterdam,
Bayesian estimation of DSGE models typically uses Markov chain Monte Carlo as impor-tance sampling (IS) algorithms have a difficult time in high-dimensional spaces. I developimproved IS algorithms for DSGE models using recent advances in Monte Carlo methodsknown as sequential Monte Carlo samplers. Sequential Monte Carlo samplers are a general-ization of particle filtering designed for full simulation of all parameters from the posterior.I build two separate algorithms; one for sequential Bayesian estimation and a second whichperforms batch estimation. The sequential Bayesian algorithm provides a new method forinspecting the time variation and stability of DSGE models. The posterior distribution ofthe DSGE model considered here changes substantially over time. In addition, the algorithmis a method for implementing Bayesian updating of the posterior.
Keywords: Sequential Monte Carlo; DSGE; Bayesian analysis; particle filter.
1 Introduction
In this paper, I expand the Bayesian toolkit for the analysis and estimation of DSGE models
by building new algorithms based upon a recently developed methodology known as sequential
Monte Carlo (SMC) samplers. I design two algorithms and I compare their performance with
MCMC and IS on both real and simulated data. One of the algorithms performs sequential
Bayesian estimation, which is entirely new to the DSGE literature.
These new methods address two issues in the current literature. Lately, An and Schorfeide
(2007) provided a review of Bayesian methods for estimating and comparing DSGE models,
which included both Markov chain Monte Carlo (MCMC) and importance sampling (IS) algo-
rithms. MCMC algorithms are definitely the preferred tool in the literature as IS algorithms
do not work effectively in higher dimensional spaces. SMC samplers are a form of IS and can
be viewed as a more robust method that will enable researchers to check their MCMC out-
put. Secondly, there is a recent emphasis on parameter instability in DSGE models; see, e.g.
Fernandez-Villaverde and Rubio-Ramırez (2007b). The sequential Bayesian estimation algo-
rithm provides useful information on the adequacy of the model as it demonstrates how the
posterior distribution evolves over time.
1
SMC samplers are a generalization of particle filtering to full simulation of all unknowns
from a posterior distribution. Particle filters are algorithms originally designed for sequential
state estimation or optimal filtering in nonlinear, non-Gaussian state space models. They were
proposed by Gordon, Salmond, and Smith (1993) and further developments of this methodology
can be found in books by Doucet, de Freitas, and Gordon (2001), Ristic, Arulampalam, and
Gordon (2004), and Cappe, Moulines, and Ryden (2005). Particle filters were introduced into
the econometrics literature by Kim, Shephard, and Chib (1998) to study the latent volatility of
asset prices and into the DSGE literature by Fernandez-Villaverde and Rubıo-Ramirez (2005,
2007).
A recent contribution by Del Moral, Doucet, and Jasra (2006b) has demonstrated how SMC
algorithms can be applied more widely than originally thought, including the ability to estimate
static parameters. Additional references in this field include Gilks and Berzuini (2001), Chopin
(2002), Liang (2002), and Cappe, Guillin, Marin, and Robert (2004). SMC samplers are an
alternative to MCMC for posterior simulation, although in reality they will often incorporate
Metropolis-Hastings within them. SMC samplers do not rely on the same convergence properties
as MCMC for their validity. They improve on some of the limitations of regular IS (discussed
below) when it is applied to higher dimensional spaces.
I build two different SMC sampling algorithms in this paper. The first is based on the
simulated tempering approach from Del Moral, Doucet, and Jasra (2006b), which estimates
parameters using all of the data collected in a batch. The SMC sampler I design requires
little more coding than an MCMC algorithm. In addition, the algorithm has only a few tuning
parameters. Based on real and simulated data, the SMC sampler using simulated tempering
works well and may be preferable to MCMC in difficult settings. The importance weights at the
end of this sampler are almost perfectly balanced, indicating that the draws are almost exactly
from the posterior. Alternatively, the method can be used to establish the reliability of MCMC
output. I also compare it to an IS algorithm, built using best practices. The IS algorithm does
not work well. Using test statistics and diagnostics developed in Koopman, Shephard, and Creal
(2007), I show that the variance of the importance weights is likely not to exist.
The second algorithm performs sequential Bayesian estimation in the spirit of Chopin (2002).
The algorithm adds an additional observation at each iteration and estimates the evolving pos-
terior distribution through time. Paths of the parameters provide additional information on
time-variation of parameters. Estimates of the posterior from the sequential algorithm are close
to the batch estimates. The sequential algorithm indicates that the posterior distribution of this
DSGE model varies significantly over time. I describe how it can be used to complement the
2
methodology proposed by Fernandez-Villaverde and Rubio-Ramırez (2007b).
It is important to recognize that the use of SMC methods in this paper is distinct from how
other authors use them in the DSGE literature. Recent articles on likelihood-based inference for
DSGE models by and An and Schorfeide (2007) have emphasized the importance of computing
higher order nonlinear approximations. Particle filters are then used to approximate the log-
likelihood function. Higher order approximations generally improve the identification of some
parameters. In order to focus on the methodology, I borrow a small New Keynesian model from
Rabanal and Rubio-Ramırez (2005). Although I consider only first-order approximations in this
paper, the methods described here can be used for nonlinear DSGE models as these will most
likely become the standard over time.
2 A Basic New Keynesian Model
The model I consider in this paper is the EHL model from Rabanal and Rubio-Ramırez (2005).1
The EHL model is a standard New Keynesian model based on theoretical work by Erceg, Hen-
derson, and Levin (2000), who combined staggered wage contracts with sticky prices using the
mechanism described in Calvo (1983). As a derivation of the model is described in Rabanal
and Rubio-Ramırez (2005), their appendix, and their references, I highlight only the dynamic
equations that describe equilibrium in order to provide an economic interpretation of the pa-
rameters.2 These equations are the log-linear approximation of the first-order conditions and
exogenous driving variables around the steady state. All variables are in log-deviations from
their steady state values.
The model includes an Euler equation relating output growth to the real interest rate
where yt denotes output, rt is the nominal interest rate, gt is a shock to preferences, pt is the
price level, and σ is the elasticity of intertemporal substitution.
The production and real marginal cost of production functions are given by
yt = at + (1 − δ)nt mct = wt − pt + nt − yt, (2)
where at is a technology shock, nt are the number of hours worked, mct is real marginal cost,
wt is the nominal wage, and δ is the capital share of output. The marginal rate of substitution
between consumption and hours worked is described by
mrst =1
σyt + γnt − gt, (3)
1Please note that this section closely follows Section 2 of Rabanal and Rubio-Ramırez (2005).2The model is also well detailed in an excellent set of lecture notes (with code) on solving and estimating
DSGE models by Fernandez-Villaverde and Rubio-Ramırez.
3
where γ denotes the inverse elasticity of labor supply with respect to real wages.
The monetary authorities behavior is assumed to follow a Taylor rule with interest rate
smoothing
rt = ρrrt−1 + (1 − ρr) [γπ∆pt + γyyt] + zt. (4)
The parameters γπ and γy measure the monetary authority’s responses to deviations of inflation
and output from their equilibrium values. The degree of interest rate smoothing is given by
ρr. The Taylor rule also includes an exogenous monetary shock zt. Altogether, the exogenous
shocks are given by
at = ρaat−1 + εat , (5)
gt = ρggt−1 + εgt , (6)
zt = εzt , (7)
λt = ελt , (8)
where the εit are assumed to be i.i.d. normally distributed with variances σ2i .
The representative agent is assumed to follow Calvo (1983) price-setting which determines
the New Keynesian Phillip’s curve
∆pt = βEt∆pt+1 + κp (mct + λt) . (9)
This describes how prices are set by firms based upon their real marginal cost mct, expected
future inflation, and the price mark-up shock λt. The parameter β measures the agent’s rate of
time preference. Rabanal and Rubio-Ramırez (2005) show that the parameter κp is equal to
where I define β1 = 1 and βn = 0 for n > 1. At smaller values of ζn, the density πn (x) is
flatter and particles can move around the state space more freely. In the simulated tempering
literature, the particles are “hotter” because of this free movement. As ζn gradually gets larger,
the densities within the sequence get closer to the posterior density and will be equal to the
posterior density when ζp = 1. Particles are no longer able to move around the state space as
easily and hence they are “cooled.”
For the sequence of cooling parameters, I chose a linear schedule starting at ζ1 = 0 with
differentials ζn − ζn−1 = 1/p that end with ζp = 1. I found that this combination of algorithm
parameters resulted in a minor number of resampling steps during a run, typically 6 or less.
In general, a user may optimize the performance of the algorithm by altering the schedule. It
may be preferrable to have tempering parameters changing slower at the beginning and then
gradually increasing. The easiest way to implement this is with a piecewise linear schedule.
However, it is interesting to view the performance of the algorithm for the linear choice.
Defining the sequence of distributions and consequently the initial importance density η1 (x)
may differ from one application to the next. Del Moral, Doucet, and Jasra (2006b) and Jasra,
Stephens, and Holmes (2007) define a slightly different sequence than (23) which results in
simulating from the prior distribution of their model as the initial target density. In this paper,
I set µ (Θ) to be a normal distribution centered at the mode with a covariance matrix equal to
the curvature at the mode. This strategy is often used for the proposal density in independence
M-H and IS algorithms; see, e.g. An and Schorfeide (2007). Note that the importance weights
are equal to one at the first iteration.
After the initial iteration, choices for the forward and backward Markov kernels are roughly
limited to M-H moves for DSGE models. After implementing many alternative algorithms, I
concluded that simple random-walk Metropolis moves performed well. Two issues to consider
are whether all the components of Θ should be moved at once or in smaller sub-blocks and
what are the best covariance matrices for the random walk steps. Moving all the components
of Θ individually or in small blocks would be ideal. This will generally increase the diversity
of particles. However, this substantially increases the number of times the likelihood must be
calculated and the number of tuning parameters. In my experience, moving all the particles at
once or in a small number of blocks (2 or 3) provided an efficient use of computing time for the
simulated tempering approach. In Section moved all components of Θ in one block.
For the covariance matrices on the normally distributed random walk proposals, I use the
13
particle system to compute the empirical estimates
En [Θ] =
∑Ni=1W
(i)n Θ(i)
∑Ni=1W
(i)n
Vn [Θ] =
∑Ni=1W
(i)n
(
Θ(i) − En [Θ]) (
Θ(i) − En [Θ])′
∑Ni=1W
(i)n
(24)
of the covariance matrix Vn [Θ] at each iteration. The idea for using this as the covariance matrix
comes from Chopin (2002). The acceptance rates for this type of random walk proposal were
typically in the 30-50% range. To ensure they remain there, I have the algorithm appropriately
adjust a scale parameter on the covariance matrix if successive iterations’ acceptance rates are
too low or high. Consequently, the SMC sampler I have built is able to tune itself. The only
tuning parameters that need to be chosen by the user are the simulated tempering parameters.
Using the above random-walk Metropolis move means that the incremental weights (22) can
be applied, which in this particular case are
ωn (xn−1, xn) =πn (xn−1)
ζn
πn−1 (xn−1)ζn−1
= [π (y1:T |Θn−1)π (Θn−1)]ζn−ζn−1 (25)
Implementing this SMC sampler requires only slightly more coding than an MCMC sampler as
one only needs to implement a resampling algorithm and compute the covariance matrices in
(24).
4.2 SMC Samplers for Sequential Bayesian Estimation
Algorithms for sequential Bayesian estimation are considerably more difficult to design than the
simulated tempering approach. There are several reasons why. Parameters in DSGE models are
likely to be highly unstable. The values of some of the parameters may change substantially in
a small number of observations. Adding an observation at each iteration is unlikely to change
the value of all the components of a particle. However, it will often impact sub-blocks of it.
These facts have several consequences. Using the particle approximation of the distribution
at iteration n−1 to create an importance density for the next iteration (as in simulated tempering
above) may not work. Two neighboring posterior distributions within the sequence may be quite
different. The algorithm needs to create an importance density to approximate the next density
rather than the posterior at the last iteration. Moving all the parameters in one block will likely
be ineffective as well. Sequential estimators may need to be tailored for each model rather than
the generic approach with simulated tempering above.
For example, I implemented sequential estimator from Chopin (2002) and unfortunately it did
not perform well for the DSGE model considered here. His algorithm moves all the components
of a particle in one block by a forward Markov kernel that is an independent M-H proposal.
The particles are moved only when the ESS falls below a given threshold. And, the mean and
14
covariance matrix of the M-H move are equal to the mean and the covariance matrix at the
previous iteration; i.e. given by (24). I found that the parameters within the DSGE model are
too unstable for this forward kernel. Chopin’s algorithm did not propose particles far enough
into the state space at each iteration. Empirical estimates of the covariance matrix gradually
got smaller at each iteration, while the acceptance rates gradually increased. Eventually, the
tails of each marginal distribution began to be severely underestimated. I implemented other
algorithms that moved all the components of a particle in one block and each worked poorly.
Consequently, I built an algorithm whose forward Markov kernel moves all the components
individually at each iteration. I found that individual moves led to better estimates of the tails
of the distribution. Each sub-component of the forward kernel is a random walk Metropolis step
with a normal distribution for the proposal. The scales on the proposal of each component are
allowed to adapt over time in order for the acceptance rates to remain in the 30-50% range.
This can be implemented with a simple conditional statement in the code.
While this improved the estimates substantially, the posterior means for some of the param-
eters at the final iteration were not in agreement with those from the batch algorithm. The
parameters in the model still change too significantly at occasional points in time. The for-
ward kernel currently proposes moves only locally and does not account for large changes in
parameters. Consequently, I implement a forward Markov kernel using a mixture of normal
distributions. The idea is simply to have one component of the mixture explore locally while
another component proposes large moves. This idea originates in work from Cappe, Guillin,
Marin, and Robert (2004) who use a mixture kernel in a batch estimation setting.
The mixture I use has two components each of which is determined by an indicator func-
tion that gets drawn first for each particle at each iteration. With probability α, I move all
components of a particle in individual blocks using a random-walk Metropolis step as above. In
the second component of the new forward kernel, all components of a particle are moved jointly
using an independent M-H step. This occurs with probability 1 − α. The proposal distribution
for this move is a normal distribution whose mean and covariance matrix are computed at the
previous iteration; i.e. given by (24). I set α = 0.95 for all iterations, although it is possible to
have time-varying random/deterministic probabilities on the components of the mixture. The
purpose of this step is to propose large moves to accomodate large changes in parameters. The
acceptance rates for this component are expected to be quite low.
The target density at iteration n is now defined as πn (x) ∝ π (y1:n|Θ)π (Θ). The incremental
weights (19) can be applied, which are
ωn (xn−1, xn) =πn (xn−1)
πn−1 (xn−1)=
π (y1:n|Θn−1)
π (y1:n−1|Θn−1)(26)
15
In Section algorithm starting with 35 observations. For an initial importance density, I run
the simulated tempering algorithm from Section 4.1 using the first 35 observations for a small
number of iterations p. This leads to a set of particles that accurately represent the target at
observation 35. The incremental weights at the first iteration are equal to 1 as the particles are
draws from the initial target.
4.3 Discussion of the Sequential Bayesian Algorithm
As currently implemented, the sequential algorithm is equivalent to running MCMC for each
sample size but only considerably faster. Estimates of parameters are one-sided in the sense
that they only use past data. Unlike a filter, the past observations are equally weighted when
estimating the posterior of the parameters. A researcher would ideally model the evolution of the
parameters over time and past observations would receive less weight. Smoothed estimates may
also be computed. Modeling the evolution of all the parameters simultaneously is unfortunately
impossible in DSGE models. Fernandez-Villaverde and Rubio-Ramırez (2007b) model their
evolution one at a time, as this remains feasible. The sequential algorithm in this paper can be
used to complement their approach. It can be used to determine which parameters are the most
unstable and which parameters are poorly identified. Output of the sequential algorithm also
provides information on the interdependencies between unstable parameters.
There also exists an issue about how one interprets the algorithm economically. I believe it
is possible to give the algorithm a learning interpretation in the spirit of Evans and Honkapohja
(2001). The agent is assumed to know the functional form of the model but is uncertain about its
parameters, creating a deviation from rational expectations. Conditional on time t information,
an agent solves the model with N different parameter values (one for each particle). Each model
then has a probability placed on it (the importance weight) and parameters are estimated by the
agent as averages across all the models. The agent then faces the same identification problems
as economists.
In theory, one could also design SMC samplers to account for uncertainty over the functional
form of the model, i.e. probability is extended over the model-space. This would be compu-
tationally challenging and may not perform well unless the models were significantly different
enough for the data to differentiate between them. An unanswered question in this approach
still remains. If the parameters are unstable, at what point does the Bayesian agent recognize
within the model that he can no longer trust models within his model space? In other words,
new models may need to be added to the model space over time.
16
5 Estimation Results
5.1 Simulated Tempering SMC Sampler on Simulated Data
I simulated 190 observations out of the linearized EHL model with parameters set equal to the
posterior estimates from the actual dataset, which are in Table below. I ran the SMC sampler
with simulated tempering (labeled SMC-st) using p = 500 iterations and N = 2000, 4000,
and 6000 particles. The second number of particles results in an algorithm with slightly more
computational time than the MCMC algorithm in RR (2005) who used 2 million draws. The
systematic resampling algorithm of Whitley (1994)/Carpenter, Clifford, and Fearnhead (1999)
was used to perform resampling when the ESS fell below 50% of the particle size.
I ran an MCMC algorithm for 4 million draws with an additional burn-in of 200,000 iter-
ations. The MCMC algorithm uses a random-walk Metropolis proposal at each iteration (the
choice of the covariance matrix for the proposals is discussed below). I then used every 2,000th
draw to compute the estimates shown in Table 2. I also report estimates from a regular IS al-
gorithm with 50,000 draws. The IS algorithm followed An and Schorfeide (2007) who suggested
using a Student’s-t distribution as an importance density with the mean set at the posterior
mode and covariance matrix equal to a scaled version of the asymptotic covariance matrix.
The degrees of freedom were set to three while I tried many different scale parameters on the
covariance matrix. Setting it equal to 1.75 performed the best.
Table MCMC, and regular IS algorithms. It is easy to see that the IS algorithm estimates
the posterior density poorly for several of the parameters (σ, θp, γ, σa, σλ, σg). In particular,
the estimates of the standard deviations are poor, which is typical when applying IS on a higher
dimensional problem. Looking at the parameter estimates alone does not provide a complete
picture of the performance of the algorithm. Table reports the ESS which indicates that only 8
draws out of 50,000 are contributing to the IS estimator.
Meanwhile, the ESS values are significantly higher for SMC-st. Interestingly, the mean of
the importance weights is almost equal to the theoretical equivalent of W = 1N
. Convergence of
the average weight toward this value is plotted recursively in Figure importance density at the
final iteration is almost equal to the posterior.6
In addition to the ESS, I computed the Wald and score statistics from Koopman, Shephard,
and Creal (2007) (see their paper for the construction of the tests).7 The tests are designed
6This can still be misleading. Although the SMC importance density at the final iteration and the targetdensity are close, there is no guarantee that particles exist in all areas of the support as they are not simulatedfrom this distribution.
7The test statistics of Koopman, Shephard, and Creal (2007) use the fact that importance sampling weightsare i.i.d.. This does not hold for SMC algorithms. Resampling of the particles causes the importance weights tobe correlated. The SMC-st algorithms resample rarely enough that i.i.d. importance weights may not be a bad
17
Table 2: Posterior estimates on simulated data.
MCMC IS SMC-st SMC-st SMC-st
- N = 50000 N = 2000 N = 4000 N = 6000
σ−1 5.68(1.84)
5.82(3.30)
5.86(1.83)
5.73(1.88)
5.77(1.91)
11−θp
4.65(0.29)
4.93(0.17)
4.65(0.29)
4.65(0.29)
4.65(0.29)
11−θw
2.78(0.23)
2.93(0.10)
2.78(0.23)
2.78(0.23)
2.78(0.23)
γ1.60
(0.28)1.52
(0.15)1.62
(0.28)1.62
(0.28)1.62
(0.28)
ρr0.76
(0.03)0.75
(0.04)0.76
(0.02)0.76
(0.02)0.76
(0.03)
γy0.30
(0.05)0.32
(0.03)0.30
(0.05)0.30
(0.05)0.30
(0.05)
γπ1.16
(0.12)1.19
(0.11)1.18
(0.12)1.18
(0.12)1.18
(0.12)
ρa0.76
(0.04)0.80
(0.04)0.76
(0.05)0.76
(0.04)0.76
(0.04)
ρg0.83
(0.03)0.83
(0.02)0.83
(0.03)0.83
(0.03)0.83
(0.03)
σa(%)3.78
(1.00)4.02
(0.72)4.25
(1.09)4.24
(1.08)4.23
(1.07)
σz(%)0.35
(0.02)0.35
(0.02)0.34
(0.02)0.34
(0.02)0.34
(0.02)
σλ(%)33.77(4.22)
34.76(2.34)
33.69(4.21)
33.70(4.16)
33.69(4.16)
σg(%)8.87
(2.25)12.88(3.49)
9.11(2.28)
8.94(2.30)
8.98(2.35)
to detect if the variance of the importance sampling weights is finite. As noted by Koopman,
Shephard, and Creal (2007), the assumption of a finite variance is almost never checked in either
frequentist or Bayesian applications of importance sampling in economics. These values are given
in the bottom rows of Table 3, where the 1, 5, and 10% levels indicate the percentage of weights
used to calculate the tests. The tests reject the null hypothesis of a finite variance for large
positive values of the statistic relative to a standard normal random variable. The existence of a
finite variance for the IS algorithm is easily rejected by both statistics. As explained in Robert
and Casella (2004), the law of large numbers still holds for the IS estimator but the central
limit theorem does not. The convergence of the estimates will be highly unstable and painfully
slow. In repeated runs of the algorithm on the same dataset, the estimates vary wildly from one
run to another. The importance density for the IS estimator is clearly a poor choice for this
assumption while for a particle filter the assumption is unlikely to hold.
18
Table 3: Koopman, Shephard, Creal (2007) test statistics on simulated data.
IS SMC-st SMC-st SMC-stN = 50000 N = 2000 N = 4000 N = 6000