Efficient Estimation with Time-Varying Information and the New Keynesian Phillips Curve ∗ Bertille Antoine Simon Fraser University Bertille [email protected]Otilia Boldea Tilburg University [email protected]August 8, 2017 Abstract Decades of empirical evidence suggest that many macroeconometric and financial models are subject to both instability and identification problems. In this paper, we address both issues under the unified framework of time-varying information, which in- cludes changes in instrument strength, changes in the second moment of instruments, and changes in the variance of moment conditions. We develop a new estimation method that exploits these changes to increase the efficiency of the estimates of the (stable) struc- tural parameters. We estimate a New Keynesian Phillips Curve and obtain more precise estimates of the price indexation and output gap parameters than standard methods. An extensive simulation study shows that our method delivers substantial efficiency gains in finite samples. Keywords: GMM; Weak instruments; Break-point; Change in identification strength. JEL classification: C13, C22, C26, C36, C51. ∗ An earlier version of this paper was circulated as ”Efficient Inference with Time-Varying Identification Strength”. We thank Lars Peter Hansen, St´ ephane Bonhomme, eight anonymous referees, Jaap Abbring, J¨ org Breitung, Bin Chen, Carolina Caetano, Jeff Campbell, Xu Cheng, Valentina Corradi, Frank Diebold, Herman van Dijk, Dennis Fok, Alastair Hall, Lynda Khalaf, Maral Kichian, Frank Kleibergen, Sophocles Mavroeidis, Adam McCloskey, Nour Meddahi, Ulrich M¨ uller, Serena Ng, Eric Renault, Bernard Salani´ e, Frank Schorfheide, Burak Uras, Bas Werker, seminar participants at Boston College, Brown U, Columbia U, Emory, Erasmus U Rotterdam, Georgia State U, Guelph U, LSE, Rochester U, TI Amsterdam, Tilburg U, TSE, U Ottawa, U Penn, U Washington, Western U, and at the conferences: EC2 (Maastricht, 2012), CESG (Kingston, 2012), ESEM (Malaga, 2012; G¨ oteborg, 2013), NESG (Amsterdam, 2013), CIREQ (Montreal, 2014) for their helpful comments. Otilia Boldea acknowledges NWO VENI grant 415-11-001, and the hospitality of the UPenn Economics Department, where part of this research was conducted. 1
68
Embed
Efficient Estimation with Time-Varying Information and the ...baa7/research/AntoineBoldea201708.pdf · [email protected] August 8, 2017 Abstract Decades of empirical evidence suggest
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Efficient Estimation with Time-Varying Information and
Decades of empirical evidence suggest that many macroeconometric and financial
models are subject to both instability and identification problems. In this paper, we
address both issues under the unified framework of time-varying information, which in-
cludes changes in instrument strength, changes in the second moment of instruments,
and changes in the variance of moment conditions. We develop a new estimation method
that exploits these changes to increase the efficiency of the estimates of the (stable) struc-
tural parameters. We estimate a New Keynesian Phillips Curve and obtain more precise
estimates of the price indexation and output gap parameters than standard methods. An
extensive simulation study shows that our method delivers substantial efficiency gains
in finite samples.
Keywords: GMM; Weak instruments; Break-point; Change in identification strength.
JEL classification: C13, C22, C26, C36, C51.
∗An earlier version of this paper was circulated as ”Efficient Inference with Time-Varying Identification Strength”. We thank
Lars Peter Hansen, Stephane Bonhomme, eight anonymous referees, Jaap Abbring, Jorg Breitung, Bin Chen, Carolina Caetano,
Jeff Campbell, Xu Cheng, Valentina Corradi, Frank Diebold, Herman van Dijk, Dennis Fok, Alastair Hall, Lynda Khalaf, Maral
Kichian, Frank Kleibergen, Sophocles Mavroeidis, Adam McCloskey, Nour Meddahi, Ulrich Muller, Serena Ng, Eric Renault,
Bernard Salanie, Frank Schorfheide, Burak Uras, Bas Werker, seminar participants at Boston College, Brown U, Columbia U,
Emory, Erasmus U Rotterdam, Georgia State U, Guelph U, LSE, Rochester U, TI Amsterdam, Tilburg U, TSE, U Ottawa, U
Penn, U Washington, Western U, and at the conferences: EC2 (Maastricht, 2012), CESG (Kingston, 2012), ESEM (Malaga, 2012;
Goteborg, 2013), NESG (Amsterdam, 2013), CIREQ (Montreal, 2014) for their helpful comments. Otilia Boldea acknowledges
NWO VENI grant 415-11-001, and the hospitality of the UPenn Economics Department, where part of this research was conducted.
1
1 Introduction
Magnusson and Mavroeidis (2014) point out that time variation in the data generating process
can be used to improve inference on stable structural parameters in time series models. They
exploit this time variation in the presence of arbitrarily weak identification. Since no point
estimator is fully robust to weak identification, they focus on hypothesis testing. However,
practitioners are often interested in point estimators as well.
In this paper, we exploit time variation in the data generating process to develop more
efficient point estimators of the stable structural parameters in time series models. Our main
identification assumption is that the information about the structural parameters grows with
the sample size in a manner that allows consistent estimation of the model by GMM methods.1
In other words, our estimation results rely on non-weak identification in a subsample of the
data. While such a restriction is not imposed by Magusson and Mavroeidis (2014), it allows
us to efficiently conduct subvector inference.
We focus on linear models and on time variation in the probability limit of the second
derivative of the GMM minimand (which we call limiting Hessian), because the limiting
Hessian summarizes the ”amount of information about the parameters (of the model) that is
contained in the observations of the sample” (Davidson and MacKinnon (2004)). We exploit
three types of time variation in the limiting Hessian: breaks in the reduced form parameters2,
breaks in the second moment of the instruments, and breaks in the variance of the moment
conditions.
Breaks in the reduced form parameters are strongly motivated by the work of Hall, Han and
Boldea (2012) and of Magnusson and Mavroeidis (2014). What is new in this paper is that we
link changes in the reduced form parameters to potential changes in instrument strength over
the sample. Allowing for changes in instrument strength means that the instruments need
not be strong over the entire sample, but that they can be weak or even uninformative over
subsamples. As long as the instruments are not weak in two adjacent subsamples so that all
break-points can be consistently estimated, we provide more efficient point estimators than
currently available.
Breaks in the second moment of instruments are motivated by events such as the Great
Moderation, which caused a variance decline in many macroeconomic variables - see Stock and
Watson (2002). Breaks in the variance of the moment conditions are motivated by financial
crises, which cause a surge in the variance of structural shocks - see Rigobon (2003).
We propose a new estimator, Break-GMM (B-GMM henceforth), that uses these three types
of breaks to compute and stack split-sample moments, reweighing these moments differently
over different subsamples according to the time variation in their variance. We show that
1This does not mean that the identification has to be sufficiently strong over the entire sample. The
identification is allowed to be weak over part of the sample as we explain in the following paragraphs.2For lack of better terminology, we refer to the projection of regressors on instruments as the reduced form.
2
B-GMM is more efficient than the standard full sample GMM. We show that this holds even
when a subsample is completely unidentified.
Because we allow for weaker identification patterns in the data along with changes in
identification strength, our paper contributes to the weak-identification literature: see the
surveys by Stock, Wright, and Yogo (2002), Andrews and Stock (2005), and Hansen, Hausman,
and Newey (2008). It also contributes to the break-point literature, since our results generalize
the estimation methods in Bai and Perron (1998) and Hall, Han and Boldea (2012).
In our simulation study, we first consider a model with one reduced form break. We show
that our estimator for the slope parameter has the smallest RMSE irrespective of the location
of the break. Then we consider a model with no reduced form break, and we find that the
standard deviation of B-GMM is still smaller than that of the GMM.3
We illustrate the use of B-GMM in our empirical analysis, where we estimate a monthly New
Keynesian Phillips curve (NKPC) model with multiple lags which we derive from a structural
new Keynesian model with general inflation indexation. We apply the model to a monthly US
dataset which features the prominent Great Moderation and oil price collapse breaks, and find
that the B-GMM procedure delivers more reliable estimates than standard GMM methods.
We find strong evidence for partial rather than full price indexation. Our estimates also
suggest that prices are re-optimized every three months, close to the price rigidity estimates
at the micro-level - see Klenow and Kryvtsov (2008).
The paper is organized as follows. Section 2 defines the three types of changes and moti-
vates them. Section 3 entails the main contribution of the paper. It introduces the B-GMM
estimator and the associated efficiency results relative to standard GMM methods. Section
4 analyzes three break-point estimators: the Bai and Perron (1998) univariate estimator, the
Qu and Perron (2007) multivariate estimator, as well as a new multivariate estimator which
is better suited for our framework. We also show that these break-point estimators4 for all
three types of changes are consistent at a fast enough rate so that the asymptotic distribution
of the B-GMM estimators is unaffected by the break-point estimation error. Section 5 derives
the asymptotic distribution of the break-point estimators in the absence of a break and for
all three types of changes. We also show that the B-GMM estimator is asymptotically equiv-
alent to the GMM estimator in the absence of a break and under a reasonable homogeneity
assumption for the moments of the data. Section 6 contains the simulation results. Section
7 applies our procedure to the NKPC model and section 8 concludes. The Appendix is orga-
nized as follows: Appendix A contains all the tables and graphs, Appendix B the proofs of all
theorems, and Appendix C contains a generalization of our framework to nonlinear models.
The supplemental appendix contains a general characterization of identification strength, and
a derivation of the NKPC model used in the empirical analysis.
3They should be asymptotically equivalent as shown in section 5 of the paper.4We do not consider the multivariate Qu and Perron (2007) estimator for all types of changes, since it is
not well-suited for our framework as explained in section 4.
3
2 Framework and Examples
Consider the standard linear regression model with p1 exogenous variables Z1t, p2 endogenous
variables Xt, p = p1+ p2 parameters of interest θ0, q valid instruments Zt (q ≥ p) that include
the constant and Z1t, and full column rank matrix Π∗ of size (q, p2):
yt = W ′tθ
0 + ut , with Wt = [Z ′1t X
′t]′ and θ0 = [θ0
′
z1 θ0′
x ]′, (2.1)
X ′t = Z ′
tΠ∗ + v′t . (2.2)
For lack of better terminology, we call (2.1) the structural equation (or structural form)
and (2.2) the reduced form5 (henceforth RF). In this paper, we focus on efficient estimation
of θ0. The standard optimal full sample GMM estimator θGMM minimizes
LGMM(θ) = g′T (θ)N−1u gT (θ) with gT (θ) = T−1
∑Tt=1 gt(θ) , (2.3)
gt(θ) = Zt(yt −W ′tθ), and Nu
P→ Nu = limT
[Var[
√TgT (θ
0)]]≡ AVar[
√TgT (θ
0)].
The standard optimality and efficiency results for θGMM rely on the stationarity of Wt and
Zt and thus, implicitly, on the assumption that the probability limit of the Hessian of the
GMM minimand LGMM does not change over the sample - see e.g. Hall (2005), Theorem 3.4
and Assumption 3.1. In this paper, we show that when this probability limit changes over the
sample, the associated change-points can be used to construct more moment conditions and
more efficient estimators than θGMM .
As mentioned in the introduction, we refer to the probability limit of the Hessian of the
GMM minimand as the limiting Hessian; this is, under regularity conditions,
PlimT
[∂2LGMM (θ0)
∂θ∂θ′
]= 2
(PlimTT
−1∑T
t=1WtZ′t
)N−1u
(PlimTT
−1∑T
t=1 ZtW′t
)
= 2
[E(Z1tZ
′t)
Π∗′Q
]N−1u
[E(ZtZ
′1t) QΠ∗
], (2.4)
where we assumed E(ZtZ′t) = Q. When Π∗, Q, or Nu change over the sample, the limiting
Hessian changes in general, and the associated change-points (or breaks) can be interacted
with the instruments to construct more moment conditions and more efficient GMM estima-
tors6: Π∗ quantifies the instrument strength, Q is related to the instrument variance, and Nu
is related to the variance of the structural error ut. We now motivate each type of break.
5Both equations can, but do not need to, originate from a structural model. In our empirical analysis, (2.1)
originates from a structural model, but the reduced forms (2.2) are the projections of regressors on instru-
ments. The reduced forms should be viewed, throughout the paper, as projections of endogenous regressors
on instruments and not be confused with the full reduced form of a structural model.6Changes in the limiting Hessian of the objective function can be used to construct more efficient estimators
not only for GMM, but for a wide range of estimation methods, linear or nonlinear, least-squares, maximum
likelihood, minimum-distance etc. Analyzing other estimators is beyond the scope of our paper.
4
• Changes in reduced form (RF) parameters. Suppose that the reduced form (2.2)
has a break-point at TRF . We consider the following generalized RF that allows for a change
in the identification strength of the instruments:
X ′t =
Z ′tΠ1
Tα1+ v′t , t ≤ TRF
Z ′tΠ2
Tα2+ v′t , t > TRF
, TRF = [TλRF ] , (2.5)
with λRF the RF break fraction, 0 ≤ αi ≤ 1/2, and Πi two (fixed) matrices of size (q, p2)
for i = 1, 2. Over each stable subsample, all the instruments have the same (unknown)
identification strength characterized by the (drifting) sequence7 T αi : instruments are strong
over subsample i when αi = 0, and weak when αi = 1/2. In other words, the larger αi, the
weaker the associated subsample. In this setting, the break-point TRF may capture two types
of changes: (1) a parameter break with stable identification strength, α1 = α2 and Π1 6= Π2
or (2) a change in identification strength, α1 6= α2.
In our empirical analysis, we estimate a NKPC model - see equations (7.1)-(7.3). The
literature suggests there is evidence for a change in identification strength for NKPC, be-
cause, using similar instrument sets, some US NKPC studies find weak instruments for the
sample 1960-2007 (see Mavroeidis (2005), Dufour, Khalaf and Kichian (2006), Nason and
Smith (2008), Kleibergen and Mavroeidis (2009)), while others find strong instruments over
the sample 1969-2005 (see Zhang, Osborn and Kim (2008, 2009)). The results in Kleibergen
and Mavroeidis (2009, Table 4) indicate that this change in identification strength occurred
around the Great Moderation: their confidence sets for the NKPC parameters are much larger
for 1960-1983 than for 1984-2007, suggesting that identification is stronger in the latter period.
• Changes in the second moment of instruments (SMI). For many macroeconomic
variables, including output and inflation, whose lags are used as instruments in our NKPC
application in section 7, there is strong evidence that their variance has declined sharply after
the Great Moderation, see e.g. Stock and Watson (2002). A change in the variance of the
instruments implies in most cases8 a break in the SMI:
E(ZtZ′t) = Q1, t ≤ TSMI = [TλSMI ] and E(ZtZ
′t) = Q2, t > TSMI . (2.6)
As we prove in section 3, the SMI break-point TSMI can also deliver more efficient estima-
tors of θ0.
7This framework is standard in the weak-identification literature (see e.g. Staiger and Stock (1997)). More
general identification patterns allowing the strength of identification to vary across instruments and directions
of the parameter space are discussed in the supplemental appendix.8The only case when this does not occur is when the decrease in the variance of instruments is equal to the
increase in their expected value squared; in that case, the SMI stays the same. There is no empirical evidence
that the Great Moderation break is of this type.
5
• Changes in the (long-run) variance of moment conditions (VMC). Setting λ ∈(0, 1], we define changes in the VMC below:
AVar
[T−1/2
∑[Tλ]t=1 gt(θ
0)]= λNu,1, [Tλ] ≤ TVMC = [TλVMC ] (2.7)
AVar
[T−1/2
∑Tt=[Tλ+1] gt(θ
0)]= (1− λ)Nu,2, [Tλ] > [TλVMC ]. (2.8)
Since the population moments are gt(θ0) = Ztut, the VMC break TVMC may occur because
of a SMI break or a break in the conditional variance of ut. In our empirical application in
section 7, the Great Moderation break is treated as both a SMI break and a VMC break.
Alternatively, asset return models often exhibit breaks in the conditional variance of their
structural error because the volatility of the structural shocks increases substantially in finan-
cial crises, see e.g. Rigobon (2003). We show in section 3 that VMC breaks can also deliver
more efficient estimators of the structural parameters.9
Therefore, the main contribution of this paper is to propose and analyze the Break-GMM
estimator (B-GMM hereafter), which exploits break-points T 0 (with T 0 ∈ TRF , TSMI , TVMC)in RF, SMI and VMC to increase efficiency of the estimators of the structural parameters θ0,
while maintaining that these parameters do not change over the sample and are sufficiently
strongly identified. As mentioned before, the break T 0 can be used to double the moment
conditions for θ0 in (2.1):
E[Ztut1(t ≤ T 0)] = 0 and E[Ztut1(t > T 0)] = 0.
In section 3, we show that these moment conditions are not redundant in presence of a break,
a result that is closely related to the non-redundancy of split-sample moments discussed in
Proposition 1 in Magnusson and Mavroeidis (2014). More specifically, we analyze the B-
GMM estimators that exploit a RF, a SMI or a VMC break, and we show that the B-GMM
estimators are more efficient than the full sample GMM estimators, and strictly more efficient
under some conditions.
3 The B-GMM Estimator
3.1 B-GMM with a RF break
Consider the model with a RF break that combines (2.1) and (2.5):
yt = W ′tθ
0 + ut , with Wt = [Z ′1t X
′t]′ and θ0 = [θ0
′
z1 θ0′
x ]′,
X ′t =
Z′
tΠ1
Tα1+ v′t , t ≤ TRF
Z′
tΠ2
Tα2+ v′t , t > TRF
, TRF = [TλRF ] ,
9Our result holds whether the regressors are endogenous or not. A proof of this statement is available in
our companion paper Antoine and Boldea (2015).
6
where λRF is the break fraction, 0 ≤ αi ≤ 1/2, and Πi two (fixed) matrices of size (q, p2) for
i = 1, 2. The main goal is to deliver efficient estimators of the structural parameters θ0. We
assume that the break TRF has occurred and is common to all RF equations. Additionally,
we do not know its location, and in this section we impose the high level assumption that it
can be estimated at a fast enough rate, given in Assumption A5 below. We call this estimator
TRF , and in section 4 we discuss both existing break-point estimators such as the ones in Bai
and Perron (1998) and Qu and Perron (1998), and a new multivariate break-point estimator
that all achieve this rate.
We now introduce three estimators of θ0.
• The full sample GMM estimator minimizes the GMM criterion LGMM defined in (2.3)
and ignores the RF break:
θGMM =(W ′ZN−1
u Z ′W)−1 (
W ′ZN−1u Z ′y
).
• The B-2SLS estimator uses first-stage predicted regressors Wt = [Z ′t X
′t]′ that are ob-
tained by interacting the instruments with the RF break TRF = [T λRF ]. It is defined as
in Hall, Han, and Boldea (2012) (HHB henceforth):
θB−2SLS =(∑T
t=1 WtW′t
)−1∑Tt=1 Wtyt, X
′t =
Z ′tΠ1λRF
, t ≤ TRF
Z ′tΠ2λRF
, t > TRF,
where for i = 1, 2, ΠiλRFare the OLS estimators of Πi/T
αi in (2.5), for the subsamples
before and after the estimated break TRF .
• Our proposed B-GMM estimator also interacts the instruments with the RF break:
θB−GMM = argminθ
[g′T (θ)(N
au)
−1gT (θ)],
with gT (θ) =
[ ∑TRF
t=1 Zt(yt −W ′tθ)/TRF∑T
t=TRF+1 Zt(yt −W ′tθ)/(T − TRF )
]and Na
uP→ AVar
(T 1/2gT (θ
0)).
Thus,
θB−GMM =(W ′Z(Na
u)−1Z ′W
)−1 (W ′Z(Na
u)−1Z ′y
), (3.1)
with Z the (T, 2q) matrix defined as Z ′ =
(Z1 · · · ZTRF
0 · · · 0
0 · · · 0 ZTRF+1 · · · ZT
).
B-GMM is the optimal counterpart of B-2SLS, using the optimal weighting matrix Nau . But
note that B-2SLS is not a standard 2SLS estimator. In addition, the standard GMM is not a
version of B-GMM with a particular weighting matrix in place of Nau .
To derive the asymptotic properties of all estimators, we impose the following regularity
assumptions in which ht is defined as: ht ≡ (ut, v′t)
′⊗Zt with ith element ht,i, and∑
1r ≡∑[Tr]
t=1 .
7
Assumption A 1. (Regularity of the break fraction, error terms and reduced form)
(i) 0 < λRF < 1, and the break-point TRF satisfies max(TRF , T − TRF ) ≥ max(q, [ǫT ]).
(ii) - The eigenvalues of S = AVar
(T−1/2
∑Tt=1 ht
)are O(1).
- E(ht,i) = 0 and for some d > 2, ‖ht,i‖d <∞ for t = 1, · · · , T and i = 1, · · · , (p2 + 1)q.
- ht,i is near epoch dependent with respect to some process ξt, ‖ht − E(ht|Gt+mt−m)‖2 ≤ ℓm
with ℓm = O(m−1/2) where Gt+mt−m is a σ-algebra based on (ξt−m, · · · , ξt+m).- ξt is either φ-mixing of size m−d/[2(d−1)] or α-mixing of size m−d/(d−2).
A1 is common for the break-point literature, and is similar to HHB. Part (i) states that
the break-fraction is fixed and that there are enough observations to estimate the parameters
before and after the break-point. Part (ii) allows for general weak dependence in the data.
Assumption A 2. (Regularity of the identification strength)
(i) Let r1T = T α1 , r2T = T α2, α = min(α1, α2), and rT = T α. We assume that α < 1/2, and
that when α = α1 = α2, then it holds that Π1 6= Π2.
(ii) For case (1), where α1 = α2 and Π1 6= Π2, we assume that Πi is of full rank p2 for at least
one i ∈ 1, 2. For case (2), where αi < αj for i, j ∈ 1, 2, i 6= j, we assume that Πi is of full
rank p2.
Note that since the slowest sequence riT is associated with the subsample with the strongest
identification, the sequence rT corresponds to the strongest subsample. A2(i) allows for at
most one subsample to be weakly identified. Thus, when there is no change in identification
strength, the identification cannot be weak throughout. A2(ii) ensures that the moment
conditions are not redundant. Note that this assumption allows for Πj = 0 (meaning that
θ0 can also be unidentified by subsample j) as long as Πi is of full rank for subsample i
(i, j ∈ 1, 2, i 6= j).
Assumption A 3. (Regularity of the instrumental variables)
Let M1r = T−1∑
1r ZtZ′t. Then M1r
P→ M1r, uniformly in r ∈ [0, 1], where we assume that
M1r is positive definite for all r ∈ (0, 1], and that (M1r1 −M1r2) is positive definite for any
r1, r2 ∈ [0, 1] such that r1 > r2.
Assumption A 4. (Regularity of the variances)
AVar
[T−1/2
∑
1r
ht
]= N1r =
(Nu,1r N ′
uv,1r
Nuv,1r Nv,1r
),
uniformly in r ∈ [0, 1], with Nu,1r, Nv,1r of size (q, q), respectively (p2q, p2q). We assume
that N1r is positive definite for all r ∈ (0, 1], and that N1r1 −N1r2 is positive definite for any
r1, r2 ∈ [0, 1] such that r1 > r2.
Assumption A 5. (Convergence rate of the RF break-point estimator)
‖λRF − λRF‖ = OP (T2α−1).
8
A3 and A4 are typical for the break-point literature and are used for proving consistency of
the break-point estimators in section 4. In this section, they are just needed to define limiting
quantities that appear in the asymptotic distributions of GMM and B-GMM. Note that A3
allows for a SMI break that may, or may not, coincide with the RF break, and that A4 allows
for heteroskedasticity in the sample moments of the structural equation and the RF. It also
allows for a VMC break that may, or may not, coincide with the RF break. A5 is the high
level assumption on the break-fraction estimator that we verify in section 4.
The following theorem states the asymptotic properties of the estimators of the structural
parameters. Its proof also shows that A1-A4 are sufficient for consistent estimation of θ0 via
GMM, B-2SLS or B-GMM.
Theorem 1. (Asymptotic normality of θGMM , θB−2SLS and θB−GMM)
Let ΛT = diag(T 1/2Ip1 , T1/2−αIp2) . Under A1 to A4 for GMM, and under A1 to A5 for B-2SLS
and B-GMM, ΛT (θGMM − θ0), ΛT (θB−2SLS − θ0), and ΛT (θB−GMM − θ0) are asymptotically
normally distributed with mean 0 and asymptotic variances, respectively:
VGMM =[(Πa′
1 M1 +Πa′
2 M2)(Nu)−1(M1Π
a1 +M2Π
a2)]−1
VB−2SLS =[Πa′
1 M1 Πa1 +Πa′
2 M2 Πa2
]−1 [Πa′
1 Nu,1Πa1 +Πa′
2 Nu,2Πa2
] [Πa′
1 M1 Πa1 +Πa′
2 M2 Πa2
]−1
VB−GMM =[Πa′
1 M1N−1u,1M1Π
a1 +Πa′
2 M2N−1u,2M2Π
a2
]−1,
with Nu = Nu,1 + Nu,2, Nu,1 = Nu,1λRF, Nu,2 = Nu,11 − Nu,1λRF
, M1 = M1λRF, M2 = M11 −
M1λRF, and, for i, j = 1 or 2 and i 6= j.
Πai =
(Πz1 Πi) αi = αj or αi < αj
(Πz1 O(q,p2)) αj < αiand Πz1 =
[Ip1
O(q−p1,p1)
].
Comments:
(i) The asymptotic normality of θB−2SLS encompasses as a special case the results in HHB,
where α1 = α2 = 0.
(ii) The rates of convergence of estimated parameters of the exogenous variable θ0z1 (standard
rate T 1/2) and the estimated parameters of the endogenous variables θ0x (slower rate T 1/2−α)
are extensions of the results developed by Antoine and Renault (2009) over stable reduced
forms (see e.g. their Theorem 4.1). The rate T 1/2−α comes from the strongest subsample and
holds even when the weakest subsample is genuinely weak, or even unidentified as would be
the case if Πj = 0 for some j. As discussed in Antoine and Renault (2009), reliable inference
on θ0 can be obtained using standard GMM-type formulas without having to know or estimate
the matrix ΛT .
(iii) To our knowledge, the consistency of both θGMM and θB−GMM - even when the weakest
subsample is genuinely weak (that is αj = 1/2) - is new. Hence, ignoring the break-point, as
in θGMM , does not lead to a loss of consistency, because the population moment condition in
A1(ii) still holds. However, using the RF break is crucial for efficiency as shown below.
9
In Theorem 2, we show that B-GMM is at least as efficient as GMM, and provide a necessary
and sufficient condition for its strict efficiency.
Theorem 2. (Efficiency of estimated structural parameters)
(i) Under A1 to A5, VB−GMM ≤ VGMM and VB−GMM ≤ VB−2SLS.10
(ii) Under A1 to A5, VB−GMM = VGMM if and only if N−1u,1M1Π
a1 = N−1
u,2M2Πa2.
(iii) Under A1 to A5, VB−GMM < VGMM if and only if
rank(N−1u,1M1Π
a1 −N−1
u,2M2Πa2) = p. (3.2)
Comments:
(i) Theorem 2 formalizes the result that when there are changes in the limiting Hessian due
to a RF break, using this break leads to more efficient estimators than the standard GMM.
Intuitively, more information can be extracted due to these changes. Also, because B-GMM
uses the optimal weighting matrix, it is more efficient than B-2SLS.
(ii) With a change in identification strength and no exogenous regressors (p = p2), θB−GMM
is strictly more efficient than θGMM . To see this, note that if riT = o(rjT ), then Πaj = O(q,p), so
rank(N−1u,1M1Π
a1 −N−1
u,2M2Πa2) = rank[(−1)i−1N−1
u,iMiΠi] = p. Thus, VB−GMM < VGMM . This
strict efficiency result holds even if the subsample j is weakly identified. Intuitively, when
computing B-GMM, the subsample moments are stacked and ”multiplied by their strength”.
As a result, the influence (variance) of the weak moments disappears at the limit. By contrast,
GMM adds the two moments to obtain the full sample moments. Therefore, the variances of
both the weak and the strong moments show up even asymptotically. Thus, GMM is strictly
less efficient than B-GMM.
(iii) If Πj = 0 for one subsample j 6= i, Theorem 2 still holds. To understand the in-
tuition for this result, suppose there are no exogenous regressors. Then, from Theorem 1,
VB−GMM = (Π′iMiN
−1u,iMiΠi)
−1 and VGMM = (Π′iMiN
−1u MiΠi)
−1. Since Nu,i < Nu, it follows
that VB−GMM < VGMM .
(iv) If in addition to Assumptions A1-A5, we also impose (full sample) conditional ho-
moskedasticity (Var(ut|Zt) = Φu and Q = E(ZtZ′t)), then VB−GMM = VB−2SLS < VGMM , as
shown in the proof of Theorem 2.
(v) Besides A3 and A4, Theorem 2 assumes nothing about the SMI and the VMC. Thus,
the strict-efficiency result in Theorem 2(iii) holds when there is a break in SMI and/or VMC.
Below, we compare B-GMM with GMM in the absence of a SMI and a VMC break. The
following assumption facilitates this comparison.
Assumption A 6. (Homogeneity of the second moments)
(i) (no SMI break) M1r = rQ, with Q =M11 = E(ZtZ′t);
(ii) (no VMC break) N1r = rN , and Nu = Nu,11 = limT→∞ Var(∑T
t=1(Ztut)/√T ).
10For two square matrices VA, VB, we write VA ≤ VB if VA − VB is negative semidefinite, and VA < VB if
VA − VB is negative definite.
10
A6 imposes no time variation in SMI and VMC. Thus, A6(i) does not allow for SMI breaks
and A6(ii) does not allow for VMC breaks.
Comments:
We define B ≡ N−1u,1M1Π
a1 −N−1
u,2M2Πa2. Under A6, B = N−1
u Q(Πa1 − Πa
2).
(i) Under A6 and in the absence of exogenous regressors, B = N−1u Q(Π1 −Π2). Therefore,
with pure structural change, e.g. rank(Π1−Π2) = p, we have that rank(B) = p. In this case,
θB−GMM is always strictly more efficient than θGMM .
(ii) Under A6 and in the presence of exogenous regressors, B = [O(q,p1), N−1u Q(Π1 − Π2)],
and rank(B) < p. Thus, in general, (VB−GMM − VGMM) is positive semi-definite: not all
linear combinations of the B-GMM estimators are strictly more efficient than the same linear
combinations of their GMM counterparts. However, with a change in identification strength,
the B-GMM estimates on the endogenous regressors (θ0x) are strictly more efficient than their
GMM counterparts, i.e. the (p2, p2) lower right block of VGMM − VB−GMM is positive definite
(see Appendix B, proof of Theorem 2).
To summarize, when there is a RF break, B-GMM is more efficient and, in many cases,
strictly more efficient than GMM.
3.2 B-GMM with a SMI break or a VMC break
In this section, we show that B-GMM is still more efficient than GMM if a SMI or a VMC
break is used to construct B-GMM instead of a RF break. A SMI break is defined in (2.6),
and a VMC break is defined in (2.7)-(2.8). We assume below that the magnitude of these
breaks is fixed.11
Assumption A 7. (SMI or VMC break)
(i) SMI break: in (2.6), Q1 −Q2 6= O(q,q).
(ii) VMC break: in (2.7) and (2.8), Nu,1 −Nu,2 6= O(q,q).
We assume first that the RF has no break and then discuss the extension to a RF break
at the end of Corollary 2. We also assume below that the SMI or the VMC break-fraction
λ0 ∈ λSMI , λVMC can be estimated consistently by some break-fraction estimator λ ∈λSMI , λVMC, and we discuss such estimators in section 4.
Assumption A 8. (SMI and VMC break-point estimators)
λ− λ0 = OP (T−1).
11If these breaks were to shrink with the sample size instead, then they would not show up as changes
in the limiting Hessian of the GMM objective function, and would therefore not be of use in constructing
asymptotically more efficient estimators than the full-sample GMM.
11
Note that the above rate of consistency is faster than that of the estimated RF break
fraction; see also A5. The following corollary gives the asymptotic distribution of the GMM
estimators and of the B-2SLS and B-GMM estimators, constructed with λ replacing λRF .
Corollary 1.
Let ΛT = diag(T 1/2Ip1, T1/2−αIp2) as in Theorem 1. Let A1 hold with λRF replaced by λ0, and
A3, A4 hold with Πi = Π of full rank and with α = αi < 1/2 for i = 1, 2. Also let A7(i) or
A7(ii) hold, and let A8 hold for B-2SLS and B-GMM. Then ΛT (θB−2SLS−θ0), ΛT (θGMM−θ0),and ΛT (θB−GMM − θ0) are asymptotically normally distributed with mean 0 and asymptotic
variances:
VGMM =[Πa′(M1 +M2)(Nu)
−1(M1 +M2)Πa]−1
VB−2SLS =[Πa′ (M1 +M2) Π
a]−1 [
Πa′ NuΠa] [Πa′ (M1 +M2) Π
a]−1
VB−GMM =[Πa′(M1N
−1u,1M1 +M2N
−1u,2M2)Π
a]−1
,
with Nu = Nu,1 + Nu,2, Nu,i = Nu,iλ0, Mi = Miλ0 , for i = 1, 2, Πa = (Πz1,Π), and with Mir
and Nu,ir defined in A3 and A4 respectively.
Comments:
(i) As stated in the introduction, B-GMM optimally reweighs the two subsample moment
conditions by their respective variance in that subsample. This is closely related to Chamber-
lain’s (1987) idea of constructing optimal instruments by reweighing the original instruments
with their respective conditional variances.
(ii) If there is a RF break that coincides with T 0 = [Tλ0], then Theorem 1 holds as stated.
We then recommend using λ instead of λRF in constructing B-GMM because of its faster
convergence rate; see A5 and A8. If the RF break is different from the SMI or VMC break,
then Corollary 1 holds over the subsamples with no RF break.
Below, we give conditions under which B-GMM is (strictly) more efficient than GMM. For
the B-2SLS and B-GMM estimators computed with λRF replaced by λ, we have:
Corollary 2.
Let A1 hold with λRF replaced by λ0, and A3, A4 with Π1 = Π2 = Π of full rank and
α1 = α2 = α < 1/2. Also let A7(i) or A7(ii) hold, and let A8 hold. Then:
(i) VB−GMM ≤ VGMM and VB−GMM ≤ VB−2SLS.
(ii) VB−GMM = VGMM if and only if N−1u,1M1Π
a = N−1u,2M2Π
a.
(iii) VB−GMM < VGMM if and only if rank(N−1u,1M1 −N−1
u,2M2)Πa = p.
Comments:
(i) If we use the SMI break-point estimator to construct B-GMM, and A6(ii) holds, there is
no VMC break. In that case, Nu,1 = λ0Nu, Nu,2 = (1−λ0)Nu,M1 = λ0Q1 andM2 = (1−λ0)Q2.
Hence, if B ≡ (N−1u,1M1−N−1
u,2M2)Πa, then B = N−1
u (Q1−Q2)Πa. Since (Q1−Q2) is full rank,
12
rank(B) = rank(Πa) = p, so B-GMM is strictly more efficient than GMM, with or without
exogenous regressors. A similar result holds under A6(i) (no SMI break) when we use the VMC
break-point estimator to construct B-GMM, in which case Nu,1 = λ0Su,1, Nu,2 = (1− λ0)Su,2,
and we need (Su,1 − Su,2) to be full rank.
(ii) Note that if there is a RF break equal to T 0, then Theorem 2 holds as stated. If there
is a RF break not equal to T 0, then Corollary 2 holds over the subsamples with no RF break.
The results in sections 3.1 and 3.2 suggest that we can use the union of RF, SMI and VMC
breaks to obtain more efficient estimates. In practice, there may also be multiple breaks in
RF, SMI or VMC. Theorems 1-2 and their corollaries straightforwardly extend to multiple
breaks.12
4 Break-point estimators
In this section, we discuss three break-point estimators: the univariate estimator in Bai and
Perron (1998) (BP henceforth), the multivariate estimator in Qu and Perron (2007) (QP
henceforth) and a new multivariate estimator which we propose in this section. To understand
the differences between these estimators, it is useful to start with a break in the RF.
We first introduce some notations that simplify the exposition. Let T1λ = [Tλ], and T2λ =
T − [Tλ]. For any parameter estimator or sum, the subscript 1λ refers to estimation or
summation in the segment 1, . . . , T1λ (e.g. Π1λ or∑
1λ), and the subscript 2λ refers to
estimation or summation in the segment T1λ + 1, . . . , T (e.g. Π2λ or∑
2λ).
4.1 RF break-point estimators
Consider one RF break as given in (2.5).13 This break is assumed common and can be
estimated from any RF equation using the univariate break-point estimator in BP. Below we
describe this estimator for the first RF equation.
Let Xt,1 be the first element of the (p2, 1) vector Xt and Πiλ,1 be the first column of the
(q, p2) matrix of OLS estimators of the RF parameters, Πiλ for i = 1, 2. The BP estimator
12For the interested reader, a discussion and theoretical results on the detection and estimation of
multiple breaks can be found in the November 2015 version of this paper, which can be found at
http://www.sfu.ca/∼baa7/research/AntoineBoldeaWP2015. It is important to note that if two adjacent sub-
samples are weakly identified, that particular break cannot be consistently estimated. Our analysis in this
paper only covers break-point estimators that are consistently estimable if they occur, and therefore if there
are multiple breaks, we do not allow for two adjacent samples that are both weakly identified.13Our results are stated for one break but bear a straightforward generalization to multiple breaks.
We consider the framework of section 3.1 with one endogenous regressor X , q valid instruments
(including the intercept), and one break in the reduced form at TRF :
yt = α+Xtβ + σtǫt , Xt =
1 + Z ′
tΠ1 + vt t ≤ TRF
1 + Z ′tΠ2 + vt t > TRF
, E[ǫtZt] = 0, E[vtZt] = 0. (6.1)
The errors (ǫt, vt) are i.i.d. jointly normally distributed with mean 0, variances σ2ǫ = σ2
v = 1
and correlation ρ; the instruments Zt are i.i.d jointly normally distributed with mean zero and
variance-covariance matrix equal to the identity matrix, and independent of (ǫt, vt). Let ιk be
the (k, 1) vector of ones, and let R2i = Var(Z ′
tΠi)/Var(Z′tΠi + vt) be the theoretical R-square
over subsamples i = 1 with t ≤ TRF and i = 2 with t > TRF . Then the model parameters are
chosen such that:
(α β) = (0 0) , Πi = diιq−1 , (i = 1, 2) with d1 =√
R2
1
(q−1)(1−R2
1), d2 = d1 + b .
We consider three versions of the model: homoskedastic (HOM) with σ2t = 1; heteroskedastic
(HET1) with σ2t = (1 Z ′
t)
(1
Zt
)/q ; and heteroskedastic (HET2) of the GARCH(1,1) type,
with σ2t = 0.1 + 0.6u2t−1 + 0.3σ2
t−1 and ut = σtǫt.
To assess the identification strength over each subsample, we report the concentration
parameter, which is defined over each subsample i of size Ti as:
µ2i = TiR
2i /(1− R2
i ).
We are interested in the slope parameter β and we compare the performance of three esti-
mators of β: B-GMM, B-2SLS and GMM. In experiment 1, their performances are evaluated
by computing the Monte-Carlo bias, standard deviations, root-mean squared errors (RMSE),
as well as the length and coverage of corresponding 95% confidence intervals20, for various
configurations of the model. In experiment 2, we investigate these performance measures as a
function of the location of the break. Finally, in experiment 3, we investigate the finite sample
properties of the above estimators when there is no break.
• Experiment 1:
Our benchmark model considers sample size T = 400, endogeneity parameter ρ = 0.5,
one RF break at TRF = 160 and a break size b = 1. We use q = 4 instruments (including
the intercept). The identification strength is strong over each subsample with associated
concentration parameters µ21 = 40 and µ2
2 = 1.2 × 103: the implied reduced form parameters
20The standard errors of each estimator are computed using the formulas in Theorem 1. We use HAC-type
estimators under conditional heteroskedasticity.
20
are d1 = 0.29 and d2 = 1.29, and the theoretical R-squares over each subsample are R21 = 0.2
and R22 = 0.83.
We explore different configurations of the model. First, we decrease µ21 to display weaker
identification in the first subsample, while the second subsample remains strong: µ21 = 8.4
(and R21 = 0.05), and µ2
1 = 1.6 (and R21 = 0.01). The break size is still b = 1, but the implied
reduced form parameters are now d1 = 0.13 and d1 = 0.06, respectively. Then, we consider
larger sample size, T = 800, more instruments, q = 6 or a larger endogeneity parameter,
ρ = 0.75. In all these experiments, the break is assumed to be known, and the results are
displayed in Tables 5 to 7 (for the HOM and HET cases). The results for cases where the break
location is unknown and estimated are displayed in Tables 8 to 10 (for the HOM and HET
cases): three break sizes, 1, 0.5, and 0.2, are considered; µ21 = 40 and d1 = 0.29 throughout,
while d2 = 1.29, 0.79, and 0.49. All the results are based on 5, 000 replications.
When the break is known, the main results do not vary much over the different specifica-
tions. We therefore focus on the benchmark model mentioned at the beginning of Experiment
1. Under homoskedasticity, as expected, B-GMM and B-2SLS are very close in terms of
bias, standard deviation, and RMSE. Their RMSE are significantly smaller than for GMM.
The biases of B-GMM and B-2SLS tend to be larger than that of GMM, but they are well-
compensated by the gains in terms of standard deviation; in addition, when the sample size
increases, these biases decrease as expected. When looking at the 95% confidence intervals
of the slope parameter, B-GMM displays the shortest ones while maintaining good cover-
age properties. Under conditional heteroskedasticity, the standard deviation and RMSE of
B-GMM are much smaller than those of B-2SLS.
When the break-point is treated as unknown, the break size is important for the accuracy
of the estimated break location. With a break size of 1, the estimated break is quite reliable
with an average (over the estimated breaks) very close to the actual break: the average is
161.3 with a true break at 160. When the break size decreases, the quality of the estimator
of the break location deteriorates: for instance, with a true break at 160 and a break size of
0.2, the average is 172.4. Reliable estimation of the location of the break is crucial for the
bias properties of B-GMM and B-2SLS; we can see that when the break is not accurately esti-
mated, their biases increase, and the coverage properties of the confidence intervals worsen.21
This bias should not be too much of a concern, because it only appears when the break size
is small, and, oftentimes, such small breaks cannot be detected; see also experiment 3 below.
• Experiment 2: Performance as a function of the true location of the break.
Given the results of section 3, at least asymptotically, it is always efficient to ”split” the
sample in order to double the number of moments. In finite samples, however, the size of the
21One remedy consists in discarding the data around the estimated break (e.g. in a confidence interval for
the break location). This simple strategy should mitigate the drawback from estimating the break inaccurately.
However, it does require the asymptotic distribution of the break, which is beyond the scope of this paper.
21
subsamples and their identification strength may matter. Therefore, we investigate how the
performance of all estimators varies with the true location of the break and the strength of
identification in each subsample.
We consider two versions of the model in (6.1), all with T = 400, ρ = 0.5, q = 4, d1 = 0.1925,
R21 = 0.1, and break location that changes from (0.1 × 400) to (0.9 × 400): accordingly the
identification strength over the first subsample is characterized by µ21 between 4.4 and 40:
• model (i): the break size is b = −0.385 with associated parameter d2 = −0.1925 and
R22 = 0.1. The identification strength in the second subsample is characterized through
µ22 with values between 40 and 4.4.
• model (ii): the break size is b = −0.5 with associated parameter d2 = −0.3075 and
R22 = 0.22. The identification strength in the second subsample remains somewhat
strong and is characterized through µ22 with values between 102.1 and 11.3.
The results under homoskedasticity and conditional heteroskedasticity are presented in
Figures 3 and 4: two measures of performance are considered, the Monte-Carlo RMSE (left),
and the Monte-Carlo standard deviation (right). All results are based on 5,000 replications.
In model (i), both Monte-Carlo RMSE and standard deviations for B-GMM estimators are
stable as the break location changes from (0.1 × 400) to (0.9 × 400). This is quite different
for GMM. Both its RMSE and standard deviation are larger, and they are both increasing as
a function of the location of the break until it is in the middle of the sample, then they are
decreasing to return to their original levels. Results for model (ii) are very similar.
• Experiment 3: Finite sample properties when there is no break.
We now investigate the finite sample properties of our estimators when there is no RF break,
but a break-point is still estimated and used to compute B-GMM and B-2SLS estimators. We
consider the following versions of the model in (6.1): T = 400 or T = 800, ρ = 0.5, q = 4, and
di = 0.29 for i = 1, 2. There is no break in RF because we set Π1 = Π2 = diι4. However, the
econometrician still believes that there is a break in RF and estimates it in order to compute
B-GMM and B-2SLS; the overall GMM estimator is also computed as a benchmark. Finite
sample properties of these three estimators are evaluated by computing the Monte-Carlo bias,
standard deviations, root-mean squared errors (RMSE), as well as the length and coverage of
corresponding 95% confidence intervals. The results under HOM and HET1 are presented in
Table 11; in Table 12, the sample size is doubled.
First, it is interesting to point out that the estimated break-point is near the middle of
the sample on average. Second, and without much surprise, the finite sample performance of
B-GMM (and B-2SLS) is not as good as standard GMM in absence of a break. Nevertheless,
under both homoskedasticity or conditional heteroskedasticity, the standard deviations and
RMSEs of the three estimators are very close to each other, especially for larger sample sizes;
22
this holds despite the bias of the GMM estimator being much smaller than for B-GMM and
B-2SLS. This supports the (asymptotic) results derived in Theorem 7.
7 The New Keynesian Phillips Curve
7.1 Model
To our knowledge, virtually all papers estimate the New Keynesian Phillips Curve (NKPC)
at the quarterly frequency - see a.o. Sbordone (2002, 2005), Christiano, Eichenbaum and
Evans (2005), Zhang, Osborn and Kim (2008), HHB and Magnusson and Mavroeidis (2014).
This comes with two shortcomings. The first one is small samples, which leads to unreliable
estimation in the presence of multiple breaks. The second is that the US monetary policy
is set twice per quarter, so quarterly NKPC models may not be as informative as monthly
NKPC models.
In this paper, we estimate the NKPC at the monthly frequency, which mitigates these
shortcomings.22 However, we face two additional challenges. The first is data availability,
which we solve in the data section below. The second is how to write down a reasonable
monthly NKPC, given the empirical evidence that prices are indexed to the previous quarter.
With the exception of Zhang, Osborn and Kim (2008), most NKPC papers assume that
when prices cannot be re-optimized, they are either kept fixed or indexed to last quarter’s
inflation and not to more lags of inflation - see Galı and Gertler (1999), Sbordone (2002,
2005), Smets and Wouters (2003), Christiano, Eichenbaum and Evans (2005) and Magnusson
and Mavroeidis (2014). Many of the above papers find strong evidence that prices are indexed
to the previous quarter, and therefore to at least three lags of monthly inflation.23
More than one lag in the NKPC model can be obtained if one imposes dynamics in the
marginal cost process or in the structural shocks - see Krogh (2015) for a summary. It can
also be obtained from more primitive assumptions on the price indexation mechanism. For
example, Zhang, Osborn and Kim (2008) derive a NKPC model with multiple lags, but they
follow Galı and Gertler (1999) in assuming that backward and forward-looking firms co-exist.
The backward-looking firms never optimize and index to past lags of inflation, while the
forward-looking firms either keep their prices fixed or re-optimize when chosen (according to
an exogenous Calvo (1983) type mechanism). We derive the NKPC model in the supplemental
appendix by assuming that all firms are forward-looking when selected to re-optimize. In-
between optimizations, they do not keep their prices fixed, but index them to past lags of
inflation. This is the framework of Smets and Wouters (2003), Woodford (2003, Ch.3), and
22Because of the aforementioned shortcomings, especially issues related to small samples with multiple
breaks, we do not use quarterly data. HHB analyze multiple breaks with quarterly data, but point out in their
empirical section that some subsamples are too small to allow for a reliable interpretation of the results.23In our empirical analysis, we found that more lags are insignificant in the NKPC.
23
Christiano, Eichenbaum and Evans (2005). In the supplemental appendix, we generalize these
models (with sticky prices but not sticky wages) to capture price indexation to ℓ previous lags
rather than one lag of inflation.24
To illustrate the usefulness of the B-GMM estimation procedure for the monthly NKPC, we
first estimate an unconstrained version of the NKPC derived in the supplemental appendix,
where the parameters are not constrained to satisfy the nonlinear restrictions implied by the
where yt is the output gap (in log difference from steady-state output), πt is inflation measured
as the log difference in prices between two periods, and inflation expectations at time t are
denoted πet .25
7.2 Data
We construct our dataset from monthly observations for the US in the period 1978:1-2010:6.
The inflation series is the annualized difference in log CPI taken from the FRED database,
i.e. πt = 1200(logCPIt − log CPIt−1). The output gap is the monthly log real GDP mi-
nus the log potential real GDP, calculated by the HP filter with monthly constant 14400:
yt = 100(log (real GDPt)− log (potential real GDPt)). A monthly real GDP series is avail-
able up to 2010:6 at http://www.princeton.edu/∼mwatson/mgdp gdi. We proxy inflation
expectations with the one year-ahead Michigan inflation expectations πet , the only series of
measured inflation expectations that, to our knowledge, is free and available at monthly fre-
quency.26 The Michigan inflation series starts at 1978:1. After constructing all series and their
lags, our data span is 1978:5-2010:6 (386 observations). Figure 1 plots the data.27
24As summarized by Krogh (2015) and shown in Mavroeidis (2005) and Nason and Smith (2008), second
order dynamics in either the marginal cost or the structural errors are useful for identifying the NKPC. Our
empirical results - see in particular Table 2 - suggest that besides the lags we use, no further dynamics is
needed for identification, since the instruments we use are not weak over the entire sample.25Galı and Gertler (1999) attribute the usual findings of negative and/or insignificant estimates of αy at
quarterly frequency to measurement errors in potential output. They propose using the NKPC model with
output gap replaced by marginal cost, which is not observed and is therefore proxied by the average unit labor
cost. However, to our knowledge, average unit labor costs are not available monthly. Moreover, there is still
a lot of criticism regarding the use of average unit labor cost as a proxy - see e.g. Rudd and Whelan (2005).26Proxying inflation expectations with this series is common in the literature, and even recommended by
Coibion and Gorodnichenko (2015), who argue that this series introduce substantial information about the
unobserved firm level inflation expectations defined in the NKPC.27We find that our results are robust to the inflation outlier around 2008, so we do not remove it.
For all methods, Zt includes an intercept, three lags of inflation, three lags of expected inflation
and three lags of output gap. Since πet and yt are both endogenous, we have two reduced forms,
projecting πet and yt onto the instrument set Zt. Without breaks, they are:
πet = Z ′t∆1 + vt1, (7.2)
yt = Z ′t∆2 + vt2. (7.3)
• SMI break. As discussed in section 2, the variance of the instruments changed after
the Great Moderation, so we have a SMI break. The SMI break can be estimated via our
multivariate FGLS estimator applied to all the unique elements of the ZtZ′t matrix. However,
because Zt contains lags, this is not necessary, and we chose to estimate the SMI break as a
joint break in the multivariate mean of squared inflation squared, squared expected inflation
and squared output gap.28 For these variables, the BP estimators of a break in mean are
1981:9, 1981:11 and 1983:5. Using any of these estimators as a starting value, the FGLS
estimator of the Great Moderation break is 1981:11.
• RF breaks. As discussed in section 2, the RF also features the Great Moderation break,
which we impose at 1981:11 because this estimator converges faster than the RF break-point
estimator of the Great Moderation - see comment (ii) after the Corollary 1. Additionally,
the RF likely features the oil price collapse, see e.g. Bernanke (2004) and Galı and Gambetti
(2008). The BP estimator of the oil price collapse in the sample 1981:12-2010:6 is at 1985:12
for the first RF and at 1985:10 for the second RF. Using any of these estimators as a starting
value, the FGLS estimator of the oil price collapse in the post Great-Moderation sample
1981:12-2010:6 is at 1985:10.29
• VMC breaks. As discussed in section 2, the SMI break is likely a VMC break too and
thus it is already taken into account in the construction of B-GMM.30
7.3.1 Results for the baseline specification
In our baseline specification, the structural parameters of interest in (7.1) are assumed to be
stable over time; robustness checks with potential structural parameter breaks are in section
7.3.2. For the validity of our estimation results with B-GMM in this section, we need suffi-
ciently strong identification as in Assumption A2. Evidence for this is also provided in section
7.3.2.
28Note that lags of these will exhibit the same SMI breaks.29Even though the oil price collapsed in the first half of 1986, it experienced a sharp decline prior to that,
in 1985 - see Gately (1986), Figure 6.30Further robustness checks regarding the number of break-points and the estimated lo-
cations of the breakpoints can be found in the November 2015 version of this paper:
that N2 = OP (N1)OP (ǫ). Similarly, N4 = OP (N3)OP (ǫ). Next, we compare (Π2λ0 − Π∆)
and (Π1λ − Π∆). Since Π1λ and Π∆ are both subsample estimators of Π01, Π1λ − Π∆ =
(Π1λ − Π01)− (Π∆ − Π0
1) = OP (T−1/2) +OP (T
−1/2) = OP (T−1/2). Since Π2λ0 is the estimator
of Π02 in subsample [T 0 + 1, T ], Π2λ0 − Π0
2 = OP (T−1/2), so
Π2λ0 − Π∆ = (Π2λ0 −Π02)− (Π∆ − Π0
1) + Π0∆ = OP (T
−1/2) + Π0∆ = OP (T
−α).
Thus, T α(Π2λ0 − Π∆) = OP (1) and T α(Π1λ − Π∆) = oP (1). By A3, Q∆ = OP (1) for large
enough C. Therefore, N1 = T α(Π2λ0 − Π∆)′Q∆ T α(Π2λ0 − Π∆) = OP (1) for large C, while
N3 = oP (1). All of these show that the probability limit of (T 0 − T )−1T 2α+1 (L1 − L3) is
determined by the probability limit of N1 for small enough ǫ, because N2 = OP (N1ǫ), N4 =
OP (N3ǫ) and thus dominated by N1 for small ǫ. For large enough C, by A3,
N1 = T 2α[OP (T−1/2)− Π0
∆]′Q∆[OP (T
−1/2)− Π0∆] ≥ ||Π0
∆||2 mineig(Q∆) + oP (1) > C∗ + oP (1),
where C∗ is a positive constant because plim Q∆ is pd by A3. This implies that (T 0 −T )−1T 2α+1(L1 − L2) > 0 with positive probability, which cannot hold. This completes the
proof.
48
• Part (ii). For the QP estimator, under the assumptions imposed, Lemma 1 in QP can
be verified by following step by step the proof of Lemma A3, which is exactly as in the
supplemental appendix of QP and omitted for simplicity (intuitively, nothing changes because
the magnitude of the RF break in parameters and in the variance is all that matters for proving
consistency of the QP estimator). Therefore, the convergence rate is as stated in Theorem 3.
• Part (iii). Because λ−λ0 = OP (T2α−1), it can be shown that the asymptotic distribution
of the parameter estimators βiλ = vec(Πiλ) are as if the breaks were known. Therefore, we are
back in the standard regression model framework, and so Σv,t, defined for the FGLS estimator
in two ways, is for both definitions consistent: Σv,tP→ Σv.
Consistency of λ. This proof is similar to part (i). For a given λ, we denote by β1λ, β2λ the
FGLS estimators of the RF parameters obtained from minimizing LFGLS(λ, βi) over β1, β2.
Let δ0β = β02 − β0
1 = T−α[δ0β + o(1)] = O(T−α), where δ0β = (Π2 − Π1) if α1 = α2, δ0β = Π1 if
α1 < α2 and δ0β = −Π2 if α2 < α1. Then by A2, δ0β 6= 0, and:
0 with positive probability, which cannot happen, because by definition, (T 0− T )−1T 2α+1(L1−L2) ≤ 0. It follows that T 0 − T ≤ CT 2α, so λ− λ0 = OP (T
2α−1).
Proof of Theorem 4.
The proof follows exactly the same steps as the proof for Theorem 3, with α replaced by 0,
and for break-point estimators for equation (4.1) instead of equation (2.5).
Proof of Theorem 5.
• Part (i). Consistency of λ. Unlike the other proofs, here we do not prove consistency by
51
contradiction; instead we show consistency by deriving the limit of the minimand directly and
applying the continuous mapping theorem. Let at = jtu2t and at = jtu