Efficient Estimation with Time-Varying Information and
the New Keynesian Phillips Curve∗
Bertille Antoine
Simon Fraser University
Bertille [email protected]
Otilia Boldea
Tilburg University
August 8, 2017
Abstract
Decades of empirical evidence suggest that many macroeconometric and financial
models are subject to both instability and identification problems. In this paper, we
address both issues under the unified framework of time-varying information, which in-
cludes changes in instrument strength, changes in the second moment of instruments,
and changes in the variance of moment conditions. We develop a new estimation method
that exploits these changes to increase the efficiency of the estimates of the (stable) struc-
tural parameters. We estimate a New Keynesian Phillips Curve and obtain more precise
estimates of the price indexation and output gap parameters than standard methods. An
extensive simulation study shows that our method delivers substantial efficiency gains
in finite samples.
Keywords: GMM; Weak instruments; Break-point; Change in identification strength.
JEL classification: C13, C22, C26, C36, C51.
∗An earlier version of this paper was circulated as ”Efficient Inference with Time-Varying Identification Strength”. We thank
Lars Peter Hansen, Stephane Bonhomme, eight anonymous referees, Jaap Abbring, Jorg Breitung, Bin Chen, Carolina Caetano,
Jeff Campbell, Xu Cheng, Valentina Corradi, Frank Diebold, Herman van Dijk, Dennis Fok, Alastair Hall, Lynda Khalaf, Maral
Kichian, Frank Kleibergen, Sophocles Mavroeidis, Adam McCloskey, Nour Meddahi, Ulrich Muller, Serena Ng, Eric Renault,
Bernard Salanie, Frank Schorfheide, Burak Uras, Bas Werker, seminar participants at Boston College, Brown U, Columbia U,
Emory, Erasmus U Rotterdam, Georgia State U, Guelph U, LSE, Rochester U, TI Amsterdam, Tilburg U, TSE, U Ottawa, U
Penn, U Washington, Western U, and at the conferences: EC2 (Maastricht, 2012), CESG (Kingston, 2012), ESEM (Malaga, 2012;
Goteborg, 2013), NESG (Amsterdam, 2013), CIREQ (Montreal, 2014) for their helpful comments. Otilia Boldea acknowledges
NWO VENI grant 415-11-001, and the hospitality of the UPenn Economics Department, where part of this research was conducted.
1
1 Introduction
Magnusson and Mavroeidis (2014) point out that time variation in the data generating process
can be used to improve inference on stable structural parameters in time series models. They
exploit this time variation in the presence of arbitrarily weak identification. Since no point
estimator is fully robust to weak identification, they focus on hypothesis testing. However,
practitioners are often interested in point estimators as well.
In this paper, we exploit time variation in the data generating process to develop more
efficient point estimators of the stable structural parameters in time series models. Our main
identification assumption is that the information about the structural parameters grows with
the sample size in a manner that allows consistent estimation of the model by GMM methods.1
In other words, our estimation results rely on non-weak identification in a subsample of the
data. While such a restriction is not imposed by Magusson and Mavroeidis (2014), it allows
us to efficiently conduct subvector inference.
We focus on linear models and on time variation in the probability limit of the second
derivative of the GMM minimand (which we call limiting Hessian), because the limiting
Hessian summarizes the ”amount of information about the parameters (of the model) that is
contained in the observations of the sample” (Davidson and MacKinnon (2004)). We exploit
three types of time variation in the limiting Hessian: breaks in the reduced form parameters2,
breaks in the second moment of the instruments, and breaks in the variance of the moment
conditions.
Breaks in the reduced form parameters are strongly motivated by the work of Hall, Han and
Boldea (2012) and of Magnusson and Mavroeidis (2014). What is new in this paper is that we
link changes in the reduced form parameters to potential changes in instrument strength over
the sample. Allowing for changes in instrument strength means that the instruments need
not be strong over the entire sample, but that they can be weak or even uninformative over
subsamples. As long as the instruments are not weak in two adjacent subsamples so that all
break-points can be consistently estimated, we provide more efficient point estimators than
currently available.
Breaks in the second moment of instruments are motivated by events such as the Great
Moderation, which caused a variance decline in many macroeconomic variables - see Stock and
Watson (2002). Breaks in the variance of the moment conditions are motivated by financial
crises, which cause a surge in the variance of structural shocks - see Rigobon (2003).
We propose a new estimator, Break-GMM (B-GMM henceforth), that uses these three types
of breaks to compute and stack split-sample moments, reweighing these moments differently
over different subsamples according to the time variation in their variance. We show that
1This does not mean that the identification has to be sufficiently strong over the entire sample. The
identification is allowed to be weak over part of the sample as we explain in the following paragraphs.2For lack of better terminology, we refer to the projection of regressors on instruments as the reduced form.
2
B-GMM is more efficient than the standard full sample GMM. We show that this holds even
when a subsample is completely unidentified.
Because we allow for weaker identification patterns in the data along with changes in
identification strength, our paper contributes to the weak-identification literature: see the
surveys by Stock, Wright, and Yogo (2002), Andrews and Stock (2005), and Hansen, Hausman,
and Newey (2008). It also contributes to the break-point literature, since our results generalize
the estimation methods in Bai and Perron (1998) and Hall, Han and Boldea (2012).
In our simulation study, we first consider a model with one reduced form break. We show
that our estimator for the slope parameter has the smallest RMSE irrespective of the location
of the break. Then we consider a model with no reduced form break, and we find that the
standard deviation of B-GMM is still smaller than that of the GMM.3
We illustrate the use of B-GMM in our empirical analysis, where we estimate a monthly New
Keynesian Phillips curve (NKPC) model with multiple lags which we derive from a structural
new Keynesian model with general inflation indexation. We apply the model to a monthly US
dataset which features the prominent Great Moderation and oil price collapse breaks, and find
that the B-GMM procedure delivers more reliable estimates than standard GMM methods.
We find strong evidence for partial rather than full price indexation. Our estimates also
suggest that prices are re-optimized every three months, close to the price rigidity estimates
at the micro-level - see Klenow and Kryvtsov (2008).
The paper is organized as follows. Section 2 defines the three types of changes and moti-
vates them. Section 3 entails the main contribution of the paper. It introduces the B-GMM
estimator and the associated efficiency results relative to standard GMM methods. Section
4 analyzes three break-point estimators: the Bai and Perron (1998) univariate estimator, the
Qu and Perron (2007) multivariate estimator, as well as a new multivariate estimator which
is better suited for our framework. We also show that these break-point estimators4 for all
three types of changes are consistent at a fast enough rate so that the asymptotic distribution
of the B-GMM estimators is unaffected by the break-point estimation error. Section 5 derives
the asymptotic distribution of the break-point estimators in the absence of a break and for
all three types of changes. We also show that the B-GMM estimator is asymptotically equiv-
alent to the GMM estimator in the absence of a break and under a reasonable homogeneity
assumption for the moments of the data. Section 6 contains the simulation results. Section
7 applies our procedure to the NKPC model and section 8 concludes. The Appendix is orga-
nized as follows: Appendix A contains all the tables and graphs, Appendix B the proofs of all
theorems, and Appendix C contains a generalization of our framework to nonlinear models.
The supplemental appendix contains a general characterization of identification strength, and
a derivation of the NKPC model used in the empirical analysis.
3They should be asymptotically equivalent as shown in section 5 of the paper.4We do not consider the multivariate Qu and Perron (2007) estimator for all types of changes, since it is
not well-suited for our framework as explained in section 4.
3
2 Framework and Examples
Consider the standard linear regression model with p1 exogenous variables Z1t, p2 endogenous
variables Xt, p = p1+ p2 parameters of interest θ0, q valid instruments Zt (q ≥ p) that include
the constant and Z1t, and full column rank matrix Π∗ of size (q, p2):
yt = W ′tθ
0 + ut , with Wt = [Z ′1t X
′t]′ and θ0 = [θ0
′
z1 θ0′
x ]′, (2.1)
X ′t = Z ′
tΠ∗ + v′t . (2.2)
For lack of better terminology, we call (2.1) the structural equation (or structural form)
and (2.2) the reduced form5 (henceforth RF). In this paper, we focus on efficient estimation
of θ0. The standard optimal full sample GMM estimator θGMM minimizes
LGMM(θ) = g′T (θ)N−1u gT (θ) with gT (θ) = T−1
∑Tt=1 gt(θ) , (2.3)
gt(θ) = Zt(yt −W ′tθ), and Nu
P→ Nu = limT
[Var[
√TgT (θ
0)]]≡ AVar[
√TgT (θ
0)].
The standard optimality and efficiency results for θGMM rely on the stationarity of Wt and
Zt and thus, implicitly, on the assumption that the probability limit of the Hessian of the
GMM minimand LGMM does not change over the sample - see e.g. Hall (2005), Theorem 3.4
and Assumption 3.1. In this paper, we show that when this probability limit changes over the
sample, the associated change-points can be used to construct more moment conditions and
more efficient estimators than θGMM .
As mentioned in the introduction, we refer to the probability limit of the Hessian of the
GMM minimand as the limiting Hessian; this is, under regularity conditions,
PlimT
[∂2LGMM (θ0)
∂θ∂θ′
]= 2
(PlimTT
−1∑T
t=1WtZ′t
)N−1u
(PlimTT
−1∑T
t=1 ZtW′t
)
= 2
[E(Z1tZ
′t)
Π∗′Q
]N−1u
[E(ZtZ
′1t) QΠ∗
], (2.4)
where we assumed E(ZtZ′t) = Q. When Π∗, Q, or Nu change over the sample, the limiting
Hessian changes in general, and the associated change-points (or breaks) can be interacted
with the instruments to construct more moment conditions and more efficient GMM estima-
tors6: Π∗ quantifies the instrument strength, Q is related to the instrument variance, and Nu
is related to the variance of the structural error ut. We now motivate each type of break.
5Both equations can, but do not need to, originate from a structural model. In our empirical analysis, (2.1)
originates from a structural model, but the reduced forms (2.2) are the projections of regressors on instru-
ments. The reduced forms should be viewed, throughout the paper, as projections of endogenous regressors
on instruments and not be confused with the full reduced form of a structural model.6Changes in the limiting Hessian of the objective function can be used to construct more efficient estimators
not only for GMM, but for a wide range of estimation methods, linear or nonlinear, least-squares, maximum
likelihood, minimum-distance etc. Analyzing other estimators is beyond the scope of our paper.
4
• Changes in reduced form (RF) parameters. Suppose that the reduced form (2.2)
has a break-point at TRF . We consider the following generalized RF that allows for a change
in the identification strength of the instruments:
X ′t =
Z ′tΠ1
Tα1+ v′t , t ≤ TRF
Z ′tΠ2
Tα2+ v′t , t > TRF
, TRF = [TλRF ] , (2.5)
with λRF the RF break fraction, 0 ≤ αi ≤ 1/2, and Πi two (fixed) matrices of size (q, p2)
for i = 1, 2. Over each stable subsample, all the instruments have the same (unknown)
identification strength characterized by the (drifting) sequence7 T αi : instruments are strong
over subsample i when αi = 0, and weak when αi = 1/2. In other words, the larger αi, the
weaker the associated subsample. In this setting, the break-point TRF may capture two types
of changes: (1) a parameter break with stable identification strength, α1 = α2 and Π1 6= Π2
or (2) a change in identification strength, α1 6= α2.
In our empirical analysis, we estimate a NKPC model - see equations (7.1)-(7.3). The
literature suggests there is evidence for a change in identification strength for NKPC, be-
cause, using similar instrument sets, some US NKPC studies find weak instruments for the
sample 1960-2007 (see Mavroeidis (2005), Dufour, Khalaf and Kichian (2006), Nason and
Smith (2008), Kleibergen and Mavroeidis (2009)), while others find strong instruments over
the sample 1969-2005 (see Zhang, Osborn and Kim (2008, 2009)). The results in Kleibergen
and Mavroeidis (2009, Table 4) indicate that this change in identification strength occurred
around the Great Moderation: their confidence sets for the NKPC parameters are much larger
for 1960-1983 than for 1984-2007, suggesting that identification is stronger in the latter period.
• Changes in the second moment of instruments (SMI). For many macroeconomic
variables, including output and inflation, whose lags are used as instruments in our NKPC
application in section 7, there is strong evidence that their variance has declined sharply after
the Great Moderation, see e.g. Stock and Watson (2002). A change in the variance of the
instruments implies in most cases8 a break in the SMI:
E(ZtZ′t) = Q1, t ≤ TSMI = [TλSMI ] and E(ZtZ
′t) = Q2, t > TSMI . (2.6)
As we prove in section 3, the SMI break-point TSMI can also deliver more efficient estima-
tors of θ0.
7This framework is standard in the weak-identification literature (see e.g. Staiger and Stock (1997)). More
general identification patterns allowing the strength of identification to vary across instruments and directions
of the parameter space are discussed in the supplemental appendix.8The only case when this does not occur is when the decrease in the variance of instruments is equal to the
increase in their expected value squared; in that case, the SMI stays the same. There is no empirical evidence
that the Great Moderation break is of this type.
5
• Changes in the (long-run) variance of moment conditions (VMC). Setting λ ∈(0, 1], we define changes in the VMC below:
AVar
[T−1/2
∑[Tλ]t=1 gt(θ
0)]= λNu,1, [Tλ] ≤ TVMC = [TλVMC ] (2.7)
AVar
[T−1/2
∑Tt=[Tλ+1] gt(θ
0)]= (1− λ)Nu,2, [Tλ] > [TλVMC ]. (2.8)
Since the population moments are gt(θ0) = Ztut, the VMC break TVMC may occur because
of a SMI break or a break in the conditional variance of ut. In our empirical application in
section 7, the Great Moderation break is treated as both a SMI break and a VMC break.
Alternatively, asset return models often exhibit breaks in the conditional variance of their
structural error because the volatility of the structural shocks increases substantially in finan-
cial crises, see e.g. Rigobon (2003). We show in section 3 that VMC breaks can also deliver
more efficient estimators of the structural parameters.9
Therefore, the main contribution of this paper is to propose and analyze the Break-GMM
estimator (B-GMM hereafter), which exploits break-points T 0 (with T 0 ∈ TRF , TSMI , TVMC)in RF, SMI and VMC to increase efficiency of the estimators of the structural parameters θ0,
while maintaining that these parameters do not change over the sample and are sufficiently
strongly identified. As mentioned before, the break T 0 can be used to double the moment
conditions for θ0 in (2.1):
E[Ztut1(t ≤ T 0)] = 0 and E[Ztut1(t > T 0)] = 0.
In section 3, we show that these moment conditions are not redundant in presence of a break,
a result that is closely related to the non-redundancy of split-sample moments discussed in
Proposition 1 in Magnusson and Mavroeidis (2014). More specifically, we analyze the B-
GMM estimators that exploit a RF, a SMI or a VMC break, and we show that the B-GMM
estimators are more efficient than the full sample GMM estimators, and strictly more efficient
under some conditions.
3 The B-GMM Estimator
3.1 B-GMM with a RF break
Consider the model with a RF break that combines (2.1) and (2.5):
yt = W ′tθ
0 + ut , with Wt = [Z ′1t X
′t]′ and θ0 = [θ0
′
z1 θ0′
x ]′,
X ′t =
Z′
tΠ1
Tα1+ v′t , t ≤ TRF
Z′
tΠ2
Tα2+ v′t , t > TRF
, TRF = [TλRF ] ,
9Our result holds whether the regressors are endogenous or not. A proof of this statement is available in
our companion paper Antoine and Boldea (2015).
6
where λRF is the break fraction, 0 ≤ αi ≤ 1/2, and Πi two (fixed) matrices of size (q, p2) for
i = 1, 2. The main goal is to deliver efficient estimators of the structural parameters θ0. We
assume that the break TRF has occurred and is common to all RF equations. Additionally,
we do not know its location, and in this section we impose the high level assumption that it
can be estimated at a fast enough rate, given in Assumption A5 below. We call this estimator
TRF , and in section 4 we discuss both existing break-point estimators such as the ones in Bai
and Perron (1998) and Qu and Perron (1998), and a new multivariate break-point estimator
that all achieve this rate.
We now introduce three estimators of θ0.
• The full sample GMM estimator minimizes the GMM criterion LGMM defined in (2.3)
and ignores the RF break:
θGMM =(W ′ZN−1
u Z ′W)−1 (
W ′ZN−1u Z ′y
).
• The B-2SLS estimator uses first-stage predicted regressors Wt = [Z ′t X
′t]′ that are ob-
tained by interacting the instruments with the RF break TRF = [T λRF ]. It is defined as
in Hall, Han, and Boldea (2012) (HHB henceforth):
θB−2SLS =(∑T
t=1 WtW′t
)−1∑Tt=1 Wtyt, X
′t =
Z ′tΠ1λRF
, t ≤ TRF
Z ′tΠ2λRF
, t > TRF,
where for i = 1, 2, ΠiλRFare the OLS estimators of Πi/T
αi in (2.5), for the subsamples
before and after the estimated break TRF .
• Our proposed B-GMM estimator also interacts the instruments with the RF break:
θB−GMM = argminθ
[g′T (θ)(N
au)
−1gT (θ)],
with gT (θ) =
[ ∑TRF
t=1 Zt(yt −W ′tθ)/TRF∑T
t=TRF+1 Zt(yt −W ′tθ)/(T − TRF )
]and Na
uP→ AVar
(T 1/2gT (θ
0)).
Thus,
θB−GMM =(W ′Z(Na
u)−1Z ′W
)−1 (W ′Z(Na
u)−1Z ′y
), (3.1)
with Z the (T, 2q) matrix defined as Z ′ =
(Z1 · · · ZTRF
0 · · · 0
0 · · · 0 ZTRF+1 · · · ZT
).
B-GMM is the optimal counterpart of B-2SLS, using the optimal weighting matrix Nau . But
note that B-2SLS is not a standard 2SLS estimator. In addition, the standard GMM is not a
version of B-GMM with a particular weighting matrix in place of Nau .
To derive the asymptotic properties of all estimators, we impose the following regularity
assumptions in which ht is defined as: ht ≡ (ut, v′t)
′⊗Zt with ith element ht,i, and∑
1r ≡∑[Tr]
t=1 .
7
Assumption A 1. (Regularity of the break fraction, error terms and reduced form)
(i) 0 < λRF < 1, and the break-point TRF satisfies max(TRF , T − TRF ) ≥ max(q, [ǫT ]).
(ii) - The eigenvalues of S = AVar
(T−1/2
∑Tt=1 ht
)are O(1).
- E(ht,i) = 0 and for some d > 2, ‖ht,i‖d <∞ for t = 1, · · · , T and i = 1, · · · , (p2 + 1)q.
- ht,i is near epoch dependent with respect to some process ξt, ‖ht − E(ht|Gt+mt−m)‖2 ≤ ℓm
with ℓm = O(m−1/2) where Gt+mt−m is a σ-algebra based on (ξt−m, · · · , ξt+m).- ξt is either φ-mixing of size m−d/[2(d−1)] or α-mixing of size m−d/(d−2).
A1 is common for the break-point literature, and is similar to HHB. Part (i) states that
the break-fraction is fixed and that there are enough observations to estimate the parameters
before and after the break-point. Part (ii) allows for general weak dependence in the data.
Assumption A 2. (Regularity of the identification strength)
(i) Let r1T = T α1 , r2T = T α2, α = min(α1, α2), and rT = T α. We assume that α < 1/2, and
that when α = α1 = α2, then it holds that Π1 6= Π2.
(ii) For case (1), where α1 = α2 and Π1 6= Π2, we assume that Πi is of full rank p2 for at least
one i ∈ 1, 2. For case (2), where αi < αj for i, j ∈ 1, 2, i 6= j, we assume that Πi is of full
rank p2.
Note that since the slowest sequence riT is associated with the subsample with the strongest
identification, the sequence rT corresponds to the strongest subsample. A2(i) allows for at
most one subsample to be weakly identified. Thus, when there is no change in identification
strength, the identification cannot be weak throughout. A2(ii) ensures that the moment
conditions are not redundant. Note that this assumption allows for Πj = 0 (meaning that
θ0 can also be unidentified by subsample j) as long as Πi is of full rank for subsample i
(i, j ∈ 1, 2, i 6= j).
Assumption A 3. (Regularity of the instrumental variables)
Let M1r = T−1∑
1r ZtZ′t. Then M1r
P→ M1r, uniformly in r ∈ [0, 1], where we assume that
M1r is positive definite for all r ∈ (0, 1], and that (M1r1 −M1r2) is positive definite for any
r1, r2 ∈ [0, 1] such that r1 > r2.
Assumption A 4. (Regularity of the variances)
AVar
[T−1/2
∑
1r
ht
]= N1r =
(Nu,1r N ′
uv,1r
Nuv,1r Nv,1r
),
uniformly in r ∈ [0, 1], with Nu,1r, Nv,1r of size (q, q), respectively (p2q, p2q). We assume
that N1r is positive definite for all r ∈ (0, 1], and that N1r1 −N1r2 is positive definite for any
r1, r2 ∈ [0, 1] such that r1 > r2.
Assumption A 5. (Convergence rate of the RF break-point estimator)
‖λRF − λRF‖ = OP (T2α−1).
8
A3 and A4 are typical for the break-point literature and are used for proving consistency of
the break-point estimators in section 4. In this section, they are just needed to define limiting
quantities that appear in the asymptotic distributions of GMM and B-GMM. Note that A3
allows for a SMI break that may, or may not, coincide with the RF break, and that A4 allows
for heteroskedasticity in the sample moments of the structural equation and the RF. It also
allows for a VMC break that may, or may not, coincide with the RF break. A5 is the high
level assumption on the break-fraction estimator that we verify in section 4.
The following theorem states the asymptotic properties of the estimators of the structural
parameters. Its proof also shows that A1-A4 are sufficient for consistent estimation of θ0 via
GMM, B-2SLS or B-GMM.
Theorem 1. (Asymptotic normality of θGMM , θB−2SLS and θB−GMM)
Let ΛT = diag(T 1/2Ip1 , T1/2−αIp2) . Under A1 to A4 for GMM, and under A1 to A5 for B-2SLS
and B-GMM, ΛT (θGMM − θ0), ΛT (θB−2SLS − θ0), and ΛT (θB−GMM − θ0) are asymptotically
normally distributed with mean 0 and asymptotic variances, respectively:
VGMM =[(Πa′
1 M1 +Πa′
2 M2)(Nu)−1(M1Π
a1 +M2Π
a2)]−1
VB−2SLS =[Πa′
1 M1 Πa1 +Πa′
2 M2 Πa2
]−1 [Πa′
1 Nu,1Πa1 +Πa′
2 Nu,2Πa2
] [Πa′
1 M1 Πa1 +Πa′
2 M2 Πa2
]−1
VB−GMM =[Πa′
1 M1N−1u,1M1Π
a1 +Πa′
2 M2N−1u,2M2Π
a2
]−1,
with Nu = Nu,1 + Nu,2, Nu,1 = Nu,1λRF, Nu,2 = Nu,11 − Nu,1λRF
, M1 = M1λRF, M2 = M11 −
M1λRF, and, for i, j = 1 or 2 and i 6= j.
Πai =
(Πz1 Πi) αi = αj or αi < αj
(Πz1 O(q,p2)) αj < αiand Πz1 =
[Ip1
O(q−p1,p1)
].
Comments:
(i) The asymptotic normality of θB−2SLS encompasses as a special case the results in HHB,
where α1 = α2 = 0.
(ii) The rates of convergence of estimated parameters of the exogenous variable θ0z1 (standard
rate T 1/2) and the estimated parameters of the endogenous variables θ0x (slower rate T 1/2−α)
are extensions of the results developed by Antoine and Renault (2009) over stable reduced
forms (see e.g. their Theorem 4.1). The rate T 1/2−α comes from the strongest subsample and
holds even when the weakest subsample is genuinely weak, or even unidentified as would be
the case if Πj = 0 for some j. As discussed in Antoine and Renault (2009), reliable inference
on θ0 can be obtained using standard GMM-type formulas without having to know or estimate
the matrix ΛT .
(iii) To our knowledge, the consistency of both θGMM and θB−GMM - even when the weakest
subsample is genuinely weak (that is αj = 1/2) - is new. Hence, ignoring the break-point, as
in θGMM , does not lead to a loss of consistency, because the population moment condition in
A1(ii) still holds. However, using the RF break is crucial for efficiency as shown below.
9
In Theorem 2, we show that B-GMM is at least as efficient as GMM, and provide a necessary
and sufficient condition for its strict efficiency.
Theorem 2. (Efficiency of estimated structural parameters)
(i) Under A1 to A5, VB−GMM ≤ VGMM and VB−GMM ≤ VB−2SLS.10
(ii) Under A1 to A5, VB−GMM = VGMM if and only if N−1u,1M1Π
a1 = N−1
u,2M2Πa2.
(iii) Under A1 to A5, VB−GMM < VGMM if and only if
rank(N−1u,1M1Π
a1 −N−1
u,2M2Πa2) = p. (3.2)
Comments:
(i) Theorem 2 formalizes the result that when there are changes in the limiting Hessian due
to a RF break, using this break leads to more efficient estimators than the standard GMM.
Intuitively, more information can be extracted due to these changes. Also, because B-GMM
uses the optimal weighting matrix, it is more efficient than B-2SLS.
(ii) With a change in identification strength and no exogenous regressors (p = p2), θB−GMM
is strictly more efficient than θGMM . To see this, note that if riT = o(rjT ), then Πaj = O(q,p), so
rank(N−1u,1M1Π
a1 −N−1
u,2M2Πa2) = rank[(−1)i−1N−1
u,iMiΠi] = p. Thus, VB−GMM < VGMM . This
strict efficiency result holds even if the subsample j is weakly identified. Intuitively, when
computing B-GMM, the subsample moments are stacked and ”multiplied by their strength”.
As a result, the influence (variance) of the weak moments disappears at the limit. By contrast,
GMM adds the two moments to obtain the full sample moments. Therefore, the variances of
both the weak and the strong moments show up even asymptotically. Thus, GMM is strictly
less efficient than B-GMM.
(iii) If Πj = 0 for one subsample j 6= i, Theorem 2 still holds. To understand the in-
tuition for this result, suppose there are no exogenous regressors. Then, from Theorem 1,
VB−GMM = (Π′iMiN
−1u,iMiΠi)
−1 and VGMM = (Π′iMiN
−1u MiΠi)
−1. Since Nu,i < Nu, it follows
that VB−GMM < VGMM .
(iv) If in addition to Assumptions A1-A5, we also impose (full sample) conditional ho-
moskedasticity (Var(ut|Zt) = Φu and Q = E(ZtZ′t)), then VB−GMM = VB−2SLS < VGMM , as
shown in the proof of Theorem 2.
(v) Besides A3 and A4, Theorem 2 assumes nothing about the SMI and the VMC. Thus,
the strict-efficiency result in Theorem 2(iii) holds when there is a break in SMI and/or VMC.
Below, we compare B-GMM with GMM in the absence of a SMI and a VMC break. The
following assumption facilitates this comparison.
Assumption A 6. (Homogeneity of the second moments)
(i) (no SMI break) M1r = rQ, with Q =M11 = E(ZtZ′t);
(ii) (no VMC break) N1r = rN , and Nu = Nu,11 = limT→∞ Var(∑T
t=1(Ztut)/√T ).
10For two square matrices VA, VB, we write VA ≤ VB if VA − VB is negative semidefinite, and VA < VB if
VA − VB is negative definite.
10
A6 imposes no time variation in SMI and VMC. Thus, A6(i) does not allow for SMI breaks
and A6(ii) does not allow for VMC breaks.
Comments:
We define B ≡ N−1u,1M1Π
a1 −N−1
u,2M2Πa2. Under A6, B = N−1
u Q(Πa1 − Πa
2).
(i) Under A6 and in the absence of exogenous regressors, B = N−1u Q(Π1 −Π2). Therefore,
with pure structural change, e.g. rank(Π1−Π2) = p, we have that rank(B) = p. In this case,
θB−GMM is always strictly more efficient than θGMM .
(ii) Under A6 and in the presence of exogenous regressors, B = [O(q,p1), N−1u Q(Π1 − Π2)],
and rank(B) < p. Thus, in general, (VB−GMM − VGMM) is positive semi-definite: not all
linear combinations of the B-GMM estimators are strictly more efficient than the same linear
combinations of their GMM counterparts. However, with a change in identification strength,
the B-GMM estimates on the endogenous regressors (θ0x) are strictly more efficient than their
GMM counterparts, i.e. the (p2, p2) lower right block of VGMM − VB−GMM is positive definite
(see Appendix B, proof of Theorem 2).
To summarize, when there is a RF break, B-GMM is more efficient and, in many cases,
strictly more efficient than GMM.
3.2 B-GMM with a SMI break or a VMC break
In this section, we show that B-GMM is still more efficient than GMM if a SMI or a VMC
break is used to construct B-GMM instead of a RF break. A SMI break is defined in (2.6),
and a VMC break is defined in (2.7)-(2.8). We assume below that the magnitude of these
breaks is fixed.11
Assumption A 7. (SMI or VMC break)
(i) SMI break: in (2.6), Q1 −Q2 6= O(q,q).
(ii) VMC break: in (2.7) and (2.8), Nu,1 −Nu,2 6= O(q,q).
We assume first that the RF has no break and then discuss the extension to a RF break
at the end of Corollary 2. We also assume below that the SMI or the VMC break-fraction
λ0 ∈ λSMI , λVMC can be estimated consistently by some break-fraction estimator λ ∈λSMI , λVMC, and we discuss such estimators in section 4.
Assumption A 8. (SMI and VMC break-point estimators)
λ− λ0 = OP (T−1).
11If these breaks were to shrink with the sample size instead, then they would not show up as changes
in the limiting Hessian of the GMM objective function, and would therefore not be of use in constructing
asymptotically more efficient estimators than the full-sample GMM.
11
Note that the above rate of consistency is faster than that of the estimated RF break
fraction; see also A5. The following corollary gives the asymptotic distribution of the GMM
estimators and of the B-2SLS and B-GMM estimators, constructed with λ replacing λRF .
Corollary 1.
Let ΛT = diag(T 1/2Ip1, T1/2−αIp2) as in Theorem 1. Let A1 hold with λRF replaced by λ0, and
A3, A4 hold with Πi = Π of full rank and with α = αi < 1/2 for i = 1, 2. Also let A7(i) or
A7(ii) hold, and let A8 hold for B-2SLS and B-GMM. Then ΛT (θB−2SLS−θ0), ΛT (θGMM−θ0),and ΛT (θB−GMM − θ0) are asymptotically normally distributed with mean 0 and asymptotic
variances:
VGMM =[Πa′(M1 +M2)(Nu)
−1(M1 +M2)Πa]−1
VB−2SLS =[Πa′ (M1 +M2) Π
a]−1 [
Πa′ NuΠa] [Πa′ (M1 +M2) Π
a]−1
VB−GMM =[Πa′(M1N
−1u,1M1 +M2N
−1u,2M2)Π
a]−1
,
with Nu = Nu,1 + Nu,2, Nu,i = Nu,iλ0, Mi = Miλ0 , for i = 1, 2, Πa = (Πz1,Π), and with Mir
and Nu,ir defined in A3 and A4 respectively.
Comments:
(i) As stated in the introduction, B-GMM optimally reweighs the two subsample moment
conditions by their respective variance in that subsample. This is closely related to Chamber-
lain’s (1987) idea of constructing optimal instruments by reweighing the original instruments
with their respective conditional variances.
(ii) If there is a RF break that coincides with T 0 = [Tλ0], then Theorem 1 holds as stated.
We then recommend using λ instead of λRF in constructing B-GMM because of its faster
convergence rate; see A5 and A8. If the RF break is different from the SMI or VMC break,
then Corollary 1 holds over the subsamples with no RF break.
Below, we give conditions under which B-GMM is (strictly) more efficient than GMM. For
the B-2SLS and B-GMM estimators computed with λRF replaced by λ, we have:
Corollary 2.
Let A1 hold with λRF replaced by λ0, and A3, A4 with Π1 = Π2 = Π of full rank and
α1 = α2 = α < 1/2. Also let A7(i) or A7(ii) hold, and let A8 hold. Then:
(i) VB−GMM ≤ VGMM and VB−GMM ≤ VB−2SLS.
(ii) VB−GMM = VGMM if and only if N−1u,1M1Π
a = N−1u,2M2Π
a.
(iii) VB−GMM < VGMM if and only if rank(N−1u,1M1 −N−1
u,2M2)Πa = p.
Comments:
(i) If we use the SMI break-point estimator to construct B-GMM, and A6(ii) holds, there is
no VMC break. In that case, Nu,1 = λ0Nu, Nu,2 = (1−λ0)Nu,M1 = λ0Q1 andM2 = (1−λ0)Q2.
Hence, if B ≡ (N−1u,1M1−N−1
u,2M2)Πa, then B = N−1
u (Q1−Q2)Πa. Since (Q1−Q2) is full rank,
12
rank(B) = rank(Πa) = p, so B-GMM is strictly more efficient than GMM, with or without
exogenous regressors. A similar result holds under A6(i) (no SMI break) when we use the VMC
break-point estimator to construct B-GMM, in which case Nu,1 = λ0Su,1, Nu,2 = (1− λ0)Su,2,
and we need (Su,1 − Su,2) to be full rank.
(ii) Note that if there is a RF break equal to T 0, then Theorem 2 holds as stated. If there
is a RF break not equal to T 0, then Corollary 2 holds over the subsamples with no RF break.
The results in sections 3.1 and 3.2 suggest that we can use the union of RF, SMI and VMC
breaks to obtain more efficient estimates. In practice, there may also be multiple breaks in
RF, SMI or VMC. Theorems 1-2 and their corollaries straightforwardly extend to multiple
breaks.12
4 Break-point estimators
In this section, we discuss three break-point estimators: the univariate estimator in Bai and
Perron (1998) (BP henceforth), the multivariate estimator in Qu and Perron (2007) (QP
henceforth) and a new multivariate estimator which we propose in this section. To understand
the differences between these estimators, it is useful to start with a break in the RF.
We first introduce some notations that simplify the exposition. Let T1λ = [Tλ], and T2λ =
T − [Tλ]. For any parameter estimator or sum, the subscript 1λ refers to estimation or
summation in the segment 1, . . . , T1λ (e.g. Π1λ or∑
1λ), and the subscript 2λ refers to
estimation or summation in the segment T1λ + 1, . . . , T (e.g. Π2λ or∑
2λ).
4.1 RF break-point estimators
Consider one RF break as given in (2.5).13 This break is assumed common and can be
estimated from any RF equation using the univariate break-point estimator in BP. Below we
describe this estimator for the first RF equation.
Let Xt,1 be the first element of the (p2, 1) vector Xt and Πiλ,1 be the first column of the
(q, p2) matrix of OLS estimators of the RF parameters, Πiλ for i = 1, 2. The BP estimator
12For the interested reader, a discussion and theoretical results on the detection and estimation of
multiple breaks can be found in the November 2015 version of this paper, which can be found at
http://www.sfu.ca/∼baa7/research/AntoineBoldeaWP2015. It is important to note that if two adjacent sub-
samples are weakly identified, that particular break cannot be consistently estimated. Our analysis in this
paper only covers break-point estimators that are consistently estimable if they occur, and therefore if there
are multiple breaks, we do not allow for two adjacent samples that are both weakly identified.13Our results are stated for one break but bear a straightforward generalization to multiple breaks.
13
TBPRF = [T λBPRF ] is:
λBPRF = argminλ∈I
[LBP (λ, Πiλ,1)
], where
LBP (λ, Πiλ,1) =∑2
i=1 T−1∑
iλ(Xt,1 − Z ′tΠiλ,1)
2 ,
and I = [ι, 1 − ι] where ι is a small but positive cut-off, usually chosen to be equal to 0.15
in practice. We show below that this estimator satisfies the high-level assumption A5 and
therefore can be used to construct B-GMM. However, this estimator discards the information
about the other RFs, and is therefore not efficient. A more efficient RF break-point estimator
is a multivariate estimator that estimates the common break in all RF equations while taking
into account the correlation between these equations.
One option is to use the QP estimator. To define this estimator, rewrite (2.5) as:
Xt = vt + (Ip2 ⊗ Z ′t)︸ ︷︷ ︸
Z′
t
β01 t ≤ TRF
β02 t > TRF
and β0i = (Π′
i,1/Tαi . . .Π′
i,p2/T αi)′, i = 1, 2,
where ⊗ denotes the Kronecker product. Let Var(vt) be equal to Σv,1 before the break TRF ,
and equal to Σv,2 after the break. Then, the QP break-point estimator λQPRF minimizes the
following nonlinear quasi-likelihood function:
minλ,βi,Σv,iLQP (λ, βi,Σv,i), where
LQP (λ, βi,Σv,i) =∑2
i=1
∑iλ
(Xt − Z ′
tβi)′ Σ−1
v,i (Xt − Z ′tβi)− log det(Σv,i)
.
We also show below that λQPRF satisfies the high-level assumption A5.
However, note that this break-point estimator implicitly jointly estimates the breaks in the
RF and in the short-run variance of the error terms. As such, this estimator is not particularly
suitable for our framework, because once these breaks are estimated, it is unclear which breaks
are which. We need to know the number of breaks in the RF parameters, the number of breaks
in the RF variance, impose this restriction in the QP objective function above, and only then
we would be able to know which are the RF parameter break-point estimators. However, for
constructing B-GMM, only the RF parameter breaks are of interest because they lead to a
more efficient estimator, while RF variance breaks do not. The fact that the RF variance
breaks are not useful for improving efficiency can be seen from the asymptotic variance of the
B-GMM in Theorem 1, which does not feature the RF variance.
Therefore, we propose below a feasible generalized least-squares (FGLS) multivariate break-
point estimator that exclusively focuses on estimating RF parameter breaks efficiently. The
FGLS estimator for one break14 is constructed as:
Step 1 Use as initial break-point estimator T = [T λ] ∈ TBPRF , TQPRF .
14The generalization to multiple breaks is straightforward and omitted for brevity.
14
Step 2 Calculate the RF residuals: vt = Xt − Z ′tβ1λ for t ≤ T , and vt = Xt − Z ′
tβ2λ otherwise.
Step 3 Consistently estimate the short-run RF variance matrix using these residuals. We call
this Σv,t, since we compute it either over the whole sample: Σv,t = Σv = T−1∑T
t=1 vtv′t,
or over the two subsamples obtained in Step 1: Σv,t = Σv,1 for t ≤ T and Σv,t = Σv,2
otherwise, with Σv,i =1Tiλ
∑iλ vtv
′t for i = 1, 2.
Step 4 Obtain the multivariate break-point estimator TRF = [T λRF ] by minimizing the following
feasible generalized least-squares (FGLS) objective function over (λ, β1, β2):
minλ,βi LFGLS(λ, βi), whereLFGLS(λ, βi) =
∑2i=1 T
−1∑
iλ(Xt − Z ′tβi)
′Σ−1v,t (Xt − Z ′
tβi).
The proposed FGLS estimator λRF uses the fact that the break is common across all
RF equations and is therefore more efficient than its univariate counterpart. Unlike the QP
estimator, it is based on a linear objective function, and as we will see below, it allows for
breaks in the variance of the error terms without having to estimate their number or their
locations. This is reflected in the assumption below.
Assumption A 9. (RF variance)
T−1∑
1r E(vtv′t) → rΣv, uniformly in r, with Σv a p.d. matrix of constants.
This assumption focuses on the short-run RF variance rather than the long-run RF variance
and is similar to QP but slightly more general. It allows for shrinking breaks in the RF
variance, conditional and unconditional heteroskedasticity, as long as the limiting variance Σv
is constant. This assumption is not needed if the breaks in the RF parameters and the RF
variance are common, in which case we can allow for fixed breaks in the short-run variance.
However, imposing this assumption greatly simplifies the proofs.
The theorem below states the assumptions needed for all break-point estimators above to
satisfy the high-level assumption A5.
Theorem 3. (RF break-point estimators that satisfy A5)
Let α = min(α1, α2).
(i) Under A1-A4, ‖λBPRF − λRF‖ = OP (T2α−1).
(ii) Under A2-A4, A9 and under the QP assumptions A1-A9, with their Xt replaced by our
Zt and their ut replaced by our vt, ‖λQPRF − λRF‖ = OP (T2α−1).
(iii) Under A1-A4 and A9, ‖λRF − λRF‖ = OP (T2α−1).
We now discuss the pros and cons of each of these estimators. The BP estimator only
uses information from one RF equation, while the QP and FGLS estimators use information
from all RF equations; therefore, the latter are more efficient. However, the BP estimator
imposes the least assumptions about the RF variance among all three estimators. In contrast,
15
the QP and the FGLS estimator estimate the RF variance, therefore they require stronger
assumptions on this variance, reflected here in A9.
For our framework and RF breaks, FGLS is in general preferred to QP for two rea-
sons: first, because it exclusively focuses on estimating RF parameter breaks and second,
because it requires less assumptions. For example, for the QP estimator, QP A1 impose that1T 0
∑T 0
t=1 ZtZ′ta.s.→ Q0
1, while for the FGLS estimator we only require that 1T 0
∑T 0
t=1 ZtZ′t
p→ Q01,
for some positive definite matrix Q01, as implicit from A3. QP A4 imposes strong mixing
conditions on ztvt, while the FGLS estimator only requires near-epoch dependence as stated
in A1. More importantly, QP A6 imposes that the RF parameter breaks are shrinking, while
for the FGLS estimator we can allow for fixed RF parameter breaks.
4.2 SMI break-point estimators
Here, we are estimating a break in the unconditional mean of the matrix ZtZ′t. Let jt =
vech (ZtZ′t), the q(q+1)/2 vector which stacks all unique elements of ZtZ
′t in order. Estimating
a SMI break implicitly means estimation of TSMI = [TλSMI ] from the following equation:
jt =
µ01 + et t ≤ TSMI
µ02 + et t > TSMI
, (4.1)
where E(ZtZ′t) = Q11[t ≤ TSMI ] +Q21[t > TSMI ], µ
0i = vech (Qi), i = 1, 2 and et = jt−E(jt).
Therefore, the BP and FGLS break-point estimators can be constructed as for the RF break,
and the assumptions we impose simply replace Xt with jt, Zt with q(q + 1)/2 intercepts, and
Πi/riT with µ0i for i = 1, 2. A3 is not needed since it is automatically satisfied for intercepts.
The remaining assumptions replace A1, A2, A4 and A9.
Assumption A 10. (SMI break)
(i) A1 holds with ht replaced by et = jt − E(jt), and λRF replaced by λSMI.
(ii) A4 and A9 hold with vt replaced by et, and Σv replaced by Σe.
(iii) µ02 − µ0
1 = δ0µ 6= 0, where µ01 and µ0
2 are defined in (4.1).
A10(iii) imposes that the size of the SMI break δ0µ is fixed; this was also stated in A7(i) and
is repeated here for clarity. If δ0µ was shrinking, the asymptotic variance of the B-GMM would
be unchanged, as evident from Corollary 2. Because this break is fixed, it is unclear how to
show consistency of the QP estimator, as the proofs in both QP and in Bai, Lumsdaine and
Stock (1998) heavily rely on shrinking breaks. Because the QP estimator is not our preferred
estimator, we focus exclusively on the BP and the FGLS estimator. The theorem below states
that they are both consistent at a rate equal to the sample size.
Theorem 4. (SMI break-point estimators that satisfy A5)
Under Assumption A10,
16
(i) ‖λBPSMI − λSMI‖ = OP (T−1).
(ii) If λSMI is obtained with the pre-estimator λBPSMI , ‖λSMI − λSMI‖ = OP (T−1).
4.3 VMC break-point estimators
Break-point estimators of the long-run variance of the moment conditions Nu are not available
to our knowledge even in the univariate case. Because HAC estimators of the long-run variance
are not reliable over small subsamples, we focus exclusively on estimating breaks in the short-
run variance of the moment conditions.15 If ut was observed, we could estimate the short-run
variance breaks TVMC = [TλVMC ] from the following (implicit) equation:
jtu2t =
µ01 + et, t ≤ T 0
µ02 + et t > T 0,
(4.2)
where et = jtu2t − E(jtu
2t ) = 0.16 Here, the dependent variable jtu
2t is not observed, but we
can replace it with jtu2t , where ut = yt −W ′
t θGMM or ut = yt −W ′t θB−GMM with θB−GMM
obtained using breaks in SMI and/or in the RF parameters. Because the dependent variable
is generated, we need slightly stronger assumptions on the data generating process to ensure
that the estimation error does not interfere with consistent estimation of the break-point.
Assumption A 11. (VMC break)
(i) A1 holds for ht,i as well as for et = jtu2t − E(jtu
2t ), with E(et) = 0, ‖ht,i‖4 < ∞, ‖zt‖8 <
∞,‖et,i‖4 <∞, and λRF replaced by λVMC.
(ii) A4 and A9 hold with vt replaced by et and Σv replaced by Σe.
(iii) µ02 − µ0
1 = δ0µ 6= 0, where µ01 and µ0
2 are defined in (4.2).
Part (i) states that et is near-epoch dependent like ht. We impose slightly stronger moment
conditions that ensure that the difference between the estimated and the true squared sample
moments [T−1∑T
t=1(jtu2t −jtu2t )] vanishes fast enough to guarantee consistency of the variance
estimator Σe,t used in the FGLS break-point estimation (Σe,t is defined as Σv,t but with et
replacing vt). Part (ii) is as for the SMI break. Part (iii) says that the VMC break is fixed,
and for the same reason as for the SMI break, we do not further pursue the QP estimator.
Below, we state the consistency of the BP and our proposed FGLS estimator.
Theorem 5. (VMC break-point estimators that satisfy A5)
Under Assumption A1-A4 and A11,
(i)‖λBPVMC − λVMC‖ = OP (T−1).
(ii) If λVMC is obtained with the pre-estimator λBPVMC, then ‖λVMC − λVMC‖ = OP (T−1).
15Therefore, we do not consider breaks in the autocorrelation structure of the moment conditions. However,
note that autocorrelation breaks could be estimated by estimating breaks in the product of the sample moment
conditions and their lags.16We use for simplicity the same notation et, µ
0
i as for the SMI break-point estimator, and jt is still equal
to vech(ZtZ′
t).
17
5 B-GMM with no break
In this section, we show under which conditions the B-GMM estimators are equivalent to the
GMM estimators in the absence of a break, so that asymptotically, there is no efficiency loss in
using B-GMM. We first state the limit distribution of all the BP and the FGLS break fraction
estimators presented in the previous section.17
Note that under A1, A4 and A6, by the functional central limit theorem in Wooldridge and
White (1988), Theorem 2.11, T−1/2∑
1r ht ⇒ N1/2Bh(r), a ((p2+1)q, 1) vector of independent
standard Brownian motions. Similarly, A10(i)-(ii) or A11(i)-(ii) guarantee that T−1/2∑
1r et ⇒Σ
1/2e Be(r), a (q(q+1)/2, 1) vector of independent standard Brownian motions. These processes
are used to define the limit distributions of the break-fraction estimators. To state these limit
distributions, we require the additional assumption given below.
Assumption A 12. (Regularity assumption for the VMC break-point estimators)
ht is either φ-mixing of size m−a/[2(a−1)] or α-mixing of size m−a/(a−2) for some a > 2. Also,
supt ||ht,i||2a < ∞ and supt ||zt,i||4a < ∞ for all i = 1, . . . , pq and j = 1, . . . , q. Moreover,
T−1∑
1r E(utjtwt,i) → ℓ(1)i and T−1
∑1r E(jtwt,iwt,j) → ℓ
(2)i,j uniformly in r ∈ (0, 1] and for all
i, j = 1, . . . , pq.
A12 guarantees that the estimation error in the sample moments T−1/2∑
1r jt(u2t −u2t ) does
not show up in the limit distribution of the VMC break-fraction estimators. This assumption
allows for shrinking RF break magnitudes but excludes fixed RF break magnitudes; if the RF
break magnitude was fixed, the estimation error could show up in the limit distribution.18
Note that the dependence structure and the moment conditions imposed in A12 are only
slightly stronger than those in A11(i).
To simplify the statement of the asymptotic distributions, for Bs(λ), s ∈ h, e, let Bs,l1:l2
refer to elements l1 to l2 of Bs(λ), stacked in order, Bs,1(λ) refer to the first element of Bs(λ),
and N∗ = N∗v,11 refer to the first (q, q) left upper block of the matrix Nv,11. Also, for any
conformable matrix A, define:
D(Bs(λ), A) = arg supλ∈I
[Bs(λ)− λBs(1)]
′ A [Bs(λ)− λBs(1)]
λ(1− λ)
.
Theorem 6. (Break-point estimators in the absence of a break)
Let A1, A3, A4 and A6 hold.
(i) Suppose there is no RF break, so that Π1 = Π2 = Π is full rank, and α1 = α2 = α < 1/2.
Under A9,
λBPRF ⇒ D(Bh,q+1:2q(λ), N∗1/2v Q−1N∗1/2
v ) and λRF ⇒ D(Bh,q+1:(p2+1)q(λ), N1/2v (Σ−1
v ⊗Q−1)N1/2v )).
17Here, we do not consider the QP estimator because it is not our preferred estimator for the reasons
discussed in section 4.18If the estimation error did show up in the limit distribution, then it is unclear whether the B-GMM and
the GMM estimators would be asymptotically equivalent in the absence of a break.
18
(ii) Suppose there is no SMI break, so that in (4.1), µ01 = µ0
2. Under A10(i)-(ii),
λBPSMI ⇒ D(Be,1(λ), 1) and λSMI ⇒ D(Be(λ), Iq).
(iii) Suppose there is no VMC break, so that in (4.2), µ01 = µ0
2. Under A11(i)-(ii) and A12,
λBPVMC ⇒ D(Be,1(λ), 1) and λVMC ⇒ D(Be(λ), Iq).
Theorem 6 shows that in all cases, the limit distribution exists. As evident from the
theorem, its form depends on the particular estimator. Despite this, we show below that
the B-GMM estimators are asymptotically equivalent to their full-sample GMM counterparts,
no matter what type of spurious break-point we estimate. To state this result, we need an
additional homogeneity assumption.
Assumption A 13. (Homogeneity assumption for covariances)
T−1∑
1r ztute′t ⇒ rNue, uniformly in r ∈ (0, 1].
This assumption is the counterpart of A6 for SMI and VMC break-point estimators, and
is imposed to guarantee that SMI and VMC break-point estimators are asymptotically in-
dependent of the B-GMM estimators. With this assumption, we can state the asymptotic
equivalence of the GMM and B-GMM estimators.
Theorem 7. (GMM and B-GMM Estimators in the absence of a break)
Suppose there is no RF break, no SMI break, and no VMC break so that Π1 = Π2 = Π is of full
rank, α1 = α2 = α < 1/2, and µ01 = µ0
2 in (4.1) and in (4.2). Let V ∗ =(Πa′QN−1
u QΠa)−1
,
Πa = (Πz1,Π) and ΛT be as in Theorem 1. Moreover, let A1, A3, A4 and A6 hold. Then:
(i) ΛT (θGMM − θ0)D→ N (0, V ∗).
(ii) If B-GMM is constructed with λBPRF or λRF , then under A9,
ΛT (θB−GMM − θ0)D→ N (0, V ∗).
(iii) If B-GMM is constructed with λBPSMI or λSMI and A10(i)-(ii) and A13 hold, or if B-GMM
is constructed with λBPVMC or λVMC and A10(i)-(ii), A12 and A13 hold, then:
ΛT (θB−GMM − θ0)D→ N (0, V ∗).
Theorem 7 shows that B-GMM is asymptotically equivalent to GMM in the absence of
breaks. Therefore, nothing is gained or lost asymptotically by imposing breaks that do not
occur.19
19In the August 2016 version of the paper (http://www.sfu.ca/econ-research/RePEc/sfu/sfudps/dp15-04.pdf),
we showed that B-GMM remains consistent when the moments of the data vary over time in a way that
violates A9. However, in that case, it is not clear that B-GMM will be asymptotically equivalent to GMM,
and neither GMM nor B-GMM are the most efficient estimators since none of them is optimally exploiting
the variation in the moments of the data (see Theorems 3-4 on pages 12-13 and their proofs in the previous
version).
19
6 Monte-Carlo simulations
We consider the framework of section 3.1 with one endogenous regressor X , q valid instruments
(including the intercept), and one break in the reduced form at TRF :
yt = α+Xtβ + σtǫt , Xt =
1 + Z ′
tΠ1 + vt t ≤ TRF
1 + Z ′tΠ2 + vt t > TRF
, E[ǫtZt] = 0, E[vtZt] = 0. (6.1)
The errors (ǫt, vt) are i.i.d. jointly normally distributed with mean 0, variances σ2ǫ = σ2
v = 1
and correlation ρ; the instruments Zt are i.i.d jointly normally distributed with mean zero and
variance-covariance matrix equal to the identity matrix, and independent of (ǫt, vt). Let ιk be
the (k, 1) vector of ones, and let R2i = Var(Z ′
tΠi)/Var(Z′tΠi + vt) be the theoretical R-square
over subsamples i = 1 with t ≤ TRF and i = 2 with t > TRF . Then the model parameters are
chosen such that:
(α β) = (0 0) , Πi = diιq−1 , (i = 1, 2) with d1 =√
R2
1
(q−1)(1−R2
1), d2 = d1 + b .
We consider three versions of the model: homoskedastic (HOM) with σ2t = 1; heteroskedastic
(HET1) with σ2t = (1 Z ′
t)
(1
Zt
)/q ; and heteroskedastic (HET2) of the GARCH(1,1) type,
with σ2t = 0.1 + 0.6u2t−1 + 0.3σ2
t−1 and ut = σtǫt.
To assess the identification strength over each subsample, we report the concentration
parameter, which is defined over each subsample i of size Ti as:
µ2i = TiR
2i /(1− R2
i ).
We are interested in the slope parameter β and we compare the performance of three esti-
mators of β: B-GMM, B-2SLS and GMM. In experiment 1, their performances are evaluated
by computing the Monte-Carlo bias, standard deviations, root-mean squared errors (RMSE),
as well as the length and coverage of corresponding 95% confidence intervals20, for various
configurations of the model. In experiment 2, we investigate these performance measures as a
function of the location of the break. Finally, in experiment 3, we investigate the finite sample
properties of the above estimators when there is no break.
• Experiment 1:
Our benchmark model considers sample size T = 400, endogeneity parameter ρ = 0.5,
one RF break at TRF = 160 and a break size b = 1. We use q = 4 instruments (including
the intercept). The identification strength is strong over each subsample with associated
concentration parameters µ21 = 40 and µ2
2 = 1.2 × 103: the implied reduced form parameters
20The standard errors of each estimator are computed using the formulas in Theorem 1. We use HAC-type
estimators under conditional heteroskedasticity.
20
are d1 = 0.29 and d2 = 1.29, and the theoretical R-squares over each subsample are R21 = 0.2
and R22 = 0.83.
We explore different configurations of the model. First, we decrease µ21 to display weaker
identification in the first subsample, while the second subsample remains strong: µ21 = 8.4
(and R21 = 0.05), and µ2
1 = 1.6 (and R21 = 0.01). The break size is still b = 1, but the implied
reduced form parameters are now d1 = 0.13 and d1 = 0.06, respectively. Then, we consider
larger sample size, T = 800, more instruments, q = 6 or a larger endogeneity parameter,
ρ = 0.75. In all these experiments, the break is assumed to be known, and the results are
displayed in Tables 5 to 7 (for the HOM and HET cases). The results for cases where the break
location is unknown and estimated are displayed in Tables 8 to 10 (for the HOM and HET
cases): three break sizes, 1, 0.5, and 0.2, are considered; µ21 = 40 and d1 = 0.29 throughout,
while d2 = 1.29, 0.79, and 0.49. All the results are based on 5, 000 replications.
When the break is known, the main results do not vary much over the different specifica-
tions. We therefore focus on the benchmark model mentioned at the beginning of Experiment
1. Under homoskedasticity, as expected, B-GMM and B-2SLS are very close in terms of
bias, standard deviation, and RMSE. Their RMSE are significantly smaller than for GMM.
The biases of B-GMM and B-2SLS tend to be larger than that of GMM, but they are well-
compensated by the gains in terms of standard deviation; in addition, when the sample size
increases, these biases decrease as expected. When looking at the 95% confidence intervals
of the slope parameter, B-GMM displays the shortest ones while maintaining good cover-
age properties. Under conditional heteroskedasticity, the standard deviation and RMSE of
B-GMM are much smaller than those of B-2SLS.
When the break-point is treated as unknown, the break size is important for the accuracy
of the estimated break location. With a break size of 1, the estimated break is quite reliable
with an average (over the estimated breaks) very close to the actual break: the average is
161.3 with a true break at 160. When the break size decreases, the quality of the estimator
of the break location deteriorates: for instance, with a true break at 160 and a break size of
0.2, the average is 172.4. Reliable estimation of the location of the break is crucial for the
bias properties of B-GMM and B-2SLS; we can see that when the break is not accurately esti-
mated, their biases increase, and the coverage properties of the confidence intervals worsen.21
This bias should not be too much of a concern, because it only appears when the break size
is small, and, oftentimes, such small breaks cannot be detected; see also experiment 3 below.
• Experiment 2: Performance as a function of the true location of the break.
Given the results of section 3, at least asymptotically, it is always efficient to ”split” the
sample in order to double the number of moments. In finite samples, however, the size of the
21One remedy consists in discarding the data around the estimated break (e.g. in a confidence interval for
the break location). This simple strategy should mitigate the drawback from estimating the break inaccurately.
However, it does require the asymptotic distribution of the break, which is beyond the scope of this paper.
21
subsamples and their identification strength may matter. Therefore, we investigate how the
performance of all estimators varies with the true location of the break and the strength of
identification in each subsample.
We consider two versions of the model in (6.1), all with T = 400, ρ = 0.5, q = 4, d1 = 0.1925,
R21 = 0.1, and break location that changes from (0.1 × 400) to (0.9 × 400): accordingly the
identification strength over the first subsample is characterized by µ21 between 4.4 and 40:
• model (i): the break size is b = −0.385 with associated parameter d2 = −0.1925 and
R22 = 0.1. The identification strength in the second subsample is characterized through
µ22 with values between 40 and 4.4.
• model (ii): the break size is b = −0.5 with associated parameter d2 = −0.3075 and
R22 = 0.22. The identification strength in the second subsample remains somewhat
strong and is characterized through µ22 with values between 102.1 and 11.3.
The results under homoskedasticity and conditional heteroskedasticity are presented in
Figures 3 and 4: two measures of performance are considered, the Monte-Carlo RMSE (left),
and the Monte-Carlo standard deviation (right). All results are based on 5,000 replications.
In model (i), both Monte-Carlo RMSE and standard deviations for B-GMM estimators are
stable as the break location changes from (0.1 × 400) to (0.9 × 400). This is quite different
for GMM. Both its RMSE and standard deviation are larger, and they are both increasing as
a function of the location of the break until it is in the middle of the sample, then they are
decreasing to return to their original levels. Results for model (ii) are very similar.
• Experiment 3: Finite sample properties when there is no break.
We now investigate the finite sample properties of our estimators when there is no RF break,
but a break-point is still estimated and used to compute B-GMM and B-2SLS estimators. We
consider the following versions of the model in (6.1): T = 400 or T = 800, ρ = 0.5, q = 4, and
di = 0.29 for i = 1, 2. There is no break in RF because we set Π1 = Π2 = diι4. However, the
econometrician still believes that there is a break in RF and estimates it in order to compute
B-GMM and B-2SLS; the overall GMM estimator is also computed as a benchmark. Finite
sample properties of these three estimators are evaluated by computing the Monte-Carlo bias,
standard deviations, root-mean squared errors (RMSE), as well as the length and coverage of
corresponding 95% confidence intervals. The results under HOM and HET1 are presented in
Table 11; in Table 12, the sample size is doubled.
First, it is interesting to point out that the estimated break-point is near the middle of
the sample on average. Second, and without much surprise, the finite sample performance of
B-GMM (and B-2SLS) is not as good as standard GMM in absence of a break. Nevertheless,
under both homoskedasticity or conditional heteroskedasticity, the standard deviations and
RMSEs of the three estimators are very close to each other, especially for larger sample sizes;
22
this holds despite the bias of the GMM estimator being much smaller than for B-GMM and
B-2SLS. This supports the (asymptotic) results derived in Theorem 7.
7 The New Keynesian Phillips Curve
7.1 Model
To our knowledge, virtually all papers estimate the New Keynesian Phillips Curve (NKPC)
at the quarterly frequency - see a.o. Sbordone (2002, 2005), Christiano, Eichenbaum and
Evans (2005), Zhang, Osborn and Kim (2008), HHB and Magnusson and Mavroeidis (2014).
This comes with two shortcomings. The first one is small samples, which leads to unreliable
estimation in the presence of multiple breaks. The second is that the US monetary policy
is set twice per quarter, so quarterly NKPC models may not be as informative as monthly
NKPC models.
In this paper, we estimate the NKPC at the monthly frequency, which mitigates these
shortcomings.22 However, we face two additional challenges. The first is data availability,
which we solve in the data section below. The second is how to write down a reasonable
monthly NKPC, given the empirical evidence that prices are indexed to the previous quarter.
With the exception of Zhang, Osborn and Kim (2008), most NKPC papers assume that
when prices cannot be re-optimized, they are either kept fixed or indexed to last quarter’s
inflation and not to more lags of inflation - see Galı and Gertler (1999), Sbordone (2002,
2005), Smets and Wouters (2003), Christiano, Eichenbaum and Evans (2005) and Magnusson
and Mavroeidis (2014). Many of the above papers find strong evidence that prices are indexed
to the previous quarter, and therefore to at least three lags of monthly inflation.23
More than one lag in the NKPC model can be obtained if one imposes dynamics in the
marginal cost process or in the structural shocks - see Krogh (2015) for a summary. It can
also be obtained from more primitive assumptions on the price indexation mechanism. For
example, Zhang, Osborn and Kim (2008) derive a NKPC model with multiple lags, but they
follow Galı and Gertler (1999) in assuming that backward and forward-looking firms co-exist.
The backward-looking firms never optimize and index to past lags of inflation, while the
forward-looking firms either keep their prices fixed or re-optimize when chosen (according to
an exogenous Calvo (1983) type mechanism). We derive the NKPC model in the supplemental
appendix by assuming that all firms are forward-looking when selected to re-optimize. In-
between optimizations, they do not keep their prices fixed, but index them to past lags of
inflation. This is the framework of Smets and Wouters (2003), Woodford (2003, Ch.3), and
22Because of the aforementioned shortcomings, especially issues related to small samples with multiple
breaks, we do not use quarterly data. HHB analyze multiple breaks with quarterly data, but point out in their
empirical section that some subsamples are too small to allow for a reliable interpretation of the results.23In our empirical analysis, we found that more lags are insignificant in the NKPC.
23
Christiano, Eichenbaum and Evans (2005). In the supplemental appendix, we generalize these
models (with sticky prices but not sticky wages) to capture price indexation to ℓ previous lags
rather than one lag of inflation.24
To illustrate the usefulness of the B-GMM estimation procedure for the monthly NKPC, we
first estimate an unconstrained version of the NKPC derived in the supplemental appendix,
where the parameters are not constrained to satisfy the nonlinear restrictions implied by the
structural model. Thus,
πt = αc + αb,1 πt−1 + αb,2 πt−2 + αb,3 πt−3 + αf πet + αy yt + ut, (7.1)
where yt is the output gap (in log difference from steady-state output), πt is inflation measured
as the log difference in prices between two periods, and inflation expectations at time t are
denoted πet .25
7.2 Data
We construct our dataset from monthly observations for the US in the period 1978:1-2010:6.
The inflation series is the annualized difference in log CPI taken from the FRED database,
i.e. πt = 1200(logCPIt − log CPIt−1). The output gap is the monthly log real GDP mi-
nus the log potential real GDP, calculated by the HP filter with monthly constant 14400:
yt = 100(log (real GDPt)− log (potential real GDPt)). A monthly real GDP series is avail-
able up to 2010:6 at http://www.princeton.edu/∼mwatson/mgdp gdi. We proxy inflation
expectations with the one year-ahead Michigan inflation expectations πet , the only series of
measured inflation expectations that, to our knowledge, is free and available at monthly fre-
quency.26 The Michigan inflation series starts at 1978:1. After constructing all series and their
lags, our data span is 1978:5-2010:6 (386 observations). Figure 1 plots the data.27
24As summarized by Krogh (2015) and shown in Mavroeidis (2005) and Nason and Smith (2008), second
order dynamics in either the marginal cost or the structural errors are useful for identifying the NKPC. Our
empirical results - see in particular Table 2 - suggest that besides the lags we use, no further dynamics is
needed for identification, since the instruments we use are not weak over the entire sample.25Galı and Gertler (1999) attribute the usual findings of negative and/or insignificant estimates of αy at
quarterly frequency to measurement errors in potential output. They propose using the NKPC model with
output gap replaced by marginal cost, which is not observed and is therefore proxied by the average unit labor
cost. However, to our knowledge, average unit labor costs are not available monthly. Moreover, there is still
a lot of criticism regarding the use of average unit labor cost as a proxy - see e.g. Rudd and Whelan (2005).26Proxying inflation expectations with this series is common in the literature, and even recommended by
Coibion and Gorodnichenko (2015), who argue that this series introduce substantial information about the
unobserved firm level inflation expectations defined in the NKPC.27We find that our results are robust to the inflation outlier around 2008, so we do not remove it.
24
7.3 Unrestricted Estimates
For all methods, Zt includes an intercept, three lags of inflation, three lags of expected inflation
and three lags of output gap. Since πet and yt are both endogenous, we have two reduced forms,
projecting πet and yt onto the instrument set Zt. Without breaks, they are:
πet = Z ′t∆1 + vt1, (7.2)
yt = Z ′t∆2 + vt2. (7.3)
• SMI break. As discussed in section 2, the variance of the instruments changed after
the Great Moderation, so we have a SMI break. The SMI break can be estimated via our
multivariate FGLS estimator applied to all the unique elements of the ZtZ′t matrix. However,
because Zt contains lags, this is not necessary, and we chose to estimate the SMI break as a
joint break in the multivariate mean of squared inflation squared, squared expected inflation
and squared output gap.28 For these variables, the BP estimators of a break in mean are
1981:9, 1981:11 and 1983:5. Using any of these estimators as a starting value, the FGLS
estimator of the Great Moderation break is 1981:11.
• RF breaks. As discussed in section 2, the RF also features the Great Moderation break,
which we impose at 1981:11 because this estimator converges faster than the RF break-point
estimator of the Great Moderation - see comment (ii) after the Corollary 1. Additionally,
the RF likely features the oil price collapse, see e.g. Bernanke (2004) and Galı and Gambetti
(2008). The BP estimator of the oil price collapse in the sample 1981:12-2010:6 is at 1985:12
for the first RF and at 1985:10 for the second RF. Using any of these estimators as a starting
value, the FGLS estimator of the oil price collapse in the post Great-Moderation sample
1981:12-2010:6 is at 1985:10.29
• VMC breaks. As discussed in section 2, the SMI break is likely a VMC break too and
thus it is already taken into account in the construction of B-GMM.30
7.3.1 Results for the baseline specification
In our baseline specification, the structural parameters of interest in (7.1) are assumed to be
stable over time; robustness checks with potential structural parameter breaks are in section
7.3.2. For the validity of our estimation results with B-GMM in this section, we need suffi-
ciently strong identification as in Assumption A2. Evidence for this is also provided in section
7.3.2.
28Note that lags of these will exhibit the same SMI breaks.29Even though the oil price collapsed in the first half of 1986, it experienced a sharp decline prior to that,
in 1985 - see Gately (1986), Figure 6.30Further robustness checks regarding the number of break-points and the estimated lo-
cations of the breakpoints can be found in the November 2015 version of this paper:
http://www.sfu.ca/∼baa7/research/AntoineBoldeaWP2015.
25
The unrestricted parameter estimates over the full sample (using B-GMM, B-2SLS, and
standard full sample GMM and 2SLS), all with Newey and West (1994) HAC robust standard
errors, are reported in Panel A of Table 1. This panel shows that all the B-GMM estimates
are more precise than the GMM estimates, and that the output gap parameter αy is not
significant when using GMM but is strongly significant when using B-GMM.
In the first and the third sample, the forward-looking coefficient αf is always significant and
much larger than the backward-looking component αb defined as αb,1+αb,2+αb,3. This result
is in line with most quarterly findings - see e.g. Galı and Gertler (1999), Sbordone (2005),
Zhang, Osborn and Kim (2008) and HHB.
Table 1 Panel A also shows a positive and significant relationship between inflation and
output gap at the monthly level when using B-GMM. The αy estimates are positive and around
0.16, indicating that a 1% increase in monthly output will increase annual inflation by 0.16%,
all else constant. These reinforce the quarterly NKPC evidence of small but positive output
gap coefficients and stand in contrast to most studies who find a negative and/or insignificant
coefficient on output gap at the quarterly level - see e.g. Galı and Gertler (1999) for a summary
of these studies.
All B-GMM parameter estimates have smaller standard errors than their GMM counter-
parts in Table 1. Therefore, the significance of the output gap coefficient αy is due to more
efficient estimation via B-GMM and not to the use of monthly data.
7.3.2 Robustness checks
There are two main concerns regarding the validity of the previous results. First, the coef-
ficients of the structural equation (7.1) may not be stable over the whole sample; see e.g.
Kleibergen and Mavroeidis (2009) and Hall, Han, and Boldea (2012). Second, the identifica-
tion strength may be weak over the whole sample.
• Stability of the structural equation (7.1):
To shed some light on the stability concern, we consider two potential structural parameter
breaks: the Great Moderation break, and the break in the recent crisis.31 The first break is
also a SMI break (break in the variance of our instruments) and is therefore set at 1981:11,
the multivariate SMI break-point estimate. The second break is set for simplicity before the
recent crisis, at 2006:12.
We thus re-estimate the model over two smaller samples: 1981:12 to 2010:6 and 1981:12 to
2006:12. Both these samples feature the oil price collapse as a RF break at 1985:10, so the
31The Great Moderation break is motivated by the analysis in Kleibergen and Mavroeidis (2009), cited in
Magnusson and Mavroeidis (2014), page 1842, and by the findings of Hall, Han and Boldea (2012), page 294.
The second structural parameter break is motivated by the fact that in the recent crisis, inflation has been
much lower than previously.
26
efficiency arguments in section 3.1 still hold; this is confirmed by Panels B and C of Table
1. In addition, most of the other results found over the full sample (and displayed in Panel
A) remain true over the smaller samples, including the positive and significant relationship
between inflation and output gap (see Panels B and C).
• Identification strength of the structural equation (7.1):
For the validity of the B-GMM method, at least one subsample has to be sufficiently strongly
identified, as stated in Assumption A2. This assumption is supported in the literature by the
findings of Zhang, Osborn and Kim (2008, 2009) and Kleibergen and Mavroeidis (2009, Table
4). All these papers use instrument sets that are similar to ours. The first two papers find
strong instruments over the whole sample; in the third paper, Table 4 shows much tighter con-
fidence sets after the Great Moderation compared to before, suggesting stronger identification
in this latter period.
In our analysis, the null of weak identification is rejected over all subsamples and over the
full sample, as shown by the reduced rank test of Kleibergen and Paap (2006) displayed in
Table 2. This test is robust to heteroskedasticity and multiple endogenous variables.32 These
results suggest no identification issues over the sample and data we consider.
To summarize, the findings in section 7.3.1 remain robust to the aforementioned concerns.
Additionally, the J-test on all stable subsamples cannot reject the validity of our instruments.
7.4 Restricted Estimates
In this section, we estimate the deep parameters of the NKPC model derived in the supple-
mental appendix:
πt = ψc +ρ1−ρ21+ρ1
πt−1 +ρ2−ρ31+ρ1
πt−2 +ρ3
1+ρ1πt−3 +
11+ρ1
πet +(1−θ)2θ(1+ρ1)
ψy yt + ǫt, (7.4)
where ρ1, ρ2, and ρ3 are the price indexation parameters to the first three lags of inflation,
and θ is the price stickiness parameter. Also, ψc is an unconstrained constant and ψy = Θψ
with Θ = 1−a1−a+ǫa , ψ = σ + ϕ+a
1−a , 1 − a the labor elasticity of output in the Cobb-Douglas
production function, ǫ the price aggregation parameter such that the steady-state marginal cost
is M = ǫǫ−1
, and σ, ϕ the utility function parameters for consumption and labor, respectively.
Since ψy is not separately identifiable from ρ1 and θ, we calibrate ψy = 3.33
32To our knowledge, it is the only test robust to both heteroskedasticity and multiple endogenous variables.
The efficient first stage F test recently proposed by Montiel Olea and Pflueger (2013) can only accommodate
one endogenous variable.33This value corresponds to the calibrations in Magnusson and Mavroeidis (2014) and Galı (2008), Ch.3.
We set a = 1/3 and ǫ = 6, thus M = 1.2 and Θ = 0.25, and we impose log utility for consumption and labor
(σ = ϕ = 1).
27
The restricted model (7.4) remains linear in the regressors and instruments, but becomes
nonlinear in the parameters of interest that correspond to the deep parameters of the NKPC
model. Our original (linear) framework of section 2 needs to be extended and the applicability
of our B-GMM procedure justified: such derivations are done in Appendix C. This means that
both breaks we previously considered can be used directly to augment the set of instruments
and obtain more efficient estimates, as for the unrestricted model (7.1). The only difference
is that we now estimate the parameters in the model (7.4) with a nonlinear GMM procedure.
We also impose the following two restrictions in the estimation34: (i) |ρ| ≤ 1 with ρ defined
as ρ1 + ρ2 + ρ3; (ii) 0.0001 < θ < 0.9999.
We estimate (7.4) by nonlinear two-step GMM or two-step B-GMM, with the same instru-
ments as before. In Table 3, the B-GMM estimates for ρ1, ρ2, and ρ3 for the full sample
are significant, while the GMM estimates are not all significant yet of similar magnitude,
highlighting yet again the efficiency gains from using the B-GMM procedure.
Looking at all samples in Table 3, there seems to be some evidence of price indexation to
all previous three months. The indexation to the third lag (ie to the previous quarter) is the
strongest: ρ3 is the largest estimate and it is significant for B-GMM across all sample periods.
The estimates for ρ1 and ρ2 are not significant for the last sample period, which excludes the
recent recession. Therefore, we also report results with ρ1 = ρ2 = 0, in which case the prices
are only indexed to last quarter’s inflation.
The results for ρ1 = ρ2 = 0 are reported in Table 4.35 The B-GMM estimates of ρ3 are
around 0.29 − 0.33, and their confidence intervals, considerably shorter than GMM, provide
strong evidence against full indexation at the monthly level. The point estimate of ρ3 is of
similar magnitude to that in Sbordone (2005), Table 1. The confidence intervals are tighter
than in Sbordone (2005) (see Table 1, pp. 1194) and Magnusson and Mavroeidis (2014) (Figure
5, pp. 1841), indicating that the B-GMM estimates are more informative for price indexation
than previous studies.
The B-GMM parameter estimates for the price stickiness θ in the third sample (our preferred
sample given the recent crisis) are significant and around 0.62, implying 1/(1− θ) ∼ 3 months
between price re-optimizations. This price duration is close to that found in Klenow and
Kryvtsov (2008) (4-7 months) and shorter than that in Nakamura and Steinsson (2008) (7-8
months).
Figure 2 shows the joint confidence regions for the estimates of ρ3 and θ for our preferred
sample 1981:12-2006:12. As can be seen from the picture, the B-GMM confidence region is
34We assume |ρ| ≤ 1 to avoid that firms overcompensate for past inflation. The lower bound on θ ensures
that the moment conditions are differentiable, and the upper bound ensures that some positive fraction of
prices is re-optimized each period.35Note that ρ1 = ρ2 = 0 implies that the restricted parameter estimates for the forward and backward
looking components are αf = 1, αb,1 = 0, αb,2 = −ρ3, αb,3 = ρ3, and they sum up to one. The fact that αf = 1
does not yield non-stationarity problems in our model because we are using measured inflation expectations.
28
much tighter than its GMM counterpart. Overall, we conclude that GMM is less informative
than B-GMM, and that relying on B-GMM, one finds strong evidence of price stickiness and
short price durations.
8 Conclusion
In this paper, we focus on changes in the limiting Hessian of the GMM minimand. We
decompose them into breaks in the reduced form, in the second moment of instruments and/or
in the variance of moment conditions. We show how to exploit them for more efficient GMM
estimators than currently available, and we call these estimators B-GMM.
Analyzing a newly developed NKPC model with a general inflation indexation scheme, we
show that using two changes, the Great Moderation and the oil price collapse, the B-GMM
estimators have tighter confidence intervals than the full sample GMM estimators, delivering
strong evidence against full inflation indexation and for short price durations.
References
[1] D.W.K. Andrews and J.H. Stock, Inference with weak instruments, Econometric Society
Monograph Series, vol. 3, ch. 8 in Advances in Economics and Econometrics, Theory and
Applications: Ninth World Congress of the Econometric Society, Cambridge University
Press, Cambridge, 2005.
[2] B. Antoine and O. Boldea, Inference in linear models with structural changes
and mixed identification strength, Working Paper, Simon Fraser University (2015),
’http://www.sfu.ca/ baa7/AntoineResearch.html’.
[3] B. Antoine and E. Renault, Efficient GMM with nearly-weak instruments, The Econo-
metrics Journal, Tenth Anniversary Issue 12 (2009), 135–171.
[4] J. Bai, Estimation of a change point in multiple regression models, Review of Economics
and Statistics 79 (1997), 551–563.
[5] J. Bai, R.L. Lumsdaine, and J.H. Stock, Testing for and dating common breaks in multi-
variate time series, Review of Economic Studies 65 (1998), 395–432.
[6] J. Bai and P. Perron, Estimating and testing linear models with multiple structural
changes, Econometrica 66 (1998), 47–78.
[7] B. Bernanke, The Great Moderation, ch. 5 of ’The Taylor rule and the transformation of
monetary policy’, pp. 143–182, Hoover Institution, 2004.
29
[8] G. Chamberlain, Asymptotic efficiency in estimation with conditional moment restric-
tions, Journal of Econometrics 34 (1987), 305–334.
[9] L.J. Christiano, M. Eichenbaum, and C.L. Evans, Nominal rigidities and the dynamic
effects of a shock to monetary policy, Journal of Political Economy 113 (2005), 1–45.
[10] O. Coibion and Y. Gorodnichenko, Is the Phillips Curve alive and well after all? Inflation
expectations and the missing disinflation, American Economic Journal - Macroeconomics
7 (2015), 197–232.
[11] R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, New York, 2004.
[12] J.-M. Dufour, L. Khalaf, and M. Kichian, Inflation dynamics and the New Keynesian
Phillips Curve: an identification robust econometric analysis, Journal of Economic Dy-
namics and Control 30 (2006), 1707–1727.
[13] J. Galı, Monetary policy, inflation, and the business cycle: an introduction to the New
Keynesian framework, Princeton University Press, Princeton and Oxford, 2008.
[14] J. Galı and L. Gambetti, On the sources of the Great Moderation, NBER Working Paper
No. 14171 (2008).
[15] J. Galı and M. Gertler, Inflation dynamics: a structural econometrics analysis, Journal
of Monetary Economics 44 (1999), 195–222.
[16] D. Gately, Lessons from the 1986 oil price collapse, Brooking Papers on Economic Activity
17 (1986), 237–284.
[17] A. R. Hall, S. Han, and O. Boldea, Inference regarding multiple structural changes in
linear models with endogenous regressors, Journal of Econometrics 170 (2012), 281–302.
[18] A.R. Hall, Generalized method of moments, Oxford University Press, New York, 2005.
[19] C. Hansen, J. Hausman, and W. Newey, Estimation with many instrumental variables,
Journal of Business and Economic Statistics 26 (2008), 398–422.
[20] F. Kleibergen and S. Mavroeidis, Weak instrument robust tests in GMM and the New
Keynesian Phillips Curve, Journal of Business and Economic Statistics 27 (2009), 293–
311.
[21] F. Kleibergen and R. Paap, Generalized reduced rank tests using the singular value de-
composition, Journal of Econometrics 133 (2006), 97–126.
[22] P.J. Klenow and O. Kryvstov, State-dependent or time-dependent pricing: does it matter
for recent U.S. inflation?, The Quarterly Journal of Economics 123 (2008), 863–904.
30
[23] T.S. Krogh, Macro frictions and theoretical identification of the New Keynesian Phillips
Curve, Journal of Macroeconomics 43 (2015), 191–204.
[24] L. Magnusson and S. Mavroeidis, Identification using stability restrictions, Econometrica
82 (2014), 1799–1851.
[25] S. Mavroeidis, Identification issues in forward-looking models estimated by GMM, with
an application to the Phillips curve, Journal of Money, Credit and Banking 37 (2005),
421–448.
[26] J.L. Montiel-Olea and C. Pflueger, A robust test for weak instruments, Journal of Business
and Economic Statistics 31 (2013), 358–369.
[27] E. Nakamura and J. Steinsson, Five facts about prices: a reevaluation of menu cost models,
Quarterly Journal of Economics 123 (2008), 1415–1464.
[28] J. Nason and G.W. Smith, Identifying the New Keynesian Phillips Curve, Journal of
Applied Econometrics 23 (2008), 525–551.
[29] W.K. Newey and K.D. West, Automatic lag selection in covariance matrix estimation,
The Review of Economic Studies 61 (1994), 631–653.
[30] Z. Qu and P. Perron, Estimating and testing structural changes in multivariate regressions,
Econometrica 75 (2007), 459–502.
[31] R. Rigobon, Identification through heteroskedasticity, Review of Economics and Statistics
85 (2003), 777–792.
[32] J. Rudd and K. Whelan, New tests of the New-Keynesian Phillips Curve, Journal of
Monetary Economics 52 (2005), 1167–1181.
[33] A. M. Sbordone, Prices and unit labor costs: a new test of price stickiness, Journal of
Monetary Economics 49 (2002), 265–292.
[34] A.M. Sbordone, Do expected future marginal costs drive inflation dynamics?, Journal of
Monetary Economics 52 (2005), 1183–1197.
[35] F. Smets and R. Wouters, An estimated dynamic stochastic general equilibrium model of
the euro area, Journal of the European Economic Association 1 (2003), 1123–1175.
[36] D. Staiger and J. Stock, Instrumental variables regression with weak instruments, Econo-
metrica 65 (1997), 557–586.
[37] J. Stock, J. Wright, and M. Yogo, A survey of weak instruments and weak identification in
Generalized Method of Moments, Review of Economics and Statistics 20 (2002), 518–529.
31
[38] J.H. Stock and M.W. Watson, Has the business cycle changed and why?, ch. in NBER
Macroeconomics Annual, pp. 159–230, edited by M. Gertler and K. Rogoff, 2002.
[39] M. Woodford, Interest and prices: foundations of a theory of monetary policy, Princeton
University Press, Princeton and Oxford, 2003.
[40] J.M. Wooldridge and H. White, Some invariance principles and central limit theorems
for dependent heterogeneous processes, Econometric Theory 4 (1988), 210–230.
[41] C. Zhang, D. R. Osborn, and D. H. Kim, The New Keynesian Phillips Curve: from sticky
inflation to sticky prices, Journal of Money, Credit and Banking 40 (2008), 667–699.
[42] C. Zhang, D.R. Osborn, and D.H. Kim, Observed inflation forecasts and the New Keyne-
sian Phillips Curve, Oxford Bulletin of Economics and Statistics 71 (2009), 375–398.
32
Appendix
Appendix A contains all the tables and graphs. Appendix B contains the proofs of the theo-
retical results stated in sections 3-5. Finally, in Appendix C we show the applicability of our
B-GMM estimation procedure in the extended framework of section 7.4.
A Tables and Figures
Table 1: Unrestricted Structural Form Estimation
Method αc αb,1 αb,2 αb,3 95% CI αb,3 αf 95% CI αf αy 95% CI αy
Panel A: Full sample 1978:5-2010:6
2SLS −0.37∗∗ −0.00 −0.22∗∗ 0.45∗∗∗ [0.20, 0.70] 0.44∗∗∗ [0.23, 0.65] 0.15 [−0.06, 0.35]
(0.16) (0.08) (0.09) (0.13) (0.11) (0.11)
GMM −0.54∗∗∗ −0.04 −0.20∗∗ 0.35∗∗∗ [0.10, 0.60] 0.55∗∗∗ [0.35, 0.75] 0.13 [−0.06, 0.32]
(0.15) (0.08) (0.08) (0.13) (0.10) (0.10)
B-2SLS −0.40∗∗∗ −0.01 −0.22∗∗∗ 0.44∗∗∗ [0.17, 0.70] 0.46∗∗∗ [0.26, 0.66] 0.17∗∗ [0.02, 0.31]
(0.16) (0.08) (0.09) (0.13) (0.11) (0.11)
B-GMM −0.47∗∗∗ −0.04 −0.21∗∗∗ 0.40∗∗∗ [0.33, 0.60] 0.51∗∗∗ [0.45, 0.57] 0.16∗∗∗ [0.08, 0.25]
(0.08) (0.03) (0.04) (0.03) (0.03) (0.04)
Panel B: Post Great Moderation sample 1981:12-2010:6
2SLS 0.78 0.06 −0.23∗∗∗ 0.51∗∗∗ [0.25, 0.78] 0.02 [−0.49, 0.53] 0.17∗ [−0.02, 0.37]
(0.60) (0.09) (0.09) (0.14) (0.26) (0.10)
GMM 0.20 −0.03 −0.19∗∗∗ 0.38∗∗∗ [0.14, 0.62] 0.29 [−0.17, 0.75] 0.18∗ [−0.00, 0.36]
(0.54) (0.08) (0.08) (0.12) (0.23) (0.09)
B-2SLS 0.59 0.05 −0.23∗∗∗ 0.50∗∗∗ [0.20, 0.81] 0.09 [−0.47, 0.65] 0.16∗∗ [0.01, 0.31]
(0.60) (0.09) (0.09) (0.14) (0.26) (0.10)
B-GMM 0.50∗∗ 0.03 −0.22∗∗∗ 0.48∗∗∗ [0.40, 0.62] 0.13 [−0.05, 0.32] 0.15∗∗∗ [0.08, 0.23]
(0.24) (0.04) (0.04) (0.04) (0.09) (0.04)
Panel C: Post Great Moderation and Pre-2007 Crisis sample 1981:12-2006:12
2SLS 0.02 −0.00 −0.30∗∗∗ 0.35∗∗∗ [0.16, 0.53] 0.41∗∗ [0.02, 0.80] 0.24∗∗ [0.02, 0.45]
(0.48) (0.07) (0.08) (0.09) (0.20) (0.11)
GMM −0.16 0.01 −0.31∗∗∗ 0.35∗∗∗ [0.18, 0.53] 0.46∗∗ [0.08, 0.84] 0.22∗∗ [0.02, 0.41]
(0.47) (0.07) (0.08) (0.09) (0.20) (0.10)
B-2SLS −0.22 −0.03 −0.30∗∗∗ 0.32∗∗∗ [0.17, 0.48] 0.51∗∗∗ [0.20, 0.81] 0.22∗∗ [0.04, 0.41]
(0.48) (0.07) (0.08) (0.09) (0.20) (0.11)
B-GMM −0.32 −0.02 −0.30∗∗∗ 0.32∗∗∗ [0.26, 0.53] 0.54∗∗∗ [0.33, 0.74] 0.16∗∗∗ [0.05, 0.27]
(0.28) (0.05) (0.04) (0.03) (0.11) (0.05)
All GMM and B-GMM estimates are second step estimates with HAC robust standard errors. 2SLS refers to
the standard 2SLS on the full sample. ∗∗∗ indicates statistical significance at 1%, ∗∗ at 5%, and ∗ at 10%.
33
Table 2: Assessing the identification strength
Reduced rank test of Kleibergen and Paap (2006) for the baseline model (7.2)-(7.3)
Sample Full sample Pre Great Moderation Post Great Moderation
1978:5-2010:6 1978:5-1981:11 1981:12-1985:10 1985:11-2010:6
p-value 0.0000 0.0000 0.0000 0.0000
Figure 1: Inflation, Inflation Expectations and Output Gap
1985:1 1990:1 1995:1 2000:1 2005:1
-8
-6
-4
-2
0
2
4
6
8
10
πt
πte
yt
2010:61978:5
34
Table 3: Restricted Structural Form Estimation
Method ψc ρ1 ρ2 ρ3 95% CI ρ3 θ 95% CI θ ρ 95% CI ρ
Panel A: Full sample 1978:5-2010:6
GMM −1.36∗∗∗ 0.27 0.23 0.50∗∗ [0.07, 0.93] 0.9970 [−77.99, 79.99] 1.00∗ [−0.15, 2.15]
(0.23) (0.22) (0.17) (0.22) (40.30) (0.59)
B-GMM −1.36∗∗∗ 0.26∗∗ 0.18∗∗ 0.45∗∗∗ [0.29, 0.61] 0.9922 [−10.28, 12.27] 0.90∗∗∗ [0.38, 1.41]
(0.14) (0.12) (0.08) (0.08) (5.75) (0.26)
Panel B: Post Great Moderation sample 1981:12-2010:6
GMM −1.74∗∗∗ −0.01 0.03 0.26 [−0.08, 0.60] 0.8023∗∗∗ [0.31, 1.29] 0.29 [−0.64, 1.21]
(0.31) (0.18) (0.13) (0.17) (0.25) (0.48)
B-GMM −1.55∗∗∗ 0.13 0.16∗ 0.41∗∗∗ [0.18, 0.64] 0.8792∗∗ [0.18, 1.57] 0.70∗∗ [0.07, 1.33]
(0.18) (0.13) (0.09) (0.12) (0.35) (0.32)
Panel C: Post Great Moderation and Pre-2007 Crisis sample 1981:12-2006:12
GMM −1.89∗∗∗ −0.09 −0.07 0.29∗∗∗ [0.07, 0.50] 0.6823∗∗∗ [0.41, 0.95] 0.13 [−0.42, 0.68]
(0.23) (0.11) (0.08) (0.11) (0.14) (0.28)
B-GMM −1.94∗∗∗ −0.11 −0.05 0.24∗∗∗ [0.10, 0.38] 0.6821∗∗∗ [0.51, 0.85] 0.08 [−0.27, 0.44]
(0.17) (0.08) (0.05) (0.07) (0.09) (0.18)
All estimates are second step estimates with HAC robust standard errors. Confidence intervals for ρ =
ρ1+ ρ2+ ρ3 are obtained via the delta method. ∗∗∗ indicates statistical significance at 1%, ∗∗ at 5%, ∗ at 10%.
Table 4: Restricted Structural Form Estimation with ρ1 = ρ2 = 0
Method ψc ρ3 95% CI ρ3 θ 95% CI θ
Panel A: Full sample 1978:5-2010:6
GMM −1.69∗∗∗ 0.29∗∗∗ [0.19, 0.40] 0.7093∗∗∗ [0.30, 1.12]
(0.48) (0.05) (0.21)
B-GMM −1.73∗∗∗ 0.29∗∗∗ [0.22, 0.37] 0.9952 [−20.37, 22.36]
(0.25) (0.04) (10.90)
Panel B: Post Great Moderation sample 1981:12-2010:6
GMM −1.74∗∗∗ 0.29∗∗∗ [0.17, 0.40] 0.8651 [−0.33, 2.06]
(0.47) (0.06) (0.61)
B-GMM −1.76∗∗∗ 0.29∗∗∗ [0.19, 0.38] 0.9078 [−0.54, 2.35]
(0.36) (0.05) (0.74)
Panel C: Post Great Moderation and Pre-2007 Crisis sample 1981:12-2006:12
GMM −1.71∗∗∗ 0.37∗∗∗ [0.24, 0.50] 0.6638∗∗∗ [0.27, 1.06]
(0.58) (0.07) (0.20)
B-GMM −1.71∗∗∗ 0.33∗∗∗ [0.22, 0.43] 0.6288∗∗∗ [0.41, 0.85]
(0.40) (0.05) (0.11)
All estimates are second step estimates with HAC robust standard errors. ∗∗∗ indicates statistical significance
at 1%, ∗∗ at 5%, and ∗ at 10%.
35
Figure 2: Confidence regions for θ and ρ in the sample 1981:12-2006:12
GMM Confidence Regions
0 0.2 0.4 0.6 0.8 1
Indexation 3
0
0.2
0.4
0.6
0.8
1
Pric
e st
icki
ness
95%90%
B-GMM Confidence Regions
0 0.2 0.4 0.6 0.8 1
Indexation 3
0
0.2
0.4
0.6
0.8
1
Pric
e st
icki
ness
95%90%
36
Table 5: Experiment 1 with known break and HOM
Benchmark case:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0023 0.0297 0.0298 0.1123 0.9340
B-2SLS 0.0023 0.0293 0.0294 0.1480 0.9860
GMM 0.0005 0.0342 0.0342 0.1305 0.9416
The concentration parameter µ2
1decreases from 40 to 8.4:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0031 0.0341 0.0342 0.1294 0.9342
B-2SLS 0.0030 0.0337 0.0338 0.1694 0.9852
GMM 0.0008 0.0416 0.0416 0.1590 0.9432
The concentration parameter µ2
1 decreases from 40 to 1.6:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0035 0.0366 0.0367 0.1389 0.9356
B-2SLS 0.0034 0.0361 0.0363 0.1815 0.9850
GMM 0.0010 0.0464 0.0464 0.1775 0.9422
Increase the sample size from 400 to 800:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0010 0.0193 0.0193 0.0755 0.9478
B-2SLS 0.0010 0.0192 0.0192 0.0988 0.9882
GMM 0.0003 0.0219 0.0219 0.0862 0.9496
Increase the number of IV from 3 to 6:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0025 0.0209 0.0211 0.0778 0.9322
B-2SLS 0.0026 0.0205 0.0206 0.1032 0.9830
GMM 0.0014 0.0240 0.0241 0.0917 0.9416
Increase the endogeneity coefficient from 0.5 to 0.75:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0034 0.0296 0.0298 0.1122 0.9332
B-2SLS 0.0033 0.0293 0.0294 0.1478 0.9840
GMM 0.0008 0.0342 0.0342 0.1305 0.9400
Table 6: Experiment 1 with known break and HET1
Benchmark case:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0047 0.0447 0.0450 0.1581 0.9110
B-2SLS 0.0022 0.0442 0.0442 0.2178 0.9826
GMM 0.0015 0.0527 0.0527 0.1930 0.9242
The concentration parameter µ2
1decreases from 40 to 8.4:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0061 0.0512 0.0516 0.1819 0.9140
B-2SLS 0.0029 0.0506 0.0507 0.2484 0.9802
GMM 0.0022 0.0641 0.0641 0.2352 0.9254
The concentration parameter µ2
1 decreases from 40 to 1.6:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0070 0.0548 0.0553 0.1952 0.9158
B-2SLS 0.0033 0.0542 0.0543 0.2660 0.9812
GMM 0.0027 0.0715 0.0715 0.2625 0.9270
Increase the sample size from 400 to 800:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0023 0.0303 0.0304 0.1149 0.9396
B-2SLS 0.0010 0.0300 0.0300 0.1540 0.9866
GMM 0.0009 0.0346 0.0346 0.1339 0.9446
Increase the number of IV from 3 to 6:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0043 0.0360 0.0362 0.1227 0.8996
B-2SLS 0.0018 0.0348 0.0348 0.1722 0.9862
GMM 0.0020 0.0411 0.0411 0.1496 0.9270
Increase the endogeneity coefficient from 0.5 to 0.75:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0068 0.0447 0.0452 0.1579 0.9074
B-2SLS 0.0031 0.0441 0.0442 0.2176 0.9804
GMM 0.0022 0.0527 0.0528 0.1931 0.9228
37
Table 7: Experiment 1 with known break and HET2
Benchmark case:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0015 0.0245 0.0246 0.0876 0.9342
B-2SLS 0.0015 0.0284 0.0284 0.1282 0.9894
GMM 0.0004 0.0305 0.0305 0.1106 0.9490
The concentration parameter µ2
1decreases from 40 to 8.4:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0019 0.0285 0.0286 0.1018 0.9344
B-2SLS 0.0020 0.0325 0.0326 0.1459 0.9884
GMM 0.0006 0.0372 0.0372 0.1350 0.9508
The concentration parameter µ21 decreases from 40 to 1.6:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0022 0.0309 0.0309 0.1097 0.9358
B-2SLS 0.0023 0.0349 0.0350 0.1561 0.9874
GMM 0.0007 0.0414 0.0414 0.1508 0.9522
Increase the sample size from 400 to 800:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0006 0.0171 0.0171 0.0621 0.9340
B-2SLS 0.0007 0.0203 0.0203 0.0895 0.9880
GMM -0.0001 0.0201 0.0201 0.0746 0.9502
Increase the number of IV from 3 to 6:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0014 0.0165 0.0166 0.0578 0.9198
B-2SLS 0.0015 0.0211 0.0211 0.0899 0.9874
GMM 0.0005 0.0196 0.0196 0.0728 0.9408
Increase the endogeneity coefficient from 0.5 to 0.75:
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0023 0.0245 0.0247 0.0875 0.9314
B-2SLS 0.0023 0.0284 0.0285 0.1281 0.9884
GMM 0.0006 0.0306 0.0306 0.1106 0.9498
Table 8: Experiment 1 with unknown break and HOM
Break size is equal to 1
Monte-Carlo average of estimated break location is T = 161.3
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0030 0.0290 0.0292 0.1124 0.9380
B-2SLS 0.0029 0.0287 0.0288 0.3827 1.0000
GMM -0.0003 0.0338 0.0338 0.1307 0.9490
Break size is equal to 0.5
Monte-Carlo average of estimated break location is T = 162.2
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0083 0.0461 0.0468 0.1771 0.9310
B-2SLS 0.0080 0.0454 0.0460 0.2865 0.9970
GMM -0.0000 0.0508 0.0508 0.1964 0.9470
Break size is equal to 0.2
Monte-Carlo average of estimated break location is T = 172.4
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0229 0.0686 0.0723 0.2619 0.9190
B-2SLS 0.0222 0.0675 0.0710 0.2430 0.9095
GMM 0.0008 0.0729 0.0729 0.2815 0.9475
The true break is T ∗ = 160.
38
Table 9: Experiment 1 with unknown break and HET1
Break size is equal to 1
Monte-Carlo average of estimated break location is T = 161.5
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0056 0.0447 0.0450 0.1580 0.9112
B-2SLS 0.0030 0.0441 0.0442 0.2184 0.9828
GMM 0.0015 0.0527 0.0527 0.1930 0.9242
Break size is equal to 0.5
Monte-Carlo average of estimated break location is T = 162.2
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0150 0.0711 0.0726 0.2480 0.8988
B-2SLS 0.0085 0.0701 0.0706 0.3510 0.9812
GMM 0.0033 0.0791 0.0792 0.2894 0.9228
Break size is equal to 0.2
Monte-Carlo average of estimated break location is T = 172.4
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0422 0.1059 0.1140 0.3594 0.8652
B-2SLS 0.0239 0.1042 0.1069 0.5387 0.9792
GMM 0.0066 0.1131 0.1133 0.4131 0.9218
The true break is T ∗ = 160.
Table 10: Experiment 1 with unknown break and HET2
Break size is equal to 1
Monte-Carlo average of estimated break location is T = 161.3
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0021 0.0245 0.0246 0.0876 0.9346
B-2SLS 0.0021 0.0284 0.0285 0.4288 0.9978
GMM 0.0004 0.0305 0.0305 0.1106 0.9490
Break size is equal to 0.5
Monte-Carlo average of estimated break location is T = 162.2
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0057 0.0382 0.0386 0.1362 0.9290
B-2SLS 0.0062 0.0454 0.0458 0.3189 0.9846
GMM 0.0009 0.0459 0.0459 0.1660 0.9482
Break size is equal to 0.2
Monte-Carlo average of estimated break location is T = 172.4
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0158 0.0557 0.0579 0.1976 0.9058
B-2SLS 0.0176 0.0696 0.0717 0.2687 0.8996
GMM 0.0018 0.0658 0.0658 0.2375 0.9458
The true break is T ∗ = 160.
39
Figure 3: Experiment 2 for model (i) and HOM (top) and HET2 (bottom)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
0.2
0.4
0.6
0.8
1
1.2MC-RMSE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1MC Std dev
B-2SLSGMMB-GMM
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8MC-RMSE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.2
0.4
0.6
0.8
1
1.2
1.4
1.6MC Std dev
B-2SLSGMMB-GMM
Left panel is RMSE and right panel is standard deviation for B-GMM (red o), B-2SLS
(blue x), and GMM (green +).
40
Figure 4: Experiment 2 for model (ii) and HOM (top) and HET2 (bottom).
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1MC-RMSE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9MC Std dev
B-2SLSGMMB-GMM
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1MC-RMSE
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9MC Std dev
B-2SLSGMMB-GMM
Left panel is RMSE and right panel is standard deviation for B-GMM (red o), B-2SLS (blue
x), and GMM (green +).
41
Table 11: Experiment 3: no break
HOM Monte-Carlo average of estimated break location is T = 199.9
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0504 0.0968 0.1091 0.3626 0.8770
B-2SLS 0.0489 0.0945 0.1064 0.5365 0.9758
GMM 0.0049 0.1037 0.1038 0.3949 0.9390
HET1 Monte-Carlo average of estimated break location is T =199.9
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.1244 0.2032 0.2383 0.6751 0.8092
B-2SLS 0.0668 0.2041 0.2147 1.1031 0.9802
GMM 0.0175 0.2240 0.2246 0.8168 0.9200
T = 400, ρ = 0.5, q = 4, F1 = 33.
Table 12: Experiment 3: no break
HOM Monte-Carlo average of estimated break location is T = 396.6
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0249 0.0666 0.0711 0.2602 0.9184
B-2SLS 0.0245 0.0659 0.0703 0.3743 0.9854
GMM 0.0027 0.0687 0.0688 0.2711 0.9482
HET1 Monte-Carlo average of estimated break location is T = 396.6
Estimator Bias Std dev RMSE Length Coverage
B-GMM 0.0694 0.1470 0.1625 0.5369 0.8826
B-2SLS 0.0351 0.1458 0.1500 0.8084 0.9848
GMM 0.0104 0.1525 0.1528 0.5911 0.9376
T = 800, ρ = 0.5, q = 4, F1 = 66.
B Proofs of Theorems
We assume throughout the proofs that λ < λ0. The proofs for λ ≥ λ0 are similar and omitted
for brevity. They are written for one break-point, but they generalize to multiple break-points
in a straightforward fashion, and this generalization is omitted for brevity.
Notation. For simplicity, we drop the subscripts RF, SMI, VMC and the superscripts
BP,QP . Therefore, in all the proofs, we denote any true break-point by T 0 = T1λ0 = [Tλ0],
any candidate break-point by T1λ = [Tλ], and any estimated break-point by T = T1λ = [T λ]
(except for the Proof of Theorems 3-7 where the FGLS estimator is denoted by T = T1λ =
[T λ] because it could be confused with other break-point estimators). Let T2λ = T − T1λ.
42
The subscripts 1λ, 2λ and ∆ on an estimator or a sum refer to estimation or summation in
the segment 1, . . . , T1λ, T1λ + 1, . . . , T and T1λ + 1, . . . , T1λ0. Let M2λ = M11 −M1λ,
Mi =Miλ0 for i = 1, 2, andM∆ =M1λ0−M1λ. Let Π0iT
def=== Πi/riT , Π
aiT = Πa
i /riT , where Πai is
defined in Theorem 1. When there is no potential for confusion, we also write Π0i = Πi/riT . We
let Ψzuiλ = T−1/2
∑iλ Ztut for i = 1, 2 and Ψzu
∆ = T−1/2∑
∆ Ztut. Also, Ψzviλ = T−1/2
∑iλ Ztvt,1
for i = 1, 2 and Ψzv∆ = T−1/2
∑∆ Ztvt,1. Mi = Miλ0 , and M∆ = M1λ0 − M1λ, for i = 1, 2
and RT = diag(Ip1 , TαIp2). Also, u.λ. means uniformly in λ ∈ (0, 1], diag(A1, A2) creates a
diagonal matrix with matrices A1, A2 on the main diagonal, vec(A) stacks the elements of
matrix A in a vector, in order, column by column, and vech(A) does the same, but removes
the elements that repeat. Let ||v|| the Euclidean norm for vectors v, ‖J‖ the square root
of the maximum eigenvalue of J ′J for matrices, and || · ||p = [E(|| · ||p)]1/p the Lp norm. If
across proofs, similar quantities appear, their notation is repeated unless there is potential for
confusion.
Proof of Theorem 1.
• Part (i): Asymptotic distribution of GMM. By A1(ii), the full sample moment conditions
are satisfied. Let N−1u be the weighting matrix. Then:
θGMM =[W ′ZTN−1u
Z′WT
]−1W ′ZTN−1u
Z′
T(Wθ0 + U)
⇔ T 1/2R−1
T
(θGMM − θ0
)=[RT
W ′ZTN−1u
Z′WTRT
]−1
RTW ′ZTN−1u
Z′UT 1/2
RTT−1W ′Z = RTΠ
a′
1TM1λ0 +RTΠa′
2TM2λ0 +RTT−1/2Ψzu
1λ0 +RTT−1/2Ψzu
2λ0
= RTΠa′
1TM1 +RTΠa′
2TM2 + oP (1),
since Ψzuiλ0 = T−1/2
∑iλ0 ztut = OP (1) by A1, A4 and the functional central limit theorem
in Wooldridge and White (1988), Theorem 2.11 (FCLT). Also, by the CLT, T−1/2Z ′UD→
N (0, Nu).
Case (a): α = α1 = α2. If we let µ′i = Πa′
i Mi, then RT T−1W ′Z = Πa′
1 M1+Πa′
2 M2+oP (1) =
µ′1 + µ′
2 + oP (1). Hence, using the optimal GMM estimator with NuP→ Nu,
T 1/2R−1
T (θGMM − θ0)D→ N
(0, [(µ1 + µ2)
′N−1u (µ1 + µ2)]
−1)= N (0, VGMM) .
Case (b): α = αi < αj. Then, as before,
RTT−1W ′Z = RTΠ
a′
1TM1 +RTΠa′
2TM2 + oP (1)P→ Πa′
1 M1 +Πa′
2 M2 = µ′1 + µ′
2,
where now Πaj = [Πz1, O(q,p2)]. Since T−1/2Z ′U
D→ N (0, Nu), the optimal GMM estimator is
obtained for NuP→ Nu, and so
T 1/2R−1
T (θGMM − θ0)D→ N
(0, [(µ1 + µ2)
′ N−1u (µ1 + µ2)]
−1).
This also shows consistency: θGMM − θ0 = RTOP (1)T−1Z ′u = OP (T
−1/2rT ) = oP (1).
43
• Part (ii): Asymptotic distribution of B-GMM. Let ZA be the (T , q) matrix with rows
Z ′1, . . . , Z
′T
and ZB be the (T − T , q) matrix with rows Z ′T+1
, . . . , Z ′T . With the weight-
ing matrix (Nau)
−1, θB−GMM =[W ′Z(Na
u)−1Z ′W
]−1
W ′Z(Nau)
−1Z′(Wθ0 + U), where Z =
[Z′A/T Z
′B/(T − T )] (since the scalings on ZA, ZB cancel out in the B-GMM formula). So,
T 1/2R−1
T [θB−GMM − θ0] =[RTW
′Z(Nau)
−1Z′WRT
]−1
RTW′Z(Na
u )−1[Z
′U√T ] (B.1)
Because we assume that λ − λ0 = OP (T2α−1), it can be shown, by similar arguments to the
proof of Theorem 8 in HHB - see their supplemental appendix - that RTT−1∑
iλWtZ′t −
RTΠa′
iTMi = oP (1) for i = 1, 2. It follows that:
RT
∑1λWtZ
′t/T = RT
TT[T−1
∑1λWtZ
′t] = RTΠ
a′
1TM1/λ0 + oP (1).
Similarly, RT
∑2λWtZ
′t/(T − T ) = RTΠ
a′
2TM2/(1− λ0) + oP (1), hence:
RTW′Z =
[RTΠ
a′
1TM1/λ0, RTΠ
a′
2TM2/(1− λ0)]+ oP (1) = K + oP (1). (B.2)
On the other hand, Z′U√T = [
√T∑
1λ utZ′t/T ,
√T∑
2λ utZ′t/(T − T )]. Because λ − λ0 =
OP (T2α−1), it can be shown by similar arguments to the proof of Theorem 8 in HHB, that
T−1/2∑
iλ Ztut − T−1/2∑
iλ0 Ztut = oP (1) for i = 1, 2. So:
√T∑
1λ utZt/T = T (T )−1T−1/2∑
1λ utZt + oP (1)D→ N (0, Nu,1/(λ
0)2)√T∑
2λ utZt/(T − T )D→ N (0, Nu,2/(1− λ0)2) .
Because√T∑
1λ utZ′t/T ⊥
√T∑
2λ utZ′t/(T − T ) asymptotically by A1(ii), Z
′U√T
D→N (0, Na
u) , where Nau = diag[Nu,1/(λ
0)2, Nu,2/(1 − λ0)2]. From the latter and (B.1)-(B.2),
T 1/2R−1
T [θB−GMM − θ0] = V1/2B−GMM N (0, Ip) + oP (1), where
VB−GMM = [K(Nau )
−1K ′]−1K(Nau)
−1Nau(N
au)
−1K ′[K(Nau)
−1K ′]−1.
Setting Nau
P→ Nau yields VB−GMM = [K(Na
u )−1K ′]−1 + oP (1). Since
rTW′Z =
[RTΠ
a′
1TM1/λ0, RTΠ
a′
2TM2/(1− λ0)]+ oP (1) = [µ′
1/λ0, µ
′
2/(1− λ0)] + oP (1),
VB−GMM = [µ′1(Nu,1)
−1µ1 + µ′2(Nu,2)
−1µ2]−1.
These results also imply that θB−GMM − θ0 = RTOP (1)(Z′u) = oP (1).
Since B-2SLS is a special case of B-GMMwhere we have diag((T )−1Z′AZA, (T−T )−1Z
′BZB)
P→diag(M1/λ
0,M2/(1− λ0)) as a weighting matrix, the proof follows in a similar fashion.
Proof of Theorem 2. Recall that:
VB−2SLS = [Πa′1 M1Π
a1 +Πa′
2 M2Πa2]
−1[Πa′1 Nu,1Π
a1 +Πa′
2 Nu,2Πa2][Π
a′1 M1Π
a1 +Πa′
2 M2Πa2]
−1
VGMM = [(Πa′1 M1 +Πa′
2 M2)(Nu,1 +Nu,2)−1(M1Π
a1 +M2Π
a2)]
−1
VB−GMM = [Πa′
1 M1(Nu,1)−1M1Π
a1 +Πa′
2 M2(Nu,2)−1M2Π
a2]
−1.
44
Since θB−GMM is the optimal version of θB−2SLS, VB−GMM ≤ VB−2SLS. We show below
that VB−GMM ≤ VGMM . Let Nu = Nu,1 +Nu,2, µ = vec [µ1 µ2], ϑ = vec [ϑ1 ϑ2], for any (p, 1)
vector a such that ϑi = µia, and L = Nu,2N−1u,1. Then:
a′(V −1B−GMM − V −1
GMM)a = ϑ′1N−1u,1ϑ1 + ϑ′2N
−1u,2ϑ2 − (ϑ1 + ϑ2)
′N−1u (ϑ1 + ϑ2)
= ϑ′
(N−1u,1 O
O N−1u,2
)ϑ− ϑ′
(N−1u N−1
u
N−1u N−1
u
)ϑ = ϑ′
(N−1u,1 −N−1
u −N−1u
−N−1u N−1
u,2 −N−1u
)ϑ
= ϑ′
(N−1u L −N−1
u
−N−1u N−1
u L−1
)ϑ ≡ f(ϑ).
It is well known that for any matrix M∗ =
[A B
B′ C
], where A,C are square symmetric
matrices of the same dimension, and C is pd, M∗ is psd (positive semi-definite) iff (A −
BC−1B′) (the Schur complement of C) is psd. In our case, M∗ =
[N−1u L −N−1
u
−N−1u N−1
u L−1
],
N−1u L−1 = N−1
u,2 − (Nu,1 +Nu,2)−1 is pd by construction, and its Schur complement is N−1
u L−N−1u LNuN
−1u = O. Thus, M∗ is psd, and f(ϑ) ≥ 0. This implies that V −1
B−GMM ≥ V −1GMM , so
VB−GMM ≤ VGMM . Moreover, because f(ϑ) ≥ 0 and is convex, its minimum is attained at 0
only for the ϑ values that satisfy ∂f(ϑ)/∂ϑ′ = 0:
∂f(ϑ)∂ϑ1
= 2N−1u (Lϑ1 − ϑ2) = 0; ∂f(ϑ)
∂ϑ2= 2N−1
u (L−1ϑ2 − ϑ1) = 0.
Thus, VGMM = VB−GMM for ϑ 6= 0 iff Lϑ1 = ϑ2. Equivalently, VGMM = VB−GMM for a 6= 0
iff (N−1u,1M1Π
a1 − N−1
u,2M2Πa2)a = 0. This holds for all a 6= 0 when N−1
u,1M1Πa1 = N−1
u,2M2Πa2. If
rank(N−1u,1M1Π
a1 − N−1
u,2M2Πa2) = p (full-rank), it cannot hold for any a 6= 0 because for any
a 6= 0, rank[(N−1u,1M1Π
a1 − N−1
u,2M2Πa2)a] = 1, so the minimum of f(ϑ) is achieved (at 0) only
for a = 0 and it is unique. So, f(ϑ) > 0 for all a 6= 0, implying that when rank(N−1u,1M1Π
a1 −
N−1u,2M2Π
a2) = p , VGMM − VB−GMM is pd (positive definite).
• Derivations for comment (iv) after Theorem 2. Under conditional homoskedasticity, we
have the usual result that VB−GMM = VB−2SLS. Also, Nu,1 = λ0ΦuQ, Nu,2 = (1 − λ0)ΦuQ,
M1 = λ0Q, M2 = (1− λ0)Q, and for any (p, 1) vector a 6= 0,
V −1B−GMM/Φu = λ0Πa′
1 QΠa1 + (1− λ0)Πa′
2 QΠa2
V −1GMM/Φu = [Πa′
1 λ0 +Πa′
2 (1− λ0)]Q[Πa1λ
0 +Πa2(1− λ0)]
a′(V −1GMM − V −1
B−GMM)a/Φu = −λ0(1− λ0) [(Πa1 −Πa
2)a]′ Q [(Πa
1 −Πa2)a] < 0,
so VGMM − VB−GMM is pd.
• Derivations for comment (ii) after A6. Let wlog r1T = o(r2T ). Then Πa1 = [Πz1,Π1] and
Πa2 = [Πz1, O(q,p2]. Then:
V −1GMM =
[A1 B1
B′1 C1
], V −1
B−GMM =
[A2 B2
B′2 C2
],
45
where A1 = Π′z1(MN−1
u M)Πz1, B1 = Π′z1(MN−1
u M1)Π1, C1 = Π′1(M1N
−1u M1)Π1, A2 =
Π′z1(M1N
−1u,1M1 + M2N
−1u,2M2)Πz1, B2 = Π′
z1(M1N−1u,1M1)Π1, C2 = Π′
1(M1N−1u,1M1)Π1. Let-
ting Di = Ci − B′iA
−1i Bi be the Schur complements of Ai, i = 1, 2. Using the formula for
partitioned inverses,
VGMM − VB−GMM =
[A−1
1−A−1
2+A−1
1B1D
−1
1B′
1A−1
1−A−1
2B2D
−1
2B′
2A−1
2, −A−1
1B1D
−1
1+A−1
2B2D
−1
2
−D−1
1B′
1A−1
1+D−1
2B′
2A−1
2D−1
1−D−1
2
].
The B-GMM estimates of θ0x are strictly more efficient than GMM - i.e. the lower right
(p2, p2) block of VGMM − VB−GMM is pd - iff D−11 − D−1
2 is pd, or, equivalently, D2 − D1 =
C2 − C1 +B′1A
−11 B1 − B′
2A−12 B2 is pd.
Under A6, M1 = λ0M , M2 = (1 − λ0)M , Nu,1 = λ0Nu, Nu,2 = (1 − λ0)Nu. Let Ω =
MN−1u M . Then C2−C1 = λ0(1−λ0)Π′
1ΩΠ1, which is pd because Π1 is full-rank by Assumption
A2(ii). On the other hand, B2 = λ0Π′z1ΩΠ1 = B1, and A2 = Π′
z1(λ0Ω + (1 − λ0)Ω)Πz1 =
Π′z1ΩΠz1 = A1, so D2 −D1 = C2 − C1 is p.d.
Proof of Theorem 3.
Part (i): Consistency and rate of convergence of λ. Note that the BP estimator is univariate.
So in the proof of part (i), we treat Xt as a scalar.
• Consistency of λ. Let vt = Xt−Z ′tΠ1λ in interval [1, T ], vt = Xt−Z ′
tΠ2λ in interval [T+1, T ],
and dt = vt − vt. By definition of the sum of squared residuals,
T∑
t=1
v2t ≤T∑
t=1
v2t ⇒ 2
T∑
t=1
vtdt +
T∑
t=1
d2t ≤ 0 . (B.3)
Recall that rT = T α. We show consistency by contradiction in two steps. In step 1, we show:
T 2α−1
T∑
t=1
vtdt ≡ T 2αL1 = oP (1) and T2α−1
T∑
t=1
d2t ≡ T 2αL2 = OP (1) (B.4)
Therefore, T 2αL2 dominates T 2αL1. Substituting (B.4) into (B.3), we have T 2αL2+oP (1) ≤ 0.
It follows that T 2αL2 = oP (1). In step 2, we show that if λ 6 P→ λ0, then, with positive constant
probability, T 2αL2 > 0, contradicting T 2αL2 = oP (1). Thus, λP→ λ0.
- Step 1. Recall that Π0i = Πi/riT (ignoring the dependence on T for simplicity), and let
Ψzviλ = T−1/2
∑iλZtvt for i = 1, 2 and Ψzv
∆ = T−1/2∑
∆ Ztvt for i = 1, 2. Note that:
dt = vt − vt =
Xt − Z ′
tΠ1 − vt, t ≤ T
Xt − Z ′tΠ2 − vt, t > T
=
Z ′t(Π
01 − Π1λ), t ≤ T
Z ′t(Π
01 − Π2λ), T + 1 ≤ t ≤ T 0
Z ′t(Π
02 − Π2λ), t > T 0
, so
∑Tt=1 vtdt = (Π0
1 − Π1λ)′[T 1/2Ψzv
1λ] + (Π0
1 − Π2λ)′[T 1/2Ψzv
∆ ] + (Π02 − Π2λ)
′[T 1/2Ψzv2λ0 ]. (B.5)
By Assumptions 1(i)-(ii), A4 and the FCLT, Ψzviλ = OP (1), u.λ. Thus:
Ψzviλ
= OP (1),Ψzviλ0 = OP (1) and Ψzv
∆ = OP (1). (B.6)
46
By A3, Miλ = OP (1) and M∆ = OP (1) u.λ, hence:
Π1λ −Π01 =M−1
1λ[T−1/2Ψzv
1λ] = OP (1) OP (T
−1/2) = OP (T−1/2). (B.7)
On the other hand, letting Π0∆ = Π0
2 − Π01 = O(T−α),
Π2λ −Π02 =M−1
2λ[T−1/2Ψzv
2λ]− M−1
2λM∆ Π0
∆ = OP (T−α) (B.8)
Π2λ −Π01 = (Π2λ − Π0
2) + Π0∆ = OP (T
−α). (B.9)
Substituting (B.6)-(B.9) into (B.5) yields∑T
t=1 vtdt = OP (T1/2−α), so T 2αL1 = T 2α−1
∑Tt=1 vtdt =
oP (1). Next, note that:
∑Tt=1 d
2t =
∑1λ d
2t +
∑∆ d
2t +
∑2λ0 d
2t = (Π0
1 − Π1λ)′ T M1λ (Π0
1 − Π1λ)
+ (Π01 − Π2λ)
′ T M∆ (Π01 − Π2λ) + (Π0
2 − Π2λ)′ T M2λ0 (Π0
2 − Π2λ)
= OP (1) +OP (T−α)OP (T )OP (T
−α) +OP (T−α)OP (T )OP (T
−α) = OP (T1−2α).
Therefore, T 2αL2 = T 2α−1∑T
t=1 d2t = OP (1) and dominates T 2αL1 = oP (1).
- Step 2. If λ 6 P→ λ0, then there exists η ∈ (0, 1), such that with positive probability,
T 0− T = [Tλ0]− [T λ] ≥ Tη. Let Mη = T−1∑[Tλ0]
t=[Tλ0]−Tη+1 ZtZ′t = M1λ0 −M1(λ0−η). Because it
is a symmetric pd matrix, ‖Mη‖ ≥ mineig(Mη) > C + oP (1), where C is a constant, because
by A3, plimmineig(Mη) > 0. Then, with positive constant probability,
L2 = T 2α−1∑T
t=1 d2t ≥ T 2α−1
(∑T 0
t=T 0−Tη+1 d2t
)= T α(Π2λ − Π0
1)′ Mη T
α(Π2λ −Π01)
≥ ‖T α(Π2λ −Π02) + T αΠ0
∆‖2C (B.10)
Let T αΠ0∆ = Π0
∆. Under A2, when α1 = α2 = α, then Π0∆ = Π2 − Π1 6= 0; when α2 < α1,
Π0∆ = Π2 + o(1) 6= 0, and when α2 > α1, Π
0∆ = −Π1 + o(1) 6= 0. Thus in all cases, Π0
∆ 6= 0.
From (B.8), T α(Π2λ −Π02) = −M−1
2λMη Π0
∆ + oP (1), so:
T α(Π2λ − Π02) + T αΠ0
∆ = oP (1)− [M−1
2λM∆ − Iq] Π
0∆
= oP (1)− M−1
2λ[M1λ0 − M1λ − M11 + M1λ] Π
0∆
= −M−1
2λM2λ0Π
0∆ + oP (1). (B.11)
A3 implies that with positive constant probability, u.λ, Π0′
∆M2λ0M−22λ M2λ0Π
0∆C + oP (1) > 0.
Using this and (B.10)-(B.11), T 2α−1∑T
t=1 d2t ≥ Π0′
∆M2λ0M−2
2λM2λ0Π
0∆C + oP (1) > C∗ + oP (1),
where C∗ is a constant. But this cannot hold because L2 = T 2α−1∑T
t=1 d2t
P→ 0. Therefore,
λP→ λ0.
• Convergence rate of λ. Since λP→ λ0, any break-point estimator T = [T λ] is such that
T 0 − T ≤ ǫT , for some chosen ǫ > 0. We find the convergence rate by contradiction. For
chosen C > 0, we assume that T 0 − T > CT 2α. Define L1, L2 and L3 to be the RF sum of
squared residuals (SSR) obtained with break-points T , T 0 and (T , T 0) respectively. Then, by
47
definition of OLS, (T 0 − T )−1T 2α+1(L1 − L2) ≤ 0. We show that if CT 2α < T 0 − T ≤ ǫT
for some large but fixed C and small but fixed ǫ, then plim[(T 0 − T )−1T 2α+1(L1 − L2)] > 0,
contradicting the above. It follows that T 0 − T ≤ CT 2α, and by symmetry of the argument,
if T ≥ T 0, T − T 0 ≤ CT 2α, establishing the desired convergence rate for the break fraction
estimator.
We now show that plim[(T 0− T )−1T 2α+1(L1−L2)] > 0. By our notation, (Π1λ, Π2λ) are the
OLS estimators based on one break at T , (Π1λ, Π∆, Π2λ0) are the ones based on two breaks at
T and T 0, and (Π1λ0 , Π2λ0) are the ones based on one break at T 0 . Let Q∆ = 1T 0−T
∑∆ ZtZ
′t.
By straightforward algebra, it can be shown (see BP):
(T 0 − T )−1T 2α+1(L1 − L3)
= T α(Π2λ0 − Π∆)′Q∆ T α(Π2λ0 − Π∆)− T α(Π2λ0 − Π∆)
′[Q∆M−1
2λM∆] T
α(Π2λ0 − Π∆)
= N1 −N2 . (B.12)
(T 0 − T )−1T 2α+1(L2 − L3)
= T α(Π1λ − Π∆)′Q∆ T α(Π1λ − Π∆)− T α(Π1λ − Π∆)
′[Q∆M−11λ0M∆] T
α(Π1λ − Π∆)
= N3 −N4. (B.13)
We now show that N1 = OP (1), N2 = OP (N1)oP (ǫ), N3 = oP (1), and N4 = OP (N3)oP (ǫ). We
also show that N1 > 0 in the limit, for large C and small ǫ. Hence, for large C and small ǫ,
we would have the desired statement:
plim (T 0 − T )−1T 2α+1(L1 − L2) = N1 −N2 −N3 +N4 > 0 .
Since T 0 − T ≤ ǫT , by A3, we have that M−1
2λM∆ = OP (1)OP (ǫ) = OP (ǫ), which implies
that N2 = OP (N1)OP (ǫ). Similarly, N4 = OP (N3)OP (ǫ). Next, we compare (Π2λ0 − Π∆)
and (Π1λ − Π∆). Since Π1λ and Π∆ are both subsample estimators of Π01, Π1λ − Π∆ =
(Π1λ − Π01)− (Π∆ − Π0
1) = OP (T−1/2) +OP (T
−1/2) = OP (T−1/2). Since Π2λ0 is the estimator
of Π02 in subsample [T 0 + 1, T ], Π2λ0 − Π0
2 = OP (T−1/2), so
Π2λ0 − Π∆ = (Π2λ0 −Π02)− (Π∆ − Π0
1) + Π0∆ = OP (T
−1/2) + Π0∆ = OP (T
−α).
Thus, T α(Π2λ0 − Π∆) = OP (1) and T α(Π1λ − Π∆) = oP (1). By A3, Q∆ = OP (1) for large
enough C. Therefore, N1 = T α(Π2λ0 − Π∆)′Q∆ T α(Π2λ0 − Π∆) = OP (1) for large C, while
N3 = oP (1). All of these show that the probability limit of (T 0 − T )−1T 2α+1 (L1 − L3) is
determined by the probability limit of N1 for small enough ǫ, because N2 = OP (N1ǫ), N4 =
OP (N3ǫ) and thus dominated by N1 for small ǫ. For large enough C, by A3,
N1 = T 2α[OP (T−1/2)− Π0
∆]′Q∆[OP (T
−1/2)− Π0∆] ≥ ||Π0
∆||2 mineig(Q∆) + oP (1) > C∗ + oP (1),
where C∗ is a positive constant because plim Q∆ is pd by A3. This implies that (T 0 −T )−1T 2α+1(L1 − L2) > 0 with positive probability, which cannot hold. This completes the
proof.
48
• Part (ii). For the QP estimator, under the assumptions imposed, Lemma 1 in QP can
be verified by following step by step the proof of Lemma A3, which is exactly as in the
supplemental appendix of QP and omitted for simplicity (intuitively, nothing changes because
the magnitude of the RF break in parameters and in the variance is all that matters for proving
consistency of the QP estimator). Therefore, the convergence rate is as stated in Theorem 3.
• Part (iii). Because λ−λ0 = OP (T2α−1), it can be shown that the asymptotic distribution
of the parameter estimators βiλ = vec(Πiλ) are as if the breaks were known. Therefore, we are
back in the standard regression model framework, and so Σv,t, defined for the FGLS estimator
in two ways, is for both definitions consistent: Σv,tP→ Σv.
Consistency of λ. This proof is similar to part (i). For a given λ, we denote by β1λ, β2λ the
FGLS estimators of the RF parameters obtained from minimizing LFGLS(λ, βi) over β1, β2.
Let δ0β = β02 − β0
1 = T−α[δ0β + o(1)] = O(T−α), where δ0β = (Π2 − Π1) if α1 = α2, δ0β = Π1 if
α1 < α2 and δ0β = −Π2 if α2 < α1. Then by A2, δ0β 6= 0, and:
βiλ = (∑
iλ ZtΣ−1v,t Z
′t)
−1∑
iλ ZtΣ−1v,tXt
= (T−1∑
iλ ZtΣ−1v,t Z
′t)
−1T−1∑
iλ ZtΣ−1v,t (vt + Z ′
tβ0i + 1[i = 2]Z ′
tδ0β1[t ≤ T 0]).
Note that by A3,
T−1∑
iλ ZtΣ−1v,t Z
′t = T−1
∑iλ Zt[Σ
−1v + oP (1)]Z
′t + oP (1) = Σ−1
v ⊗ (T−1∑
iλ ZtZ′t) + oP (1)
= Σ−1v ⊗Miλ + oP (1) = OP (1). (B.14)
Also, by A1, A4 and the FCLT,
T−1∑
iλ ZtΣ−1v,tvt = T−1/2 (T−1/2
∑iλ ZtΣ
−1v,tvt) + oP (1)]
= T−1/2 (Σ−1v ⊗ Iq)(T
−1/2∑
iλ vt ⊗ Zt) + oP (T−1/2)
= T−1/2 (Σ−1v ⊗ Iq)OP (1) + oP (T
−1/2) = OP (T−1/2). (B.15)
Therefore,
β1λ − β01 = (
∑1λ ZtΣ
−1v,t Z
′t)
−1∑
1λ ZtΣ−1v,tvt = OP (T
−1/2) (B.16)
β2λ − β02 = (
∑2λ ZtΣ
−1v,t Z
′t)
−1∑
2λ ZtΣ−1v,tvt − (
∑2λ ZtΣ
−1v,t Z
′t)
−1∑
∆ ZtΣ−1v,t Z
′t δ
0β (B.17)
= OP (T−1/2) +OP (T
−α) = OP (T−α). (B.18)
Letting βt = β1λ1[t ≤ T1λ] + β2λ1[t > T1λ] and β0t = β0
11[t ≤ T 0] + β021[t > T 0], we have:
LFGLS(λ, βiλ) = T−1∑T
t=1(Xt − Z ′tβt)
′Σ−1v,t (Xt − Z ′
tβt)
= T−1∑T
t=1[vt − Z ′t(βt − β0
t )]′Σ−1
v,t [vt − Z ′t(βt − β0
t )]
= T−1∑T
t=1 v′tΣ
−1v,tvt − 2T−1
∑Tt=1 v
′tΣ
−1v,t Z
′t(βt − β0
t )
+ T−1∑T
t=1(βt − β0t )
′ZtΣ−1v,t Z
′t(βt − β0
t ).
49
Because LFGLS(λ0, β0i ) = T−1
∑Tt=1 v
′tΣ
−1v,tvt, by definition of the minimization problem,
LFGLS(λ, βiλ)− LFGLS(λ0, β0i )
= −2T−1∑T
t=1 v′tΣ
−1v,t Z
′t(βt − β0
t ) + T−1∑T
t=1(βt − β0t )
′ZtΣ−1v,t Z
′t(βt − β0
t ) = −2L1 + L2 ≤ 0.
We now show that T 2αL1 is dominated in probability limit by T 2αL2. Using equations
(B.14)-(B.16) and (B.18),
L1 = (T−1∑
1λ v′tΣ
−1v,t Z
′t)(β1λ − β0
1) + (T−1∑
∆ v′tΣ
−1v,t Z
′t)[β2λ − β0
2 + δ0β]
+ (T−1∑
2λ v′tΣ
−1v,t Z
′t)(β2λ − β0
2) = OP (T−1/2)OP (T
−1/2) +OP (T−1/2)OP (T
−α)
+OP (T−1/2)OP (T
−α) = OP (T−1/2−α)
T 2αL1 = OP (Tα−1/2) = oP (1)
L2 = (β1λ − β01)
′(T−1∑
1λ ZtΣ−1v,t Z
′t)(β1λ − β0
1)
+ (β2λ − β02 + δ0β)
′(T−1∑
∆ ZtΣ−1v,t Z
′t)(β2λ − β0
2 + δ0β)
+ (β2λ − β02)
′(T−1∑
2λ ZtΣ−1v,t Z
′t)(β2λ − β0
2) (B.19)
= OP (T−1) +OP (T
−2α) +OP (T−2α) = OP (T
−2α)
T 2αL2 = OP (1).
This shows that T 2αL2 dominates T 2αL1, therefore T2αL2
P→ 0 because −2L1 + L2 ≤ 0. We
now show that T 2αL2 > 0 with some positive constant probability when the break-fraction
λ is inconsistent, which is a contradiction. Therefore, it must be that λP→ λ0, and showing
that T 2αL2 > 0 with positive probability will complete the proof. From (B.17)-(B.18) and
Σ−1v,t = Σ−1
v + oP (1),
T α(β2λ − β02) = OP (T
α−1/2)− (∑
2λ ZtΣ−1v,t Z
′t)
−1∑
∆ ZtΣ−1v,t Z
′t (T
αδ0β)
= oP (1)− (T−1∑
2λ ZtΣ−1v Z ′
t)−1∑
∆ ZtΣ−1v Z ′
t δ0β
= oP (1) − [Σ−1v ⊗M2λ]
−1Σ−1v ⊗ [M2λ −M2λ0 ] δ0β
= −Ip2 ⊗ M−1
2λ [M2λ −M2λ0 ] δ0β + oP (1) (B.20)
T α(β2λ − β02 + δ0β) = −
Ip2 ⊗ M−1
2λ [M2λ −M2λ0 ] − Ip2 ⊗ Iqδ0β + oP (1)
=Ip2 ⊗ [M−1
2λ M2λ0 ]δ0β + oP (1) (B.21)
From (B.14) and (B.20)-(B.21),
T 2αL2 = T 2α(β2λ − β02 + δ0β)
′(T−1∑
∆ ZtΣ−1v,t Z
′t)(β2λ − β0
2 + δ0β)
+ T 2α(β2λ − β02)
′(T−1∑
2λ ZtΣ−1v,t Z
′t)(β2λ − β0
2) + oP (1)
= δ0′
β
[Ip2 ⊗ (M2λ0M
−12λ )][Σ−1
v ⊗ (M2λ −M2λ0)][Ip2 ⊗ (M−1
2λ M2λ0)]δ0β
+ δ0′
β
Ip2 ⊗ [(M2λ −M2λ0)M
−12λ ](Σ−1
v ⊗M2λ)Ip2 ⊗ [M−1
2λ (M2λ −M2λ0)]δ0β
+ oP (1)
= δ0′
β
Σ−1v ⊗ [M2λ0M
−12λ (M2λ −M2λ0)M
−12λ M2λ0 ]
δ0β
+ δ0′
β
Σ−1v ⊗ [(M2λ −M2λ0)M
−12λ (M2λ −M2λ0)]
δ0β + oP (1).
50
Because we assumed that λ ≤ λ0, if λ does not converge in probability to λ0, then with
probability approaching 1, λ < λ0 . By A3 it follows that M2λ0M−12λ (M2λ −M2λ0)M
−12λ M2λ0
and (M2λ −M2λ0)M−12λ (M2λ −M2λ0) are pd, and because δ0β 6= 0, T 2αL2 + oP (1) > 0. Since
this contradicts T 2αL2P→ 0, it follows that λ
P→ λ0.
• Rate of convergence of λ. We now derive the rate of convergence of λ in a similar fashion
to the rate of convergence of the BP estimator λ. By consistency, for any small ǫ > 0,
T 0 − T ≤ ǫT . Assume that T 0 − T > CT 2α for some large C > 0, and let L1, L2, L3 be the
FGLS objective functions at the breaks λ, λ0 and (λ, λ0) respectively, and the corresponding
FGLS parameter estimators for these breaks. Therefore, the FGLS parameter estimators are
(β1λ, β2λ) for the first objective function, (β1λ0 , β2λ0) for the second, and (β1λ, β∆, β2λ0) for the
third. By similar arguments to part (i), and for C large enough,
βiλ0 − β0i = OP (T
−1/2) and β∆ − β01 = OP (T
−1/2).
Let Miλ = T−1∑
iλ ZtΣ−1v,t Z
′t for i = 1, 2 and M∆ = T−1
∑∆ ZtΣ
−1v,t Z
′t. Letting Q∆ =
1
T 0−T∑
∆ ZtZ′t, similarly to part (i),
(T 0 − T )−1T 2α+1(L1 − L3)
= T α(β∆ − β2λ0)′Q∆T
α(β∆ − β2λ0)− T α(β∆ − β2λ0)′Q∆M
−1
2λM∆T
α(β∆ − β2λ0)
= N1 −N2
(T 0 − T )−1T 2α+1(L2 − L3)
= T α(β∆ − β1λ)′Q∆T
α(β∆ − β1λ)− T α(β∆ − β1λ)′Q∆M
−11λ0T
α(β∆ − β1λ)
= N3 −N4.
Similarly to part (i), it can be shown that N1 dominates N2, N3 and N4 for small ǫ. Moreover,
β∆ − β2λ0 = β01 − β0
2 + OP (T−1/2) = −δ0β + OP (T
−1/2) and β∆ − β1λ = OP (T−1/2), therefore
(T 0 − T )−1T 2α+1(L1 − L2) = δ0′
β M∆δ0β + oP (1) = OP (1) and (T 0 − T )−1T 2α+1(L2 − L3) =
OP (Tα−1/2)M∆OP (T
α−1/2) = oP (1). Hence, (T0 − T )−1T 2α+1(L1 − L2) = N1 + oP (1).
For large enough C, M∆ = Σ−1v ⊗ [M2λ−M2λ0 ] + oP (1), so with positive probability, (T 0 −
T )−1T 2α+1L1 = δ0′
β Σ−1v ⊗[M2λ−M2λ0 ] δ0β+oP (1) > 0. Therefore, (T 0−T )−1T 2α+1(L1−L2) >
0 with positive probability, which cannot happen, because by definition, (T 0− T )−1T 2α+1(L1−L2) ≤ 0. It follows that T 0 − T ≤ CT 2α, so λ− λ0 = OP (T
2α−1).
Proof of Theorem 4.
The proof follows exactly the same steps as the proof for Theorem 3, with α replaced by 0,
and for break-point estimators for equation (4.1) instead of equation (2.5).
Proof of Theorem 5.
• Part (i). Consistency of λ. Unlike the other proofs, here we do not prove consistency by
51
contradiction; instead we show consistency by deriving the limit of the minimand directly and
applying the continuous mapping theorem. Let at = jtu2t and at = jtu
2t . We minimize over λ:
LBP (λ, at) = T−1∑T
t=1(at − µt)2,
where µt = µ1λ1[t ≤ [Tλ]] + µ2λ1[t ≥ [Tλ]], µiλ = 1Tiλ
∑iλ at for i = 1, 2, and jt, at and at are
treated as scalars here because we are analyzing the BP break-point estimator.
Note that at = at+ (at− at) = et+ (at− at) +µ0t , with µ
0t = µ0
11[t ≤ T 0] +µ021[t > T 0]. So,
LBP (λ, at) = T−1∑T
t=1[et + (at − at) + (µ0t − µt)]
2 = T−1∑T
t=1 e2t + T−1
∑Tt=1(at − at)
2
+ T−1∑T
t=1(µ0t − µt)
2 + 2T−1∑T
t=1 et(at − at) + 2T−1∑T
t=1 et(µ0t − µt)
+ 2T−1∑T
t=1(at − at)(µ0t − µt) =
∑6i=1 Li. (B.22)
Note that L1,L2 and L4 do not depend on λ and therefore their limiting behavior is irrel-
evant for consistency, since minλ LBP (λ, at) = minλ[L∗BP (λ, at) = LBP (λ, at)−L1 − L2 −L4].
For L3,
L3 = T−1∑
1λ(µ01 − µ1λ)
2 + T−1∑
∆(µ01 − µ2λ)
2 + T−1∑
2λ0(µ02 − µ2λ)
2
= λ(µ01 − µ1λ)
2 + (λ0 − λ)(µ01 − µ2λ)
2 + (1− λ0)(µ02 − µ2λ)
2. (B.23)
We have:
µ1λ − µ01 = T−1
1λ
∑1λ(at − µ0
1) = T−11λ
∑1λ et + T−1
1λ
∑1λ(at − at)
= OP (T−1/2) + T−1
1λ
∑1λ(at − at). (B.24)
Letting δ = θ0 − θGMM or δ = θ0 − θB−GMM (where θB−GMM was constructed with a SMI
or a VMC break-point estimator), we have:
at − at = jt(u2t − u2t ) = jt(ut − ut)(ut + ut) = jtw
′tδ(2ut + w′
tδ) = 2utjtw′tδ + jt(w
′tδ)
2. (B.25)
Therefore,
T−11λ
∑1λ(at − at) = 2T−1
1λ
∑1λ utjtw
′tδ + T−1
1λ
∑1λ jt(w
′tδ)
2 = 2A1 +A2.
We show below that T−11λ
∑1λ(at−at) = oP (1) u.λ. Letting, as in the paper, subscript i denote
the ith element of a vector, we have:
A1 = T−11λ
∑1λ utjtw
′tδ =
∑p2i=1 T
−11λ
∑1λ utjtwt,iδi =
∑pi=1 T
−11λ
∑1λ utjtwt,iOP (T
α−1/2).
Note that jt = zt,kzt,k∗ for some k, k∗ ∈ 1, . . . , q. Therefore, A1 = oP (1) if we can
show that T α−1/2T−11λ
∑1λ utzt,kzt,k∗wt,i = oP (1). We show this result using Markov’s in-
equality and A11(i). Note that wt,i can be equal to z1t,i or to xt,i1 = z′tΠ0t,i1 + vt,i1 for
i1 = i − p1, and Π0t,i1
is the ith1 column of Π0t = Π1/r1T1[t ≤ T 0] + Π2/r2T1[t > T 0].
52
Since zt is already present in xt,i1 , we do not consider the case wt,i = z1t,i whenever we
deal in this proof with partial sums that involve terms (at − at) because this case adds no
additional insights. Note that z′tΠ0t,i1
=∑q
i2=1 zt,i2Π0t,i1,i2
, where Π0t,i1,i2
is the (i1, i2) ele-
ment of Π0t . Since Π0
t = O(1), showing that T α−1/2T−11λ
∑1λ utzt,kzt,k∗wt,i = oP (1) u.λ. is
equivalent to showing T α−1/2T−11λ
∑1λ utzt,kzt,k∗zt,i2 = T α−1/2T−1
1λ
∑1λ ht,kzt,k∗zt,i2 = oP (1) and
T α−1/2T−11λ
∑1λ utzt,kzt,k∗vt,i1 = T α−1/2T−1
1λ
∑1λ ht,kht,l = oP (1) for some indexes k, k∗, i2, l.
Using Markov’s inequality, followed by the triangle inequality, Holder’s inequality (implicitly
applied twice), and in the last step using the moment conditions in A11(i), there is a constant
C > 0 such that
P (supλ |T α−1/2T−11λ
∑1λ ht,kzt,k∗zt,i2 | > ǫ) ≤ T α−1/2E(supλ |T−1
1λ
∑1λ ht,kzt,k∗zt,i2 |)/ǫ
≤T α−1/2 supt E |ht,kzt,k∗zt,i2 | ≤ T α−1/2 supt ||ht,k||2 supt ||zt,k∗||4 supt ||zt,i2||4/ǫ < T α−1/2C/ǫ→ 0.
Therefore, T α−1/2T−11λ
∑1λ ht,kzt,k∗zt,i2 = oP (1) u.λ. Similarly, T α−1/2T−1
1λ
∑1λ ht,kht,l = oP (1).
So, T α−1/2T−11λ
∑1λ utzt,kzt,k∗wt,i = oP (1) u.λ., implying that A1 = oP (1). By very similar
arguments, using the moment conditions supt ||zt,i||4 < ∞ and supt ||ht,i||2 < ∞ in A11(i),
one can show that A2 = oP (1). Therefore, T−11λ
∑1λ(at − at) = oP (1). Substituting this into
(B.24), we get:
µ1λ − µ01 = OP (T
α−1/2). (B.26)
As for µ2λ, recalling that δ0µ = µ02 − µ0
1, we have:
µ2λ − µ02 = T−1
2λ
∑2λ(at − µ0
2) = T−12λ
∑∆(at − µ0
2) + T−12λ
∑2λ0(at − µ0
2)
= T−12λ
∑∆(et + µ0
1 − µ02 + at − at) + T−1
2λ
∑2λ0(et + at − at)
= T−12λ
∑2λ et − [λ
0−λ1−λ δ
0µ +OP (T
−1)] + T−12λ
∑2λ(at − at)
= OP (T−1/2)− λ0−λ
1−λ δ0µ +OP (T
α−1/2)
= −λ0−λ1−λ δ
0µ +OP (T
α−1/2) = OP (1), (B.27)
where the last equality follows by A11(iii) which states that δ0µ is fixed. Similarly,
µ2λ − µ01 = δ0µ − λ0−λ
1−λ δ0µ +OP (T
α−1/2) = 1−λ01−λ δ
0µ +OP (T
α−1/2) = OP (1). (B.28)
Substituting (B.26)-(B.28) into (B.23), we have:
L3 = λ(µ1λ − µ01)
2 + (λ0 − λ)(µ01 − µ2λ)
2 + (1− λ0)(µ02 − µ2λ)
2
= OP (T2α−1) +OP (1) +OP (1) = OP (1) (B.29)
= oP (1) +[(λ0−λ)(1−λ0)2
(1−λ)2 + (1−λ0)(λ0−λ)2(1−λ)2
](δ0µ)
2 = (λ0−λ)(1−λ0)1−λ (δ0µ)
2 + oP (1). (B.30)
53
As for L5,
L5 = 2T−1∑T
t=1 et(µ0t − µt)
= 2T−1∑
1λ et(µ01 − µ1λ) + 2T−1
∑∆ et(µ
01 − µ2λ) + 2T−1
∑2λ et(µ
02 − µ2λ)
= OP (T−1/2)OP (T
α−1/2) +OP (T−1/2)OP (1) +OP (T
−1/2)OP (1)
= OP (T−1/2). (B.31)
Finally, using (B.26)-(B.28),
L6 = 2T−1∑T
t=1(at − at)(µ0t − µt)
= 2[T−1∑
1λ(at − at)] (µ01 − µ1λ) + 2[T−1
∑∆(at − at)] (µ
01 − µ2λ)
+ 2[T−1∑
2λ(at − at)] (µ02 − µ2λ)
= oP (1)OP (Tα−1/2) + oP (1)OP (1) + oP (1)OP (1)
= oP (Tα−1/2). (B.32)
From (B.29) and (B.31)-(B.32), it is clear that L3 dominates L5 and L6 since it is OP (1).
Because (δ0µ)2 > 0, from (B.30), it follows that u.λ.,
L∗BP (λ, at)
P→ (λ0−λ)(1−λ0)1−λ (δ0µ)
2,
where (λ0−λ)(1−λ0)1−λ (δ0µ)
2 is positive for λ ≤ λ0, and uniquely minimized at λ = λ0. It can be
shown that when λ > λ0, the limit of L∗BP (λ, at) is also minimized at λ = λ0, so λ
P→ λ0.
Part (i). Convergence rate of λ. The proof is similar to the other proofs for rates of
convergence of break-point estimators. We let L1 = LBP (λ, at), L2 = LBP (λ0, at), and L3 =
LBP (λ, λ0, at), where λ ≤ λ0. The corresponding sub-sample mean estimators are µ1λ, µ1λ
for L1, µ1λ0 , µ2λ0 for L2, and µ1λ, µ∆, µ2λ0 for L3. By consistency, T − T 0 < ǫT , and assume
T 0 − T > C for some large enough C > 0. Then using similar arguments to the other proofs,
T (T 0 − T )−1(L1 − L3) = (µ2λ0 − µ∆)2 − (µ2λ0 − µ∆)
2 (λ0−λ)1−λ = N1 −N2
T (T 0 − T )−1(L2 − L3) = (µ1λ − µ∆)2 − (µ1λ − µ∆)
2 (λ0−λ)λ
= N3 −N4.
Because T 0 − T < ǫT , for small enough ǫ, we show below that N1 dominates N2 and N3
dominates N4. Previously, we showed that µ1λ − µ01 = OP (T
α−1/2) = oP (1). Because µ∆
is also a sub-sample estimator of µ01, for C large enough, µ∆ − µ0
1 = OP (Tα−1/2) = oP (1).
Therefore, N3 = (µ1λ − µ∆)2 = [µ1λ − µ0
1 − (µ∆ − µ01)]
2 = OP (T2α−1) = oP (1). On the other
hand, µ2λ0 − µ02 = oP (1), therefore, µ2λ − µ0
1 = δ0µ + oP (1). Therefore, N1 = (µ2λ0 − µ∆)2 =
[δ0µ + oP (1)]2 = (δ0µ)
2 + oP (1) = OP (1). Hence, N1 dominates N2, N3, N4, so with positive
probability for large C and small ǫ,T (T 0 − T )−1(L1 − L2) > 0 because:
T (T 0 − T )−1(L1 − L2) = N1 + oP (1) = (δ0µ)2 + oP (1).
54
This cannot happen because by definition L1 − L2 ≤ 0, so it must be that T 0 − T ≤ C.
Similarly, it can be shown that for T > T 0, T − T 0 < C, therefore |T − T 0| = OP (1).
• Part (ii). Consistency of Σe,t. First, note that now at, at, et and µiλ are now treated as
vectors rather than scalars because we are analyzing λ which is a multivariate break-point
estimator.
We first show that Σe,tP→ Σe. In the first definition of Σe,t, it is estimated over the
full sample: Σe,t = Σe = T−1∑
1λ(at − µ1λ)(at − µ1λ)′ + T−1
∑2λ(at − µ2λ)(at − µ2λ)
′ =
T−1∑T
t=1(at − µtλ)(at − µtλ)′, where µtλ = µ1λ1[t ≤ T ] + µ2λ1[t > T ]. Following the same
steps as in the consistency proof for λ, one can show that:
Σe = T−1∑T
t=1 ete′t + T−1
∑Tt=1(at − at)(at − at)
′
+ T−1∑T
t=1(µ0t − µtλ)(µ
0t − µtλ)
′ + 2T−1∑T
t=1 et(at − at)′ + oP (1)
= L1 + L2 + L3 + 2L4 + oP (1),
where we already showed that L3 =(λ0−λ)(1−λ0)
1−λ (δ0µ)2 + oP (1) = OP (1). Therefore,
Σe = L1 + L2 + 2L4 + oP (1). (B.33)
First analyze L1. By A11(i)-(ii), and the weak law of large numbers (WLLN) for near-epoch
dependent processes,
L1 = T−1∑T
t=1 ete′t = Σe + oP (1) = OP (1).
Next, we analyze L2. From (B.25),
L2 = T−1∑T
t=1(at − at)(at − at)′ = T−1
∑Tt=1[2utjt(w
′tδ) + jt(w
′tδ)
2][2utjt(w′tδ) + jt(w
′tδ)
2]′
= 4T−1∑T
t=1 u2t jtj
′t(w
′tδ)
2 + 4T−1∑T
t=1 utjtj′t(w
′tδ)
3 + T−1∑T
t=1 jtj′t(w
′tδ)
4
= 4A3 + 4A4 +A5.
By Markov’s inequality and repeated application of Holder’s inequality, very similar to showing
T−11λ
∑Tt=1(at − at) = oP (1), it can be shown that A3 = oP (1), A4 = oP (1) and A5 = oP (1)
(the only difference is that to do so, we need the moment conditions supt ||zt,i||8 < ∞, and
supt ||ht,i||4 < ∞). Therefore, L2 = oP (1). Similarly, using ||et,i||4 < ∞, it can be shown that
L4 = T−1∑T
t=1 et(at − at)′ = oP (1).
Since L1 = Σe + oP (1), L2 = oP (1), and L4 = oP (1), from (B.33), we have:
Σe = Σe + oP (1).
Similarly, it can be shown that because λ− λ0 = OP (T−1), we have that Σe,i = T−1
∑iλ(at −
µiλ)(at − µiλ) = T−1∑
iλ0(at − µiλ)(at − µiλ) = Σe + oP (1), therefore Σe,t = Σe,11[t ≤ T ] +
Σe,21[t > T ] = Σe + oP (1). This concludes the proof of consistency of Σe,t.
55
• Part (ii). Consistency of λ. The proof largely follows the same steps as the proof for the
consistency of λ. Let µt = µ1λ1[t < T1λ] + µ2λ1[t > T1λ], where µiλ = (∑
iλ Σe,t)−1∑
iλ Σe,tat
are the GLS estimators of the mean of at before and after [Tλ] = T1λ. The FGLS objective
function evaluated at the FGLS parameter estimators µiλ is:
LFGLS(λ, at) = T−1∑T
t=1[et + (at − at) + (µ0t − µt)]
′Σ−1e,t [et + (at − at) + (µ0
t − µt)]
= T−1∑T
t=1 e′tΣ
−1e,t et + T−1
∑Tt=1(at − at)
′Σ−1e,t (at − at)
+ T−1∑T
t=1(µ0t − µt)
′Σ−1e,t (µ
0t − µt) + 2T−1
∑Tt=1 e
′tΣ
−1e,t (at − at)
+ 2T−1∑T
t=1 e′tΣ
−1e,t (µ
0t − µt) + 2T−1
∑Tt=1(at − at)Σ
−1e,t (µ
0t − µt) =
∑6i=1 Li.
As before, L1,L2 and L4 do not depend on λ, so
minλ
LFGLS(λ, at) = minλ
[L∗FGLS(λ, at) = L3 + L5 + L6]
Because Σe,t = Σe + oP (1), we can follow the same steps as for the proof of consistency of λ
to show that L3 = OP (1), while L5 = oP (1) and L6 = oP (1) and are therefore dominated by
L3. We can also follow the same steps as before to show that L3 is
L3 =(λ0−λ)(1−λ)
1−λ0 δ0′
µ Σ−1e δ0µ + oP (1),
uniformly in λ < λ0. Since Σe is pd, Σ−1e is pd, and because δ0µ 6= 0, δ0
′
µ Σ−1e δ0µ > 0. Therefore,
(λ0−λ)(1−λ)1−λ0 δ0
′
µ Σ−1e δ0µ is uniquely minimized at λ = λ0. It follows that λ
P→ λ0.
• Part (ii). Rate of convergence of λ. Using Σe,tP→ Σe, the proof follows the same steps as
the proof for convergence of λ and is omitted for simplicity.
Proof of Theorem 6.
• Part (i). We start by analyzing λ = λBPRF , and therefore we treat Xt as a scalar. From
the RF with Π = Πi and α = αi for i = 1, 2, we have:
λ = minλ∈I LBP(λ, Πiλ
)= minλ∈I T
[LBP
(λ, Πiλ
)− LBP (Π)
]+ oP (1),
where LBP (Π) is the full sample RF SSR evaluated at the full sample estimator of Π0 = Π/rT
which we denote by Π.
T[LBP
(λ, Πiλ
)−LBP (Π)
]
=∑2
i=1
∑iλ(Xt − Z ′
tΠiλ)2 −∑T
t=1(Xt − Z ′tΠ)
2
=∑
1λ[vt − Z ′t(Π1λ − Π0)]2 +
∑2λ[vt − Z ′
t(Π2λ −Π0)]2 −∑Tt=1[vt − Z ′
t(Π− Π0)]2
= −2∑
1λ vtZ′t(Π1λ − Π0) + (Π1λ − Π0)′ (
∑1λ ZtZ
′t) (Π1λ − Π0)
− 2∑
2λ vtZ′t(Π2λ − Π0) + (Π2λ − Π0)′ (
∑2λ ZtZ
′t) (Π2λ − Π0)
+ 2∑T
t=1 vtZ′t(Π− Π0)− (Π− Π0)′
(∑Tt=1 ZtZ
′t
)(Π− Π0)
= −∑1λ vtZ′t(Π1λ − Π0)−∑2λ vtZ
′t(Π2λ − Π0) +
∑Tt=1 vtZ
′t(Π− Π0).
56
T[LBP
(λ, Πiλ
)− LBP (Π)
]= −
(T−1/2
∑1λ vtZ
′t
)(T−1
∑1λ ZtZ
′t)
−1T−1/2 (
∑1λ Ztvt)
−(T−1/2
∑2λ vtZ
′t
)(T−1
∑2λ ZtZ
′t)
−1 (T−1/2
∑2λ Ztvt
)
+(T−1/2
∑Tt=1 vtZ
′t
)(T−1
∑Tt=1 ZtZ
′t
)−1 (T−1/2
∑Tt=1 Ztvt
). (B.34)
By A1, A4-A6 and the FCLT, T−1/2∑
1λ Ztvt ⇒ N∗1/2v Bh,q+1:2q(λ) and T−1/2
∑2λ Ztvt ⇒
N∗1/2v [Bh,q+1:2q(1) − Bh,q+1:2q(λ)], where Bh(λ) and N∗
v were defined at the beginning of sec-
tion 5. By A3 and A6, T−1∑
1λ ZtZtP→ λQ and T−1
∑2λ ZtZt
P→ (1 − λ)Q, u.λ, and
T−1∑T
t=1 ZtZtP→ Q. Substituting these into (B.34),
T[LBP
(λ, Πiλ
)− LBP (Π)
]⇒ −λ−1Bh,q+1:2q(λ)
′N∗1/2v Q−1N
∗1/2v Bh,q+1:2q(λ)
− (1− λ)−1[Bh,q+1:2q(1)−Bh,q+1:2q(λ)]′N
∗1/2v Q−1N
∗1/2v [Bh,q+1:2q(1)− Bh,q+1:2q(λ)]
+B′h,q+1:2q(1)N
∗1/2v Q−1N
∗1/2v Bh,q+1:2q(1)
= −[λ(1 − λ)]−1[Bh,q+1:2q(λ)− λBh,q+1:2q(1)]
′(N∗v )
1/2Q−1(N∗v )
1/2[Bh,q+1:2q(λ)− λBh,q+1:2q(1)].
Letting Φ = (N∗v )
1/2Q−1(N∗v )
1/2, we have:
λBPRF ⇒ arg infλ∈Λǫ −[λ(1 − λ)]−1 [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]′ Φ [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]
= arg supλ∈Λǫ[λ(1− λ)]−1 [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]
′ Φ [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]= D(Bh,q+1:2q(λ),Φ).
• Part (i). We continue by analyzing λRF . Since there is no RF break, βiλ are subsample
estimators of β0 = β01 = β0
2 . Therefore, by standard arguments, it can be shown that Σv,tP→ Σv
u.λ. regardless of what candidate break-point λ is used to compute Σv,t. Therefore, Σv,t =
Σv + oP (1) also when λ = λBPRF is used to construct Σv,t. Letting λ = λRF , we have:
λ = minλ∈I LFGLS(λ, βiλ
)= minλ∈I T
[LFGLS
(λ, βiλ
)−LFGLS(β)
],
where LFGLS(β) is the full sample RF FGLS objective function evaluated at the full sample
estimator of β0 = vec(Π0), which we denote by β.
T[LFGLS
(λ, Πiλ
)−LFGLS(Π)
]
=∑2
i=1
∑iλ(Xt − Z ′
tβiλ)′Σ−1
v,t (Xt − Z ′tβiλ)−
∑Tt=1(Xt − Z ′
tβ)′Σ−1
v,t (Xt − Z ′tβ).
Using similar calculations as for the BP estimator and Σv,t = Σv + oP (1), one can show that:
T[LFGLS
(λ, βiλ
)− LFGLS(β)
]
= −∑2i=1
(T−1/2
∑iλ vt ⊗ Zt
)′[Σ−1
v ⊗ (T−1∑
iλ ZtZ′t)
−1]T−1/2 (∑
iλ vt ⊗ Zt)
+(T−1/2
∑Tt=1 vt ⊗ Zt
)′[Σ−1
v ⊗ (T−1∑T
t=1 ZtZ′t)
−1](T−1/2
∑Tt=1 vt ⊗ Zt
)+ oP (1)
57
= −(T−1/2
∑1λ vt ⊗ Zt
)′[Σ−1
v ⊗ (λQ)−1]T−1/2 (∑
1λ vt ⊗ Zt)
−(T−1/2
∑2λ vt ⊗ Zt
)′ Σ−1v ⊗ [(1− λ)Q]−1T−1/2 (
∑2λ vt ⊗ Zt)
+(T−1/2
∑Tt=1 vt ⊗ Z ′
t
)[Σ−1
v ⊗Q−1]T−1/2(∑T
t=1 vt ⊗ Zt
)+ oP (1)
= T−1/2(∑
1λ vt ⊗ Zt − λ∑T
t=1 vt ⊗ Zt
)′(Σ−1
v ⊗Q−1)T−1/2(∑
1λ vt ⊗ Zt − λ∑T
t=1 vt ⊗ Zt
)
+ oP (1).
Under A1, A4 and A6, by the FCLT, T−1/2∑
1λ vt⊗Zt ⇒ N1/2v Bh,q+1:(p2+1)q(λ). Let Bvz(λ) =
Bh,q+1:(p2+1)q(λ). Then
[LFGLS
(λ, βiλ
)− LFGLS(β)
]⇒ −[Bvz(λ)− λBvz(1)]
′N1/2v (S−1
v ⊗Q−1)N1/2v [Bvz(λ)− λBvz(1)],
so
λ⇒ arg supλ∈I [λ(1− λ)]−1[Bvz(λ)− λBvz(1)]′N
1/2v (S−1
v ⊗Q−1)N1/2v [Bvz(λ)− λBvz(1)]
= D(Bvz(λ), N1/2v (S−1
v ⊗Q−1)N1/2v ).
Note that under conditional homoskedasticity, Nv = Sv ⊗Q, so N1/2v (S−1
v ⊗Q−1)N1/2v = Ip2q.
• Part (ii). This part can be shown by exactly the same steps as in part (i) and is omitted.
• Part (iii). The proof for a VMC break is slightly more complicated because of the estimation
error (at − at). We start with λ = λBPVMC , therefore treating at = jtu2t as a scalar. As before,
λ = argminλ∈[ǫ,1−ǫ] T [LBP (λ, µiλ, at)−LBP (µ, at)],
where LBP (λ, µiλ, at) is the BP objective function from estimating (4.2), using at = jtu2t
instead of at = jtu2t , LBP (µ, at) is the OLS objective function obtained from estimating (4.2)
over the full sample, using at instead of at, and µ = T−1∑T
t=1 at. We have:
T [LBP (λ, µiλat)− LBP (µ, at)] =∑2
i=1
∑iλ(at − µiλ)
2 −∑Tt=1 (at − µ)2
=∑2
i=1
∑iλ(µ− µiλ)(2at − µiλ − µ) =
∑2i=1(µ− µiλ)Tiλ(µiλ − µ).
Now, µ = T−1∑T
t=1 at = T−1∑2
i=1
∑iλ at = λµ1λ + (1 − λ)µ2λ. Therefore, µ − µ1λ =
(1− λ)(µ2λ − µ1λ), and µ− µ2λ = λ(µ1λ − µ2λ). Therefore,
T [LBP (λ, µiλ, at)−LBP (µ, at)] = −λ(1− λ)2T (µ1λ − µ2λ)2 − λ2(1− λ)T (µ1λ − µ2λ)
2
= −λ(1− λ)T (µ1λ − µ2λ)2. (B.35)
Let µ0 = µ01 = µ0
2. Because at = at − at + µ0 + et,
T 1/2(µ1λ − µ2λ) = T−11λ
∑1λ at − T−1
1λ
∑1λ at
= T 1/2[T−11λ
∑1λ(at − at)− T−1
1λ
∑1λ(at − at)
]+ T 1/2
[T−11λ
∑1λ et − T−1
2λ
∑2λ et
]
= B1 + B2. (B.36)
58
We first show that B1 = oP (1) u.λ. To that end, recall that jt = zt,kzt,k∗ for some k, k∗ ∈1, . . . , q. Then:
T−1/2∑
1λ(at − at) = T−1/2∑
1λ utjtw′tδ + T−1/2
∑1λ jt(w
′tδ)
2
=∑p
i=1 T−1/2
∑1λ ht,kzt,kwt,iδi +
∑pi,j=1 T
−1/2∑
1λ ht,kzt,kwt,iwt,j δiδj
≡∑pi=1 L∗
i δi +∑p
i,j=1L∗i,j δiδj. (B.37)
We now analyze L∗i and L∗
i,j in turn. By A12,
L∗i = T−1/2
∑1λ ht,kzt,kwt,i = T−1/2
∑1λ[ht,kzt,kwt,i − E(ht,kzt,kwt,i)] + T−1/2
∑1λ E(ht,kzt,kwt,i)
= T−1/2∑
1λ[ht,kzt,kwt,i − E(ht,kzt,kwt,i)] + λT 1/2ℓ(1)i + o(1). (B.38)
By A12, [ht,kzt,kwt,i − E(ht,kzt,kwt,i)] is a mean-zero mixing process with the rates defined
in A12. If supt ||ht,kzt,kwt,i − E(ht,kzt,kwt,i)||a < ∞ for some a > 2, then by the FCLT for
mixing processes (special case of the FCLT in Wooldridge and White (1988), Theorem 2.11),
T−1/2∑
1λ[ht,kzt,kwt,i−E(ht,kzt,kwt,i)] = OP (1). We will only consider the case wt,i = xt,i1 , for
i1 = i − p1, because the case wt,i = z1t,i leads to no additional insights. We now verify that
supt ||ht,kzt,kxt,i1 − E(ht,kzt,kxt,i1)||a <∞ for a defined in A12. By the triangle inequality and
Holder’s inequality, for some s ∈ q + 1, . . . , (p2 + 1)q, we have:
supt ||ht,kzt,kxt,i1 − E(ht,kzt,kwt,i1)||a ≤ 2 supt ||ht,kzt,kxt,i1 ||a= 2 supt ||
∑qi2=1 ht,kzt,kzt,i1Π
0t,i1,i2 + ht,kzt,kvt,i1 ||a = 2 supt ||
∑qi2=1 ht,kzt,kzt,i1Π
0t,i1,i2 + ht,kht,s||a
≤ 2∑q
i2=1 supt ||ht,kzt,kzt,i1 ||a |Π0t,i1,i2
|+ 2 supt ||ht,kht,s||a≤ 2
∑qi2=1 supt ||ht,k||2c supt ||zt,k||4a supt ||zt,i1||4a |Π0
t,i1,i2|+ 2 supt ||ht,k||2a supt ||ht,s||2a <∞.
Therefore, T−1/2∑
1λ[ht,kzt,kwt,i − E(ht,kzt,kwt,i)] = OP (1). Using this into (B.38), we have:
L∗i = OP (1) + λT 1/2ℓ
(1)i .
Similarly, using again A12, it can be shown that:
L∗i,j = OP (1) + λT 1/2ℓ
(2)i,j .
Substituting these last two equations into (B.37), we have:
T−1/2∑
1λ(at − at) =∑p
i=1OP (1)δi + λT 1/2∑p
i=1 ℓ(1)i δi +
∑pi,j=1OP (1)δiδj + λT 1/2
∑pi,j=1 ℓ
(2)i,j δiδj
= oP (1) + λT 1/2∑p
i=1 ℓ(1)i δi + λT 1/2
∑pi,j=1 ℓ
(2)i,j δiδj
Similarly, it can be shown that:
T−1/2∑
2λ(at − at) = oP (1) + (1− λ)T 1/2∑p
i=1 ℓ(1)i δi + (1− λ)T 1/2
∑pi,j=1 ℓ
(2)i,j δiδj.
59
Therefore,
B1 = T 1/2[T−11λ
∑1λ(at − at)− T−1
1λ
∑1λ(at − at)
]
= oP (1) + T 1/2∑p
i=1 ℓ(1)i δi + T 1/2
∑pi,j=1 ℓ
(2)i,j δiδj
− oP (1)− T 1/2∑p
i=1 ℓ(1)i δi − T 1/2
∑pi,j=1 ℓ
(2)i,j δiδj = oP (1),
u.λ. Therefore, the estimation error B1 plays no role in the asymptotic distribution of the
VMC break-point estimator. On the other hand, by A11(i)-(ii) and the FCLT, T−1/2∑
1λ et ⇒Σ
∗1/2e Be,1(λ), where Σ∗
e is the (1,1) element of Σe, a scalar. Therefore,
B2 = T 1/2[T−11λ
∑1λ et − T−1
2λ
∑2λ et
]= λ−1T−1/2
∑1λ et − (1− λ)−1
∑2λ et + oP (1)
= [λ(1− λ)]−1T−1/2(∑
1λ et − λ∑T
t=1 et
)+ oP (1). (B.39)
Substituting B1 = oP (1) and (B.39) into (B.36), we have that T 1/2(µ1λ− µ2λ) = oP (1)+[λ(1−λ)]−1
(T−1/2
∑1λ et − λ
∑Tt=1 et
). Using this in (B.35), we have:
− T [LBP (λ, µiλ, at)−LBP (µ), at]
= [λ(1− λ)]−1[T−1/2
(∑1λ et − λ
∑Tt=1 et
)]2+ oP (1) (B.40)
⇒ Σ∗e [λ(1− λ)]−1[Be,1(λ)− λBe,1(1)]
2.
Because the scalar Σ∗e plays no role in maximizing −T [LBP (λ, µiλ, at)−LBP (µ, at)] over λ,
λ⇒ argmaxλ∈[ǫ,1−ǫ] [λ(1− λ)]−1[Be,1(λ)− λBe,1(1)]2 = D(Be,1(λ), 1).
• Part (iii). Similarly to the proof in part (i), it can be shown that Σe,tP→ Σe. Then, treating
et, at, at as vectors and following the same steps as for the BP VMC break-point estimator, it
can be shown that:
T [LFGLS(λ, µiλ)− LFGLS(µ)]
= −[λ(1− λ)]−1[T−1/2
(∑1λ et − λ
∑Tt=1 et
)]′Σ−1e
[T−1/2
(∑1λ et − λ
∑Tt=1 et
)]+ oP (1)
λVMC ⇒ argmaxλ∈[ǫ,1−ǫ] [λ(1− λ)]−1[Be(λ)− λBe(1)]′[Be(λ)− λBe(1)] = D(Be(λ), Iq(q+1)/2).
Proof of Theorem 7.
• Part (i). This result is a special case of the limit distribution derived in Theorem 1. In
Theorem 1, imposing A6 and no RF break ( Πa = Πai , α = αi), we have:
VGMM =[(Πa′
1 M1 +Πa′
2 M2)(Nu)−1(M1Π
a1 +M2Π
a2)]−1
=[(Πa′λQ +Πa′(1− λ)Q)(Nu)
−1(λQΠa + (1− λ)QΠa)]−1
=[Πa′QN−1
u QΠa]−1
= V ∗.
60
• Part (ii). Under no break, using (B.1) but with Z defined just after (3.1) replacing Z, we
have:
θB−GMM − θ0 = RT [RTT−1W ′Z(Na
u)−1Z ′WT−1RT ]
−1RTT−1W ′Z(Na
u)−1[T−1Z ′U ] (B.41)
= [RTT−1W ′Z(Na
u)−1Z ′WT−1RT ]
−1RTT−1W ′Z(Na
u)−1[rTT
−1Z ′U ] + oP (1).
By the FCLT, T−1/2∑
iλ Ztut = OP (1), so rTT−1∑
iλ Ztut = oP (1) u. λ. So, rTT−1Z ′u =
[rTT−1∑
1λ Z′tut, rTT
−1∑
2λ Z′tut]
′ = oP (1). As in the proof of Theorem 1, RTT−1W ′Z =
OP (1), and (Nau)
−1 = OP (1), so θB−GMMP→ θ0. We first state the limit distribution of B-
GMM conditional on the break-fraction estimator (which we call τ , and with could be either
λBPRF or λRF ). Following the same steps as for Theorem 1(ii), under A1, A3, and A4,
ΛT (θB−GMM − θ0)D→ N (0, Vτ),
where Vτ =[Πa′(M1τN
−1u,1τM1τ +M2τN
−1u,2τM2τ )Π
a]−1
.
By A6, M1τ = τQ, M2τ = (1 − τ)Q, Nu,1τ = τNu and Nu,2τ = (1 − τ)Nu. Therefore,
Vτ =[Πa′QN−1
u QΠa]−1
= V ∗, and ΛT (θB−GMM − θ0)D→ N (0, V ∗). We now show that τ
and ΛT (θB−GMM − θ0) are asymptotically independent. We only show this for the BP break-
fraction estimator, τ = λ; the proof for τ = λ follows by very similar arguments. Using the
fact that T−1∑
1λ ZtZtP→ λQ, T−1
∑2λ ZtZt
P→ (1 − λ)Q u.λ. and T−1∑T
t=1 ZtZtP→ Q into
(B.34), and rearranging terms, we obtain:
− T[LBP
(τ, Πiλ
)− LBP (Π)
]
= [λ(1− λ)]−1(T−1/2
∑1τ Ztvt − τ
∑Tt=1 Ztvt
)′Q−1
(T−1/2
∑1τ Ztvt − τ
∑Tt=1 Ztvt
)+ oP (1).
On the other hand, using (B.41) and Assumption A6,
ΛT (θB−GMM − θ0) = [(V ∗)−1Πa′QN−1u ]T−1/2
∑Tt=1 Ztut.
Now,(T−1/2
∑1τ Ztvt − τ
∑Tt=1 Ztvt
)and T−1/2
∑Tt=1 Ztut have asymptotic covariance τNuv−
τNuv = 0 under A6, and are jointly asymptotically normally distributed by the FCLT for all τ
(applied under A1, A4 and A6). Therefore, they are asymptotically independent, and hence,
so is λ and ΛT (θB−GMM − θ0). Therefore, the unconditional limit distribution of B-GMM is
the same as the limit distribution conditional on τ : ΛT (θB−GMM − θ0)D→ N (0, V ∗).
• Part (iii). Let τ = λBPSMI or τ = λBPVMC . Then following the same steps as in part (ii),
ΛT (θB−GMM − θ0)D→ N (0, V ∗) conditional on τ . From (B.40) (or a similar equation for the
SMI break-point estimator), we have:
− T [LBP (τ, µiλ)− LBP (µ)] = [λ(1− λ)]−1(T−1/2
∑1τ et − τ
∑Tt=1 et
)2+ oP (1).
Note tha T−1/2∑T
t=1 ztut and T−1/2
(∑1τ et − τ
∑Tt=1 et
)have asymptotic covariance τNue −
τNue = 0 under A13, and are jointly asymptotically normally distributed by the FCLT for
61
all τ - under A1 and either A10(i)-(ii) or A11(i)-(ii). Therefore, they are asymptotically inde-
pendent, so λBPSMI and θB−GMM are asymptotically independent, and λBPVMC and θB−GMM are
asymptotically independent. Therefore, also unconditionally, ΛT (θB−GMM − θ0)D→ N (0, V ∗).
A similar argument holds for λSMI and λVMC .
C Extended framework
In this section, we consider the extended framework of section 7.4 where the moment conditions
become nonlinear in the parameters of interest, but still linear with respect to (nonlinear)
transformations of these parameters. Specifically, using the notations introduced in section 2,
we consider the following model,
yt = Z ′1tθ
0z1 +X ′
tθ0x + ut =W ′
tθ0 + ut
X ′t = Z ′
tΠ∗ + v′t
where we are not directly interested in estimating the vector of p parameters θ0, but rather
some underlying (structural) parameters, say η0, that are connected to θ0 by non-linear trans-
formations. Similarly to the application considered in section 7.4, we consider
θ0 = f(η0) with
(θ0z1θ0x
)=
(f1(η
0)
f2(η0)
),
where f(.) is a non-linear vectorial function and η0 is a vector of pη parameters of interest (e.g.
pη ≤ p) with ∂f(η0)/∂η′ full column rank matrix. The associated standard optimal GMM
estimator minimizes
LGMM(η) = gT (η)′N−1
u gT (η) with gT (η) =1
T
T∑
t=1
Zt(yt −W ′tf(η)) ,
where NuP→ Nu = AVar[T 1/2gT (η
0)]. The moment conditions remain affine with respect to a
(nonlinear) transformation of the parameters,
gT (η) = AT +BTf(η) .
This feature greatly simplifies our analysis which remains close to the linear model already
discussed. For example, the probability limit of the GMM minimand writes
PlimT
[12
∂L2
GMM (η0)
∂η∂η′
]= ∂f ′(η0)
∂η
(PlimTT
−1∑T
t=1WtZ′t
)N−1u
(PlimTT
−1∑T
t=1 ZtW′t
)∂f(η0)∂η′
+[
∂2
∂η∂η′f ′(η0)
] (PlimTT
−1∑T
t=1WtZ′t
)N−1u gT (η
0).
Under our maintained assumption of stability of the structural parameter η0 over time, the
limiting Hessian changes, in general, whenever Π∗, Q, or Nu change over the sample, which
62
is similar to the linear framework considered in section 2. Hence, similarly to the linear
framework, the associated change-points (or breaks) can be interacted with the instruments
to construct more moment conditions and more efficient GMM estimators for η0 as we show
below.
We focus on the model of section 3.1 where the RF has one break-point at TRF . The
B-GMM estimator of η0 is then defined as
ηB−GMM = argminη
[g′T (η)(N
au)
−1gT (η)]
(C.1)
with
gT (η) =
( ∑TRF
t=1 Zt(yt −W ′tf(η))/TRF∑T
t=TRF+1 Zt(yt −W ′tf(η))/(T − TRF )
)and Na
uP→ AVar(
√T gT (η
0)) ,
while the (standard) full-sample GMM of η0 is defined as
ηGMM = argminη
[g′T (η)(Nu)
−1gT (η)]
(C.2)
with
gT (η) =
T∑
t=1
Zt(yt −W ′tf(η))/T and Nu
P→ AVar(√TgT (η
0)) .
Corollary A 1. (Extensions of Theorems 1 and 2)
Under Assumptions A1 to A5, the B-GMM estimator ηB−GMM of η0 defined in (C.1) is at
least as efficient as the standard (full-sample) GMM estimator ηGMM defined in (C.2).
Proof of Corollary A1.
(i) Consistency of ηGMM and ηB−GMM :
- The consistency of ηGMM follows almost directly from Theorem 2.1 in Antoine and Renault
(2012; AR hereafter). It requires a slight extension of the AR framework. We now define and
adapt AR key quantities to the above framework:
ΨT (η) ≡√T (gT (η)−mT (η))
where
mT (η) = E(
1T
∑Tt=1 gt(η)
)with gt(η) = Zt(yt −W ′
tf(η))
= 1T
∑Tt=1E(ZtZ
′1t)(f1(η
0)− f1(η)) +1T
∑Tt=1E(ZtX
′t)(f2(η
0)− f2(η))
= 1T
∑Tt=1E(ZtZ
′1t)(f1(η
0)− f1(η)) + E(Z ′Z/T )
(Π1/T
α1
Π2/Tα2
)(f2(η
0)− f2(η))
where recall that Z is the (T, q) matrix with rows Z ′t and Z is the (T, 2q) matrix defined as
Z ′ =
(Z ′
1 · · · Z ′TRF
0 · · · 0
0 · · · 0 Z ′TRF+1 · · · ZT
). It is useful to rewrite mT (·) as follows:
mT (η) = m1(η) +1
rTm2(η)
63
with m1(η) =1T
∑Tt=1 E(ZtZ
′1t)(f1(η
0)− f1(η)) , rT = Tminα1,α2
and m2(η) =
E(Z ′Z/T )
(Π1
Π2
)(f2(η
0)− f2(η)) when α1 = α2
E(Z ′Z/T )
(Π1
0
)(f2(η
0)− f2(η)) + o(Tα1−α2) when α1 < α2
E(Z ′Z/T )
(0
Π2
)(f2(η
0)− f2(η)) + o(Tα2−α1) when α1 > α2
As a result, we have:
ΨT (η) =√T(gT (η)− ΛT√
Tρ(η)
)with ΛT =
[√TIq
...√TrTIq
]and ρ(η) =
(m1(η)
m2(η)
).
Strictly speaking, it is a slight extension of the AR framework who consider a square diagonal
matrix ΛT . The weak consistency of ηGMM follows directly from Theorem 2.1 in AR under
their assumptions 1 and 2. Assumption 1 in AR is an identification assumption, while AR’s
assumption 2 maintains a functional CLT on ΨT and sufficiently strong identification:
• ρ(η) = 0 ⇔ η = η0;
• ΨT (η) weakly converges towards a Gaussian stochastic process with mean zero;
• ΛT is deterministic; its minimal coefficient λT and maximal coefficient λT are such that
limTλT = ∞ and lim
TλT/
√T <∞ .
The regularity assumptions A1-A4 maintained in the present paper ensures that the AR
assumptions 1 and 2 hold; therefore, the consistency of ηGMM follows.
- The consistency of ηB−GMM follows from Theorem 2.1 in AR and the proof of Theorem 1 in
this paper. Recall that the definition of ηB−GMM requires replacing TRF (the unknown break-
point) by TRF (the estimated break-point). Therefore, we can follow the proof of Theorem 1
to show that it has no effect on the asymptotic properties of ηB−GMM . It remains to justify
that for given TRF (assumed known), we are back to AR’s (slightly extended) framework
introduced above. We start by rewriting ηB−GMM accordingly:
ηB−GMM(TRF ) = argminη
[g′T (η)(N
au)
−1gT (η)]
with
gT (η) =
( ∑TRF
t=1 Zt(yt −W ′tf(η))/TRF∑T
t=TRF+1 Zt(yt −W ′tf(η))/(T − TRF )
)and Na
uP→ AV ar(
√T gT (η
0)) .
Following the computations done in the above GMM case, we can now introduce AR key
quantities in the B-GMM case:
ΨT (η) =√T(gT (η)− ΛT√
Tρ(η)
)with ΛT =
[√TI2q
...√TrTI2q
], ρ(η) =
(m1(η)
m2(η)
),
64
and m1(η) =
( ∑1λ0 E(ZtZ
′1t)/TRF∑T
t=1+TRFE(ZtZ
′1t)/(T − TRF )
)(f1(η)− f1(η
0))
m2(η) =
( ∑1λ0 E(ZtZ
′t)Π1/TRF∑T
t=1+TRFE(ZtZ
′t)Π2/(T − TRF )
)(f2(η)− f2(η
0)) when α1 = α2
( ∑1λ0 E(ZtZ
′t)Π1/TRF
0
)(f2(η)− f2(η
0)) + o(T α1−α2) when α1 < α2
(0
1T−TRF
∑Tt=1+TRF
E(ZtZ′t)Π2
)(f2(η)− f2(η
0)) + o(T α1−α2) when α1 > α2
The regularity assumptions A1-A4 maintained in the present paper ensure that the AR
assumptions 1 and 2 hold on the newly defined ΨT (η) and ρ(η). And the consistency of
ηGMM(TRF ) follows.
(ii) Asymptotic theory of ηB−GMM and ηGMM :
We now derive the asymptotic distribution of ηB−GMM . We start with a mean-value expansion
of the moment conditions gT around η0,
gT (ηB−GMM ) = gT (η0) +
∂gT (η)
∂η′(ηB−GMM − η0)
with η between ηB−GMM and η0. We substitute it into the first order conditions,
∂g′T (ηB−GMM )
∂η(Na
u)−1gT (ηB−GMM) = 0 ,
to obtain
∂g′T (ηB−GMM )
∂η(Na
u)−1[gT (η
0) + ∂gT (η)∂η′
(ηB−GMM − η0)]= 0
⇒[∂g′T (ηB−GMM )
∂η(Na
u)−1 ∂g
′
T (η)
∂η
](ηB−GMM − η0) = −∂g′T (ηB−GMM )
∂η(Na
u)−1gT (η
0)
⇒ ∂f ′(ηB−GMM )∂η
[W ′Z(Na
u)−1Z ′W
]∂f(η)∂η′
(ηB−GMM − η0)
= −∂f ′(ηB−GMM )∂η
W ′Z(Nau)
−1Z ′U . (C.3)
since∂gT (η)
∂η′=
[ ∑TRF
t=1 ZtW′t/TRF∑T
t=TRF+1 ZtW′t/(T − TRF )
]∂f(η)
∂η′= Z ′W
∂f(η)
∂η′
where we use Z introduced in the consistency proof (i) above. We use (B.12) and the following,
RTW′Z = K + op(1) with RT =
(Ip1 0
0 rT Ip2
)and
√T Z ′U
D→ N (0, Nau) ,(C.4)
to rewrite (C.3) as:
∂f ′(ηB−GMM )∂η
R−1
T
[RTW
′Z(Nau)
−1Z ′WRT
]R
−1
T∂f(η)∂η′
√T (ηB−GMM − η0) (C.5)
= −∂f ′(ηB−GMM )
∂ηR
−1
T RTW′Z(Na
u)−1√T Z ′U (C.6)
65
Recall that the matrix RT is related to the identification strength of the moment conditions,
and as such to the rates of convergence of the estimators of f(η0) (that is θ0). In order to
conclude, we need to introduce another rescaling matrix which will play a role similar to RT ,
but that will be tied to the identification strength of estimators of η0, as explained in AR.
The general result concerning the existence of such rescaling matrix follows from Antoine
and Renault (2012). However, it is not straightforward to explicitly obtain such a rescaling
matrix in the general case. In what follows, we work in two special cases of interest to gain
some intuition: (i) we start with the special case of strong identification where such a rescaling
matrix is not needed; (ii) we focus on the mixed identification strength cases that are plausible
for our application in section 7.4.
(a) Case of strong identification where RT = Ip:
Starting from (C.6) after replacing RT by the identity matrix, we get:
∂f ′(ηB−GMM )∂η
[W ′Z(Na
u)−1Z ′W
]∂f(η)∂η′
√T (ηB−GMM − η0)
= −∂f ′(ηB−GMM )∂η
W ′Z(Nau)
−1√T Z ′U (C.7)
Using (C.4) and the consistency of ηB−GMM , we have:
∂f ′(ηB−GMM )
∂η
[W ′Z(Na
u)−1Z ′W
] ∂f(η)∂η′
P→ K(Nau)
−1K ′ with K =∂f ′(η0)
∂ηK ,
and K(Nau)
−1K ′ an invertible matrix of size pη. Then, it follows that,
√T (ηB−GMM − η0)
D→ N (0, VB−GMM)
with
VB−GMM =[K(Na
u)−1K ′
]−1
=[Πa′
1 M1(Nu,1)−1M ′
1Πa1 + Πa′
2 M2(Nu,2)−1M ′
2Πa2
]−1
,
Πai = Πa
i ∂f(η0)/∂η′, and Πa
i (i = 1, 2) defined in Theorem 1. Similarly, we can show that:
√T (ηGMM − η0)
D→ N (0, VGMM)
with
VGMM =[(M ′
1Πa1 +M ′
2Πa2)
′(Nu)−1(M ′
1Πa1 +M ′
2Πa2)]−1
.
(b) Model in section 7.4 with mixed identification strengths where the coefficients of the
endogenous variables may be weakly identified.
Let us first recall the linear model considered in section 7, where the main equation is:
πt = αc + αb,1πt−1 + αb,2πt−2 + αb,3πt−3 + αfπet + αyyt + ut = Z ′
1,tθz1 +X ′tθx + ut,
with θz1 = [αc, αb,1, αb,2, αb,3]′, θx = [αf , αy]
′, Z1t = [1, πt−1, πt−2, πt−3]′, Xt = [πet , yt]
′, p1 = 4,
and p2 = 2. In section 7.3, we estimate the slope parameters θz1 and θx: the model is linear in
66
the parameters. Direct applications of Theorems 1 and 2 deliver their B-GMM estimators with
associated convergence rates provided by√TR
−1
T (e.g.√T and
√T/rT , respectively) where
the underlying matrix RT writes RT =
(I4 0
0 rT I2
)with some rT = O(T α), 0 ≤ α < 1/2.
In section 7.4, we are not interested in estimating the above slope parameters, but rather
the deep structural parameters derived from the underlying macroeconomic model. The main
equation writes instead,
πt = ψc+
(ρ1 − ρ21 + ρ1
)πt−1+
(ρ2 − ρ31 + ρ1
)πt−2+
(ρ3
1 + ρ1
)πt−3+
(1
1 + ρ1
)πet +
3(1− θ)2
θ(1 + ρ1)yt+ ǫt
The vector of unknown parameters of interest is
η = [ψc, ρ1, ρ2, ρ3, θ]′ ,
while the above slope parameters are given by
f(η) =
[ψc ,
ρ1 − ρ21 + ρ1
,ρ2 − ρ31 + ρ1
,ρ3
1 + ρ1,
1
1 + ρ1,3(1− θ)2
θ(1 + ρ1)
]′
One can show that ∂f(η0)/∂η′ is full-column rank:
∂f(η)
∂η′=
1 0 0 0 0
0 1+ρ2(1+ρ1)2
−11+ρ1
0 0
0 ρ3−ρ2(1+ρ1)2
11+ρ1
−11+ρ1
0
0 −ρ3(1+ρ1)2
0 11+ρ1
0
0 −1(1+ρ1)2
0 0 0
0 −3(1−θ)2θ(1+ρ1)2
0 0 3(1−θ)(θ2−1)θ2(1+ρ1)
.
The first four moments are strongly identified and only depend on four unknown parameters,
ψc, ρ1, ρ2, and ρ3. Necessarily, ψc, ρ1, ρ2, and ρ3 are strongly identified, while θ may not be.
This suggests the following rescaling matrix,
Rη
T =
(I4 0
0 rT
).
We can now rewrite (C.6) as follows:
Rη
T∂f ′(ηB−GMM )
∂ηR
−1
T
[RTW
′Z(Nau)
−1Z ′WRT
]R
−1
T∂f(η)∂η′
Rη
T
√T (R
η
T )−1(ηB−GMM − η0)
= −Rη
T∂f ′(ηB−GMM )
∂ηR
−1
T RTW′Z(Na
u)−1√T Z ′U (C.8)
To conclude, we simply need to show that
R−1
T
∂f(η)
∂η′Rη
T
67
converges in probability to a full-column rank matrix when T goes to infinity. Direct compu-
tation along with the consistency of η yields:
Plim
[R
−1
T∂f(η)∂η′
Rη
T
]= Plim
1 0 0 0 0
0 1+ρ2(1+ρ1)2
−11+ρ1
0 0
0 ρ3−ρ2(1+ρ1)2
11+ρ1
−11+ρ1
0
0 −ρ3(1+ρ1)2
0 11+ρ1
0
0 −1rT (1+ρ1)2
0 0 0
0 −3(1−θ)2rT θ(1+ρ1)2
0 0 3(1−θ)(θ2−1)
θ2(1+ρ1)
=
1 0 0 0 0
0 1+ρ2(1+ρ1)2
−11+ρ1
0 0
0 ρ3−ρ2(1+ρ1)2
11+ρ1
−11+ρ1
0
0 −ρ3(1+ρ1)2
0 11+ρ1
0
0 0 0 0 0
0 0 0 0 3(1−θ)(θ2−1)θ2(1+ρ1)
≡ F (η0),
which is full-column rank. Similarly, Plim[R−1
T (∂f(ηB−GMM )∂η′)Rη
T ] = F (η0). And, the
asymptotic distribution directly follows:
√TR
η
T (ηB−GMM − η0)D→ N (0, VB−GMM)
with
VB−GMM = [F ′(η0)K(Nau)
−1K ′F (η0)]−1
= F ′(η0) [Πa′1 M1(Nu,1)
−1M ′1Π
a1 +Πa′
2 M2(Nu,2)−1M ′
2Πa2]F (η
0)−1
Similarly, we can show that:
√T (ηGMM − η0)
D→ N (0, VGMM)
with
VGMM =[F ′(η0)(M ′
1Πa1 +M ′
2Πa2)
′(Nu)−1(M ′
1Πa1 +M ′
2Πa2)F (η
0)]−1
.
The asymptotic variance-covariance matrices of B-GMM and GMM are very similar to the
formulas obtained for the linear model. It is then straightforward to show that
VB−GMM ≤ VGMM
by following the proof of Theorem 2.
68