Eﬃcient Estimation with Time-Varying Information and the ...baa7/research/AntoineBoldea201708.pdf · [email protected] August 8, 2017 Abstract Decades of empirical evidence suggest

Efficient Estimation with Time-Varying Information and

the New Keynesian Phillips Curve∗

Bertille Antoine

Simon Fraser University

Bertille [email protected]

Otilia Boldea

Tilburg University

[email protected]

August 8, 2017

Abstract

Decades of empirical evidence suggest that many macroeconometric and financial

models are subject to both instability and identification problems. In this paper, we

address both issues under the unified framework of time-varying information, which in-

cludes changes in instrument strength, changes in the second moment of instruments,

and changes in the variance of moment conditions. We develop a new estimation method

that exploits these changes to increase the efficiency of the estimates of the (stable) struc-

tural parameters. We estimate a New Keynesian Phillips Curve and obtain more precise

estimates of the price indexation and output gap parameters than standard methods. An

extensive simulation study shows that our method delivers substantial efficiency gains

in finite samples.

Keywords: GMM; Weak instruments; Break-point; Change in identification strength.

JEL classification: C13, C22, C26, C36, C51.

∗An earlier version of this paper was circulated as ”Efficient Inference with Time-Varying Identification Strength”. We thank

Lars Peter Hansen, Stephane Bonhomme, eight anonymous referees, Jaap Abbring, Jorg Breitung, Bin Chen, Carolina Caetano,

Jeff Campbell, Xu Cheng, Valentina Corradi, Frank Diebold, Herman van Dijk, Dennis Fok, Alastair Hall, Lynda Khalaf, Maral

Kichian, Frank Kleibergen, Sophocles Mavroeidis, Adam McCloskey, Nour Meddahi, Ulrich Muller, Serena Ng, Eric Renault,

Bernard Salanie, Frank Schorfheide, Burak Uras, Bas Werker, seminar participants at Boston College, Brown U, Columbia U,

Emory, Erasmus U Rotterdam, Georgia State U, Guelph U, LSE, Rochester U, TI Amsterdam, Tilburg U, TSE, U Ottawa, U

Penn, U Washington, Western U, and at the conferences: EC2 (Maastricht, 2012), CESG (Kingston, 2012), ESEM (Malaga, 2012;

Goteborg, 2013), NESG (Amsterdam, 2013), CIREQ (Montreal, 2014) for their helpful comments. Otilia Boldea acknowledges

NWO VENI grant 415-11-001, and the hospitality of the UPenn Economics Department, where part of this research was conducted.

1

1 Introduction

Magnusson and Mavroeidis (2014) point out that time variation in the data generating process

can be used to improve inference on stable structural parameters in time series models. They

exploit this time variation in the presence of arbitrarily weak identification. Since no point

estimator is fully robust to weak identification, they focus on hypothesis testing. However,

practitioners are often interested in point estimators as well.

In this paper, we exploit time variation in the data generating process to develop more

efficient point estimators of the stable structural parameters in time series models. Our main

identification assumption is that the information about the structural parameters grows with

the sample size in a manner that allows consistent estimation of the model by GMM methods.1

In other words, our estimation results rely on non-weak identification in a subsample of the

data. While such a restriction is not imposed by Magusson and Mavroeidis (2014), it allows

us to efficiently conduct subvector inference.

We focus on linear models and on time variation in the probability limit of the second

derivative of the GMM minimand (which we call limiting Hessian), because the limiting

Hessian summarizes the ”amount of information about the parameters (of the model) that is

contained in the observations of the sample” (Davidson and MacKinnon (2004)). We exploit

three types of time variation in the limiting Hessian: breaks in the reduced form parameters2,

breaks in the second moment of the instruments, and breaks in the variance of the moment

conditions.

Breaks in the reduced form parameters are strongly motivated by the work of Hall, Han and

Boldea (2012) and of Magnusson and Mavroeidis (2014). What is new in this paper is that we

link changes in the reduced form parameters to potential changes in instrument strength over

the sample. Allowing for changes in instrument strength means that the instruments need

not be strong over the entire sample, but that they can be weak or even uninformative over

subsamples. As long as the instruments are not weak in two adjacent subsamples so that all

break-points can be consistently estimated, we provide more efficient point estimators than

currently available.

Breaks in the second moment of instruments are motivated by events such as the Great

Moderation, which caused a variance decline in many macroeconomic variables - see Stock and

Watson (2002). Breaks in the variance of the moment conditions are motivated by financial

crises, which cause a surge in the variance of structural shocks - see Rigobon (2003).

We propose a new estimator, Break-GMM (B-GMM henceforth), that uses these three types

of breaks to compute and stack split-sample moments, reweighing these moments differently

over different subsamples according to the time variation in their variance. We show that

1This does not mean that the identification has to be sufficiently strong over the entire sample. The

identification is allowed to be weak over part of the sample as we explain in the following paragraphs.2For lack of better terminology, we refer to the projection of regressors on instruments as the reduced form.

2

B-GMM is more efficient than the standard full sample GMM. We show that this holds even

when a subsample is completely unidentified.

Because we allow for weaker identification patterns in the data along with changes in

identification strength, our paper contributes to the weak-identification literature: see the

surveys by Stock, Wright, and Yogo (2002), Andrews and Stock (2005), and Hansen, Hausman,

and Newey (2008). It also contributes to the break-point literature, since our results generalize

the estimation methods in Bai and Perron (1998) and Hall, Han and Boldea (2012).

In our simulation study, we first consider a model with one reduced form break. We show

that our estimator for the slope parameter has the smallest RMSE irrespective of the location

of the break. Then we consider a model with no reduced form break, and we find that the

standard deviation of B-GMM is still smaller than that of the GMM.3

We illustrate the use of B-GMM in our empirical analysis, where we estimate a monthly New

Keynesian Phillips curve (NKPC) model with multiple lags which we derive from a structural

new Keynesian model with general inflation indexation. We apply the model to a monthly US

dataset which features the prominent Great Moderation and oil price collapse breaks, and find

that the B-GMM procedure delivers more reliable estimates than standard GMM methods.

We find strong evidence for partial rather than full price indexation. Our estimates also

suggest that prices are re-optimized every three months, close to the price rigidity estimates

at the micro-level - see Klenow and Kryvtsov (2008).

The paper is organized as follows. Section 2 defines the three types of changes and moti-

vates them. Section 3 entails the main contribution of the paper. It introduces the B-GMM

estimator and the associated efficiency results relative to standard GMM methods. Section

4 analyzes three break-point estimators: the Bai and Perron (1998) univariate estimator, the

Qu and Perron (2007) multivariate estimator, as well as a new multivariate estimator which

is better suited for our framework. We also show that these break-point estimators4 for all

three types of changes are consistent at a fast enough rate so that the asymptotic distribution

of the B-GMM estimators is unaffected by the break-point estimation error. Section 5 derives

the asymptotic distribution of the break-point estimators in the absence of a break and for

all three types of changes. We also show that the B-GMM estimator is asymptotically equiv-

alent to the GMM estimator in the absence of a break and under a reasonable homogeneity

assumption for the moments of the data. Section 6 contains the simulation results. Section

7 applies our procedure to the NKPC model and section 8 concludes. The Appendix is orga-

nized as follows: Appendix A contains all the tables and graphs, Appendix B the proofs of all

theorems, and Appendix C contains a generalization of our framework to nonlinear models.

The supplemental appendix contains a general characterization of identification strength, and

a derivation of the NKPC model used in the empirical analysis.

3They should be asymptotically equivalent as shown in section 5 of the paper.4We do not consider the multivariate Qu and Perron (2007) estimator for all types of changes, since it is

not well-suited for our framework as explained in section 4.

3

2 Framework and Examples

Consider the standard linear regression model with p1 exogenous variables Z1t, p2 endogenous

variables Xt, p = p1+ p2 parameters of interest θ0, q valid instruments Zt (q ≥ p) that include

the constant and Z1t, and full column rank matrix Π∗ of size (q, p2):

yt = W ′tθ

0 + ut , with Wt = [Z ′1t X

′t]′ and θ0 = [θ0

′

z1 θ0′

x ]′, (2.1)

X ′t = Z ′

tΠ∗ + v′t . (2.2)

For lack of better terminology, we call (2.1) the structural equation (or structural form)

and (2.2) the reduced form5 (henceforth RF). In this paper, we focus on efficient estimation

of θ0. The standard optimal full sample GMM estimator θGMM minimizes

LGMM(θ) = g′T (θ)N−1u gT (θ) with gT (θ) = T−1

∑Tt=1 gt(θ) , (2.3)

gt(θ) = Zt(yt −W ′tθ), and Nu

P→ Nu = limT

[Var[

√TgT (θ

0)]]≡ AVar[

√TgT (θ

0)].

The standard optimality and efficiency results for θGMM rely on the stationarity of Wt and

Zt and thus, implicitly, on the assumption that the probability limit of the Hessian of the

GMM minimand LGMM does not change over the sample - see e.g. Hall (2005), Theorem 3.4

and Assumption 3.1. In this paper, we show that when this probability limit changes over the

sample, the associated change-points can be used to construct more moment conditions and

more efficient estimators than θGMM .

As mentioned in the introduction, we refer to the probability limit of the Hessian of the

GMM minimand as the limiting Hessian; this is, under regularity conditions,

PlimT

[∂2LGMM (θ0)

∂θ∂θ′

]= 2

(PlimTT

−1∑T

t=1WtZ′t

)N−1u

(PlimTT

−1∑T

t=1 ZtW′t

)

= 2

[E(Z1tZ

′t)

Π∗′Q

]N−1u

[E(ZtZ

′1t) QΠ∗

], (2.4)

where we assumed E(ZtZ′t) = Q. When Π∗, Q, or Nu change over the sample, the limiting

Hessian changes in general, and the associated change-points (or breaks) can be interacted

with the instruments to construct more moment conditions and more efficient GMM estima-

tors6: Π∗ quantifies the instrument strength, Q is related to the instrument variance, and Nu

is related to the variance of the structural error ut. We now motivate each type of break.

5Both equations can, but do not need to, originate from a structural model. In our empirical analysis, (2.1)

originates from a structural model, but the reduced forms (2.2) are the projections of regressors on instru-

ments. The reduced forms should be viewed, throughout the paper, as projections of endogenous regressors

on instruments and not be confused with the full reduced form of a structural model.6Changes in the limiting Hessian of the objective function can be used to construct more efficient estimators

not only for GMM, but for a wide range of estimation methods, linear or nonlinear, least-squares, maximum

likelihood, minimum-distance etc. Analyzing other estimators is beyond the scope of our paper.

4

• Changes in reduced form (RF) parameters. Suppose that the reduced form (2.2)

has a break-point at TRF . We consider the following generalized RF that allows for a change

in the identification strength of the instruments:

X ′t =

Z ′tΠ1

Tα1+ v′t , t ≤ TRF

Z ′tΠ2

Tα2+ v′t , t > TRF

, TRF = [TλRF ] , (2.5)

with λRF the RF break fraction, 0 ≤ αi ≤ 1/2, and Πi two (fixed) matrices of size (q, p2)

for i = 1, 2. Over each stable subsample, all the instruments have the same (unknown)

identification strength characterized by the (drifting) sequence7 T αi : instruments are strong

over subsample i when αi = 0, and weak when αi = 1/2. In other words, the larger αi, the

weaker the associated subsample. In this setting, the break-point TRF may capture two types

of changes: (1) a parameter break with stable identification strength, α1 = α2 and Π1 6= Π2

or (2) a change in identification strength, α1 6= α2.

In our empirical analysis, we estimate a NKPC model - see equations (7.1)-(7.3). The

literature suggests there is evidence for a change in identification strength for NKPC, be-

cause, using similar instrument sets, some US NKPC studies find weak instruments for the

sample 1960-2007 (see Mavroeidis (2005), Dufour, Khalaf and Kichian (2006), Nason and

Smith (2008), Kleibergen and Mavroeidis (2009)), while others find strong instruments over

the sample 1969-2005 (see Zhang, Osborn and Kim (2008, 2009)). The results in Kleibergen

and Mavroeidis (2009, Table 4) indicate that this change in identification strength occurred

around the Great Moderation: their confidence sets for the NKPC parameters are much larger

for 1960-1983 than for 1984-2007, suggesting that identification is stronger in the latter period.

• Changes in the second moment of instruments (SMI). For many macroeconomic

variables, including output and inflation, whose lags are used as instruments in our NKPC

application in section 7, there is strong evidence that their variance has declined sharply after

the Great Moderation, see e.g. Stock and Watson (2002). A change in the variance of the

instruments implies in most cases8 a break in the SMI:

E(ZtZ′t) = Q1, t ≤ TSMI = [TλSMI ] and E(ZtZ

′t) = Q2, t > TSMI . (2.6)

As we prove in section 3, the SMI break-point TSMI can also deliver more efficient estima-

tors of θ0.

7This framework is standard in the weak-identification literature (see e.g. Staiger and Stock (1997)). More

general identification patterns allowing the strength of identification to vary across instruments and directions

of the parameter space are discussed in the supplemental appendix.8The only case when this does not occur is when the decrease in the variance of instruments is equal to the

increase in their expected value squared; in that case, the SMI stays the same. There is no empirical evidence

that the Great Moderation break is of this type.

5

• Changes in the (long-run) variance of moment conditions (VMC). Setting λ ∈(0, 1], we define changes in the VMC below:

AVar

[T−1/2

∑[Tλ]t=1 gt(θ

0)]= λNu,1, [Tλ] ≤ TVMC = [TλVMC ] (2.7)

AVar

[T−1/2

∑Tt=[Tλ+1] gt(θ

0)]= (1− λ)Nu,2, [Tλ] > [TλVMC ]. (2.8)

Since the population moments are gt(θ0) = Ztut, the VMC break TVMC may occur because

of a SMI break or a break in the conditional variance of ut. In our empirical application in

section 7, the Great Moderation break is treated as both a SMI break and a VMC break.

Alternatively, asset return models often exhibit breaks in the conditional variance of their

structural error because the volatility of the structural shocks increases substantially in finan-

cial crises, see e.g. Rigobon (2003). We show in section 3 that VMC breaks can also deliver

more efficient estimators of the structural parameters.9

Therefore, the main contribution of this paper is to propose and analyze the Break-GMM

estimator (B-GMM hereafter), which exploits break-points T 0 (with T 0 ∈ TRF , TSMI , TVMC)in RF, SMI and VMC to increase efficiency of the estimators of the structural parameters θ0,

while maintaining that these parameters do not change over the sample and are sufficiently

strongly identified. As mentioned before, the break T 0 can be used to double the moment

conditions for θ0 in (2.1):

E[Ztut1(t ≤ T 0)] = 0 and E[Ztut1(t > T 0)] = 0.

In section 3, we show that these moment conditions are not redundant in presence of a break,

a result that is closely related to the non-redundancy of split-sample moments discussed in

Proposition 1 in Magnusson and Mavroeidis (2014). More specifically, we analyze the B-

GMM estimators that exploit a RF, a SMI or a VMC break, and we show that the B-GMM

estimators are more efficient than the full sample GMM estimators, and strictly more efficient

under some conditions.

3 The B-GMM Estimator

3.1 B-GMM with a RF break

Consider the model with a RF break that combines (2.1) and (2.5):

yt = W ′tθ

0 + ut , with Wt = [Z ′1t X

′t]′ and θ0 = [θ0

′

z1 θ0′

x ]′,

X ′t =

Z′

tΠ1

Tα1+ v′t , t ≤ TRF

Z′

tΠ2

Tα2+ v′t , t > TRF

, TRF = [TλRF ] ,

9Our result holds whether the regressors are endogenous or not. A proof of this statement is available in

our companion paper Antoine and Boldea (2015).

6

where λRF is the break fraction, 0 ≤ αi ≤ 1/2, and Πi two (fixed) matrices of size (q, p2) for

i = 1, 2. The main goal is to deliver efficient estimators of the structural parameters θ0. We

assume that the break TRF has occurred and is common to all RF equations. Additionally,

we do not know its location, and in this section we impose the high level assumption that it

can be estimated at a fast enough rate, given in Assumption A5 below. We call this estimator

TRF , and in section 4 we discuss both existing break-point estimators such as the ones in Bai

and Perron (1998) and Qu and Perron (1998), and a new multivariate break-point estimator

that all achieve this rate.

We now introduce three estimators of θ0.

• The full sample GMM estimator minimizes the GMM criterion LGMM defined in (2.3)

and ignores the RF break:

θGMM =(W ′ZN−1

u Z ′W)−1 (

W ′ZN−1u Z ′y

).

• The B-2SLS estimator uses first-stage predicted regressors Wt = [Z ′t X

′t]′ that are ob-

tained by interacting the instruments with the RF break TRF = [T λRF ]. It is defined as

in Hall, Han, and Boldea (2012) (HHB henceforth):

θB−2SLS =(∑T

t=1 WtW′t

)−1∑Tt=1 Wtyt, X

′t =

Z ′tΠ1λRF

, t ≤ TRF

Z ′tΠ2λRF

, t > TRF,

where for i = 1, 2, ΠiλRFare the OLS estimators of Πi/T

αi in (2.5), for the subsamples

before and after the estimated break TRF .

• Our proposed B-GMM estimator also interacts the instruments with the RF break:

θB−GMM = argminθ

[g′T (θ)(N

au)

−1gT (θ)],

with gT (θ) =

[ ∑TRF

t=1 Zt(yt −W ′tθ)/TRF∑T

t=TRF+1 Zt(yt −W ′tθ)/(T − TRF )

]and Na

uP→ AVar

(T 1/2gT (θ

0)).

Thus,

θB−GMM =(W ′Z(Na

u)−1Z ′W

)−1 (W ′Z(Na

u)−1Z ′y

), (3.1)

with Z the (T, 2q) matrix defined as Z ′ =

(Z1 · · · ZTRF

0 · · · 0

0 · · · 0 ZTRF+1 · · · ZT

).

B-GMM is the optimal counterpart of B-2SLS, using the optimal weighting matrix Nau . But

note that B-2SLS is not a standard 2SLS estimator. In addition, the standard GMM is not a

version of B-GMM with a particular weighting matrix in place of Nau .

To derive the asymptotic properties of all estimators, we impose the following regularity

assumptions in which ht is defined as: ht ≡ (ut, v′t)

′⊗Zt with ith element ht,i, and∑

1r ≡∑[Tr]

t=1 .

7

Assumption A 1. (Regularity of the break fraction, error terms and reduced form)

(i) 0 < λRF < 1, and the break-point TRF satisfies max(TRF , T − TRF ) ≥ max(q, [ǫT ]).

(ii) - The eigenvalues of S = AVar

(T−1/2

∑Tt=1 ht

)are O(1).

- E(ht,i) = 0 and for some d > 2, ‖ht,i‖d <∞ for t = 1, · · · , T and i = 1, · · · , (p2 + 1)q.

- ht,i is near epoch dependent with respect to some process ξt, ‖ht − E(ht|Gt+mt−m)‖2 ≤ ℓm

with ℓm = O(m−1/2) where Gt+mt−m is a σ-algebra based on (ξt−m, · · · , ξt+m).- ξt is either φ-mixing of size m−d/[2(d−1)] or α-mixing of size m−d/(d−2).

A1 is common for the break-point literature, and is similar to HHB. Part (i) states that

the break-fraction is fixed and that there are enough observations to estimate the parameters

before and after the break-point. Part (ii) allows for general weak dependence in the data.

Assumption A 2. (Regularity of the identification strength)

(i) Let r1T = T α1 , r2T = T α2, α = min(α1, α2), and rT = T α. We assume that α < 1/2, and

that when α = α1 = α2, then it holds that Π1 6= Π2.

(ii) For case (1), where α1 = α2 and Π1 6= Π2, we assume that Πi is of full rank p2 for at least

one i ∈ 1, 2. For case (2), where αi < αj for i, j ∈ 1, 2, i 6= j, we assume that Πi is of full

rank p2.

Note that since the slowest sequence riT is associated with the subsample with the strongest

identification, the sequence rT corresponds to the strongest subsample. A2(i) allows for at

most one subsample to be weakly identified. Thus, when there is no change in identification

strength, the identification cannot be weak throughout. A2(ii) ensures that the moment

conditions are not redundant. Note that this assumption allows for Πj = 0 (meaning that

θ0 can also be unidentified by subsample j) as long as Πi is of full rank for subsample i

(i, j ∈ 1, 2, i 6= j).

Assumption A 3. (Regularity of the instrumental variables)

Let M1r = T−1∑

1r ZtZ′t. Then M1r

P→ M1r, uniformly in r ∈ [0, 1], where we assume that

M1r is positive definite for all r ∈ (0, 1], and that (M1r1 −M1r2) is positive definite for any

r1, r2 ∈ [0, 1] such that r1 > r2.

Assumption A 4. (Regularity of the variances)

AVar

[T−1/2

∑

1r

ht

]= N1r =

(Nu,1r N ′

uv,1r

Nuv,1r Nv,1r

),

uniformly in r ∈ [0, 1], with Nu,1r, Nv,1r of size (q, q), respectively (p2q, p2q). We assume

that N1r is positive definite for all r ∈ (0, 1], and that N1r1 −N1r2 is positive definite for any

r1, r2 ∈ [0, 1] such that r1 > r2.

Assumption A 5. (Convergence rate of the RF break-point estimator)

‖λRF − λRF‖ = OP (T2α−1).

8

A3 and A4 are typical for the break-point literature and are used for proving consistency of

the break-point estimators in section 4. In this section, they are just needed to define limiting

quantities that appear in the asymptotic distributions of GMM and B-GMM. Note that A3

allows for a SMI break that may, or may not, coincide with the RF break, and that A4 allows

for heteroskedasticity in the sample moments of the structural equation and the RF. It also

allows for a VMC break that may, or may not, coincide with the RF break. A5 is the high

level assumption on the break-fraction estimator that we verify in section 4.

The following theorem states the asymptotic properties of the estimators of the structural

parameters. Its proof also shows that A1-A4 are sufficient for consistent estimation of θ0 via

GMM, B-2SLS or B-GMM.

Theorem 1. (Asymptotic normality of θGMM , θB−2SLS and θB−GMM)

Let ΛT = diag(T 1/2Ip1 , T1/2−αIp2) . Under A1 to A4 for GMM, and under A1 to A5 for B-2SLS

and B-GMM, ΛT (θGMM − θ0), ΛT (θB−2SLS − θ0), and ΛT (θB−GMM − θ0) are asymptotically

normally distributed with mean 0 and asymptotic variances, respectively:

VGMM =[(Πa′

1 M1 +Πa′

2 M2)(Nu)−1(M1Π

a1 +M2Π

a2)]−1

VB−2SLS =[Πa′

1 M1 Πa1 +Πa′

2 M2 Πa2

]−1 [Πa′

1 Nu,1Πa1 +Πa′

2 Nu,2Πa2

] [Πa′

1 M1 Πa1 +Πa′

2 M2 Πa2

]−1

VB−GMM =[Πa′

1 M1N−1u,1M1Π

a1 +Πa′

2 M2N−1u,2M2Π

a2

]−1,

with Nu = Nu,1 + Nu,2, Nu,1 = Nu,1λRF, Nu,2 = Nu,11 − Nu,1λRF

, M1 = M1λRF, M2 = M11 −

M1λRF, and, for i, j = 1 or 2 and i 6= j.

Πai =

(Πz1 Πi) αi = αj or αi < αj

(Πz1 O(q,p2)) αj < αiand Πz1 =

[Ip1

O(q−p1,p1)

].

Comments:

(i) The asymptotic normality of θB−2SLS encompasses as a special case the results in HHB,

where α1 = α2 = 0.

(ii) The rates of convergence of estimated parameters of the exogenous variable θ0z1 (standard

rate T 1/2) and the estimated parameters of the endogenous variables θ0x (slower rate T 1/2−α)

are extensions of the results developed by Antoine and Renault (2009) over stable reduced

forms (see e.g. their Theorem 4.1). The rate T 1/2−α comes from the strongest subsample and

holds even when the weakest subsample is genuinely weak, or even unidentified as would be

the case if Πj = 0 for some j. As discussed in Antoine and Renault (2009), reliable inference

on θ0 can be obtained using standard GMM-type formulas without having to know or estimate

the matrix ΛT .

(iii) To our knowledge, the consistency of both θGMM and θB−GMM - even when the weakest

subsample is genuinely weak (that is αj = 1/2) - is new. Hence, ignoring the break-point, as

in θGMM , does not lead to a loss of consistency, because the population moment condition in

A1(ii) still holds. However, using the RF break is crucial for efficiency as shown below.

9

In Theorem 2, we show that B-GMM is at least as efficient as GMM, and provide a necessary

and sufficient condition for its strict efficiency.

Theorem 2. (Efficiency of estimated structural parameters)

(i) Under A1 to A5, VB−GMM ≤ VGMM and VB−GMM ≤ VB−2SLS.10

(ii) Under A1 to A5, VB−GMM = VGMM if and only if N−1u,1M1Π

a1 = N−1

u,2M2Πa2.

(iii) Under A1 to A5, VB−GMM < VGMM if and only if

rank(N−1u,1M1Π

a1 −N−1

u,2M2Πa2) = p. (3.2)

Comments:

(i) Theorem 2 formalizes the result that when there are changes in the limiting Hessian due

to a RF break, using this break leads to more efficient estimators than the standard GMM.

Intuitively, more information can be extracted due to these changes. Also, because B-GMM

uses the optimal weighting matrix, it is more efficient than B-2SLS.

(ii) With a change in identification strength and no exogenous regressors (p = p2), θB−GMM

is strictly more efficient than θGMM . To see this, note that if riT = o(rjT ), then Πaj = O(q,p), so

rank(N−1u,1M1Π

a1 −N−1

u,2M2Πa2) = rank[(−1)i−1N−1

u,iMiΠi] = p. Thus, VB−GMM < VGMM . This

strict efficiency result holds even if the subsample j is weakly identified. Intuitively, when

computing B-GMM, the subsample moments are stacked and ”multiplied by their strength”.

As a result, the influence (variance) of the weak moments disappears at the limit. By contrast,

GMM adds the two moments to obtain the full sample moments. Therefore, the variances of

both the weak and the strong moments show up even asymptotically. Thus, GMM is strictly

less efficient than B-GMM.

(iii) If Πj = 0 for one subsample j 6= i, Theorem 2 still holds. To understand the in-

tuition for this result, suppose there are no exogenous regressors. Then, from Theorem 1,

VB−GMM = (Π′iMiN

−1u,iMiΠi)

−1 and VGMM = (Π′iMiN

−1u MiΠi)

−1. Since Nu,i < Nu, it follows

that VB−GMM < VGMM .

(iv) If in addition to Assumptions A1-A5, we also impose (full sample) conditional ho-

moskedasticity (Var(ut|Zt) = Φu and Q = E(ZtZ′t)), then VB−GMM = VB−2SLS < VGMM , as

shown in the proof of Theorem 2.

(v) Besides A3 and A4, Theorem 2 assumes nothing about the SMI and the VMC. Thus,

the strict-efficiency result in Theorem 2(iii) holds when there is a break in SMI and/or VMC.

Below, we compare B-GMM with GMM in the absence of a SMI and a VMC break. The

following assumption facilitates this comparison.

Assumption A 6. (Homogeneity of the second moments)

(i) (no SMI break) M1r = rQ, with Q =M11 = E(ZtZ′t);

(ii) (no VMC break) N1r = rN , and Nu = Nu,11 = limT→∞ Var(∑T

t=1(Ztut)/√T ).

10For two square matrices VA, VB, we write VA ≤ VB if VA − VB is negative semidefinite, and VA < VB if

VA − VB is negative definite.

10

A6 imposes no time variation in SMI and VMC. Thus, A6(i) does not allow for SMI breaks

and A6(ii) does not allow for VMC breaks.

Comments:

We define B ≡ N−1u,1M1Π

a1 −N−1

u,2M2Πa2. Under A6, B = N−1

u Q(Πa1 − Πa

2).

(i) Under A6 and in the absence of exogenous regressors, B = N−1u Q(Π1 −Π2). Therefore,

with pure structural change, e.g. rank(Π1−Π2) = p, we have that rank(B) = p. In this case,

θB−GMM is always strictly more efficient than θGMM .

(ii) Under A6 and in the presence of exogenous regressors, B = [O(q,p1), N−1u Q(Π1 − Π2)],

and rank(B) < p. Thus, in general, (VB−GMM − VGMM) is positive semi-definite: not all

linear combinations of the B-GMM estimators are strictly more efficient than the same linear

combinations of their GMM counterparts. However, with a change in identification strength,

the B-GMM estimates on the endogenous regressors (θ0x) are strictly more efficient than their

GMM counterparts, i.e. the (p2, p2) lower right block of VGMM − VB−GMM is positive definite

(see Appendix B, proof of Theorem 2).

To summarize, when there is a RF break, B-GMM is more efficient and, in many cases,

strictly more efficient than GMM.

3.2 B-GMM with a SMI break or a VMC break

In this section, we show that B-GMM is still more efficient than GMM if a SMI or a VMC

break is used to construct B-GMM instead of a RF break. A SMI break is defined in (2.6),

and a VMC break is defined in (2.7)-(2.8). We assume below that the magnitude of these

breaks is fixed.11

Assumption A 7. (SMI or VMC break)

(i) SMI break: in (2.6), Q1 −Q2 6= O(q,q).

(ii) VMC break: in (2.7) and (2.8), Nu,1 −Nu,2 6= O(q,q).

We assume first that the RF has no break and then discuss the extension to a RF break

at the end of Corollary 2. We also assume below that the SMI or the VMC break-fraction

λ0 ∈ λSMI , λVMC can be estimated consistently by some break-fraction estimator λ ∈λSMI , λVMC, and we discuss such estimators in section 4.

Assumption A 8. (SMI and VMC break-point estimators)

λ− λ0 = OP (T−1).

11If these breaks were to shrink with the sample size instead, then they would not show up as changes

in the limiting Hessian of the GMM objective function, and would therefore not be of use in constructing

asymptotically more efficient estimators than the full-sample GMM.

11

Note that the above rate of consistency is faster than that of the estimated RF break

fraction; see also A5. The following corollary gives the asymptotic distribution of the GMM

estimators and of the B-2SLS and B-GMM estimators, constructed with λ replacing λRF .

Corollary 1.

Let ΛT = diag(T 1/2Ip1, T1/2−αIp2) as in Theorem 1. Let A1 hold with λRF replaced by λ0, and

A3, A4 hold with Πi = Π of full rank and with α = αi < 1/2 for i = 1, 2. Also let A7(i) or

A7(ii) hold, and let A8 hold for B-2SLS and B-GMM. Then ΛT (θB−2SLS−θ0), ΛT (θGMM−θ0),and ΛT (θB−GMM − θ0) are asymptotically normally distributed with mean 0 and asymptotic

variances:

VGMM =[Πa′(M1 +M2)(Nu)

−1(M1 +M2)Πa]−1

VB−2SLS =[Πa′ (M1 +M2) Π

a]−1 [

Πa′ NuΠa] [Πa′ (M1 +M2) Π

a]−1

VB−GMM =[Πa′(M1N

−1u,1M1 +M2N

−1u,2M2)Π

a]−1

,

with Nu = Nu,1 + Nu,2, Nu,i = Nu,iλ0, Mi = Miλ0 , for i = 1, 2, Πa = (Πz1,Π), and with Mir

and Nu,ir defined in A3 and A4 respectively.

Comments:

(i) As stated in the introduction, B-GMM optimally reweighs the two subsample moment

conditions by their respective variance in that subsample. This is closely related to Chamber-

lain’s (1987) idea of constructing optimal instruments by reweighing the original instruments

with their respective conditional variances.

(ii) If there is a RF break that coincides with T 0 = [Tλ0], then Theorem 1 holds as stated.

We then recommend using λ instead of λRF in constructing B-GMM because of its faster

convergence rate; see A5 and A8. If the RF break is different from the SMI or VMC break,

then Corollary 1 holds over the subsamples with no RF break.

Below, we give conditions under which B-GMM is (strictly) more efficient than GMM. For

the B-2SLS and B-GMM estimators computed with λRF replaced by λ, we have:

Corollary 2.

Let A1 hold with λRF replaced by λ0, and A3, A4 with Π1 = Π2 = Π of full rank and

α1 = α2 = α < 1/2. Also let A7(i) or A7(ii) hold, and let A8 hold. Then:

(i) VB−GMM ≤ VGMM and VB−GMM ≤ VB−2SLS.

(ii) VB−GMM = VGMM if and only if N−1u,1M1Π

a = N−1u,2M2Π

a.

(iii) VB−GMM < VGMM if and only if rank(N−1u,1M1 −N−1

u,2M2)Πa = p.

Comments:

(i) If we use the SMI break-point estimator to construct B-GMM, and A6(ii) holds, there is

no VMC break. In that case, Nu,1 = λ0Nu, Nu,2 = (1−λ0)Nu,M1 = λ0Q1 andM2 = (1−λ0)Q2.

Hence, if B ≡ (N−1u,1M1−N−1

u,2M2)Πa, then B = N−1

u (Q1−Q2)Πa. Since (Q1−Q2) is full rank,

12

rank(B) = rank(Πa) = p, so B-GMM is strictly more efficient than GMM, with or without

exogenous regressors. A similar result holds under A6(i) (no SMI break) when we use the VMC

break-point estimator to construct B-GMM, in which case Nu,1 = λ0Su,1, Nu,2 = (1− λ0)Su,2,

and we need (Su,1 − Su,2) to be full rank.

(ii) Note that if there is a RF break equal to T 0, then Theorem 2 holds as stated. If there

is a RF break not equal to T 0, then Corollary 2 holds over the subsamples with no RF break.

The results in sections 3.1 and 3.2 suggest that we can use the union of RF, SMI and VMC

breaks to obtain more efficient estimates. In practice, there may also be multiple breaks in

RF, SMI or VMC. Theorems 1-2 and their corollaries straightforwardly extend to multiple

breaks.12

4 Break-point estimators

In this section, we discuss three break-point estimators: the univariate estimator in Bai and

Perron (1998) (BP henceforth), the multivariate estimator in Qu and Perron (2007) (QP

henceforth) and a new multivariate estimator which we propose in this section. To understand

the differences between these estimators, it is useful to start with a break in the RF.

We first introduce some notations that simplify the exposition. Let T1λ = [Tλ], and T2λ =

T − [Tλ]. For any parameter estimator or sum, the subscript 1λ refers to estimation or

summation in the segment 1, . . . , T1λ (e.g. Π1λ or∑

1λ), and the subscript 2λ refers to

estimation or summation in the segment T1λ + 1, . . . , T (e.g. Π2λ or∑

2λ).

4.1 RF break-point estimators

Consider one RF break as given in (2.5).13 This break is assumed common and can be

estimated from any RF equation using the univariate break-point estimator in BP. Below we

describe this estimator for the first RF equation.

Let Xt,1 be the first element of the (p2, 1) vector Xt and Πiλ,1 be the first column of the

(q, p2) matrix of OLS estimators of the RF parameters, Πiλ for i = 1, 2. The BP estimator

12For the interested reader, a discussion and theoretical results on the detection and estimation of

multiple breaks can be found in the November 2015 version of this paper, which can be found at

http://www.sfu.ca/∼baa7/research/AntoineBoldeaWP2015. It is important to note that if two adjacent sub-

samples are weakly identified, that particular break cannot be consistently estimated. Our analysis in this

paper only covers break-point estimators that are consistently estimable if they occur, and therefore if there

are multiple breaks, we do not allow for two adjacent samples that are both weakly identified.13Our results are stated for one break but bear a straightforward generalization to multiple breaks.

13

http://www.sfu.ca/~baa7/research/AntoineBoldeaWP2015

TBPRF = [T λBPRF ] is:

λBPRF = argminλ∈I

[LBP (λ, Πiλ,1)

], where

LBP (λ, Πiλ,1) =∑2

i=1 T−1∑

iλ(Xt,1 − Z ′tΠiλ,1)

2 ,

and I = [ι, 1 − ι] where ι is a small but positive cut-off, usually chosen to be equal to 0.15

in practice. We show below that this estimator satisfies the high-level assumption A5 and

therefore can be used to construct B-GMM. However, this estimator discards the information

about the other RFs, and is therefore not efficient. A more efficient RF break-point estimator

is a multivariate estimator that estimates the common break in all RF equations while taking

into account the correlation between these equations.

One option is to use the QP estimator. To define this estimator, rewrite (2.5) as:

Xt = vt + (Ip2 ⊗ Z ′t)︸︷︷︸

Z′

t

β01 t ≤ TRF

β02 t > TRF

and β0i = (Π′

i,1/Tαi . . .Π′

i,p2/T αi)′, i = 1, 2,

where ⊗ denotes the Kronecker product. Let Var(vt) be equal to Σv,1 before the break TRF ,

and equal to Σv,2 after the break. Then, the QP break-point estimator λQPRF minimizes the

following nonlinear quasi-likelihood function:

minλ,βi,Σv,iLQP (λ, βi,Σv,i), where

LQP (λ, βi,Σv,i) =∑2

i=1

∑iλ

(Xt − Z ′

tβi)′ Σ−1

v,i (Xt − Z ′tβi)− log det(Σv,i)

.

We also show below that λQPRF satisfies the high-level assumption A5.

However, note that this break-point estimator implicitly jointly estimates the breaks in the

RF and in the short-run variance of the error terms. As such, this estimator is not particularly

suitable for our framework, because once these breaks are estimated, it is unclear which breaks

are which. We need to know the number of breaks in the RF parameters, the number of breaks

in the RF variance, impose this restriction in the QP objective function above, and only then

we would be able to know which are the RF parameter break-point estimators. However, for

constructing B-GMM, only the RF parameter breaks are of interest because they lead to a

more efficient estimator, while RF variance breaks do not. The fact that the RF variance

breaks are not useful for improving efficiency can be seen from the asymptotic variance of the

B-GMM in Theorem 1, which does not feature the RF variance.

Therefore, we propose below a feasible generalized least-squares (FGLS) multivariate break-

point estimator that exclusively focuses on estimating RF parameter breaks efficiently. The

FGLS estimator for one break14 is constructed as:

Step 1 Use as initial break-point estimator T = [T λ] ∈ TBPRF , TQPRF .

14The generalization to multiple breaks is straightforward and omitted for brevity.

14

Step 2 Calculate the RF residuals: vt = Xt − Z ′tβ1λ for t ≤ T , and vt = Xt − Z ′

tβ2λ otherwise.

Step 3 Consistently estimate the short-run RF variance matrix using these residuals. We call

this Σv,t, since we compute it either over the whole sample: Σv,t = Σv = T−1∑T

t=1 vtv′t,

or over the two subsamples obtained in Step 1: Σv,t = Σv,1 for t ≤ T and Σv,t = Σv,2

otherwise, with Σv,i =1Tiλ

∑iλ vtv

′t for i = 1, 2.

Step 4 Obtain the multivariate break-point estimator TRF = [T λRF ] by minimizing the following

feasible generalized least-squares (FGLS) objective function over (λ, β1, β2):

minλ,βi LFGLS(λ, βi), whereLFGLS(λ, βi) =

∑2i=1 T

−1∑

iλ(Xt − Z ′tβi)

′Σ−1v,t (Xt − Z ′

tβi).

The proposed FGLS estimator λRF uses the fact that the break is common across all

RF equations and is therefore more efficient than its univariate counterpart. Unlike the QP

estimator, it is based on a linear objective function, and as we will see below, it allows for

breaks in the variance of the error terms without having to estimate their number or their

locations. This is reflected in the assumption below.

Assumption A 9. (RF variance)

T−1∑

1r E(vtv′t) → rΣv, uniformly in r, with Σv a p.d. matrix of constants.

This assumption focuses on the short-run RF variance rather than the long-run RF variance

and is similar to QP but slightly more general. It allows for shrinking breaks in the RF

variance, conditional and unconditional heteroskedasticity, as long as the limiting variance Σv

is constant. This assumption is not needed if the breaks in the RF parameters and the RF

variance are common, in which case we can allow for fixed breaks in the short-run variance.

However, imposing this assumption greatly simplifies the proofs.

The theorem below states the assumptions needed for all break-point estimators above to

satisfy the high-level assumption A5.

Theorem 3. (RF break-point estimators that satisfy A5)

Let α = min(α1, α2).

(i) Under A1-A4, ‖λBPRF − λRF‖ = OP (T2α−1).

(ii) Under A2-A4, A9 and under the QP assumptions A1-A9, with their Xt replaced by our

Zt and their ut replaced by our vt, ‖λQPRF − λRF‖ = OP (T2α−1).

(iii) Under A1-A4 and A9, ‖λRF − λRF‖ = OP (T2α−1).

We now discuss the pros and cons of each of these estimators. The BP estimator only

uses information from one RF equation, while the QP and FGLS estimators use information

from all RF equations; therefore, the latter are more efficient. However, the BP estimator

imposes the least assumptions about the RF variance among all three estimators. In contrast,

15

the QP and the FGLS estimator estimate the RF variance, therefore they require stronger

assumptions on this variance, reflected here in A9.

For our framework and RF breaks, FGLS is in general preferred to QP for two rea-

sons: first, because it exclusively focuses on estimating RF parameter breaks and second,

because it requires less assumptions. For example, for the QP estimator, QP A1 impose that1T 0

∑T 0

t=1 ZtZ′ta.s.→ Q0

1, while for the FGLS estimator we only require that 1T 0

∑T 0

t=1 ZtZ′t

p→ Q01,

for some positive definite matrix Q01, as implicit from A3. QP A4 imposes strong mixing

conditions on ztvt, while the FGLS estimator only requires near-epoch dependence as stated

in A1. More importantly, QP A6 imposes that the RF parameter breaks are shrinking, while

for the FGLS estimator we can allow for fixed RF parameter breaks.

4.2 SMI break-point estimators

Here, we are estimating a break in the unconditional mean of the matrix ZtZ′t. Let jt =

vech (ZtZ′t), the q(q+1)/2 vector which stacks all unique elements of ZtZ

′t in order. Estimating

a SMI break implicitly means estimation of TSMI = [TλSMI ] from the following equation:

jt =

µ01 + et t ≤ TSMI

µ02 + et t > TSMI

, (4.1)

where E(ZtZ′t) = Q11[t ≤ TSMI ] +Q21[t > TSMI ], µ

0i = vech (Qi), i = 1, 2 and et = jt−E(jt).

Therefore, the BP and FGLS break-point estimators can be constructed as for the RF break,

and the assumptions we impose simply replace Xt with jt, Zt with q(q + 1)/2 intercepts, and

Πi/riT with µ0i for i = 1, 2. A3 is not needed since it is automatically satisfied for intercepts.

The remaining assumptions replace A1, A2, A4 and A9.

Assumption A 10. (SMI break)

(i) A1 holds with ht replaced by et = jt − E(jt), and λRF replaced by λSMI.

(ii) A4 and A9 hold with vt replaced by et, and Σv replaced by Σe.

(iii) µ02 − µ0

1 = δ0µ 6= 0, where µ01 and µ0

2 are defined in (4.1).

A10(iii) imposes that the size of the SMI break δ0µ is fixed; this was also stated in A7(i) and

is repeated here for clarity. If δ0µ was shrinking, the asymptotic variance of the B-GMM would

be unchanged, as evident from Corollary 2. Because this break is fixed, it is unclear how to

show consistency of the QP estimator, as the proofs in both QP and in Bai, Lumsdaine and

Stock (1998) heavily rely on shrinking breaks. Because the QP estimator is not our preferred

estimator, we focus exclusively on the BP and the FGLS estimator. The theorem below states

that they are both consistent at a rate equal to the sample size.

Theorem 4. (SMI break-point estimators that satisfy A5)

Under Assumption A10,

16

(i) ‖λBPSMI − λSMI‖ = OP (T−1).

(ii) If λSMI is obtained with the pre-estimator λBPSMI , ‖λSMI − λSMI‖ = OP (T−1).

4.3 VMC break-point estimators

Break-point estimators of the long-run variance of the moment conditions Nu are not available

to our knowledge even in the univariate case. Because HAC estimators of the long-run variance

are not reliable over small subsamples, we focus exclusively on estimating breaks in the short-

run variance of the moment conditions.15 If ut was observed, we could estimate the short-run

variance breaks TVMC = [TλVMC ] from the following (implicit) equation:

jtu2t =

µ01 + et, t ≤ T 0

µ02 + et t > T 0,

(4.2)

where et = jtu2t − E(jtu

2t ) = 0.16 Here, the dependent variable jtu

2t is not observed, but we

can replace it with jtu2t , where ut = yt −W ′

t θGMM or ut = yt −W ′t θB−GMM with θB−GMM

obtained using breaks in SMI and/or in the RF parameters. Because the dependent variable

is generated, we need slightly stronger assumptions on the data generating process to ensure

that the estimation error does not interfere with consistent estimation of the break-point.

Assumption A 11. (VMC break)

(i) A1 holds for ht,i as well as for et = jtu2t − E(jtu

2t ), with E(et) = 0, ‖ht,i‖4 < ∞, ‖zt‖8 <

∞,‖et,i‖4 <∞, and λRF replaced by λVMC.

(ii) A4 and A9 hold with vt replaced by et and Σv replaced by Σe.

(iii) µ02 − µ0

1 = δ0µ 6= 0, where µ01 and µ0

2 are defined in (4.2).

Part (i) states that et is near-epoch dependent like ht. We impose slightly stronger moment

conditions that ensure that the difference between the estimated and the true squared sample

moments [T−1∑T

t=1(jtu2t −jtu2t )] vanishes fast enough to guarantee consistency of the variance

estimator Σe,t used in the FGLS break-point estimation (Σe,t is defined as Σv,t but with et

replacing vt). Part (ii) is as for the SMI break. Part (iii) says that the VMC break is fixed,

and for the same reason as for the SMI break, we do not further pursue the QP estimator.

Below, we state the consistency of the BP and our proposed FGLS estimator.

Theorem 5. (VMC break-point estimators that satisfy A5)

Under Assumption A1-A4 and A11,

(i)‖λBPVMC − λVMC‖ = OP (T−1).

(ii) If λVMC is obtained with the pre-estimator λBPVMC, then ‖λVMC − λVMC‖ = OP (T−1).

15Therefore, we do not consider breaks in the autocorrelation structure of the moment conditions. However,

note that autocorrelation breaks could be estimated by estimating breaks in the product of the sample moment

conditions and their lags.16We use for simplicity the same notation et, µ

0

i as for the SMI break-point estimator, and jt is still equal

to vech(ZtZ′

t).

17

5 B-GMM with no break

In this section, we show under which conditions the B-GMM estimators are equivalent to the

GMM estimators in the absence of a break, so that asymptotically, there is no efficiency loss in

using B-GMM. We first state the limit distribution of all the BP and the FGLS break fraction

estimators presented in the previous section.17

Note that under A1, A4 and A6, by the functional central limit theorem in Wooldridge and

White (1988), Theorem 2.11, T−1/2∑

1r ht ⇒ N1/2Bh(r), a ((p2+1)q, 1) vector of independent

standard Brownian motions. Similarly, A10(i)-(ii) or A11(i)-(ii) guarantee that T−1/2∑

1r et ⇒Σ

1/2e Be(r), a (q(q+1)/2, 1) vector of independent standard Brownian motions. These processes

are used to define the limit distributions of the break-fraction estimators. To state these limit

distributions, we require the additional assumption given below.

Assumption A 12. (Regularity assumption for the VMC break-point estimators)

ht is either φ-mixing of size m−a/[2(a−1)] or α-mixing of size m−a/(a−2) for some a > 2. Also,

supt ||ht,i||2a < ∞ and supt ||zt,i||4a < ∞ for all i = 1, . . . , pq and j = 1, . . . , q. Moreover,

T−1∑

1r E(utjtwt,i) → ℓ(1)i and T−1

∑1r E(jtwt,iwt,j) → ℓ

(2)i,j uniformly in r ∈ (0, 1] and for all

i, j = 1, . . . , pq.

A12 guarantees that the estimation error in the sample moments T−1/2∑

1r jt(u2t −u2t ) does

not show up in the limit distribution of the VMC break-fraction estimators. This assumption

allows for shrinking RF break magnitudes but excludes fixed RF break magnitudes; if the RF

break magnitude was fixed, the estimation error could show up in the limit distribution.18

Note that the dependence structure and the moment conditions imposed in A12 are only

slightly stronger than those in A11(i).

To simplify the statement of the asymptotic distributions, for Bs(λ), s ∈ h, e, let Bs,l1:l2

refer to elements l1 to l2 of Bs(λ), stacked in order, Bs,1(λ) refer to the first element of Bs(λ),

and N∗ = N∗v,11 refer to the first (q, q) left upper block of the matrix Nv,11. Also, for any

conformable matrix A, define:

D(Bs(λ), A) = arg supλ∈I

[Bs(λ)− λBs(1)]

′ A [Bs(λ)− λBs(1)]

λ(1− λ)

.

Theorem 6. (Break-point estimators in the absence of a break)

Let A1, A3, A4 and A6 hold.

(i) Suppose there is no RF break, so that Π1 = Π2 = Π is full rank, and α1 = α2 = α < 1/2.

Under A9,

λBPRF ⇒ D(Bh,q+1:2q(λ), N∗1/2v Q−1N∗1/2

v ) and λRF ⇒ D(Bh,q+1:(p2+1)q(λ), N1/2v (Σ−1

v ⊗Q−1)N1/2v )).

17Here, we do not consider the QP estimator because it is not our preferred estimator for the reasons

discussed in section 4.18If the estimation error did show up in the limit distribution, then it is unclear whether the B-GMM and

the GMM estimators would be asymptotically equivalent in the absence of a break.

18

(ii) Suppose there is no SMI break, so that in (4.1), µ01 = µ0

2. Under A10(i)-(ii),

λBPSMI ⇒ D(Be,1(λ), 1) and λSMI ⇒ D(Be(λ), Iq).

(iii) Suppose there is no VMC break, so that in (4.2), µ01 = µ0

2. Under A11(i)-(ii) and A12,

λBPVMC ⇒ D(Be,1(λ), 1) and λVMC ⇒ D(Be(λ), Iq).

Theorem 6 shows that in all cases, the limit distribution exists. As evident from the

theorem, its form depends on the particular estimator. Despite this, we show below that

the B-GMM estimators are asymptotically equivalent to their full-sample GMM counterparts,

no matter what type of spurious break-point we estimate. To state this result, we need an

additional homogeneity assumption.

Assumption A 13. (Homogeneity assumption for covariances)

T−1∑

1r ztute′t ⇒ rNue, uniformly in r ∈ (0, 1].

This assumption is the counterpart of A6 for SMI and VMC break-point estimators, and

is imposed to guarantee that SMI and VMC break-point estimators are asymptotically in-

dependent of the B-GMM estimators. With this assumption, we can state the asymptotic

equivalence of the GMM and B-GMM estimators.

Theorem 7. (GMM and B-GMM Estimators in the absence of a break)

Suppose there is no RF break, no SMI break, and no VMC break so that Π1 = Π2 = Π is of full

rank, α1 = α2 = α < 1/2, and µ01 = µ0

2 in (4.1) and in (4.2). Let V ∗ =(Πa′QN−1

u QΠa)−1

,

Πa = (Πz1,Π) and ΛT be as in Theorem 1. Moreover, let A1, A3, A4 and A6 hold. Then:

(i) ΛT (θGMM − θ0)D→ N (0, V ∗).

(ii) If B-GMM is constructed with λBPRF or λRF , then under A9,

ΛT (θB−GMM − θ0)D→ N (0, V ∗).

(iii) If B-GMM is constructed with λBPSMI or λSMI and A10(i)-(ii) and A13 hold, or if B-GMM

is constructed with λBPVMC or λVMC and A10(i)-(ii), A12 and A13 hold, then:

ΛT (θB−GMM − θ0)D→ N (0, V ∗).

Theorem 7 shows that B-GMM is asymptotically equivalent to GMM in the absence of

breaks. Therefore, nothing is gained or lost asymptotically by imposing breaks that do not

occur.19

19In the August 2016 version of the paper (http://www.sfu.ca/econ-research/RePEc/sfu/sfudps/dp15-04.pdf),

we showed that B-GMM remains consistent when the moments of the data vary over time in a way that

violates A9. However, in that case, it is not clear that B-GMM will be asymptotically equivalent to GMM,

and neither GMM nor B-GMM are the most efficient estimators since none of them is optimally exploiting

the variation in the moments of the data (see Theorems 3-4 on pages 12-13 and their proofs in the previous

version).

19

http://www.sfu.ca/econ-research/RePEc/sfu/sfudps/dp15-04.pdf

6 Monte-Carlo simulations

We consider the framework of section 3.1 with one endogenous regressor X , q valid instruments

(including the intercept), and one break in the reduced form at TRF :

yt = α+Xtβ + σtǫt , Xt =

1 + Z ′

tΠ1 + vt t ≤ TRF

1 + Z ′tΠ2 + vt t > TRF

, E[ǫtZt] = 0, E[vtZt] = 0. (6.1)

The errors (ǫt, vt) are i.i.d. jointly normally distributed with mean 0, variances σ2ǫ = σ2

v = 1

and correlation ρ; the instruments Zt are i.i.d jointly normally distributed with mean zero and

variance-covariance matrix equal to the identity matrix, and independent of (ǫt, vt). Let ιk be

the (k, 1) vector of ones, and let R2i = Var(Z ′

tΠi)/Var(Z′tΠi + vt) be the theoretical R-square

over subsamples i = 1 with t ≤ TRF and i = 2 with t > TRF . Then the model parameters are

chosen such that:

(α β) = (0 0) , Πi = diιq−1 , (i = 1, 2) with d1 =√

R2

1

(q−1)(1−R2

1), d2 = d1 + b .

We consider three versions of the model: homoskedastic (HOM) with σ2t = 1; heteroskedastic

(HET1) with σ2t = (1 Z ′

t)

(1

Zt

)/q ; and heteroskedastic (HET2) of the GARCH(1,1) type,

with σ2t = 0.1 + 0.6u2t−1 + 0.3σ2

t−1 and ut = σtǫt.

To assess the identification strength over each subsample, we report the concentration

parameter, which is defined over each subsample i of size Ti as:

µ2i = TiR

2i /(1− R2

i ).

We are interested in the slope parameter β and we compare the performance of three esti-

mators of β: B-GMM, B-2SLS and GMM. In experiment 1, their performances are evaluated

by computing the Monte-Carlo bias, standard deviations, root-mean squared errors (RMSE),

as well as the length and coverage of corresponding 95% confidence intervals20, for various

configurations of the model. In experiment 2, we investigate these performance measures as a

function of the location of the break. Finally, in experiment 3, we investigate the finite sample

properties of the above estimators when there is no break.

• Experiment 1:

Our benchmark model considers sample size T = 400, endogeneity parameter ρ = 0.5,

one RF break at TRF = 160 and a break size b = 1. We use q = 4 instruments (including

the intercept). The identification strength is strong over each subsample with associated

concentration parameters µ21 = 40 and µ2

2 = 1.2 × 103: the implied reduced form parameters

20The standard errors of each estimator are computed using the formulas in Theorem 1. We use HAC-type

estimators under conditional heteroskedasticity.

20

are d1 = 0.29 and d2 = 1.29, and the theoretical R-squares over each subsample are R21 = 0.2

and R22 = 0.83.

We explore different configurations of the model. First, we decrease µ21 to display weaker

identification in the first subsample, while the second subsample remains strong: µ21 = 8.4

(and R21 = 0.05), and µ2

1 = 1.6 (and R21 = 0.01). The break size is still b = 1, but the implied

reduced form parameters are now d1 = 0.13 and d1 = 0.06, respectively. Then, we consider

larger sample size, T = 800, more instruments, q = 6 or a larger endogeneity parameter,

ρ = 0.75. In all these experiments, the break is assumed to be known, and the results are

displayed in Tables 5 to 7 (for the HOM and HET cases). The results for cases where the break

location is unknown and estimated are displayed in Tables 8 to 10 (for the HOM and HET

cases): three break sizes, 1, 0.5, and 0.2, are considered; µ21 = 40 and d1 = 0.29 throughout,

while d2 = 1.29, 0.79, and 0.49. All the results are based on 5, 000 replications.

When the break is known, the main results do not vary much over the different specifica-

tions. We therefore focus on the benchmark model mentioned at the beginning of Experiment

1. Under homoskedasticity, as expected, B-GMM and B-2SLS are very close in terms of

bias, standard deviation, and RMSE. Their RMSE are significantly smaller than for GMM.

The biases of B-GMM and B-2SLS tend to be larger than that of GMM, but they are well-

compensated by the gains in terms of standard deviation; in addition, when the sample size

increases, these biases decrease as expected. When looking at the 95% confidence intervals

of the slope parameter, B-GMM displays the shortest ones while maintaining good cover-

age properties. Under conditional heteroskedasticity, the standard deviation and RMSE of

B-GMM are much smaller than those of B-2SLS.

When the break-point is treated as unknown, the break size is important for the accuracy

of the estimated break location. With a break size of 1, the estimated break is quite reliable

with an average (over the estimated breaks) very close to the actual break: the average is

161.3 with a true break at 160. When the break size decreases, the quality of the estimator

of the break location deteriorates: for instance, with a true break at 160 and a break size of

0.2, the average is 172.4. Reliable estimation of the location of the break is crucial for the

bias properties of B-GMM and B-2SLS; we can see that when the break is not accurately esti-

mated, their biases increase, and the coverage properties of the confidence intervals worsen.21

This bias should not be too much of a concern, because it only appears when the break size

is small, and, oftentimes, such small breaks cannot be detected; see also experiment 3 below.

• Experiment 2: Performance as a function of the true location of the break.

Given the results of section 3, at least asymptotically, it is always efficient to ”split” the

sample in order to double the number of moments. In finite samples, however, the size of the

21One remedy consists in discarding the data around the estimated break (e.g. in a confidence interval for

the break location). This simple strategy should mitigate the drawback from estimating the break inaccurately.

However, it does require the asymptotic distribution of the break, which is beyond the scope of this paper.

21

subsamples and their identification strength may matter. Therefore, we investigate how the

performance of all estimators varies with the true location of the break and the strength of

identification in each subsample.

We consider two versions of the model in (6.1), all with T = 400, ρ = 0.5, q = 4, d1 = 0.1925,

R21 = 0.1, and break location that changes from (0.1 × 400) to (0.9 × 400): accordingly the

identification strength over the first subsample is characterized by µ21 between 4.4 and 40:

• model (i): the break size is b = −0.385 with associated parameter d2 = −0.1925 and

R22 = 0.1. The identification strength in the second subsample is characterized through

µ22 with values between 40 and 4.4.

• model (ii): the break size is b = −0.5 with associated parameter d2 = −0.3075 and

R22 = 0.22. The identification strength in the second subsample remains somewhat

strong and is characterized through µ22 with values between 102.1 and 11.3.

The results under homoskedasticity and conditional heteroskedasticity are presented in

Figures 3 and 4: two measures of performance are considered, the Monte-Carlo RMSE (left),

and the Monte-Carlo standard deviation (right). All results are based on 5,000 replications.

In model (i), both Monte-Carlo RMSE and standard deviations for B-GMM estimators are

stable as the break location changes from (0.1 × 400) to (0.9 × 400). This is quite different

for GMM. Both its RMSE and standard deviation are larger, and they are both increasing as

a function of the location of the break until it is in the middle of the sample, then they are

decreasing to return to their original levels. Results for model (ii) are very similar.

• Experiment 3: Finite sample properties when there is no break.

We now investigate the finite sample properties of our estimators when there is no RF break,

but a break-point is still estimated and used to compute B-GMM and B-2SLS estimators. We

consider the following versions of the model in (6.1): T = 400 or T = 800, ρ = 0.5, q = 4, and

di = 0.29 for i = 1, 2. There is no break in RF because we set Π1 = Π2 = diι4. However, the

econometrician still believes that there is a break in RF and estimates it in order to compute

B-GMM and B-2SLS; the overall GMM estimator is also computed as a benchmark. Finite

sample properties of these three estimators are evaluated by computing the Monte-Carlo bias,

standard deviations, root-mean squared errors (RMSE), as well as the length and coverage of

corresponding 95% confidence intervals. The results under HOM and HET1 are presented in

Table 11; in Table 12, the sample size is doubled.

First, it is interesting to point out that the estimated break-point is near the middle of

the sample on average. Second, and without much surprise, the finite sample performance of

B-GMM (and B-2SLS) is not as good as standard GMM in absence of a break. Nevertheless,

under both homoskedasticity or conditional heteroskedasticity, the standard deviations and

RMSEs of the three estimators are very close to each other, especially for larger sample sizes;

22

this holds despite the bias of the GMM estimator being much smaller than for B-GMM and

B-2SLS. This supports the (asymptotic) results derived in Theorem 7.

7 The New Keynesian Phillips Curve

7.1 Model

To our knowledge, virtually all papers estimate the New Keynesian Phillips Curve (NKPC)

at the quarterly frequency - see a.o. Sbordone (2002, 2005), Christiano, Eichenbaum and

Evans (2005), Zhang, Osborn and Kim (2008), HHB and Magnusson and Mavroeidis (2014).

This comes with two shortcomings. The first one is small samples, which leads to unreliable

estimation in the presence of multiple breaks. The second is that the US monetary policy

is set twice per quarter, so quarterly NKPC models may not be as informative as monthly

NKPC models.

In this paper, we estimate the NKPC at the monthly frequency, which mitigates these

shortcomings.22 However, we face two additional challenges. The first is data availability,

which we solve in the data section below. The second is how to write down a reasonable

monthly NKPC, given the empirical evidence that prices are indexed to the previous quarter.

With the exception of Zhang, Osborn and Kim (2008), most NKPC papers assume that

when prices cannot be re-optimized, they are either kept fixed or indexed to last quarter’s

inflation and not to more lags of inflation - see Galı and Gertler (1999), Sbordone (2002,

2005), Smets and Wouters (2003), Christiano, Eichenbaum and Evans (2005) and Magnusson

and Mavroeidis (2014). Many of the above papers find strong evidence that prices are indexed

to the previous quarter, and therefore to at least three lags of monthly inflation.23

More than one lag in the NKPC model can be obtained if one imposes dynamics in the

marginal cost process or in the structural shocks - see Krogh (2015) for a summary. It can

also be obtained from more primitive assumptions on the price indexation mechanism. For

example, Zhang, Osborn and Kim (2008) derive a NKPC model with multiple lags, but they

follow Galı and Gertler (1999) in assuming that backward and forward-looking firms co-exist.

The backward-looking firms never optimize and index to past lags of inflation, while the

forward-looking firms either keep their prices fixed or re-optimize when chosen (according to

an exogenous Calvo (1983) type mechanism). We derive the NKPC model in the supplemental

appendix by assuming that all firms are forward-looking when selected to re-optimize. In-

between optimizations, they do not keep their prices fixed, but index them to past lags of

inflation. This is the framework of Smets and Wouters (2003), Woodford (2003, Ch.3), and

22Because of the aforementioned shortcomings, especially issues related to small samples with multiple

breaks, we do not use quarterly data. HHB analyze multiple breaks with quarterly data, but point out in their

empirical section that some subsamples are too small to allow for a reliable interpretation of the results.23In our empirical analysis, we found that more lags are insignificant in the NKPC.

23

Christiano, Eichenbaum and Evans (2005). In the supplemental appendix, we generalize these

models (with sticky prices but not sticky wages) to capture price indexation to ℓ previous lags

rather than one lag of inflation.24

To illustrate the usefulness of the B-GMM estimation procedure for the monthly NKPC, we

first estimate an unconstrained version of the NKPC derived in the supplemental appendix,

where the parameters are not constrained to satisfy the nonlinear restrictions implied by the

structural model. Thus,

πt = αc + αb,1 πt−1 + αb,2 πt−2 + αb,3 πt−3 + αf πet + αy yt + ut, (7.1)

where yt is the output gap (in log difference from steady-state output), πt is inflation measured

as the log difference in prices between two periods, and inflation expectations at time t are

denoted πet .25

7.2 Data

We construct our dataset from monthly observations for the US in the period 1978:1-2010:6.

The inflation series is the annualized difference in log CPI taken from the FRED database,

i.e. πt = 1200(logCPIt − log CPIt−1). The output gap is the monthly log real GDP mi-

nus the log potential real GDP, calculated by the HP filter with monthly constant 14400:

yt = 100(log (real GDPt)− log (potential real GDPt)). A monthly real GDP series is avail-

able up to 2010:6 at http://www.princeton.edu/∼mwatson/mgdp gdi. We proxy inflation

expectations with the one year-ahead Michigan inflation expectations πet , the only series of

measured inflation expectations that, to our knowledge, is free and available at monthly fre-

quency.26 The Michigan inflation series starts at 1978:1. After constructing all series and their

lags, our data span is 1978:5-2010:6 (386 observations). Figure 1 plots the data.27

24As summarized by Krogh (2015) and shown in Mavroeidis (2005) and Nason and Smith (2008), second

order dynamics in either the marginal cost or the structural errors are useful for identifying the NKPC. Our

empirical results - see in particular Table 2 - suggest that besides the lags we use, no further dynamics is

needed for identification, since the instruments we use are not weak over the entire sample.25Galı and Gertler (1999) attribute the usual findings of negative and/or insignificant estimates of αy at

quarterly frequency to measurement errors in potential output. They propose using the NKPC model with

output gap replaced by marginal cost, which is not observed and is therefore proxied by the average unit labor

cost. However, to our knowledge, average unit labor costs are not available monthly. Moreover, there is still

a lot of criticism regarding the use of average unit labor cost as a proxy - see e.g. Rudd and Whelan (2005).26Proxying inflation expectations with this series is common in the literature, and even recommended by

Coibion and Gorodnichenko (2015), who argue that this series introduce substantial information about the

unobserved firm level inflation expectations defined in the NKPC.27We find that our results are robust to the inflation outlier around 2008, so we do not remove it.

24

http://www.princeton.edu/~mwatson/mgdp_gdi

7.3 Unrestricted Estimates

For all methods, Zt includes an intercept, three lags of inflation, three lags of expected inflation

and three lags of output gap. Since πet and yt are both endogenous, we have two reduced forms,

projecting πet and yt onto the instrument set Zt. Without breaks, they are:

πet = Z ′t∆1 + vt1, (7.2)

yt = Z ′t∆2 + vt2. (7.3)

• SMI break. As discussed in section 2, the variance of the instruments changed after

the Great Moderation, so we have a SMI break. The SMI break can be estimated via our

multivariate FGLS estimator applied to all the unique elements of the ZtZ′t matrix. However,

because Zt contains lags, this is not necessary, and we chose to estimate the SMI break as a

joint break in the multivariate mean of squared inflation squared, squared expected inflation

and squared output gap.28 For these variables, the BP estimators of a break in mean are

1981:9, 1981:11 and 1983:5. Using any of these estimators as a starting value, the FGLS

estimator of the Great Moderation break is 1981:11.

• RF breaks. As discussed in section 2, the RF also features the Great Moderation break,

which we impose at 1981:11 because this estimator converges faster than the RF break-point

estimator of the Great Moderation - see comment (ii) after the Corollary 1. Additionally,

the RF likely features the oil price collapse, see e.g. Bernanke (2004) and Galı and Gambetti

(2008). The BP estimator of the oil price collapse in the sample 1981:12-2010:6 is at 1985:12

for the first RF and at 1985:10 for the second RF. Using any of these estimators as a starting

value, the FGLS estimator of the oil price collapse in the post Great-Moderation sample

1981:12-2010:6 is at 1985:10.29

• VMC breaks. As discussed in section 2, the SMI break is likely a VMC break too and

thus it is already taken into account in the construction of B-GMM.30

7.3.1 Results for the baseline specification

In our baseline specification, the structural parameters of interest in (7.1) are assumed to be

stable over time; robustness checks with potential structural parameter breaks are in section

7.3.2. For the validity of our estimation results with B-GMM in this section, we need suffi-

ciently strong identification as in Assumption A2. Evidence for this is also provided in section

7.3.2.

28Note that lags of these will exhibit the same SMI breaks.29Even though the oil price collapsed in the first half of 1986, it experienced a sharp decline prior to that,

in 1985 - see Gately (1986), Figure 6.30Further robustness checks regarding the number of break-points and the estimated lo-

cations of the breakpoints can be found in the November 2015 version of this paper:

http://www.sfu.ca/∼baa7/research/AntoineBoldeaWP2015.

25

http://www.sfu.ca/~baa7/research/AntoineBoldeaWP2015

The unrestricted parameter estimates over the full sample (using B-GMM, B-2SLS, and

standard full sample GMM and 2SLS), all with Newey and West (1994) HAC robust standard

errors, are reported in Panel A of Table 1. This panel shows that all the B-GMM estimates

are more precise than the GMM estimates, and that the output gap parameter αy is not

significant when using GMM but is strongly significant when using B-GMM.

In the first and the third sample, the forward-looking coefficient αf is always significant and

much larger than the backward-looking component αb defined as αb,1+αb,2+αb,3. This result

is in line with most quarterly findings - see e.g. Galı and Gertler (1999), Sbordone (2005),

Zhang, Osborn and Kim (2008) and HHB.

Table 1 Panel A also shows a positive and significant relationship between inflation and

output gap at the monthly level when using B-GMM. The αy estimates are positive and around

0.16, indicating that a 1% increase in monthly output will increase annual inflation by 0.16%,

all else constant. These reinforce the quarterly NKPC evidence of small but positive output

gap coefficients and stand in contrast to most studies who find a negative and/or insignificant

coefficient on output gap at the quarterly level - see e.g. Galı and Gertler (1999) for a summary

of these studies.

All B-GMM parameter estimates have smaller standard errors than their GMM counter-

parts in Table 1. Therefore, the significance of the output gap coefficient αy is due to more

efficient estimation via B-GMM and not to the use of monthly data.

7.3.2 Robustness checks

There are two main concerns regarding the validity of the previous results. First, the coef-

ficients of the structural equation (7.1) may not be stable over the whole sample; see e.g.

Kleibergen and Mavroeidis (2009) and Hall, Han, and Boldea (2012). Second, the identifica-

tion strength may be weak over the whole sample.

• Stability of the structural equation (7.1):

To shed some light on the stability concern, we consider two potential structural parameter

breaks: the Great Moderation break, and the break in the recent crisis.31 The first break is

also a SMI break (break in the variance of our instruments) and is therefore set at 1981:11,

the multivariate SMI break-point estimate. The second break is set for simplicity before the

recent crisis, at 2006:12.

We thus re-estimate the model over two smaller samples: 1981:12 to 2010:6 and 1981:12 to

2006:12. Both these samples feature the oil price collapse as a RF break at 1985:10, so the

31The Great Moderation break is motivated by the analysis in Kleibergen and Mavroeidis (2009), cited in

Magnusson and Mavroeidis (2014), page 1842, and by the findings of Hall, Han and Boldea (2012), page 294.

The second structural parameter break is motivated by the fact that in the recent crisis, inflation has been

much lower than previously.

26

efficiency arguments in section 3.1 still hold; this is confirmed by Panels B and C of Table

1. In addition, most of the other results found over the full sample (and displayed in Panel

A) remain true over the smaller samples, including the positive and significant relationship

between inflation and output gap (see Panels B and C).

• Identification strength of the structural equation (7.1):

For the validity of the B-GMM method, at least one subsample has to be sufficiently strongly

identified, as stated in Assumption A2. This assumption is supported in the literature by the

findings of Zhang, Osborn and Kim (2008, 2009) and Kleibergen and Mavroeidis (2009, Table

4). All these papers use instrument sets that are similar to ours. The first two papers find

strong instruments over the whole sample; in the third paper, Table 4 shows much tighter con-

fidence sets after the Great Moderation compared to before, suggesting stronger identification

in this latter period.

In our analysis, the null of weak identification is rejected over all subsamples and over the

full sample, as shown by the reduced rank test of Kleibergen and Paap (2006) displayed in

Table 2. This test is robust to heteroskedasticity and multiple endogenous variables.32 These

results suggest no identification issues over the sample and data we consider.

To summarize, the findings in section 7.3.1 remain robust to the aforementioned concerns.

Additionally, the J-test on all stable subsamples cannot reject the validity of our instruments.

7.4 Restricted Estimates

In this section, we estimate the deep parameters of the NKPC model derived in the supple-

mental appendix:

πt = ψc +ρ1−ρ21+ρ1

πt−1 +ρ2−ρ31+ρ1

πt−2 +ρ3

1+ρ1πt−3 +

11+ρ1

πet +(1−θ)2θ(1+ρ1)

ψy yt + ǫt, (7.4)

where ρ1, ρ2, and ρ3 are the price indexation parameters to the first three lags of inflation,

and θ is the price stickiness parameter. Also, ψc is an unconstrained constant and ψy = Θψ

with Θ = 1−a1−a+ǫa , ψ = σ + ϕ+a

1−a , 1 − a the labor elasticity of output in the Cobb-Douglas

production function, ǫ the price aggregation parameter such that the steady-state marginal cost

is M = ǫǫ−1

, and σ, ϕ the utility function parameters for consumption and labor, respectively.

Since ψy is not separately identifiable from ρ1 and θ, we calibrate ψy = 3.33

32To our knowledge, it is the only test robust to both heteroskedasticity and multiple endogenous variables.

The efficient first stage F test recently proposed by Montiel Olea and Pflueger (2013) can only accommodate

one endogenous variable.33This value corresponds to the calibrations in Magnusson and Mavroeidis (2014) and Galı (2008), Ch.3.

We set a = 1/3 and ǫ = 6, thus M = 1.2 and Θ = 0.25, and we impose log utility for consumption and labor

(σ = ϕ = 1).

27

The restricted model (7.4) remains linear in the regressors and instruments, but becomes

nonlinear in the parameters of interest that correspond to the deep parameters of the NKPC

model. Our original (linear) framework of section 2 needs to be extended and the applicability

of our B-GMM procedure justified: such derivations are done in Appendix C. This means that

both breaks we previously considered can be used directly to augment the set of instruments

and obtain more efficient estimates, as for the unrestricted model (7.1). The only difference

is that we now estimate the parameters in the model (7.4) with a nonlinear GMM procedure.

We also impose the following two restrictions in the estimation34: (i) |ρ| ≤ 1 with ρ defined

as ρ1 + ρ2 + ρ3; (ii) 0.0001 < θ < 0.9999.

We estimate (7.4) by nonlinear two-step GMM or two-step B-GMM, with the same instru-

ments as before. In Table 3, the B-GMM estimates for ρ1, ρ2, and ρ3 for the full sample

are significant, while the GMM estimates are not all significant yet of similar magnitude,

highlighting yet again the efficiency gains from using the B-GMM procedure.

Looking at all samples in Table 3, there seems to be some evidence of price indexation to

all previous three months. The indexation to the third lag (ie to the previous quarter) is the

strongest: ρ3 is the largest estimate and it is significant for B-GMM across all sample periods.

The estimates for ρ1 and ρ2 are not significant for the last sample period, which excludes the

recent recession. Therefore, we also report results with ρ1 = ρ2 = 0, in which case the prices

are only indexed to last quarter’s inflation.

The results for ρ1 = ρ2 = 0 are reported in Table 4.35 The B-GMM estimates of ρ3 are

around 0.29 − 0.33, and their confidence intervals, considerably shorter than GMM, provide

strong evidence against full indexation at the monthly level. The point estimate of ρ3 is of

similar magnitude to that in Sbordone (2005), Table 1. The confidence intervals are tighter

than in Sbordone (2005) (see Table 1, pp. 1194) and Magnusson and Mavroeidis (2014) (Figure

5, pp. 1841), indicating that the B-GMM estimates are more informative for price indexation

than previous studies.

The B-GMM parameter estimates for the price stickiness θ in the third sample (our preferred

sample given the recent crisis) are significant and around 0.62, implying 1/(1− θ) ∼ 3 months

between price re-optimizations. This price duration is close to that found in Klenow and

Kryvtsov (2008) (4-7 months) and shorter than that in Nakamura and Steinsson (2008) (7-8

months).

Figure 2 shows the joint confidence regions for the estimates of ρ3 and θ for our preferred

sample 1981:12-2006:12. As can be seen from the picture, the B-GMM confidence region is

34We assume |ρ| ≤ 1 to avoid that firms overcompensate for past inflation. The lower bound on θ ensures

that the moment conditions are differentiable, and the upper bound ensures that some positive fraction of

prices is re-optimized each period.35Note that ρ1 = ρ2 = 0 implies that the restricted parameter estimates for the forward and backward

looking components are αf = 1, αb,1 = 0, αb,2 = −ρ3, αb,3 = ρ3, and they sum up to one. The fact that αf = 1

does not yield non-stationarity problems in our model because we are using measured inflation expectations.

28

much tighter than its GMM counterpart. Overall, we conclude that GMM is less informative

than B-GMM, and that relying on B-GMM, one finds strong evidence of price stickiness and

short price durations.

8 Conclusion

In this paper, we focus on changes in the limiting Hessian of the GMM minimand. We

decompose them into breaks in the reduced form, in the second moment of instruments and/or

in the variance of moment conditions. We show how to exploit them for more efficient GMM

estimators than currently available, and we call these estimators B-GMM.

Analyzing a newly developed NKPC model with a general inflation indexation scheme, we

show that using two changes, the Great Moderation and the oil price collapse, the B-GMM

estimators have tighter confidence intervals than the full sample GMM estimators, delivering

strong evidence against full inflation indexation and for short price durations.

References

[1] D.W.K. Andrews and J.H. Stock, Inference with weak instruments, Econometric Society

Monograph Series, vol. 3, ch. 8 in Advances in Economics and Econometrics, Theory and

Applications: Ninth World Congress of the Econometric Society, Cambridge University

Press, Cambridge, 2005.

[2] B. Antoine and O. Boldea, Inference in linear models with structural changes

and mixed identification strength, Working Paper, Simon Fraser University (2015),

’http://www.sfu.ca/ baa7/AntoineResearch.html’.

[3] B. Antoine and E. Renault, Efficient GMM with nearly-weak instruments, The Econo-

metrics Journal, Tenth Anniversary Issue 12 (2009), 135–171.

[4] J. Bai, Estimation of a change point in multiple regression models, Review of Economics

and Statistics 79 (1997), 551–563.

[5] J. Bai, R.L. Lumsdaine, and J.H. Stock, Testing for and dating common breaks in multi-

variate time series, Review of Economic Studies 65 (1998), 395–432.

[6] J. Bai and P. Perron, Estimating and testing linear models with multiple structural

changes, Econometrica 66 (1998), 47–78.

[7] B. Bernanke, The Great Moderation, ch. 5 of ’The Taylor rule and the transformation of

monetary policy’, pp. 143–182, Hoover Institution, 2004.

29

[8] G. Chamberlain, Asymptotic efficiency in estimation with conditional moment restric-

tions, Journal of Econometrics 34 (1987), 305–334.

[9] L.J. Christiano, M. Eichenbaum, and C.L. Evans, Nominal rigidities and the dynamic

effects of a shock to monetary policy, Journal of Political Economy 113 (2005), 1–45.

[10] O. Coibion and Y. Gorodnichenko, Is the Phillips Curve alive and well after all? Inflation

expectations and the missing disinflation, American Economic Journal - Macroeconomics

7 (2015), 197–232.

[11] R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, New York, 2004.

[12] J.-M. Dufour, L. Khalaf, and M. Kichian, Inflation dynamics and the New Keynesian

Phillips Curve: an identification robust econometric analysis, Journal of Economic Dy-

namics and Control 30 (2006), 1707–1727.

[13] J. Galı, Monetary policy, inflation, and the business cycle: an introduction to the New

Keynesian framework, Princeton University Press, Princeton and Oxford, 2008.

[14] J. Galı and L. Gambetti, On the sources of the Great Moderation, NBER Working Paper

No. 14171 (2008).

[15] J. Galı and M. Gertler, Inflation dynamics: a structural econometrics analysis, Journal

of Monetary Economics 44 (1999), 195–222.

[16] D. Gately, Lessons from the 1986 oil price collapse, Brooking Papers on Economic Activity

17 (1986), 237–284.

[17] A. R. Hall, S. Han, and O. Boldea, Inference regarding multiple structural changes in

linear models with endogenous regressors, Journal of Econometrics 170 (2012), 281–302.

[18] A.R. Hall, Generalized method of moments, Oxford University Press, New York, 2005.

[19] C. Hansen, J. Hausman, and W. Newey, Estimation with many instrumental variables,

Journal of Business and Economic Statistics 26 (2008), 398–422.

[20] F. Kleibergen and S. Mavroeidis, Weak instrument robust tests in GMM and the New

Keynesian Phillips Curve, Journal of Business and Economic Statistics 27 (2009), 293–

311.

[21] F. Kleibergen and R. Paap, Generalized reduced rank tests using the singular value de-

composition, Journal of Econometrics 133 (2006), 97–126.

[22] P.J. Klenow and O. Kryvstov, State-dependent or time-dependent pricing: does it matter

for recent U.S. inflation?, The Quarterly Journal of Economics 123 (2008), 863–904.

30

[23] T.S. Krogh, Macro frictions and theoretical identification of the New Keynesian Phillips

Curve, Journal of Macroeconomics 43 (2015), 191–204.

[24] L. Magnusson and S. Mavroeidis, Identification using stability restrictions, Econometrica

82 (2014), 1799–1851.

[25] S. Mavroeidis, Identification issues in forward-looking models estimated by GMM, with

an application to the Phillips curve, Journal of Money, Credit and Banking 37 (2005),

421–448.

[26] J.L. Montiel-Olea and C. Pflueger, A robust test for weak instruments, Journal of Business

and Economic Statistics 31 (2013), 358–369.

[27] E. Nakamura and J. Steinsson, Five facts about prices: a reevaluation of menu cost models,

Quarterly Journal of Economics 123 (2008), 1415–1464.

[28] J. Nason and G.W. Smith, Identifying the New Keynesian Phillips Curve, Journal of

Applied Econometrics 23 (2008), 525–551.

[29] W.K. Newey and K.D. West, Automatic lag selection in covariance matrix estimation,

The Review of Economic Studies 61 (1994), 631–653.

[30] Z. Qu and P. Perron, Estimating and testing structural changes in multivariate regressions,

Econometrica 75 (2007), 459–502.

[31] R. Rigobon, Identification through heteroskedasticity, Review of Economics and Statistics

85 (2003), 777–792.

[32] J. Rudd and K. Whelan, New tests of the New-Keynesian Phillips Curve, Journal of

Monetary Economics 52 (2005), 1167–1181.

[33] A. M. Sbordone, Prices and unit labor costs: a new test of price stickiness, Journal of


[34] A.M. Sbordone, Do expected future marginal costs drive inflation dynamics?, Journal of


[35] F. Smets and R. Wouters, An estimated dynamic stochastic general equilibrium model of

the euro area, Journal of the European Economic Association 1 (2003), 1123–1175.

[36] D. Staiger and J. Stock, Instrumental variables regression with weak instruments, Econo-

metrica 65 (1997), 557–586.

[37] J. Stock, J. Wright, and M. Yogo, A survey of weak instruments and weak identification in

Generalized Method of Moments, Review of Economics and Statistics 20 (2002), 518–529.

31

[38] J.H. Stock and M.W. Watson, Has the business cycle changed and why?, ch. in NBER

Macroeconomics Annual, pp. 159–230, edited by M. Gertler and K. Rogoff, 2002.

[39] M. Woodford, Interest and prices: foundations of a theory of monetary policy, Princeton

University Press, Princeton and Oxford, 2003.

[40] J.M. Wooldridge and H. White, Some invariance principles and central limit theorems

for dependent heterogeneous processes, Econometric Theory 4 (1988), 210–230.

[41] C. Zhang, D. R. Osborn, and D. H. Kim, The New Keynesian Phillips Curve: from sticky

inflation to sticky prices, Journal of Money, Credit and Banking 40 (2008), 667–699.

[42] C. Zhang, D.R. Osborn, and D.H. Kim, Observed inflation forecasts and the New Keyne-

sian Phillips Curve, Oxford Bulletin of Economics and Statistics 71 (2009), 375–398.

32

Appendix

Appendix A contains all the tables and graphs. Appendix B contains the proofs of the theo-

retical results stated in sections 3-5. Finally, in Appendix C we show the applicability of our

B-GMM estimation procedure in the extended framework of section 7.4.

A Tables and Figures

Table 1: Unrestricted Structural Form Estimation

Method αc αb,1 αb,2 αb,3 95% CI αb,3 αf 95% CI αf αy 95% CI αy

Panel A: Full sample 1978:5-2010:6

2SLS −0.37∗∗ −0.00 −0.22∗∗ 0.45∗∗∗ [0.20, 0.70] 0.44∗∗∗ [0.23, 0.65] 0.15 [−0.06, 0.35]

(0.16) (0.08) (0.09) (0.13) (0.11) (0.11)

GMM −0.54∗∗∗ −0.04 −0.20∗∗ 0.35∗∗∗ [0.10, 0.60] 0.55∗∗∗ [0.35, 0.75] 0.13 [−0.06, 0.32]

(0.15) (0.08) (0.08) (0.13) (0.10) (0.10)

B-2SLS −0.40∗∗∗ −0.01 −0.22∗∗∗ 0.44∗∗∗ [0.17, 0.70] 0.46∗∗∗ [0.26, 0.66] 0.17∗∗ [0.02, 0.31]

(0.16) (0.08) (0.09) (0.13) (0.11) (0.11)

B-GMM −0.47∗∗∗ −0.04 −0.21∗∗∗ 0.40∗∗∗ [0.33, 0.60] 0.51∗∗∗ [0.45, 0.57] 0.16∗∗∗ [0.08, 0.25]

(0.08) (0.03) (0.04) (0.03) (0.03) (0.04)

Panel B: Post Great Moderation sample 1981:12-2010:6

2SLS 0.78 0.06 −0.23∗∗∗ 0.51∗∗∗ [0.25, 0.78] 0.02 [−0.49, 0.53] 0.17∗ [−0.02, 0.37]

(0.60) (0.09) (0.09) (0.14) (0.26) (0.10)

GMM 0.20 −0.03 −0.19∗∗∗ 0.38∗∗∗ [0.14, 0.62] 0.29 [−0.17, 0.75] 0.18∗ [−0.00, 0.36]

(0.54) (0.08) (0.08) (0.12) (0.23) (0.09)

B-2SLS 0.59 0.05 −0.23∗∗∗ 0.50∗∗∗ [0.20, 0.81] 0.09 [−0.47, 0.65] 0.16∗∗ [0.01, 0.31]

(0.60) (0.09) (0.09) (0.14) (0.26) (0.10)

B-GMM 0.50∗∗ 0.03 −0.22∗∗∗ 0.48∗∗∗ [0.40, 0.62] 0.13 [−0.05, 0.32] 0.15∗∗∗ [0.08, 0.23]

(0.24) (0.04) (0.04) (0.04) (0.09) (0.04)

Panel C: Post Great Moderation and Pre-2007 Crisis sample 1981:12-2006:12

2SLS 0.02 −0.00 −0.30∗∗∗ 0.35∗∗∗ [0.16, 0.53] 0.41∗∗ [0.02, 0.80] 0.24∗∗ [0.02, 0.45]

(0.48) (0.07) (0.08) (0.09) (0.20) (0.11)

GMM −0.16 0.01 −0.31∗∗∗ 0.35∗∗∗ [0.18, 0.53] 0.46∗∗ [0.08, 0.84] 0.22∗∗ [0.02, 0.41]

(0.47) (0.07) (0.08) (0.09) (0.20) (0.10)

B-2SLS −0.22 −0.03 −0.30∗∗∗ 0.32∗∗∗ [0.17, 0.48] 0.51∗∗∗ [0.20, 0.81] 0.22∗∗ [0.04, 0.41]

(0.48) (0.07) (0.08) (0.09) (0.20) (0.11)

B-GMM −0.32 −0.02 −0.30∗∗∗ 0.32∗∗∗ [0.26, 0.53] 0.54∗∗∗ [0.33, 0.74] 0.16∗∗∗ [0.05, 0.27]

(0.28) (0.05) (0.04) (0.03) (0.11) (0.05)

All GMM and B-GMM estimates are second step estimates with HAC robust standard errors. 2SLS refers to

the standard 2SLS on the full sample. ∗∗∗ indicates statistical significance at 1%, ∗∗ at 5%, and ∗ at 10%.

33

Table 2: Assessing the identification strength

Reduced rank test of Kleibergen and Paap (2006) for the baseline model (7.2)-(7.3)

Sample Full sample Pre Great Moderation Post Great Moderation

1978:5-2010:6 1978:5-1981:11 1981:12-1985:10 1985:11-2010:6

p-value 0.0000 0.0000 0.0000 0.0000

Figure 1: Inflation, Inflation Expectations and Output Gap

1985:1 1990:1 1995:1 2000:1 2005:1

-8

-6

-4

-2

0

2

4

6

8

10

πt

πte

yt

2010:61978:5

34

Table 3: Restricted Structural Form Estimation

Method ψc ρ1 ρ2 ρ3 95% CI ρ3 θ 95% CI θ ρ 95% CI ρ


GMM −1.36∗∗∗ 0.27 0.23 0.50∗∗ [0.07, 0.93] 0.9970 [−77.99, 79.99] 1.00∗ [−0.15, 2.15]

(0.23) (0.22) (0.17) (0.22) (40.30) (0.59)

B-GMM −1.36∗∗∗ 0.26∗∗ 0.18∗∗ 0.45∗∗∗ [0.29, 0.61] 0.9922 [−10.28, 12.27] 0.90∗∗∗ [0.38, 1.41]

(0.14) (0.12) (0.08) (0.08) (5.75) (0.26)


GMM −1.74∗∗∗ −0.01 0.03 0.26 [−0.08, 0.60] 0.8023∗∗∗ [0.31, 1.29] 0.29 [−0.64, 1.21]

(0.31) (0.18) (0.13) (0.17) (0.25) (0.48)

B-GMM −1.55∗∗∗ 0.13 0.16∗ 0.41∗∗∗ [0.18, 0.64] 0.8792∗∗ [0.18, 1.57] 0.70∗∗ [0.07, 1.33]

(0.18) (0.13) (0.09) (0.12) (0.35) (0.32)


GMM −1.89∗∗∗ −0.09 −0.07 0.29∗∗∗ [0.07, 0.50] 0.6823∗∗∗ [0.41, 0.95] 0.13 [−0.42, 0.68]

(0.23) (0.11) (0.08) (0.11) (0.14) (0.28)

B-GMM −1.94∗∗∗ −0.11 −0.05 0.24∗∗∗ [0.10, 0.38] 0.6821∗∗∗ [0.51, 0.85] 0.08 [−0.27, 0.44]

(0.17) (0.08) (0.05) (0.07) (0.09) (0.18)

All estimates are second step estimates with HAC robust standard errors. Confidence intervals for ρ =

ρ1+ ρ2+ ρ3 are obtained via the delta method. ∗∗∗ indicates statistical significance at 1%, ∗∗ at 5%, ∗ at 10%.

Table 4: Restricted Structural Form Estimation with ρ1 = ρ2 = 0

Method ψc ρ3 95% CI ρ3 θ 95% CI θ


GMM −1.69∗∗∗ 0.29∗∗∗ [0.19, 0.40] 0.7093∗∗∗ [0.30, 1.12]

(0.48) (0.05) (0.21)

B-GMM −1.73∗∗∗ 0.29∗∗∗ [0.22, 0.37] 0.9952 [−20.37, 22.36]

(0.25) (0.04) (10.90)


GMM −1.74∗∗∗ 0.29∗∗∗ [0.17, 0.40] 0.8651 [−0.33, 2.06]

(0.47) (0.06) (0.61)

B-GMM −1.76∗∗∗ 0.29∗∗∗ [0.19, 0.38] 0.9078 [−0.54, 2.35]

(0.36) (0.05) (0.74)


GMM −1.71∗∗∗ 0.37∗∗∗ [0.24, 0.50] 0.6638∗∗∗ [0.27, 1.06]

(0.58) (0.07) (0.20)

B-GMM −1.71∗∗∗ 0.33∗∗∗ [0.22, 0.43] 0.6288∗∗∗ [0.41, 0.85]

(0.40) (0.05) (0.11)

All estimates are second step estimates with HAC robust standard errors. ∗∗∗ indicates statistical significance

at 1%, ∗∗ at 5%, and ∗ at 10%.

35

Figure 2: Confidence regions for θ and ρ in the sample 1981:12-2006:12

GMM Confidence Regions

0 0.2 0.4 0.6 0.8 1

Indexation 3

0

0.2

0.4

0.6

0.8

1

Pric

e st

icki

ness

95%90%

B-GMM Confidence Regions

0 0.2 0.4 0.6 0.8 1

Indexation 3

0

0.2

0.4

0.6

0.8

1

Pric

e st

icki

ness

95%90%

36

Table 5: Experiment 1 with known break and HOM

Benchmark case:

Estimator Bias Std dev RMSE Length Coverage

B-GMM 0.0023 0.0297 0.0298 0.1123 0.9340

B-2SLS 0.0023 0.0293 0.0294 0.1480 0.9860

GMM 0.0005 0.0342 0.0342 0.1305 0.9416

The concentration parameter µ2

1decreases from 40 to 8.4:


B-GMM 0.0031 0.0341 0.0342 0.1294 0.9342

B-2SLS 0.0030 0.0337 0.0338 0.1694 0.9852

GMM 0.0008 0.0416 0.0416 0.1590 0.9432


1 decreases from 40 to 1.6:


B-GMM 0.0035 0.0366 0.0367 0.1389 0.9356

B-2SLS 0.0034 0.0361 0.0363 0.1815 0.9850

GMM 0.0010 0.0464 0.0464 0.1775 0.9422

Increase the sample size from 400 to 800:


B-GMM 0.0010 0.0193 0.0193 0.0755 0.9478

B-2SLS 0.0010 0.0192 0.0192 0.0988 0.9882

GMM 0.0003 0.0219 0.0219 0.0862 0.9496

Increase the number of IV from 3 to 6:


B-GMM 0.0025 0.0209 0.0211 0.0778 0.9322

B-2SLS 0.0026 0.0205 0.0206 0.1032 0.9830

GMM 0.0014 0.0240 0.0241 0.0917 0.9416

Increase the endogeneity coefficient from 0.5 to 0.75:


B-GMM 0.0034 0.0296 0.0298 0.1122 0.9332

B-2SLS 0.0033 0.0293 0.0294 0.1478 0.9840

GMM 0.0008 0.0342 0.0342 0.1305 0.9400

Table 6: Experiment 1 with known break and HET1

Benchmark case:


B-GMM 0.0047 0.0447 0.0450 0.1581 0.9110

B-2SLS 0.0022 0.0442 0.0442 0.2178 0.9826

GMM 0.0015 0.0527 0.0527 0.1930 0.9242




B-GMM 0.0061 0.0512 0.0516 0.1819 0.9140

B-2SLS 0.0029 0.0506 0.0507 0.2484 0.9802

GMM 0.0022 0.0641 0.0641 0.2352 0.9254


1 decreases from 40 to 1.6:


B-GMM 0.0070 0.0548 0.0553 0.1952 0.9158

B-2SLS 0.0033 0.0542 0.0543 0.2660 0.9812

GMM 0.0027 0.0715 0.0715 0.2625 0.9270



B-GMM 0.0023 0.0303 0.0304 0.1149 0.9396

B-2SLS 0.0010 0.0300 0.0300 0.1540 0.9866

GMM 0.0009 0.0346 0.0346 0.1339 0.9446



B-GMM 0.0043 0.0360 0.0362 0.1227 0.8996

B-2SLS 0.0018 0.0348 0.0348 0.1722 0.9862

GMM 0.0020 0.0411 0.0411 0.1496 0.9270



B-GMM 0.0068 0.0447 0.0452 0.1579 0.9074

B-2SLS 0.0031 0.0441 0.0442 0.2176 0.9804

GMM 0.0022 0.0527 0.0528 0.1931 0.9228

37

Table 7: Experiment 1 with known break and HET2

Benchmark case:


B-GMM 0.0015 0.0245 0.0246 0.0876 0.9342

B-2SLS 0.0015 0.0284 0.0284 0.1282 0.9894

GMM 0.0004 0.0305 0.0305 0.1106 0.9490




B-GMM 0.0019 0.0285 0.0286 0.1018 0.9344

B-2SLS 0.0020 0.0325 0.0326 0.1459 0.9884

GMM 0.0006 0.0372 0.0372 0.1350 0.9508

The concentration parameter µ21 decreases from 40 to 1.6:


B-GMM 0.0022 0.0309 0.0309 0.1097 0.9358

B-2SLS 0.0023 0.0349 0.0350 0.1561 0.9874

GMM 0.0007 0.0414 0.0414 0.1508 0.9522



B-GMM 0.0006 0.0171 0.0171 0.0621 0.9340

B-2SLS 0.0007 0.0203 0.0203 0.0895 0.9880

GMM -0.0001 0.0201 0.0201 0.0746 0.9502



B-GMM 0.0014 0.0165 0.0166 0.0578 0.9198

B-2SLS 0.0015 0.0211 0.0211 0.0899 0.9874

GMM 0.0005 0.0196 0.0196 0.0728 0.9408



B-GMM 0.0023 0.0245 0.0247 0.0875 0.9314

B-2SLS 0.0023 0.0284 0.0285 0.1281 0.9884

GMM 0.0006 0.0306 0.0306 0.1106 0.9498

Table 8: Experiment 1 with unknown break and HOM

Break size is equal to 1

Monte-Carlo average of estimated break location is T = 161.3


B-GMM 0.0030 0.0290 0.0292 0.1124 0.9380

B-2SLS 0.0029 0.0287 0.0288 0.3827 1.0000

GMM -0.0003 0.0338 0.0338 0.1307 0.9490

Break size is equal to 0.5



B-GMM 0.0083 0.0461 0.0468 0.1771 0.9310

B-2SLS 0.0080 0.0454 0.0460 0.2865 0.9970

GMM -0.0000 0.0508 0.0508 0.1964 0.9470




B-GMM 0.0229 0.0686 0.0723 0.2619 0.9190

B-2SLS 0.0222 0.0675 0.0710 0.2430 0.9095

GMM 0.0008 0.0729 0.0729 0.2815 0.9475

The true break is T ∗ = 160.

38

Table 9: Experiment 1 with unknown break and HET1




B-GMM 0.0056 0.0447 0.0450 0.1580 0.9112

B-2SLS 0.0030 0.0441 0.0442 0.2184 0.9828

GMM 0.0015 0.0527 0.0527 0.1930 0.9242




B-GMM 0.0150 0.0711 0.0726 0.2480 0.8988

B-2SLS 0.0085 0.0701 0.0706 0.3510 0.9812

GMM 0.0033 0.0791 0.0792 0.2894 0.9228




B-GMM 0.0422 0.1059 0.1140 0.3594 0.8652

B-2SLS 0.0239 0.1042 0.1069 0.5387 0.9792

GMM 0.0066 0.1131 0.1133 0.4131 0.9218


Table 10: Experiment 1 with unknown break and HET2




B-GMM 0.0021 0.0245 0.0246 0.0876 0.9346

B-2SLS 0.0021 0.0284 0.0285 0.4288 0.9978

GMM 0.0004 0.0305 0.0305 0.1106 0.9490




B-GMM 0.0057 0.0382 0.0386 0.1362 0.9290

B-2SLS 0.0062 0.0454 0.0458 0.3189 0.9846

GMM 0.0009 0.0459 0.0459 0.1660 0.9482




B-GMM 0.0158 0.0557 0.0579 0.1976 0.9058

B-2SLS 0.0176 0.0696 0.0717 0.2687 0.8996

GMM 0.0018 0.0658 0.0658 0.2375 0.9458


39

Figure 3: Experiment 2 for model (i) and HOM (top) and HET2 (bottom)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.2

0.4

0.6

0.8

1

1.2MC-RMSE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1MC Std dev

B-2SLSGMMB-GMM

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8MC-RMSE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.2

0.4

0.6

0.8

1

1.2

1.4

1.6MC Std dev

B-2SLSGMMB-GMM

Left panel is RMSE and right panel is standard deviation for B-GMM (red o), B-2SLS

(blue x), and GMM (green +).

40

Figure 4: Experiment 2 for model (ii) and HOM (top) and HET2 (bottom).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1MC-RMSE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9MC Std dev

B-2SLSGMMB-GMM

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1MC-RMSE

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9MC Std dev

B-2SLSGMMB-GMM

Left panel is RMSE and right panel is standard deviation for B-GMM (red o), B-2SLS (blue

x), and GMM (green +).

41

Table 11: Experiment 3: no break

HOM Monte-Carlo average of estimated break location is T = 199.9


B-GMM 0.0504 0.0968 0.1091 0.3626 0.8770

B-2SLS 0.0489 0.0945 0.1064 0.5365 0.9758

GMM 0.0049 0.1037 0.1038 0.3949 0.9390

HET1 Monte-Carlo average of estimated break location is T =199.9


B-GMM 0.1244 0.2032 0.2383 0.6751 0.8092

B-2SLS 0.0668 0.2041 0.2147 1.1031 0.9802

GMM 0.0175 0.2240 0.2246 0.8168 0.9200

T = 400, ρ = 0.5, q = 4, F1 = 33.

Table 12: Experiment 3: no break

HOM Monte-Carlo average of estimated break location is T = 396.6


B-GMM 0.0249 0.0666 0.0711 0.2602 0.9184

B-2SLS 0.0245 0.0659 0.0703 0.3743 0.9854

GMM 0.0027 0.0687 0.0688 0.2711 0.9482

HET1 Monte-Carlo average of estimated break location is T = 396.6


B-GMM 0.0694 0.1470 0.1625 0.5369 0.8826

B-2SLS 0.0351 0.1458 0.1500 0.8084 0.9848

GMM 0.0104 0.1525 0.1528 0.5911 0.9376

T = 800, ρ = 0.5, q = 4, F1 = 66.

B Proofs of Theorems

We assume throughout the proofs that λ < λ0. The proofs for λ ≥ λ0 are similar and omitted

for brevity. They are written for one break-point, but they generalize to multiple break-points

in a straightforward fashion, and this generalization is omitted for brevity.

Notation. For simplicity, we drop the subscripts RF, SMI, VMC and the superscripts

BP,QP . Therefore, in all the proofs, we denote any true break-point by T 0 = T1λ0 = [Tλ0],

any candidate break-point by T1λ = [Tλ], and any estimated break-point by T = T1λ = [T λ]

(except for the Proof of Theorems 3-7 where the FGLS estimator is denoted by T = T1λ =

[T λ] because it could be confused with other break-point estimators). Let T2λ = T − T1λ.

42

The subscripts 1λ, 2λ and ∆ on an estimator or a sum refer to estimation or summation in

the segment 1, . . . , T1λ, T1λ + 1, . . . , T and T1λ + 1, . . . , T1λ0. Let M2λ = M11 −M1λ,

Mi =Miλ0 for i = 1, 2, andM∆ =M1λ0−M1λ. Let Π0iT

def=== Πi/riT , Π

aiT = Πa

i /riT , where Πai is

defined in Theorem 1. When there is no potential for confusion, we also write Π0i = Πi/riT . We

let Ψzuiλ = T−1/2

∑iλ Ztut for i = 1, 2 and Ψzu

∆ = T−1/2∑

∆ Ztut. Also, Ψzviλ = T−1/2

∑iλ Ztvt,1

for i = 1, 2 and Ψzv∆ = T−1/2

∑∆ Ztvt,1. Mi = Miλ0 , and M∆ = M1λ0 − M1λ, for i = 1, 2

and RT = diag(Ip1 , TαIp2). Also, u.λ. means uniformly in λ ∈ (0, 1], diag(A1, A2) creates a

diagonal matrix with matrices A1, A2 on the main diagonal, vec(A) stacks the elements of

matrix A in a vector, in order, column by column, and vech(A) does the same, but removes

the elements that repeat. Let ||v|| the Euclidean norm for vectors v, ‖J‖ the square root

of the maximum eigenvalue of J ′J for matrices, and || · ||p = [E(|| · ||p)]1/p the Lp norm. If

across proofs, similar quantities appear, their notation is repeated unless there is potential for

confusion.

Proof of Theorem 1.

• Part (i): Asymptotic distribution of GMM. By A1(ii), the full sample moment conditions

are satisfied. Let N−1u be the weighting matrix. Then:

θGMM =[W ′ZTN−1u

Z′WT

]−1W ′ZTN−1u

Z′

T(Wθ0 + U)

⇔ T 1/2R−1

T

(θGMM − θ0

)=[RT

W ′ZTN−1u

Z′WTRT

]−1

RTW ′ZTN−1u

Z′UT 1/2

RTT−1W ′Z = RTΠ

a′

1TM1λ0 +RTΠa′

2TM2λ0 +RTT−1/2Ψzu

1λ0 +RTT−1/2Ψzu

2λ0

= RTΠa′

1TM1 +RTΠa′

2TM2 + oP (1),

since Ψzuiλ0 = T−1/2

∑iλ0 ztut = OP (1) by A1, A4 and the functional central limit theorem

in Wooldridge and White (1988), Theorem 2.11 (FCLT). Also, by the CLT, T−1/2Z ′UD→

N (0, Nu).

Case (a): α = α1 = α2. If we let µ′i = Πa′

i Mi, then RT T−1W ′Z = Πa′

1 M1+Πa′

2 M2+oP (1) =

µ′1 + µ′

2 + oP (1). Hence, using the optimal GMM estimator with NuP→ Nu,

T 1/2R−1

T (θGMM − θ0)D→ N

(0, [(µ1 + µ2)

′N−1u (µ1 + µ2)]

−1)= N (0, VGMM) .

Case (b): α = αi < αj. Then, as before,

RTT−1W ′Z = RTΠ

a′

1TM1 +RTΠa′

2TM2 + oP (1)P→ Πa′

1 M1 +Πa′

2 M2 = µ′1 + µ′

2,

where now Πaj = [Πz1, O(q,p2)]. Since T−1/2Z ′U

D→ N (0, Nu), the optimal GMM estimator is

obtained for NuP→ Nu, and so

T 1/2R−1

T (θGMM − θ0)D→ N

(0, [(µ1 + µ2)

′ N−1u (µ1 + µ2)]

−1).

This also shows consistency: θGMM − θ0 = RTOP (1)T−1Z ′u = OP (T

−1/2rT ) = oP (1).

43

• Part (ii): Asymptotic distribution of B-GMM. Let ZA be the (T , q) matrix with rows

Z ′1, . . . , Z

′T

and ZB be the (T − T , q) matrix with rows Z ′T+1

, . . . , Z ′T . With the weight-

ing matrix (Nau)

−1, θB−GMM =[W ′Z(Na

u)−1Z ′W

]−1

W ′Z(Nau)

−1Z′(Wθ0 + U), where Z =

[Z′A/T Z

′B/(T − T )] (since the scalings on ZA, ZB cancel out in the B-GMM formula). So,

T 1/2R−1

T [θB−GMM − θ0] =[RTW

′Z(Nau)

−1Z′WRT

]−1

RTW′Z(Na

u )−1[Z

′U√T ] (B.1)

Because we assume that λ − λ0 = OP (T2α−1), it can be shown, by similar arguments to the

proof of Theorem 8 in HHB - see their supplemental appendix - that RTT−1∑

iλWtZ′t −

RTΠa′

iTMi = oP (1) for i = 1, 2. It follows that:

RT

∑1λWtZ

′t/T = RT

TT[T−1

∑1λWtZ

′t] = RTΠ

a′

1TM1/λ0 + oP (1).

Similarly, RT

∑2λWtZ

′t/(T − T ) = RTΠ

a′

2TM2/(1− λ0) + oP (1), hence:

RTW′Z =

[RTΠ

a′

1TM1/λ0, RTΠ

a′

2TM2/(1− λ0)]+ oP (1) = K + oP (1). (B.2)

On the other hand, Z′U√T = [

√T∑

1λ utZ′t/T ,

√T∑

2λ utZ′t/(T − T )]. Because λ − λ0 =

OP (T2α−1), it can be shown by similar arguments to the proof of Theorem 8 in HHB, that

T−1/2∑

iλ Ztut − T−1/2∑

iλ0 Ztut = oP (1) for i = 1, 2. So:

√T∑

1λ utZt/T = T (T )−1T−1/2∑

1λ utZt + oP (1)D→ N (0, Nu,1/(λ

0)2)√T∑

2λ utZt/(T − T )D→ N (0, Nu,2/(1− λ0)2) .

Because√T∑

1λ utZ′t/T ⊥

√T∑

2λ utZ′t/(T − T ) asymptotically by A1(ii), Z

′U√T

D→N (0, Na

u) , where Nau = diag[Nu,1/(λ

0)2, Nu,2/(1 − λ0)2]. From the latter and (B.1)-(B.2),

T 1/2R−1

T [θB−GMM − θ0] = V1/2B−GMM N (0, Ip) + oP (1), where

VB−GMM = [K(Nau )

−1K ′]−1K(Nau)

−1Nau(N

au)

−1K ′[K(Nau)

−1K ′]−1.

Setting Nau

P→ Nau yields VB−GMM = [K(Na

u )−1K ′]−1 + oP (1). Since

rTW′Z =

[RTΠ

a′

1TM1/λ0, RTΠ

a′

2TM2/(1− λ0)]+ oP (1) = [µ′

1/λ0, µ

′

2/(1− λ0)] + oP (1),

VB−GMM = [µ′1(Nu,1)

−1µ1 + µ′2(Nu,2)

−1µ2]−1.

These results also imply that θB−GMM − θ0 = RTOP (1)(Z′u) = oP (1).

Since B-2SLS is a special case of B-GMMwhere we have diag((T )−1Z′AZA, (T−T )−1Z

′BZB)

P→diag(M1/λ

0,M2/(1− λ0)) as a weighting matrix, the proof follows in a similar fashion.

Proof of Theorem 2. Recall that:

VB−2SLS = [Πa′1 M1Π

a1 +Πa′

2 M2Πa2]

−1[Πa′1 Nu,1Π

a1 +Πa′

2 Nu,2Πa2][Π

a′1 M1Π

a1 +Πa′

2 M2Πa2]

−1

VGMM = [(Πa′1 M1 +Πa′

2 M2)(Nu,1 +Nu,2)−1(M1Π

a1 +M2Π

a2)]

−1

VB−GMM = [Πa′

1 M1(Nu,1)−1M1Π

a1 +Πa′

2 M2(Nu,2)−1M2Π

a2]

−1.

44

Since θB−GMM is the optimal version of θB−2SLS, VB−GMM ≤ VB−2SLS. We show below

that VB−GMM ≤ VGMM . Let Nu = Nu,1 +Nu,2, µ = vec [µ1 µ2], ϑ = vec [ϑ1 ϑ2], for any (p, 1)

vector a such that ϑi = µia, and L = Nu,2N−1u,1. Then:

a′(V −1B−GMM − V −1

GMM)a = ϑ′1N−1u,1ϑ1 + ϑ′2N

−1u,2ϑ2 − (ϑ1 + ϑ2)

′N−1u (ϑ1 + ϑ2)

= ϑ′

(N−1u,1 O

O N−1u,2

)ϑ− ϑ′

(N−1u N−1

u

N−1u N−1

u

)ϑ = ϑ′

(N−1u,1 −N−1

u −N−1u

−N−1u N−1

u,2 −N−1u

)ϑ

= ϑ′

(N−1u L −N−1

u

−N−1u N−1

u L−1

)ϑ ≡ f(ϑ).

It is well known that for any matrix M∗ =

[A B

B′ C

], where A,C are square symmetric

matrices of the same dimension, and C is pd, M∗ is psd (positive semi-definite) iff (A −

BC−1B′) (the Schur complement of C) is psd. In our case, M∗ =

[N−1u L −N−1

u

−N−1u N−1

u L−1

],

N−1u L−1 = N−1

u,2 − (Nu,1 +Nu,2)−1 is pd by construction, and its Schur complement is N−1

u L−N−1u LNuN

−1u = O. Thus, M∗ is psd, and f(ϑ) ≥ 0. This implies that V −1

B−GMM ≥ V −1GMM , so

VB−GMM ≤ VGMM . Moreover, because f(ϑ) ≥ 0 and is convex, its minimum is attained at 0

only for the ϑ values that satisfy ∂f(ϑ)/∂ϑ′ = 0:

∂f(ϑ)∂ϑ1

= 2N−1u (Lϑ1 − ϑ2) = 0; ∂f(ϑ)

∂ϑ2= 2N−1

u (L−1ϑ2 − ϑ1) = 0.

Thus, VGMM = VB−GMM for ϑ 6= 0 iff Lϑ1 = ϑ2. Equivalently, VGMM = VB−GMM for a 6= 0

iff (N−1u,1M1Π

a1 − N−1

u,2M2Πa2)a = 0. This holds for all a 6= 0 when N−1

u,1M1Πa1 = N−1

u,2M2Πa2. If

rank(N−1u,1M1Π

a1 − N−1

u,2M2Πa2) = p (full-rank), it cannot hold for any a 6= 0 because for any

a 6= 0, rank[(N−1u,1M1Π

a1 − N−1

u,2M2Πa2)a] = 1, so the minimum of f(ϑ) is achieved (at 0) only

for a = 0 and it is unique. So, f(ϑ) > 0 for all a 6= 0, implying that when rank(N−1u,1M1Π

a1 −

N−1u,2M2Π

a2) = p , VGMM − VB−GMM is pd (positive definite).

• Derivations for comment (iv) after Theorem 2. Under conditional homoskedasticity, we

have the usual result that VB−GMM = VB−2SLS. Also, Nu,1 = λ0ΦuQ, Nu,2 = (1 − λ0)ΦuQ,

M1 = λ0Q, M2 = (1− λ0)Q, and for any (p, 1) vector a 6= 0,

V −1B−GMM/Φu = λ0Πa′

1 QΠa1 + (1− λ0)Πa′

2 QΠa2

V −1GMM/Φu = [Πa′

1 λ0 +Πa′

2 (1− λ0)]Q[Πa1λ

0 +Πa2(1− λ0)]

a′(V −1GMM − V −1

B−GMM)a/Φu = −λ0(1− λ0) [(Πa1 −Πa

2)a]′ Q [(Πa

1 −Πa2)a] < 0,

so VGMM − VB−GMM is pd.

• Derivations for comment (ii) after A6. Let wlog r1T = o(r2T ). Then Πa1 = [Πz1,Π1] and

Πa2 = [Πz1, O(q,p2]. Then:

V −1GMM =

[A1 B1

B′1 C1

], V −1

B−GMM =

[A2 B2

B′2 C2

],

45

where A1 = Π′z1(MN−1

u M)Πz1, B1 = Π′z1(MN−1

u M1)Π1, C1 = Π′1(M1N

−1u M1)Π1, A2 =

Π′z1(M1N

−1u,1M1 + M2N

−1u,2M2)Πz1, B2 = Π′

z1(M1N−1u,1M1)Π1, C2 = Π′

1(M1N−1u,1M1)Π1. Let-

ting Di = Ci − B′iA

−1i Bi be the Schur complements of Ai, i = 1, 2. Using the formula for

partitioned inverses,

VGMM − VB−GMM =

[A−1

1−A−1

2+A−1

1B1D

−1

1B′

1A−1

1−A−1

2B2D

−1

2B′

2A−1

2, −A−1

1B1D

−1

1+A−1

2B2D

−1

2

−D−1

1B′

1A−1

1+D−1

2B′

2A−1

2D−1

1−D−1

2

].

The B-GMM estimates of θ0x are strictly more efficient than GMM - i.e. the lower right

(p2, p2) block of VGMM − VB−GMM is pd - iff D−11 − D−1

2 is pd, or, equivalently, D2 − D1 =

C2 − C1 +B′1A

−11 B1 − B′

2A−12 B2 is pd.

Under A6, M1 = λ0M , M2 = (1 − λ0)M , Nu,1 = λ0Nu, Nu,2 = (1 − λ0)Nu. Let Ω =

MN−1u M . Then C2−C1 = λ0(1−λ0)Π′

1ΩΠ1, which is pd because Π1 is full-rank by Assumption

A2(ii). On the other hand, B2 = λ0Π′z1ΩΠ1 = B1, and A2 = Π′

z1(λ0Ω + (1 − λ0)Ω)Πz1 =

Π′z1ΩΠz1 = A1, so D2 −D1 = C2 − C1 is p.d.

Proof of Theorem 3.

Part (i): Consistency and rate of convergence of λ. Note that the BP estimator is univariate.

So in the proof of part (i), we treat Xt as a scalar.

• Consistency of λ. Let vt = Xt−Z ′tΠ1λ in interval [1, T ], vt = Xt−Z ′

tΠ2λ in interval [T+1, T ],

and dt = vt − vt. By definition of the sum of squared residuals,

T∑

t=1

v2t ≤T∑

t=1

v2t ⇒ 2

T∑

t=1

vtdt +

T∑

t=1

d2t ≤ 0 . (B.3)

Recall that rT = T α. We show consistency by contradiction in two steps. In step 1, we show:

T 2α−1

T∑

t=1

vtdt ≡ T 2αL1 = oP (1) and T2α−1

T∑

t=1

d2t ≡ T 2αL2 = OP (1) (B.4)

Therefore, T 2αL2 dominates T 2αL1. Substituting (B.4) into (B.3), we have T 2αL2+oP (1) ≤ 0.

It follows that T 2αL2 = oP (1). In step 2, we show that if λ 6 P→ λ0, then, with positive constant

probability, T 2αL2 > 0, contradicting T 2αL2 = oP (1). Thus, λP→ λ0.

- Step 1. Recall that Π0i = Πi/riT (ignoring the dependence on T for simplicity), and let

Ψzviλ = T−1/2

∑iλZtvt for i = 1, 2 and Ψzv

∆ = T−1/2∑

∆ Ztvt for i = 1, 2. Note that:

dt = vt − vt =

Xt − Z ′

tΠ1 − vt, t ≤ T

Xt − Z ′tΠ2 − vt, t > T

=

Z ′t(Π

01 − Π1λ), t ≤ T

Z ′t(Π

01 − Π2λ), T + 1 ≤ t ≤ T 0

Z ′t(Π

02 − Π2λ), t > T 0

, so

∑Tt=1 vtdt = (Π0

1 − Π1λ)′[T 1/2Ψzv

1λ] + (Π0

1 − Π2λ)′[T 1/2Ψzv

∆ ] + (Π02 − Π2λ)

′[T 1/2Ψzv2λ0 ]. (B.5)

By Assumptions 1(i)-(ii), A4 and the FCLT, Ψzviλ = OP (1), u.λ. Thus:

Ψzviλ

= OP (1),Ψzviλ0 = OP (1) and Ψzv

∆ = OP (1). (B.6)

46

By A3, Miλ = OP (1) and M∆ = OP (1) u.λ, hence:

Π1λ −Π01 =M−1

1λ[T−1/2Ψzv

1λ] = OP (1) OP (T

−1/2) = OP (T−1/2). (B.7)

On the other hand, letting Π0∆ = Π0

2 − Π01 = O(T−α),

Π2λ −Π02 =M−1

2λ[T−1/2Ψzv

2λ]− M−1

2λM∆ Π0

∆ = OP (T−α) (B.8)

Π2λ −Π01 = (Π2λ − Π0

2) + Π0∆ = OP (T

−α). (B.9)

Substituting (B.6)-(B.9) into (B.5) yields∑T

t=1 vtdt = OP (T1/2−α), so T 2αL1 = T 2α−1

∑Tt=1 vtdt =

oP (1). Next, note that:

∑Tt=1 d

2t =

∑1λ d

2t +

∑∆ d

2t +

∑2λ0 d

2t = (Π0

1 − Π1λ)′ T M1λ (Π0

1 − Π1λ)

+ (Π01 − Π2λ)

′ T M∆ (Π01 − Π2λ) + (Π0

2 − Π2λ)′ T M2λ0 (Π0

2 − Π2λ)

= OP (1) +OP (T−α)OP (T )OP (T

−α) +OP (T−α)OP (T )OP (T

−α) = OP (T1−2α).

Therefore, T 2αL2 = T 2α−1∑T

t=1 d2t = OP (1) and dominates T 2αL1 = oP (1).

- Step 2. If λ 6 P→ λ0, then there exists η ∈ (0, 1), such that with positive probability,

T 0− T = [Tλ0]− [T λ] ≥ Tη. Let Mη = T−1∑[Tλ0]

t=[Tλ0]−Tη+1 ZtZ′t = M1λ0 −M1(λ0−η). Because it

is a symmetric pd matrix, ‖Mη‖ ≥ mineig(Mη) > C + oP (1), where C is a constant, because

by A3, plimmineig(Mη) > 0. Then, with positive constant probability,

L2 = T 2α−1∑T

t=1 d2t ≥ T 2α−1

(∑T 0

t=T 0−Tη+1 d2t

)= T α(Π2λ − Π0

1)′ Mη T

α(Π2λ −Π01)

≥ ‖T α(Π2λ −Π02) + T αΠ0

∆‖2C (B.10)

Let T αΠ0∆ = Π0

∆. Under A2, when α1 = α2 = α, then Π0∆ = Π2 − Π1 6= 0; when α2 < α1,

Π0∆ = Π2 + o(1) 6= 0, and when α2 > α1, Π

0∆ = −Π1 + o(1) 6= 0. Thus in all cases, Π0

∆ 6= 0.

From (B.8), T α(Π2λ −Π02) = −M−1

2λMη Π0

∆ + oP (1), so:

T α(Π2λ − Π02) + T αΠ0

∆ = oP (1)− [M−1

2λM∆ − Iq] Π

0∆

= oP (1)− M−1

2λ[M1λ0 − M1λ − M11 + M1λ] Π

0∆

= −M−1

2λM2λ0Π

0∆ + oP (1). (B.11)

A3 implies that with positive constant probability, u.λ, Π0′

∆M2λ0M−22λ M2λ0Π

0∆C + oP (1) > 0.

Using this and (B.10)-(B.11), T 2α−1∑T

t=1 d2t ≥ Π0′

∆M2λ0M−2

2λM2λ0Π

0∆C + oP (1) > C∗ + oP (1),

where C∗ is a constant. But this cannot hold because L2 = T 2α−1∑T

t=1 d2t

P→ 0. Therefore,

λP→ λ0.

• Convergence rate of λ. Since λP→ λ0, any break-point estimator T = [T λ] is such that

T 0 − T ≤ ǫT , for some chosen ǫ > 0. We find the convergence rate by contradiction. For

chosen C > 0, we assume that T 0 − T > CT 2α. Define L1, L2 and L3 to be the RF sum of

squared residuals (SSR) obtained with break-points T , T 0 and (T , T 0) respectively. Then, by

47

definition of OLS, (T 0 − T )−1T 2α+1(L1 − L2) ≤ 0. We show that if CT 2α < T 0 − T ≤ ǫT

for some large but fixed C and small but fixed ǫ, then plim[(T 0 − T )−1T 2α+1(L1 − L2)] > 0,

contradicting the above. It follows that T 0 − T ≤ CT 2α, and by symmetry of the argument,

if T ≥ T 0, T − T 0 ≤ CT 2α, establishing the desired convergence rate for the break fraction

estimator.

We now show that plim[(T 0− T )−1T 2α+1(L1−L2)] > 0. By our notation, (Π1λ, Π2λ) are the

OLS estimators based on one break at T , (Π1λ, Π∆, Π2λ0) are the ones based on two breaks at

T and T 0, and (Π1λ0 , Π2λ0) are the ones based on one break at T 0 . Let Q∆ = 1T 0−T

∑∆ ZtZ

′t.

By straightforward algebra, it can be shown (see BP):

(T 0 − T )−1T 2α+1(L1 − L3)

= T α(Π2λ0 − Π∆)′Q∆ T α(Π2λ0 − Π∆)− T α(Π2λ0 − Π∆)

′[Q∆M−1

2λM∆] T

α(Π2λ0 − Π∆)

= N1 −N2 . (B.12)

(T 0 − T )−1T 2α+1(L2 − L3)

= T α(Π1λ − Π∆)′Q∆ T α(Π1λ − Π∆)− T α(Π1λ − Π∆)

′[Q∆M−11λ0M∆] T

α(Π1λ − Π∆)

= N3 −N4. (B.13)

We now show that N1 = OP (1), N2 = OP (N1)oP (ǫ), N3 = oP (1), and N4 = OP (N3)oP (ǫ). We

also show that N1 > 0 in the limit, for large C and small ǫ. Hence, for large C and small ǫ,

we would have the desired statement:

plim (T 0 − T )−1T 2α+1(L1 − L2) = N1 −N2 −N3 +N4 > 0 .

Since T 0 − T ≤ ǫT , by A3, we have that M−1

2λM∆ = OP (1)OP (ǫ) = OP (ǫ), which implies

that N2 = OP (N1)OP (ǫ). Similarly, N4 = OP (N3)OP (ǫ). Next, we compare (Π2λ0 − Π∆)

and (Π1λ − Π∆). Since Π1λ and Π∆ are both subsample estimators of Π01, Π1λ − Π∆ =

(Π1λ − Π01)− (Π∆ − Π0

1) = OP (T−1/2) +OP (T

−1/2) = OP (T−1/2). Since Π2λ0 is the estimator

of Π02 in subsample [T 0 + 1, T ], Π2λ0 − Π0

2 = OP (T−1/2), so

Π2λ0 − Π∆ = (Π2λ0 −Π02)− (Π∆ − Π0

1) + Π0∆ = OP (T

−1/2) + Π0∆ = OP (T

−α).

Thus, T α(Π2λ0 − Π∆) = OP (1) and T α(Π1λ − Π∆) = oP (1). By A3, Q∆ = OP (1) for large

enough C. Therefore, N1 = T α(Π2λ0 − Π∆)′Q∆ T α(Π2λ0 − Π∆) = OP (1) for large C, while

N3 = oP (1). All of these show that the probability limit of (T 0 − T )−1T 2α+1 (L1 − L3) is

determined by the probability limit of N1 for small enough ǫ, because N2 = OP (N1ǫ), N4 =

OP (N3ǫ) and thus dominated by N1 for small ǫ. For large enough C, by A3,

N1 = T 2α[OP (T−1/2)− Π0

∆]′Q∆[OP (T

−1/2)− Π0∆] ≥ ||Π0

∆||2 mineig(Q∆) + oP (1) > C∗ + oP (1),

where C∗ is a positive constant because plim Q∆ is pd by A3. This implies that (T 0 −T )−1T 2α+1(L1 − L2) > 0 with positive probability, which cannot hold. This completes the

proof.

48

• Part (ii). For the QP estimator, under the assumptions imposed, Lemma 1 in QP can

be verified by following step by step the proof of Lemma A3, which is exactly as in the

supplemental appendix of QP and omitted for simplicity (intuitively, nothing changes because

the magnitude of the RF break in parameters and in the variance is all that matters for proving

consistency of the QP estimator). Therefore, the convergence rate is as stated in Theorem 3.

• Part (iii). Because λ−λ0 = OP (T2α−1), it can be shown that the asymptotic distribution

of the parameter estimators βiλ = vec(Πiλ) are as if the breaks were known. Therefore, we are

back in the standard regression model framework, and so Σv,t, defined for the FGLS estimator

in two ways, is for both definitions consistent: Σv,tP→ Σv.

Consistency of λ. This proof is similar to part (i). For a given λ, we denote by β1λ, β2λ the

FGLS estimators of the RF parameters obtained from minimizing LFGLS(λ, βi) over β1, β2.

Let δ0β = β02 − β0

1 = T−α[δ0β + o(1)] = O(T−α), where δ0β = (Π2 − Π1) if α1 = α2, δ0β = Π1 if

α1 < α2 and δ0β = −Π2 if α2 < α1. Then by A2, δ0β 6= 0, and:

βiλ = (∑

iλ ZtΣ−1v,t Z

′t)

−1∑

iλ ZtΣ−1v,tXt

= (T−1∑

iλ ZtΣ−1v,t Z

′t)

−1T−1∑

iλ ZtΣ−1v,t (vt + Z ′

tβ0i + 1[i = 2]Z ′

tδ0β1[t ≤ T 0]).

Note that by A3,

T−1∑

iλ ZtΣ−1v,t Z

′t = T−1

∑iλ Zt[Σ

−1v + oP (1)]Z

′t + oP (1) = Σ−1

v ⊗ (T−1∑

iλ ZtZ′t) + oP (1)

= Σ−1v ⊗Miλ + oP (1) = OP (1). (B.14)

Also, by A1, A4 and the FCLT,

T−1∑

iλ ZtΣ−1v,tvt = T−1/2 (T−1/2

∑iλ ZtΣ

−1v,tvt) + oP (1)]

= T−1/2 (Σ−1v ⊗ Iq)(T

−1/2∑

iλ vt ⊗ Zt) + oP (T−1/2)

= T−1/2 (Σ−1v ⊗ Iq)OP (1) + oP (T

−1/2) = OP (T−1/2). (B.15)

Therefore,

β1λ − β01 = (

∑1λ ZtΣ

−1v,t Z

′t)

−1∑

1λ ZtΣ−1v,tvt = OP (T

−1/2) (B.16)

β2λ − β02 = (

∑2λ ZtΣ

−1v,t Z

′t)

−1∑

2λ ZtΣ−1v,tvt − (

∑2λ ZtΣ

−1v,t Z

′t)

−1∑

∆ ZtΣ−1v,t Z

′t δ

0β (B.17)

= OP (T−1/2) +OP (T

−α) = OP (T−α). (B.18)

Letting βt = β1λ1[t ≤ T1λ] + β2λ1[t > T1λ] and β0t = β0

11[t ≤ T 0] + β021[t > T 0], we have:

LFGLS(λ, βiλ) = T−1∑T

t=1(Xt − Z ′tβt)

′Σ−1v,t (Xt − Z ′

tβt)

= T−1∑T

t=1[vt − Z ′t(βt − β0

t )]′Σ−1

v,t [vt − Z ′t(βt − β0

t )]

= T−1∑T

t=1 v′tΣ

−1v,tvt − 2T−1

∑Tt=1 v

′tΣ

−1v,t Z

′t(βt − β0

t )

+ T−1∑T

t=1(βt − β0t )

′ZtΣ−1v,t Z

′t(βt − β0

t ).

49

Because LFGLS(λ0, β0i ) = T−1

∑Tt=1 v

′tΣ

−1v,tvt, by definition of the minimization problem,

LFGLS(λ, βiλ)− LFGLS(λ0, β0i )

= −2T−1∑T

t=1 v′tΣ

−1v,t Z

′t(βt − β0

t ) + T−1∑T

t=1(βt − β0t )

′ZtΣ−1v,t Z

′t(βt − β0

t ) = −2L1 + L2 ≤ 0.

We now show that T 2αL1 is dominated in probability limit by T 2αL2. Using equations

(B.14)-(B.16) and (B.18),

L1 = (T−1∑

1λ v′tΣ

−1v,t Z

′t)(β1λ − β0

1) + (T−1∑

∆ v′tΣ

−1v,t Z

′t)[β2λ − β0

2 + δ0β]

+ (T−1∑

2λ v′tΣ

−1v,t Z

′t)(β2λ − β0

2) = OP (T−1/2)OP (T

−1/2) +OP (T−1/2)OP (T

−α)

+OP (T−1/2)OP (T

−α) = OP (T−1/2−α)

T 2αL1 = OP (Tα−1/2) = oP (1)

L2 = (β1λ − β01)

′(T−1∑

1λ ZtΣ−1v,t Z

′t)(β1λ − β0

1)

+ (β2λ − β02 + δ0β)

′(T−1∑

∆ ZtΣ−1v,t Z

′t)(β2λ − β0

2 + δ0β)

+ (β2λ − β02)

′(T−1∑

2λ ZtΣ−1v,t Z

′t)(β2λ − β0

2) (B.19)

= OP (T−1) +OP (T

−2α) +OP (T−2α) = OP (T

−2α)

T 2αL2 = OP (1).

This shows that T 2αL2 dominates T 2αL1, therefore T2αL2

P→ 0 because −2L1 + L2 ≤ 0. We

now show that T 2αL2 > 0 with some positive constant probability when the break-fraction

λ is inconsistent, which is a contradiction. Therefore, it must be that λP→ λ0, and showing

that T 2αL2 > 0 with positive probability will complete the proof. From (B.17)-(B.18) and

Σ−1v,t = Σ−1

v + oP (1),

T α(β2λ − β02) = OP (T

α−1/2)− (∑

2λ ZtΣ−1v,t Z

′t)

−1∑

∆ ZtΣ−1v,t Z

′t (T

αδ0β)

= oP (1)− (T−1∑

2λ ZtΣ−1v Z ′

t)−1∑

∆ ZtΣ−1v Z ′

t δ0β

= oP (1) − [Σ−1v ⊗M2λ]

−1Σ−1v ⊗ [M2λ −M2λ0 ] δ0β

= −Ip2 ⊗ M−1

2λ [M2λ −M2λ0 ] δ0β + oP (1) (B.20)

T α(β2λ − β02 + δ0β) = −

Ip2 ⊗ M−1

2λ [M2λ −M2λ0 ] − Ip2 ⊗ Iqδ0β + oP (1)

=Ip2 ⊗ [M−1

2λ M2λ0 ]δ0β + oP (1) (B.21)

From (B.14) and (B.20)-(B.21),

T 2αL2 = T 2α(β2λ − β02 + δ0β)

′(T−1∑

∆ ZtΣ−1v,t Z

′t)(β2λ − β0

2 + δ0β)

+ T 2α(β2λ − β02)

′(T−1∑

2λ ZtΣ−1v,t Z

′t)(β2λ − β0

2) + oP (1)

= δ0′

β

[Ip2 ⊗ (M2λ0M

−12λ )][Σ−1

v ⊗ (M2λ −M2λ0)][Ip2 ⊗ (M−1

2λ M2λ0)]δ0β

+ δ0′

β

Ip2 ⊗ [(M2λ −M2λ0)M

−12λ ](Σ−1

v ⊗M2λ)Ip2 ⊗ [M−1

2λ (M2λ −M2λ0)]δ0β

+ oP (1)

= δ0′

β

Σ−1v ⊗ [M2λ0M

−12λ (M2λ −M2λ0)M

−12λ M2λ0 ]

δ0β

+ δ0′

β

Σ−1v ⊗ [(M2λ −M2λ0)M

−12λ (M2λ −M2λ0)]

δ0β + oP (1).

50

Because we assumed that λ ≤ λ0, if λ does not converge in probability to λ0, then with

probability approaching 1, λ < λ0 . By A3 it follows that M2λ0M−12λ (M2λ −M2λ0)M

−12λ M2λ0

and (M2λ −M2λ0)M−12λ (M2λ −M2λ0) are pd, and because δ0β 6= 0, T 2αL2 + oP (1) > 0. Since

this contradicts T 2αL2P→ 0, it follows that λ

P→ λ0.

• Rate of convergence of λ. We now derive the rate of convergence of λ in a similar fashion

to the rate of convergence of the BP estimator λ. By consistency, for any small ǫ > 0,

T 0 − T ≤ ǫT . Assume that T 0 − T > CT 2α for some large C > 0, and let L1, L2, L3 be the

FGLS objective functions at the breaks λ, λ0 and (λ, λ0) respectively, and the corresponding

FGLS parameter estimators for these breaks. Therefore, the FGLS parameter estimators are

(β1λ, β2λ) for the first objective function, (β1λ0 , β2λ0) for the second, and (β1λ, β∆, β2λ0) for the

third. By similar arguments to part (i), and for C large enough,

βiλ0 − β0i = OP (T

−1/2) and β∆ − β01 = OP (T

−1/2).

Let Miλ = T−1∑

iλ ZtΣ−1v,t Z

′t for i = 1, 2 and M∆ = T−1

∑∆ ZtΣ

−1v,t Z

′t. Letting Q∆ =

1

T 0−T∑

∆ ZtZ′t, similarly to part (i),

(T 0 − T )−1T 2α+1(L1 − L3)

= T α(β∆ − β2λ0)′Q∆T

α(β∆ − β2λ0)− T α(β∆ − β2λ0)′Q∆M

−1

2λM∆T

α(β∆ − β2λ0)

= N1 −N2

(T 0 − T )−1T 2α+1(L2 − L3)

= T α(β∆ − β1λ)′Q∆T

α(β∆ − β1λ)− T α(β∆ − β1λ)′Q∆M

−11λ0T

α(β∆ − β1λ)

= N3 −N4.

Similarly to part (i), it can be shown that N1 dominates N2, N3 and N4 for small ǫ. Moreover,

β∆ − β2λ0 = β01 − β0

2 + OP (T−1/2) = −δ0β + OP (T

−1/2) and β∆ − β1λ = OP (T−1/2), therefore

(T 0 − T )−1T 2α+1(L1 − L2) = δ0′

β M∆δ0β + oP (1) = OP (1) and (T 0 − T )−1T 2α+1(L2 − L3) =

OP (Tα−1/2)M∆OP (T

α−1/2) = oP (1). Hence, (T0 − T )−1T 2α+1(L1 − L2) = N1 + oP (1).

For large enough C, M∆ = Σ−1v ⊗ [M2λ−M2λ0 ] + oP (1), so with positive probability, (T 0 −

T )−1T 2α+1L1 = δ0′

β Σ−1v ⊗[M2λ−M2λ0 ] δ0β+oP (1) > 0. Therefore, (T 0−T )−1T 2α+1(L1−L2) >

0 with positive probability, which cannot happen, because by definition, (T 0− T )−1T 2α+1(L1−L2) ≤ 0. It follows that T 0 − T ≤ CT 2α, so λ− λ0 = OP (T

2α−1).

Proof of Theorem 4.

The proof follows exactly the same steps as the proof for Theorem 3, with α replaced by 0,

and for break-point estimators for equation (4.1) instead of equation (2.5).

Proof of Theorem 5.

• Part (i). Consistency of λ. Unlike the other proofs, here we do not prove consistency by

51

contradiction; instead we show consistency by deriving the limit of the minimand directly and

applying the continuous mapping theorem. Let at = jtu2t and at = jtu

2t . We minimize over λ:

LBP (λ, at) = T−1∑T

t=1(at − µt)2,

where µt = µ1λ1[t ≤ [Tλ]] + µ2λ1[t ≥ [Tλ]], µiλ = 1Tiλ

∑iλ at for i = 1, 2, and jt, at and at are

treated as scalars here because we are analyzing the BP break-point estimator.

Note that at = at+ (at− at) = et+ (at− at) +µ0t , with µ

0t = µ0

11[t ≤ T 0] +µ021[t > T 0]. So,

LBP (λ, at) = T−1∑T

t=1[et + (at − at) + (µ0t − µt)]

2 = T−1∑T

t=1 e2t + T−1

∑Tt=1(at − at)

2

+ T−1∑T

t=1(µ0t − µt)

2 + 2T−1∑T

t=1 et(at − at) + 2T−1∑T

t=1 et(µ0t − µt)

+ 2T−1∑T

t=1(at − at)(µ0t − µt) =

∑6i=1 Li. (B.22)

Note that L1,L2 and L4 do not depend on λ and therefore their limiting behavior is irrel-

evant for consistency, since minλ LBP (λ, at) = minλ[L∗BP (λ, at) = LBP (λ, at)−L1 − L2 −L4].

For L3,

L3 = T−1∑

1λ(µ01 − µ1λ)

2 + T−1∑

∆(µ01 − µ2λ)

2 + T−1∑

2λ0(µ02 − µ2λ)

2

= λ(µ01 − µ1λ)

2 + (λ0 − λ)(µ01 − µ2λ)

2 + (1− λ0)(µ02 − µ2λ)

2. (B.23)

We have:

µ1λ − µ01 = T−1

1λ

∑1λ(at − µ0

1) = T−11λ

∑1λ et + T−1

1λ

∑1λ(at − at)

= OP (T−1/2) + T−1

1λ

∑1λ(at − at). (B.24)

Letting δ = θ0 − θGMM or δ = θ0 − θB−GMM (where θB−GMM was constructed with a SMI

or a VMC break-point estimator), we have:

at − at = jt(u2t − u2t ) = jt(ut − ut)(ut + ut) = jtw

′tδ(2ut + w′

tδ) = 2utjtw′tδ + jt(w

′tδ)

2. (B.25)

Therefore,

T−11λ

∑1λ(at − at) = 2T−1

1λ

∑1λ utjtw

′tδ + T−1

1λ

∑1λ jt(w

′tδ)

2 = 2A1 +A2.

We show below that T−11λ

∑1λ(at−at) = oP (1) u.λ. Letting, as in the paper, subscript i denote

the ith element of a vector, we have:

A1 = T−11λ

∑1λ utjtw

′tδ =

∑p2i=1 T

−11λ

∑1λ utjtwt,iδi =

∑pi=1 T

−11λ

∑1λ utjtwt,iOP (T

α−1/2).

Note that jt = zt,kzt,k∗ for some k, k∗ ∈ 1, . . . , q. Therefore, A1 = oP (1) if we can

show that T α−1/2T−11λ

∑1λ utzt,kzt,k∗wt,i = oP (1). We show this result using Markov’s in-

equality and A11(i). Note that wt,i can be equal to z1t,i or to xt,i1 = z′tΠ0t,i1 + vt,i1 for

i1 = i − p1, and Π0t,i1

is the ith1 column of Π0t = Π1/r1T1[t ≤ T 0] + Π2/r2T1[t > T 0].

52

Since zt is already present in xt,i1 , we do not consider the case wt,i = z1t,i whenever we

deal in this proof with partial sums that involve terms (at − at) because this case adds no

additional insights. Note that z′tΠ0t,i1

=∑q

i2=1 zt,i2Π0t,i1,i2

, where Π0t,i1,i2

is the (i1, i2) ele-

ment of Π0t . Since Π0

t = O(1), showing that T α−1/2T−11λ

∑1λ utzt,kzt,k∗wt,i = oP (1) u.λ. is

equivalent to showing T α−1/2T−11λ

∑1λ utzt,kzt,k∗zt,i2 = T α−1/2T−1

1λ

∑1λ ht,kzt,k∗zt,i2 = oP (1) and

T α−1/2T−11λ

∑1λ utzt,kzt,k∗vt,i1 = T α−1/2T−1

1λ

∑1λ ht,kht,l = oP (1) for some indexes k, k∗, i2, l.

Using Markov’s inequality, followed by the triangle inequality, Holder’s inequality (implicitly

applied twice), and in the last step using the moment conditions in A11(i), there is a constant

C > 0 such that

P (supλ |T α−1/2T−11λ

∑1λ ht,kzt,k∗zt,i2 | > ǫ) ≤ T α−1/2E(supλ |T−1

1λ

∑1λ ht,kzt,k∗zt,i2 |)/ǫ

≤T α−1/2 supt E |ht,kzt,k∗zt,i2 | ≤ T α−1/2 supt ||ht,k||2 supt ||zt,k∗||4 supt ||zt,i2||4/ǫ < T α−1/2C/ǫ→ 0.

Therefore, T α−1/2T−11λ

∑1λ ht,kzt,k∗zt,i2 = oP (1) u.λ. Similarly, T α−1/2T−1

1λ

∑1λ ht,kht,l = oP (1).

So, T α−1/2T−11λ

∑1λ utzt,kzt,k∗wt,i = oP (1) u.λ., implying that A1 = oP (1). By very similar

arguments, using the moment conditions supt ||zt,i||4 < ∞ and supt ||ht,i||2 < ∞ in A11(i),

one can show that A2 = oP (1). Therefore, T−11λ

∑1λ(at − at) = oP (1). Substituting this into

(B.24), we get:

µ1λ − µ01 = OP (T

α−1/2). (B.26)

As for µ2λ, recalling that δ0µ = µ02 − µ0

1, we have:

µ2λ − µ02 = T−1

2λ

∑2λ(at − µ0

2) = T−12λ

∑∆(at − µ0

2) + T−12λ

∑2λ0(at − µ0

2)

= T−12λ

∑∆(et + µ0

1 − µ02 + at − at) + T−1

2λ

∑2λ0(et + at − at)

= T−12λ

∑2λ et − [λ

0−λ1−λ δ

0µ +OP (T

−1)] + T−12λ

∑2λ(at − at)

= OP (T−1/2)− λ0−λ

1−λ δ0µ +OP (T

α−1/2)

= −λ0−λ1−λ δ

0µ +OP (T

α−1/2) = OP (1), (B.27)

where the last equality follows by A11(iii) which states that δ0µ is fixed. Similarly,

µ2λ − µ01 = δ0µ − λ0−λ

1−λ δ0µ +OP (T

α−1/2) = 1−λ01−λ δ

0µ +OP (T

α−1/2) = OP (1). (B.28)

Substituting (B.26)-(B.28) into (B.23), we have:

L3 = λ(µ1λ − µ01)

2 + (λ0 − λ)(µ01 − µ2λ)

2 + (1− λ0)(µ02 − µ2λ)

2

= OP (T2α−1) +OP (1) +OP (1) = OP (1) (B.29)

= oP (1) +[(λ0−λ)(1−λ0)2

(1−λ)2 + (1−λ0)(λ0−λ)2(1−λ)2

](δ0µ)

2 = (λ0−λ)(1−λ0)1−λ (δ0µ)

2 + oP (1). (B.30)

53

As for L5,

L5 = 2T−1∑T

t=1 et(µ0t − µt)

= 2T−1∑

1λ et(µ01 − µ1λ) + 2T−1

∑∆ et(µ

01 − µ2λ) + 2T−1

∑2λ et(µ

02 − µ2λ)

= OP (T−1/2)OP (T

α−1/2) +OP (T−1/2)OP (1) +OP (T

−1/2)OP (1)

= OP (T−1/2). (B.31)

Finally, using (B.26)-(B.28),

L6 = 2T−1∑T

t=1(at − at)(µ0t − µt)

= 2[T−1∑

1λ(at − at)] (µ01 − µ1λ) + 2[T−1

∑∆(at − at)] (µ

01 − µ2λ)

+ 2[T−1∑

2λ(at − at)] (µ02 − µ2λ)

= oP (1)OP (Tα−1/2) + oP (1)OP (1) + oP (1)OP (1)

= oP (Tα−1/2). (B.32)

From (B.29) and (B.31)-(B.32), it is clear that L3 dominates L5 and L6 since it is OP (1).

Because (δ0µ)2 > 0, from (B.30), it follows that u.λ.,

L∗BP (λ, at)

P→ (λ0−λ)(1−λ0)1−λ (δ0µ)

2,

where (λ0−λ)(1−λ0)1−λ (δ0µ)

2 is positive for λ ≤ λ0, and uniquely minimized at λ = λ0. It can be

shown that when λ > λ0, the limit of L∗BP (λ, at) is also minimized at λ = λ0, so λ

P→ λ0.

Part (i). Convergence rate of λ. The proof is similar to the other proofs for rates of

convergence of break-point estimators. We let L1 = LBP (λ, at), L2 = LBP (λ0, at), and L3 =

LBP (λ, λ0, at), where λ ≤ λ0. The corresponding sub-sample mean estimators are µ1λ, µ1λ

for L1, µ1λ0 , µ2λ0 for L2, and µ1λ, µ∆, µ2λ0 for L3. By consistency, T − T 0 < ǫT , and assume

T 0 − T > C for some large enough C > 0. Then using similar arguments to the other proofs,

T (T 0 − T )−1(L1 − L3) = (µ2λ0 − µ∆)2 − (µ2λ0 − µ∆)

2 (λ0−λ)1−λ = N1 −N2

T (T 0 − T )−1(L2 − L3) = (µ1λ − µ∆)2 − (µ1λ − µ∆)

2 (λ0−λ)λ

= N3 −N4.

Because T 0 − T < ǫT , for small enough ǫ, we show below that N1 dominates N2 and N3

dominates N4. Previously, we showed that µ1λ − µ01 = OP (T

α−1/2) = oP (1). Because µ∆

is also a sub-sample estimator of µ01, for C large enough, µ∆ − µ0

1 = OP (Tα−1/2) = oP (1).

Therefore, N3 = (µ1λ − µ∆)2 = [µ1λ − µ0

1 − (µ∆ − µ01)]

2 = OP (T2α−1) = oP (1). On the other

hand, µ2λ0 − µ02 = oP (1), therefore, µ2λ − µ0

1 = δ0µ + oP (1). Therefore, N1 = (µ2λ0 − µ∆)2 =

[δ0µ + oP (1)]2 = (δ0µ)

2 + oP (1) = OP (1). Hence, N1 dominates N2, N3, N4, so with positive

probability for large C and small ǫ,T (T 0 − T )−1(L1 − L2) > 0 because:

T (T 0 − T )−1(L1 − L2) = N1 + oP (1) = (δ0µ)2 + oP (1).

54

This cannot happen because by definition L1 − L2 ≤ 0, so it must be that T 0 − T ≤ C.

Similarly, it can be shown that for T > T 0, T − T 0 < C, therefore |T − T 0| = OP (1).

• Part (ii). Consistency of Σe,t. First, note that now at, at, et and µiλ are now treated as

vectors rather than scalars because we are analyzing λ which is a multivariate break-point

estimator.

We first show that Σe,tP→ Σe. In the first definition of Σe,t, it is estimated over the

full sample: Σe,t = Σe = T−1∑

1λ(at − µ1λ)(at − µ1λ)′ + T−1

∑2λ(at − µ2λ)(at − µ2λ)

′ =

T−1∑T

t=1(at − µtλ)(at − µtλ)′, where µtλ = µ1λ1[t ≤ T ] + µ2λ1[t > T ]. Following the same

steps as in the consistency proof for λ, one can show that:

Σe = T−1∑T

t=1 ete′t + T−1

∑Tt=1(at − at)(at − at)

′

+ T−1∑T

t=1(µ0t − µtλ)(µ

0t − µtλ)

′ + 2T−1∑T

t=1 et(at − at)′ + oP (1)

= L1 + L2 + L3 + 2L4 + oP (1),

where we already showed that L3 =(λ0−λ)(1−λ0)

1−λ (δ0µ)2 + oP (1) = OP (1). Therefore,

Σe = L1 + L2 + 2L4 + oP (1). (B.33)

First analyze L1. By A11(i)-(ii), and the weak law of large numbers (WLLN) for near-epoch

dependent processes,

L1 = T−1∑T

t=1 ete′t = Σe + oP (1) = OP (1).

Next, we analyze L2. From (B.25),

L2 = T−1∑T

t=1(at − at)(at − at)′ = T−1

∑Tt=1[2utjt(w

′tδ) + jt(w

′tδ)

2][2utjt(w′tδ) + jt(w

′tδ)

2]′

= 4T−1∑T

t=1 u2t jtj

′t(w

′tδ)

2 + 4T−1∑T

t=1 utjtj′t(w

′tδ)

3 + T−1∑T

t=1 jtj′t(w

′tδ)

4

= 4A3 + 4A4 +A5.

By Markov’s inequality and repeated application of Holder’s inequality, very similar to showing

T−11λ

∑Tt=1(at − at) = oP (1), it can be shown that A3 = oP (1), A4 = oP (1) and A5 = oP (1)

(the only difference is that to do so, we need the moment conditions supt ||zt,i||8 < ∞, and

supt ||ht,i||4 < ∞). Therefore, L2 = oP (1). Similarly, using ||et,i||4 < ∞, it can be shown that

L4 = T−1∑T

t=1 et(at − at)′ = oP (1).

Since L1 = Σe + oP (1), L2 = oP (1), and L4 = oP (1), from (B.33), we have:

Σe = Σe + oP (1).

Similarly, it can be shown that because λ− λ0 = OP (T−1), we have that Σe,i = T−1

∑iλ(at −

µiλ)(at − µiλ) = T−1∑

iλ0(at − µiλ)(at − µiλ) = Σe + oP (1), therefore Σe,t = Σe,11[t ≤ T ] +

Σe,21[t > T ] = Σe + oP (1). This concludes the proof of consistency of Σe,t.

55

• Part (ii). Consistency of λ. The proof largely follows the same steps as the proof for the

consistency of λ. Let µt = µ1λ1[t < T1λ] + µ2λ1[t > T1λ], where µiλ = (∑

iλ Σe,t)−1∑

iλ Σe,tat

are the GLS estimators of the mean of at before and after [Tλ] = T1λ. The FGLS objective

function evaluated at the FGLS parameter estimators µiλ is:

LFGLS(λ, at) = T−1∑T

t=1[et + (at − at) + (µ0t − µt)]

′Σ−1e,t [et + (at − at) + (µ0

t − µt)]

= T−1∑T

t=1 e′tΣ

−1e,t et + T−1

∑Tt=1(at − at)

′Σ−1e,t (at − at)

+ T−1∑T

t=1(µ0t − µt)

′Σ−1e,t (µ

0t − µt) + 2T−1

∑Tt=1 e

′tΣ

−1e,t (at − at)

+ 2T−1∑T

t=1 e′tΣ

−1e,t (µ

0t − µt) + 2T−1

∑Tt=1(at − at)Σ

−1e,t (µ

0t − µt) =

∑6i=1 Li.

As before, L1,L2 and L4 do not depend on λ, so

minλ

LFGLS(λ, at) = minλ

[L∗FGLS(λ, at) = L3 + L5 + L6]

Because Σe,t = Σe + oP (1), we can follow the same steps as for the proof of consistency of λ

to show that L3 = OP (1), while L5 = oP (1) and L6 = oP (1) and are therefore dominated by

L3. We can also follow the same steps as before to show that L3 is

L3 =(λ0−λ)(1−λ)

1−λ0 δ0′

µ Σ−1e δ0µ + oP (1),

uniformly in λ < λ0. Since Σe is pd, Σ−1e is pd, and because δ0µ 6= 0, δ0

′

µ Σ−1e δ0µ > 0. Therefore,

(λ0−λ)(1−λ)1−λ0 δ0

′

µ Σ−1e δ0µ is uniquely minimized at λ = λ0. It follows that λ

P→ λ0.

• Part (ii). Rate of convergence of λ. Using Σe,tP→ Σe, the proof follows the same steps as

the proof for convergence of λ and is omitted for simplicity.

Proof of Theorem 6.

• Part (i). We start by analyzing λ = λBPRF , and therefore we treat Xt as a scalar. From

the RF with Π = Πi and α = αi for i = 1, 2, we have:

λ = minλ∈I LBP(λ, Πiλ

)= minλ∈I T

[LBP

(λ, Πiλ

)− LBP (Π)

]+ oP (1),

where LBP (Π) is the full sample RF SSR evaluated at the full sample estimator of Π0 = Π/rT

which we denote by Π.

T[LBP

(λ, Πiλ

)−LBP (Π)

]

=∑2

i=1

∑iλ(Xt − Z ′

tΠiλ)2 −∑T

t=1(Xt − Z ′tΠ)

2

=∑

1λ[vt − Z ′t(Π1λ − Π0)]2 +

∑2λ[vt − Z ′

t(Π2λ −Π0)]2 −∑Tt=1[vt − Z ′

t(Π− Π0)]2

= −2∑

1λ vtZ′t(Π1λ − Π0) + (Π1λ − Π0)′ (

∑1λ ZtZ

′t) (Π1λ − Π0)

− 2∑

2λ vtZ′t(Π2λ − Π0) + (Π2λ − Π0)′ (

∑2λ ZtZ

′t) (Π2λ − Π0)

+ 2∑T

t=1 vtZ′t(Π− Π0)− (Π− Π0)′

(∑Tt=1 ZtZ

′t

)(Π− Π0)

= −∑1λ vtZ′t(Π1λ − Π0)−∑2λ vtZ

′t(Π2λ − Π0) +

∑Tt=1 vtZ

′t(Π− Π0).

56

T[LBP

(λ, Πiλ

)− LBP (Π)

]= −

(T−1/2

∑1λ vtZ

′t

)(T−1

∑1λ ZtZ

′t)

−1T−1/2 (

∑1λ Ztvt)

−(T−1/2

∑2λ vtZ

′t

)(T−1

∑2λ ZtZ

′t)

−1 (T−1/2

∑2λ Ztvt

)

+(T−1/2

∑Tt=1 vtZ

′t

)(T−1

∑Tt=1 ZtZ

′t

)−1 (T−1/2

∑Tt=1 Ztvt

). (B.34)

By A1, A4-A6 and the FCLT, T−1/2∑

1λ Ztvt ⇒ N∗1/2v Bh,q+1:2q(λ) and T−1/2

∑2λ Ztvt ⇒

N∗1/2v [Bh,q+1:2q(1) − Bh,q+1:2q(λ)], where Bh(λ) and N∗

v were defined at the beginning of sec-

tion 5. By A3 and A6, T−1∑

1λ ZtZtP→ λQ and T−1

∑2λ ZtZt

P→ (1 − λ)Q, u.λ, and

T−1∑T

t=1 ZtZtP→ Q. Substituting these into (B.34),

T[LBP

(λ, Πiλ

)− LBP (Π)

]⇒ −λ−1Bh,q+1:2q(λ)

′N∗1/2v Q−1N

∗1/2v Bh,q+1:2q(λ)

− (1− λ)−1[Bh,q+1:2q(1)−Bh,q+1:2q(λ)]′N

∗1/2v Q−1N

∗1/2v [Bh,q+1:2q(1)− Bh,q+1:2q(λ)]

+B′h,q+1:2q(1)N

∗1/2v Q−1N

∗1/2v Bh,q+1:2q(1)

= −[λ(1 − λ)]−1[Bh,q+1:2q(λ)− λBh,q+1:2q(1)]

′(N∗v )

1/2Q−1(N∗v )

1/2[Bh,q+1:2q(λ)− λBh,q+1:2q(1)].

Letting Φ = (N∗v )

1/2Q−1(N∗v )

1/2, we have:

λBPRF ⇒ arg infλ∈Λǫ −[λ(1 − λ)]−1 [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]′ Φ [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]

= arg supλ∈Λǫ[λ(1− λ)]−1 [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]

′ Φ [Bh,q+1:2q(λ)− λBh,q+1:2q(1)]= D(Bh,q+1:2q(λ),Φ).

• Part (i). We continue by analyzing λRF . Since there is no RF break, βiλ are subsample

estimators of β0 = β01 = β0

2 . Therefore, by standard arguments, it can be shown that Σv,tP→ Σv

u.λ. regardless of what candidate break-point λ is used to compute Σv,t. Therefore, Σv,t =

Σv + oP (1) also when λ = λBPRF is used to construct Σv,t. Letting λ = λRF , we have:

λ = minλ∈I LFGLS(λ, βiλ

)= minλ∈I T

[LFGLS

(λ, βiλ

)−LFGLS(β)

],

where LFGLS(β) is the full sample RF FGLS objective function evaluated at the full sample

estimator of β0 = vec(Π0), which we denote by β.

T[LFGLS

(λ, Πiλ

)−LFGLS(Π)

]

=∑2

i=1

∑iλ(Xt − Z ′

tβiλ)′Σ−1

v,t (Xt − Z ′tβiλ)−

∑Tt=1(Xt − Z ′

tβ)′Σ−1

v,t (Xt − Z ′tβ).

Using similar calculations as for the BP estimator and Σv,t = Σv + oP (1), one can show that:

T[LFGLS

(λ, βiλ

)− LFGLS(β)

]

= −∑2i=1

(T−1/2

∑iλ vt ⊗ Zt

)′[Σ−1

v ⊗ (T−1∑

iλ ZtZ′t)

−1]T−1/2 (∑

iλ vt ⊗ Zt)

+(T−1/2

∑Tt=1 vt ⊗ Zt

)′[Σ−1

v ⊗ (T−1∑T

t=1 ZtZ′t)

−1](T−1/2

∑Tt=1 vt ⊗ Zt

)+ oP (1)

57

= −(T−1/2

∑1λ vt ⊗ Zt

)′[Σ−1

v ⊗ (λQ)−1]T−1/2 (∑

1λ vt ⊗ Zt)

−(T−1/2

∑2λ vt ⊗ Zt

)′ Σ−1v ⊗ [(1− λ)Q]−1T−1/2 (

∑2λ vt ⊗ Zt)

+(T−1/2

∑Tt=1 vt ⊗ Z ′

t

)[Σ−1

v ⊗Q−1]T−1/2(∑T

t=1 vt ⊗ Zt

)+ oP (1)

= T−1/2(∑

1λ vt ⊗ Zt − λ∑T

t=1 vt ⊗ Zt

)′(Σ−1

v ⊗Q−1)T−1/2(∑

1λ vt ⊗ Zt − λ∑T

t=1 vt ⊗ Zt

)

+ oP (1).

Under A1, A4 and A6, by the FCLT, T−1/2∑

1λ vt⊗Zt ⇒ N1/2v Bh,q+1:(p2+1)q(λ). Let Bvz(λ) =

Bh,q+1:(p2+1)q(λ). Then

[LFGLS

(λ, βiλ

)− LFGLS(β)

]⇒ −[Bvz(λ)− λBvz(1)]

′N1/2v (S−1

v ⊗Q−1)N1/2v [Bvz(λ)− λBvz(1)],

so

λ⇒ arg supλ∈I [λ(1− λ)]−1[Bvz(λ)− λBvz(1)]′N

1/2v (S−1

v ⊗Q−1)N1/2v [Bvz(λ)− λBvz(1)]

= D(Bvz(λ), N1/2v (S−1

v ⊗Q−1)N1/2v ).

Note that under conditional homoskedasticity, Nv = Sv ⊗Q, so N1/2v (S−1

v ⊗Q−1)N1/2v = Ip2q.

• Part (ii). This part can be shown by exactly the same steps as in part (i) and is omitted.

• Part (iii). The proof for a VMC break is slightly more complicated because of the estimation

error (at − at). We start with λ = λBPVMC , therefore treating at = jtu2t as a scalar. As before,

λ = argminλ∈[ǫ,1−ǫ] T [LBP (λ, µiλ, at)−LBP (µ, at)],

where LBP (λ, µiλ, at) is the BP objective function from estimating (4.2), using at = jtu2t

instead of at = jtu2t , LBP (µ, at) is the OLS objective function obtained from estimating (4.2)

over the full sample, using at instead of at, and µ = T−1∑T

t=1 at. We have:

T [LBP (λ, µiλat)− LBP (µ, at)] =∑2

i=1

∑iλ(at − µiλ)

2 −∑Tt=1 (at − µ)2

=∑2

i=1

∑iλ(µ− µiλ)(2at − µiλ − µ) =

∑2i=1(µ− µiλ)Tiλ(µiλ − µ).

Now, µ = T−1∑T

t=1 at = T−1∑2

i=1

∑iλ at = λµ1λ + (1 − λ)µ2λ. Therefore, µ − µ1λ =

(1− λ)(µ2λ − µ1λ), and µ− µ2λ = λ(µ1λ − µ2λ). Therefore,

T [LBP (λ, µiλ, at)−LBP (µ, at)] = −λ(1− λ)2T (µ1λ − µ2λ)2 − λ2(1− λ)T (µ1λ − µ2λ)

2

= −λ(1− λ)T (µ1λ − µ2λ)2. (B.35)

Let µ0 = µ01 = µ0

2. Because at = at − at + µ0 + et,

T 1/2(µ1λ − µ2λ) = T−11λ

∑1λ at − T−1

1λ

∑1λ at

= T 1/2[T−11λ

∑1λ(at − at)− T−1

1λ

∑1λ(at − at)

]+ T 1/2

[T−11λ

∑1λ et − T−1

2λ

∑2λ et

]

= B1 + B2. (B.36)

58

We first show that B1 = oP (1) u.λ. To that end, recall that jt = zt,kzt,k∗ for some k, k∗ ∈1, . . . , q. Then:

T−1/2∑

1λ(at − at) = T−1/2∑

1λ utjtw′tδ + T−1/2

∑1λ jt(w

′tδ)

2

=∑p

i=1 T−1/2

∑1λ ht,kzt,kwt,iδi +

∑pi,j=1 T

−1/2∑

1λ ht,kzt,kwt,iwt,j δiδj

≡∑pi=1 L∗

i δi +∑p

i,j=1L∗i,j δiδj. (B.37)

We now analyze L∗i and L∗

i,j in turn. By A12,

L∗i = T−1/2

∑1λ ht,kzt,kwt,i = T−1/2

∑1λ[ht,kzt,kwt,i − E(ht,kzt,kwt,i)] + T−1/2

∑1λ E(ht,kzt,kwt,i)

= T−1/2∑

1λ[ht,kzt,kwt,i − E(ht,kzt,kwt,i)] + λT 1/2ℓ(1)i + o(1). (B.38)

By A12, [ht,kzt,kwt,i − E(ht,kzt,kwt,i)] is a mean-zero mixing process with the rates defined

in A12. If supt ||ht,kzt,kwt,i − E(ht,kzt,kwt,i)||a < ∞ for some a > 2, then by the FCLT for

mixing processes (special case of the FCLT in Wooldridge and White (1988), Theorem 2.11),

T−1/2∑

1λ[ht,kzt,kwt,i−E(ht,kzt,kwt,i)] = OP (1). We will only consider the case wt,i = xt,i1 , for

i1 = i − p1, because the case wt,i = z1t,i leads to no additional insights. We now verify that

supt ||ht,kzt,kxt,i1 − E(ht,kzt,kxt,i1)||a <∞ for a defined in A12. By the triangle inequality and

Holder’s inequality, for some s ∈ q + 1, . . . , (p2 + 1)q, we have:

supt ||ht,kzt,kxt,i1 − E(ht,kzt,kwt,i1)||a ≤ 2 supt ||ht,kzt,kxt,i1 ||a= 2 supt ||

∑qi2=1 ht,kzt,kzt,i1Π

0t,i1,i2 + ht,kzt,kvt,i1 ||a = 2 supt ||

∑qi2=1 ht,kzt,kzt,i1Π

0t,i1,i2 + ht,kht,s||a

≤ 2∑q

i2=1 supt ||ht,kzt,kzt,i1 ||a |Π0t,i1,i2

|+ 2 supt ||ht,kht,s||a≤ 2

∑qi2=1 supt ||ht,k||2c supt ||zt,k||4a supt ||zt,i1||4a |Π0

t,i1,i2|+ 2 supt ||ht,k||2a supt ||ht,s||2a <∞.

Therefore, T−1/2∑

1λ[ht,kzt,kwt,i − E(ht,kzt,kwt,i)] = OP (1). Using this into (B.38), we have:

L∗i = OP (1) + λT 1/2ℓ

(1)i .

Similarly, using again A12, it can be shown that:

L∗i,j = OP (1) + λT 1/2ℓ

(2)i,j .

Substituting these last two equations into (B.37), we have:

T−1/2∑

1λ(at − at) =∑p

i=1OP (1)δi + λT 1/2∑p

i=1 ℓ(1)i δi +

∑pi,j=1OP (1)δiδj + λT 1/2

∑pi,j=1 ℓ

(2)i,j δiδj

= oP (1) + λT 1/2∑p

i=1 ℓ(1)i δi + λT 1/2

∑pi,j=1 ℓ

(2)i,j δiδj

Similarly, it can be shown that:

T−1/2∑

2λ(at − at) = oP (1) + (1− λ)T 1/2∑p

i=1 ℓ(1)i δi + (1− λ)T 1/2

∑pi,j=1 ℓ

(2)i,j δiδj.

59

Therefore,

B1 = T 1/2[T−11λ

∑1λ(at − at)− T−1

1λ

∑1λ(at − at)

]

= oP (1) + T 1/2∑p

i=1 ℓ(1)i δi + T 1/2

∑pi,j=1 ℓ

(2)i,j δiδj

− oP (1)− T 1/2∑p

i=1 ℓ(1)i δi − T 1/2

∑pi,j=1 ℓ

(2)i,j δiδj = oP (1),

u.λ. Therefore, the estimation error B1 plays no role in the asymptotic distribution of the

VMC break-point estimator. On the other hand, by A11(i)-(ii) and the FCLT, T−1/2∑

1λ et ⇒Σ

∗1/2e Be,1(λ), where Σ∗

e is the (1,1) element of Σe, a scalar. Therefore,

B2 = T 1/2[T−11λ

∑1λ et − T−1

2λ

∑2λ et

]= λ−1T−1/2

∑1λ et − (1− λ)−1

∑2λ et + oP (1)

= [λ(1− λ)]−1T−1/2(∑

1λ et − λ∑T

t=1 et

)+ oP (1). (B.39)

Substituting B1 = oP (1) and (B.39) into (B.36), we have that T 1/2(µ1λ− µ2λ) = oP (1)+[λ(1−λ)]−1

(T−1/2

∑1λ et − λ

∑Tt=1 et

). Using this in (B.35), we have:

− T [LBP (λ, µiλ, at)−LBP (µ), at]

= [λ(1− λ)]−1[T−1/2

(∑1λ et − λ

∑Tt=1 et

)]2+ oP (1) (B.40)

⇒ Σ∗e [λ(1− λ)]−1[Be,1(λ)− λBe,1(1)]

2.

Because the scalar Σ∗e plays no role in maximizing −T [LBP (λ, µiλ, at)−LBP (µ, at)] over λ,

λ⇒ argmaxλ∈[ǫ,1−ǫ] [λ(1− λ)]−1[Be,1(λ)− λBe,1(1)]2 = D(Be,1(λ), 1).

• Part (iii). Similarly to the proof in part (i), it can be shown that Σe,tP→ Σe. Then, treating

et, at, at as vectors and following the same steps as for the BP VMC break-point estimator, it

can be shown that:

T [LFGLS(λ, µiλ)− LFGLS(µ)]

= −[λ(1− λ)]−1[T−1/2

(∑1λ et − λ

∑Tt=1 et

)]′Σ−1e

[T−1/2

(∑1λ et − λ

∑Tt=1 et

)]+ oP (1)

λVMC ⇒ argmaxλ∈[ǫ,1−ǫ] [λ(1− λ)]−1[Be(λ)− λBe(1)]′[Be(λ)− λBe(1)] = D(Be(λ), Iq(q+1)/2).

Proof of Theorem 7.

• Part (i). This result is a special case of the limit distribution derived in Theorem 1. In

Theorem 1, imposing A6 and no RF break ( Πa = Πai , α = αi), we have:

VGMM =[(Πa′

1 M1 +Πa′

2 M2)(Nu)−1(M1Π

a1 +M2Π

a2)]−1

=[(Πa′λQ +Πa′(1− λ)Q)(Nu)

−1(λQΠa + (1− λ)QΠa)]−1

=[Πa′QN−1

u QΠa]−1

= V ∗.

60

• Part (ii). Under no break, using (B.1) but with Z defined just after (3.1) replacing Z, we

have:

θB−GMM − θ0 = RT [RTT−1W ′Z(Na

u)−1Z ′WT−1RT ]

−1RTT−1W ′Z(Na

u)−1[T−1Z ′U ] (B.41)

= [RTT−1W ′Z(Na

u)−1Z ′WT−1RT ]

−1RTT−1W ′Z(Na

u)−1[rTT

−1Z ′U ] + oP (1).

By the FCLT, T−1/2∑

iλ Ztut = OP (1), so rTT−1∑

iλ Ztut = oP (1) u. λ. So, rTT−1Z ′u =

[rTT−1∑

1λ Z′tut, rTT

−1∑

2λ Z′tut]

′ = oP (1). As in the proof of Theorem 1, RTT−1W ′Z =

OP (1), and (Nau)

−1 = OP (1), so θB−GMMP→ θ0. We first state the limit distribution of B-

GMM conditional on the break-fraction estimator (which we call τ , and with could be either

λBPRF or λRF ). Following the same steps as for Theorem 1(ii), under A1, A3, and A4,

ΛT (θB−GMM − θ0)D→ N (0, Vτ),

where Vτ =[Πa′(M1τN

−1u,1τM1τ +M2τN

−1u,2τM2τ )Π

a]−1

.

By A6, M1τ = τQ, M2τ = (1 − τ)Q, Nu,1τ = τNu and Nu,2τ = (1 − τ)Nu. Therefore,

Vτ =[Πa′QN−1

u QΠa]−1

= V ∗, and ΛT (θB−GMM − θ0)D→ N (0, V ∗). We now show that τ

and ΛT (θB−GMM − θ0) are asymptotically independent. We only show this for the BP break-

fraction estimator, τ = λ; the proof for τ = λ follows by very similar arguments. Using the

fact that T−1∑

1λ ZtZtP→ λQ, T−1

∑2λ ZtZt

P→ (1 − λ)Q u.λ. and T−1∑T

t=1 ZtZtP→ Q into

(B.34), and rearranging terms, we obtain:

− T[LBP

(τ, Πiλ

)− LBP (Π)

]

= [λ(1− λ)]−1(T−1/2

∑1τ Ztvt − τ

∑Tt=1 Ztvt

)′Q−1

(T−1/2

∑1τ Ztvt − τ

∑Tt=1 Ztvt

)+ oP (1).

On the other hand, using (B.41) and Assumption A6,

ΛT (θB−GMM − θ0) = [(V ∗)−1Πa′QN−1u ]T−1/2

∑Tt=1 Ztut.

Now,(T−1/2

∑1τ Ztvt − τ

∑Tt=1 Ztvt

)and T−1/2

∑Tt=1 Ztut have asymptotic covariance τNuv−

τNuv = 0 under A6, and are jointly asymptotically normally distributed by the FCLT for all τ

(applied under A1, A4 and A6). Therefore, they are asymptotically independent, and hence,

so is λ and ΛT (θB−GMM − θ0). Therefore, the unconditional limit distribution of B-GMM is

the same as the limit distribution conditional on τ : ΛT (θB−GMM − θ0)D→ N (0, V ∗).

• Part (iii). Let τ = λBPSMI or τ = λBPVMC . Then following the same steps as in part (ii),

ΛT (θB−GMM − θ0)D→ N (0, V ∗) conditional on τ . From (B.40) (or a similar equation for the

SMI break-point estimator), we have:

− T [LBP (τ, µiλ)− LBP (µ)] = [λ(1− λ)]−1(T−1/2

∑1τ et − τ

∑Tt=1 et

)2+ oP (1).

Note tha T−1/2∑T

t=1 ztut and T−1/2

(∑1τ et − τ

∑Tt=1 et

)have asymptotic covariance τNue −

τNue = 0 under A13, and are jointly asymptotically normally distributed by the FCLT for

61

all τ - under A1 and either A10(i)-(ii) or A11(i)-(ii). Therefore, they are asymptotically inde-

pendent, so λBPSMI and θB−GMM are asymptotically independent, and λBPVMC and θB−GMM are

asymptotically independent. Therefore, also unconditionally, ΛT (θB−GMM − θ0)D→ N (0, V ∗).

A similar argument holds for λSMI and λVMC .

C Extended framework

In this section, we consider the extended framework of section 7.4 where the moment conditions

become nonlinear in the parameters of interest, but still linear with respect to (nonlinear)

transformations of these parameters. Specifically, using the notations introduced in section 2,

we consider the following model,

yt = Z ′1tθ

0z1 +X ′

tθ0x + ut =W ′

tθ0 + ut

X ′t = Z ′

tΠ∗ + v′t

where we are not directly interested in estimating the vector of p parameters θ0, but rather

some underlying (structural) parameters, say η0, that are connected to θ0 by non-linear trans-

formations. Similarly to the application considered in section 7.4, we consider

θ0 = f(η0) with

(θ0z1θ0x

)=

(f1(η

0)

f2(η0)

),

where f(.) is a non-linear vectorial function and η0 is a vector of pη parameters of interest (e.g.

pη ≤ p) with ∂f(η0)/∂η′ full column rank matrix. The associated standard optimal GMM

estimator minimizes

LGMM(η) = gT (η)′N−1

u gT (η) with gT (η) =1

T

T∑

t=1

Zt(yt −W ′tf(η)) ,

where NuP→ Nu = AVar[T 1/2gT (η

0)]. The moment conditions remain affine with respect to a

(nonlinear) transformation of the parameters,

gT (η) = AT +BTf(η) .

This feature greatly simplifies our analysis which remains close to the linear model already

discussed. For example, the probability limit of the GMM minimand writes

PlimT

[12

∂L2

GMM (η0)

∂η∂η′

]= ∂f ′(η0)

∂η

(PlimTT

−1∑T

t=1WtZ′t

)N−1u

(PlimTT

−1∑T

t=1 ZtW′t

)∂f(η0)∂η′

+[

∂2

∂η∂η′f ′(η0)

] (PlimTT

−1∑T

t=1WtZ′t

)N−1u gT (η

0).

Under our maintained assumption of stability of the structural parameter η0 over time, the

limiting Hessian changes, in general, whenever Π∗, Q, or Nu change over the sample, which

62

is similar to the linear framework considered in section 2. Hence, similarly to the linear

framework, the associated change-points (or breaks) can be interacted with the instruments

to construct more moment conditions and more efficient GMM estimators for η0 as we show

below.

We focus on the model of section 3.1 where the RF has one break-point at TRF . The

B-GMM estimator of η0 is then defined as

ηB−GMM = argminη

[g′T (η)(N

au)

−1gT (η)]

(C.1)

with

gT (η) =

( ∑TRF

t=1 Zt(yt −W ′tf(η))/TRF∑T

t=TRF+1 Zt(yt −W ′tf(η))/(T − TRF )

)and Na

uP→ AVar(

√T gT (η

0)) ,

while the (standard) full-sample GMM of η0 is defined as

ηGMM = argminη

[g′T (η)(Nu)

−1gT (η)]

(C.2)

with

gT (η) =

T∑

t=1

Zt(yt −W ′tf(η))/T and Nu

P→ AVar(√TgT (η

0)) .

Corollary A 1. (Extensions of Theorems 1 and 2)

Under Assumptions A1 to A5, the B-GMM estimator ηB−GMM of η0 defined in (C.1) is at

least as efficient as the standard (full-sample) GMM estimator ηGMM defined in (C.2).

Proof of Corollary A1.

(i) Consistency of ηGMM and ηB−GMM :

- The consistency of ηGMM follows almost directly from Theorem 2.1 in Antoine and Renault

(2012; AR hereafter). It requires a slight extension of the AR framework. We now define and

adapt AR key quantities to the above framework:

ΨT (η) ≡√T (gT (η)−mT (η))

where

mT (η) = E(

1T

∑Tt=1 gt(η)

)with gt(η) = Zt(yt −W ′

tf(η))

= 1T

∑Tt=1E(ZtZ

′1t)(f1(η

0)− f1(η)) +1T

∑Tt=1E(ZtX

′t)(f2(η

0)− f2(η))

= 1T

∑Tt=1E(ZtZ

′1t)(f1(η

0)− f1(η)) + E(Z ′Z/T )

(Π1/T

α1

Π2/Tα2

)(f2(η

0)− f2(η))

where recall that Z is the (T, q) matrix with rows Z ′t and Z is the (T, 2q) matrix defined as

Z ′ =

(Z ′

1 · · · Z ′TRF

0 · · · 0

0 · · · 0 Z ′TRF+1 · · · ZT

). It is useful to rewrite mT (·) as follows:

mT (η) = m1(η) +1

rTm2(η)

63

with m1(η) =1T

∑Tt=1 E(ZtZ

′1t)(f1(η

0)− f1(η)) , rT = Tminα1,α2

and m2(η) =

E(Z ′Z/T )

(Π1

Π2

)(f2(η

0)− f2(η)) when α1 = α2

E(Z ′Z/T )

(Π1

0

)(f2(η

0)− f2(η)) + o(Tα1−α2) when α1 < α2

E(Z ′Z/T )

(0

Π2

)(f2(η

0)− f2(η)) + o(Tα2−α1) when α1 > α2

As a result, we have:

ΨT (η) =√T(gT (η)− ΛT√

Tρ(η)

)with ΛT =

[√TIq

...√TrTIq

]and ρ(η) =

(m1(η)

m2(η)

).

Strictly speaking, it is a slight extension of the AR framework who consider a square diagonal

matrix ΛT . The weak consistency of ηGMM follows directly from Theorem 2.1 in AR under

their assumptions 1 and 2. Assumption 1 in AR is an identification assumption, while AR’s

assumption 2 maintains a functional CLT on ΨT and sufficiently strong identification:

• ρ(η) = 0 ⇔ η = η0;

• ΨT (η) weakly converges towards a Gaussian stochastic process with mean zero;

• ΛT is deterministic; its minimal coefficient λT and maximal coefficient λT are such that

limTλT = ∞ and lim

TλT/

√T <∞ .

The regularity assumptions A1-A4 maintained in the present paper ensures that the AR

assumptions 1 and 2 hold; therefore, the consistency of ηGMM follows.

- The consistency of ηB−GMM follows from Theorem 2.1 in AR and the proof of Theorem 1 in

this paper. Recall that the definition of ηB−GMM requires replacing TRF (the unknown break-

point) by TRF (the estimated break-point). Therefore, we can follow the proof of Theorem 1

to show that it has no effect on the asymptotic properties of ηB−GMM . It remains to justify

that for given TRF (assumed known), we are back to AR’s (slightly extended) framework

introduced above. We start by rewriting ηB−GMM accordingly:

ηB−GMM(TRF ) = argminη

[g′T (η)(N

au)

−1gT (η)]

with

gT (η) =

( ∑TRF

t=1 Zt(yt −W ′tf(η))/TRF∑T

t=TRF+1 Zt(yt −W ′tf(η))/(T − TRF )

)and Na

uP→ AV ar(

√T gT (η

0)) .

Following the computations done in the above GMM case, we can now introduce AR key

quantities in the B-GMM case:

ΨT (η) =√T(gT (η)− ΛT√

Tρ(η)

)with ΛT =

[√TI2q

...√TrTI2q

], ρ(η) =

(m1(η)

m2(η)

),

64

and m1(η) =

( ∑1λ0 E(ZtZ

′1t)/TRF∑T

t=1+TRFE(ZtZ

′1t)/(T − TRF )

)(f1(η)− f1(η

0))

m2(η) =

( ∑1λ0 E(ZtZ

′t)Π1/TRF∑T

t=1+TRFE(ZtZ

′t)Π2/(T − TRF )

)(f2(η)− f2(η

0)) when α1 = α2

( ∑1λ0 E(ZtZ

′t)Π1/TRF

0

)(f2(η)− f2(η

0)) + o(T α1−α2) when α1 < α2

(0

1T−TRF

∑Tt=1+TRF

E(ZtZ′t)Π2

)(f2(η)− f2(η

0)) + o(T α1−α2) when α1 > α2

The regularity assumptions A1-A4 maintained in the present paper ensure that the AR

assumptions 1 and 2 hold on the newly defined ΨT (η) and ρ(η). And the consistency of

ηGMM(TRF ) follows.

(ii) Asymptotic theory of ηB−GMM and ηGMM :

We now derive the asymptotic distribution of ηB−GMM . We start with a mean-value expansion

of the moment conditions gT around η0,

gT (ηB−GMM ) = gT (η0) +

∂gT (η)

∂η′(ηB−GMM − η0)

with η between ηB−GMM and η0. We substitute it into the first order conditions,

∂g′T (ηB−GMM )

∂η(Na

u)−1gT (ηB−GMM) = 0 ,

to obtain

∂g′T (ηB−GMM )

∂η(Na

u)−1[gT (η

0) + ∂gT (η)∂η′

(ηB−GMM − η0)]= 0

⇒[∂g′T (ηB−GMM )

∂η(Na

u)−1 ∂g

′

T (η)

∂η

](ηB−GMM − η0) = −∂g′T (ηB−GMM )

∂η(Na

u)−1gT (η

0)

⇒ ∂f ′(ηB−GMM )∂η

[W ′Z(Na

u)−1Z ′W

]∂f(η)∂η′

(ηB−GMM − η0)

= −∂f ′(ηB−GMM )∂η

W ′Z(Nau)

−1Z ′U . (C.3)

since∂gT (η)

∂η′=

[ ∑TRF

t=1 ZtW′t/TRF∑T

t=TRF+1 ZtW′t/(T − TRF )

]∂f(η)

∂η′= Z ′W

∂f(η)

∂η′

where we use Z introduced in the consistency proof (i) above. We use (B.12) and the following,

RTW′Z = K + op(1) with RT =

(Ip1 0

0 rT Ip2

)and

√T Z ′U

D→ N (0, Nau) ,(C.4)

to rewrite (C.3) as:

∂f ′(ηB−GMM )∂η

R−1

T

[RTW

′Z(Nau)

−1Z ′WRT

]R

−1

T∂f(η)∂η′

√T (ηB−GMM − η0) (C.5)

= −∂f ′(ηB−GMM )

∂ηR

−1

T RTW′Z(Na

u)−1√T Z ′U (C.6)

65

Recall that the matrix RT is related to the identification strength of the moment conditions,

and as such to the rates of convergence of the estimators of f(η0) (that is θ0). In order to

conclude, we need to introduce another rescaling matrix which will play a role similar to RT ,

but that will be tied to the identification strength of estimators of η0, as explained in AR.

The general result concerning the existence of such rescaling matrix follows from Antoine

and Renault (2012). However, it is not straightforward to explicitly obtain such a rescaling

matrix in the general case. In what follows, we work in two special cases of interest to gain

some intuition: (i) we start with the special case of strong identification where such a rescaling

matrix is not needed; (ii) we focus on the mixed identification strength cases that are plausible

for our application in section 7.4.

(a) Case of strong identification where RT = Ip:

Starting from (C.6) after replacing RT by the identity matrix, we get:

∂f ′(ηB−GMM )∂η

[W ′Z(Na

u)−1Z ′W

]∂f(η)∂η′

√T (ηB−GMM − η0)

= −∂f ′(ηB−GMM )∂η

W ′Z(Nau)

−1√T Z ′U (C.7)

Using (C.4) and the consistency of ηB−GMM , we have:

∂f ′(ηB−GMM )

∂η

[W ′Z(Na

u)−1Z ′W

] ∂f(η)∂η′

P→ K(Nau)

−1K ′ with K =∂f ′(η0)

∂ηK ,

and K(Nau)

−1K ′ an invertible matrix of size pη. Then, it follows that,

√T (ηB−GMM − η0)

D→ N (0, VB−GMM)

with

VB−GMM =[K(Na

u)−1K ′

]−1

=[Πa′

1 M1(Nu,1)−1M ′

1Πa1 + Πa′

2 M2(Nu,2)−1M ′

2Πa2

]−1

,

Πai = Πa

i ∂f(η0)/∂η′, and Πa

i (i = 1, 2) defined in Theorem 1. Similarly, we can show that:

√T (ηGMM − η0)

D→ N (0, VGMM)

with

VGMM =[(M ′

1Πa1 +M ′

2Πa2)

′(Nu)−1(M ′

1Πa1 +M ′

2Πa2)]−1

.

(b) Model in section 7.4 with mixed identification strengths where the coefficients of the

endogenous variables may be weakly identified.

Let us first recall the linear model considered in section 7, where the main equation is:

πt = αc + αb,1πt−1 + αb,2πt−2 + αb,3πt−3 + αfπet + αyyt + ut = Z ′

1,tθz1 +X ′tθx + ut,

with θz1 = [αc, αb,1, αb,2, αb,3]′, θx = [αf , αy]

′, Z1t = [1, πt−1, πt−2, πt−3]′, Xt = [πet , yt]

′, p1 = 4,

and p2 = 2. In section 7.3, we estimate the slope parameters θz1 and θx: the model is linear in

66

the parameters. Direct applications of Theorems 1 and 2 deliver their B-GMM estimators with

associated convergence rates provided by√TR

−1

T (e.g.√T and

√T/rT , respectively) where

the underlying matrix RT writes RT =

(I4 0

0 rT I2

)with some rT = O(T α), 0 ≤ α < 1/2.

In section 7.4, we are not interested in estimating the above slope parameters, but rather

the deep structural parameters derived from the underlying macroeconomic model. The main

equation writes instead,

πt = ψc+

(ρ1 − ρ21 + ρ1

)πt−1+

(ρ2 − ρ31 + ρ1

)πt−2+

(ρ3

1 + ρ1

)πt−3+

(1

1 + ρ1

)πet +

3(1− θ)2

θ(1 + ρ1)yt+ ǫt

The vector of unknown parameters of interest is

η = [ψc, ρ1, ρ2, ρ3, θ]′ ,

while the above slope parameters are given by

f(η) =

[ψc ,

ρ1 − ρ21 + ρ1

,ρ2 − ρ31 + ρ1

,ρ3

1 + ρ1,

1

1 + ρ1,3(1− θ)2

θ(1 + ρ1)

]′

One can show that ∂f(η0)/∂η′ is full-column rank:

∂f(η)

∂η′=

1 0 0 0 0

0 1+ρ2(1+ρ1)2

−11+ρ1

0 0

0 ρ3−ρ2(1+ρ1)2

11+ρ1

−11+ρ1

0

0 −ρ3(1+ρ1)2

0 11+ρ1

0

0 −1(1+ρ1)2

0 0 0

0 −3(1−θ)2θ(1+ρ1)2

0 0 3(1−θ)(θ2−1)θ2(1+ρ1)

.

The first four moments are strongly identified and only depend on four unknown parameters,

ψc, ρ1, ρ2, and ρ3. Necessarily, ψc, ρ1, ρ2, and ρ3 are strongly identified, while θ may not be.

This suggests the following rescaling matrix,

Rη

T =

(I4 0

0 rT

).

We can now rewrite (C.6) as follows:

Rη

T∂f ′(ηB−GMM )

∂ηR

−1

T

[RTW

′Z(Nau)

−1Z ′WRT

]R

−1

T∂f(η)∂η′

Rη

T

√T (R

η

T )−1(ηB−GMM − η0)

= −Rη

T∂f ′(ηB−GMM )

∂ηR

−1

T RTW′Z(Na

u)−1√T Z ′U (C.8)

To conclude, we simply need to show that

R−1

T

∂f(η)

∂η′Rη

T

67

converges in probability to a full-column rank matrix when T goes to infinity. Direct compu-

tation along with the consistency of η yields:

Plim

[R

−1

T∂f(η)∂η′

Rη

T

]= Plim

1 0 0 0 0

0 1+ρ2(1+ρ1)2

−11+ρ1

0 0

0 ρ3−ρ2(1+ρ1)2

11+ρ1

−11+ρ1

0

0 −ρ3(1+ρ1)2

0 11+ρ1

0

0 −1rT (1+ρ1)2

0 0 0

0 −3(1−θ)2rT θ(1+ρ1)2

0 0 3(1−θ)(θ2−1)

θ2(1+ρ1)

=

1 0 0 0 0

0 1+ρ2(1+ρ1)2

−11+ρ1

0 0

0 ρ3−ρ2(1+ρ1)2

11+ρ1

−11+ρ1

0

0 −ρ3(1+ρ1)2

0 11+ρ1

0

0 0 0 0 0

0 0 0 0 3(1−θ)(θ2−1)θ2(1+ρ1)

≡ F (η0),

which is full-column rank. Similarly, Plim[R−1

T (∂f(ηB−GMM )∂η′)Rη

T ] = F (η0). And, the

asymptotic distribution directly follows:

√TR

η

T (ηB−GMM − η0)D→ N (0, VB−GMM)

with

VB−GMM = [F ′(η0)K(Nau)

−1K ′F (η0)]−1

= F ′(η0) [Πa′1 M1(Nu,1)

−1M ′1Π

a1 +Πa′

2 M2(Nu,2)−1M ′

2Πa2]F (η

0)−1

Similarly, we can show that:

√T (ηGMM − η0)

D→ N (0, VGMM)

with

VGMM =[F ′(η0)(M ′

1Πa1 +M ′

2Πa2)

′(Nu)−1(M ′

1Πa1 +M ′

2Πa2)F (η

0)]−1

.

The asymptotic variance-covariance matrices of B-GMM and GMM are very similar to the

formulas obtained for the linear model. It is then straightforward to show that

VB−GMM ≤ VGMM

by following the proof of Theorem 2.

68

Eﬃcient Estimation with Time-Varying Information and the ...baa7/research/AntoineBoldea201708.pdf · [email protected] August 8, 2017 Abstract Decades of empirical evidence suggest

Documents

Eﬃcient Estimation with Time-Varying Information and the ...baa7/research/AntoineBoldea201708.pdf · [email protected] August 8, 2017 Abstract Decades of empirical evidence suggest