Spatial Dynamic Panel Data Models with Interactive Fixed ... · Spatial Dynamic Panel Data Models with Interactive Fixed EffectsI Wei Shi Department of Economics, The Ohio State University,

Spatial Dynamic Panel Data Models with Interactive Fixed EffectsI

Wei Shi

Department of Economics, The Ohio State University, Columbus, Ohio 43210.

Lung-Fei Lee

Department of Economics, The Ohio State University, Columbus, Ohio 43210.

Abstract

This paper studies the estimation of a dynamic spatial panel data model with interactive individual and time

effects with large n and T . The model has a rich spatial structure including contemporaneous spatial inter-

action and spatial heterogeneity. Dynamic features include individual time lag and spatial diffusion. The

interactive effects capture heterogeneous impacts of time effects on cross sectional units. The interactive

effects are treated as parameters, so as to allow correlations between the interactive effects and the regres-

sors. We consider a quasi-maximum likelihood estimation and show estimator consistency and characterize

its asymptotic distribution. The Monte Carlo experiment shows that the estimator performs well and the

proposed bias correction is effective. We illustrate the empirical relevance of the model by applying it to

examine the effects of house price dynamics on reverse mortgage origination rates in the US.

Keywords: Spatial panel, dynamics, multiplicative individual and time effects

JEL classification: C13, C23, C51

IThis version: 11/26/2015. We would like to thank the participants of the 2014 China Meeting of the Econometric Society atXiamen University, the 2014 Shanghai Econometrics Workshop at Shanghai University of Finance and Economics, the 2015 MEAAnnual Meeting, New York Camp Econometrics X, the 11th World Congress of the Econometric Society, and the Econometricsseminars at the Ohio State University for many valuable comments. We appreciate receiving valuable comments and suggestionsfrom referees, an associate editor and a coeditor of this journal.

Email addresses: [email protected] (Wei Shi), [email protected] (Lung-Fei Lee)

1. Introduction

Spatial interaction is present in many economic problems. When a state determines its tax rate, it

takes into account not only its domestic constituent, but also what neighboring states might do (e.g., Han

(2013)). Fluctuations in one industrial sector can spill to other sectors if the sectors are “close” in terms

of using similar production technology, using similar inputs, etc (e.g., Conley and Dupor (2003)). Early

contributions to spatial econometrics include Cliff and Ord (1973) and Anselin (1988). Kelejian and Prucha

(2010) examine the GMM for estimation of spatial models. Lee (2004a) establishes asymptotic properties

of the quasi-maximum likelihood (QML) estimator of a spatial autoregressive (SAR) model. A spatial

panel can take into account dynamics and control for unobserved heterogeneity. Dynamic spatial panel data

models with fixed individual and/or time effects where spatial effects appear as lags in time and in space

have been studied in Yu et al. (2008) and Lee and Yu (2010b). Su and Yang (2014) examine QML estimation

of dynamic panel data models with spatial effects in the errors. Lee and Yu (2013b) is a recent survey on

spatial panel data models.

On the other hand, individual units can be differently affected by common factors.1 In the example

of industrial sectors above, common factors like interest rate, demographic trend, etc., can simultaneously

affect various industrial sectors, but magnitudes of the impact can be different across sectors. A few possibly

unobserved common factors may drive much essential comovement between sectors although those sectors

are far from each other according to economic distance measures. Essentially, a factor induces a time fixed

effect which may affect individuals differently. Bai (2009) labels this an interactive effect.

In recent years, much progress has been made in the estimation and inference of panel data models with

interactive effects. When interactive effects are viewed as fixed parameters, the model can be estimated

by the nonlinear least squares (NLS) method involving principal components. Bai (2009) systematically

studies asymptotic properties of the NLS estimator. The estimation is iterative where in each iteration, slope

parameters are estimated given factors, and then factors are estimated by principal components given the

estimated slope parameters. Moon and Weidner (2015) show that, under additional assumptions, the limiting

distribution of the NLS estimator does not depend on the number of factors assumed in the estimation, as

long as it does not fall below the true number of factors. They analyze asymptotic properties of the NLS

estimator by perturbation method in Kato (1995), where the objective function is expanded involving its

1In the literature, “common shocks”, “common factors” or “factors” refer to the time varying factors. “Factor loading” quantifiesthe magnitude of the effect of the time varying factors on an individual. There can be multiple factors. In this paper, they are treatedas fixed parameters to be estimated. Because the main purpose of this paper is to consistently estimate slope coefficients, it is notnecessary to separately identify time factors from their loadings when they are concentrated out in the estimation.

1

approximated gradient vector and Hessian matrix. Assuming that factors and factor loadings are random

and factors also enter regressors linearly, Pesaran (2006) proposes the common correlated effects (CCE)

estimator. His idea is that variations due to factors can be captured by cross sectional averages of the

dependent and explanatory variables. Ahn et al. (2013) propose a GMM estimator for fixed T case. The

interactive effects can be eliminated by a transformation involving the factors and then GMM can be applied.

Spatial interaction and common factors are two specifications explored in the literature where individu-

als’ activities and outcomes are not independently distributed. In this paper, an individual can be influenced

by its neighbors’ actions or outcomes which is modeled by the SAR specification. Individuals are also ex-

posed to unobserved common factors which are modeled as interactive effects following Bai (2009). By

treating interactive effects as parameters to be estimated, this approach allows flexible correlation between

the interactive effects and the regressors. As a data generating process involves spatial spillovers and inter-

active effects, both should be taken into account for estimators to be consistent.

Existing literature on factor models (Bai (2003), Forni et al. (2004), Stock and Watson (2002)) ignores

spatial interactions. Literature on spatial interactions (Lee and Yu (2010a), Lee and Yu (2013b)) has not

considered unobserved interactive effects. To the best of our knowledge, there are only a few papers that

jointly model spatial correlation and interactive effects. Pesaran and Tosetti (2011) model different forms of

error correlation, including spatial error correlation and common factor, and show that the CCE estimator

continues to work well. Bai and Li (2014) consider a model with spatial correlation in the dependent variable

and common factors.

This paper jointly models spatial interactions and interactive effects in a panel data with large n and T .

We consider spatial interaction in the dependent variable where the degree of spatial correlation is of interest,

in which case the CCE estimator is not directly applicable. The spatial panel model under consideration is a

general dynamic spatial panel data model where spatial effects can appear both in the form of lags and errors.

In addition to contemporaneous spatial interaction, time lagged dependent variables, diffusion and spatially

correlated and heterogeneous disturbances are included to allow a rich specification of the state dependence

and guard against spurious spatial correlation. We do not impose specific structures on how interactive

effects affect the regressors. The interactive effects are treated as nuisance parameters and are concentrated

out in the estimation. Moon and Weidner (2015) show how to derive the approximated gradient vector and

the Hessian matrix of the concentrated NLS objective function of a regression panel. In spatial panel setting,

the log of the sum of squared residuals from the regression panel is a component of the likelihood function,

and we adapt their approach to that component. We provide conditions for identification and show that the

QML estimation method works well. The estimator is shown to be consistent and asymptotically normal.

2

Asymptotic biases of order 1√nT

exist due to incidental parameters, and a bias correction method is proposed.

The paper is organized as follows. Section 2 presents the model and discusses assumptions and iden-

tification. We prove consistency and derive the limiting distribution of the QML estimator in Section 3.

We then illustrate the empirical relevance of the theory by demonstrating the estimator’s good finite sample

performance and applying the model to analyze the effect of house price dynamics on reverse mortgage

origination rates in the U.S. Section 6 concludes. Proofs of the main results are collected in the appendix. A

supplementary file is available online which has detailed proofs on relevant useful lemmas.

In this paper, here are some essential notations. For a vector η , ‖η‖1 = ∑k |ηk| and ‖η‖2 =

√∑k |ηk|2.

Let µi(M) denote the i-th largest eigenvalue of a symmetric matrix M of dimension n with eigenvalues listed

in a decreasing order such that µn(M) ≤ µn−1(M) ≤ ·· · ≤ µ1(M). For a real matrix A, its spectral norm is

||A||2, i.e., ‖A‖2 =√

µ1 (A′A). In addition, ||A||1 = max1≤ j≤n ∑mi=1 |Ai j| is its maximum column sum norm,

||A||∞ = max1≤i≤m ∑nj=1 |Ai j| is its maximum row sum norm, and ‖A‖F =

√tr(AA′) is its Frobenius norm.

Denote the projection matrices PA = A(A′A)−1A′ and MA = I−PA. In cases where A might not have full

rank, we use (A′A)† to denote the Moore-Penrose generalized inverse of A′A. For a real number x, dxe is the

smallest integer greater than or equal to x. “wpa 1” stands for “with probability approaching 1”.

2. The Spatial Dynamic Panel Data (SDPD) Model with Interactive Effects

2.1. The Model

There are n individual units and T time periods. The SDPD model has the following specification,

Ynt = λWnYnt + γYn,t−1 +ρWnYn,t−1 +Xntβ +Γn ft +Unt , and Unt = αWnUnt + εnt , (1)

where Ynt is an n-dimensional column vector of observed dependent variables and Xnt is an n×(K−2) matrix

of exogenous regressors, so that the total number of variables in Yn,t−1, WnYn,t−1 and Xnt is K. The model

accommodates two types of cross sectional dependences, namely, local dependence and global (strong)

dependence. Individual units are impacted by potentially time varying unknown common factors ft , which

captures global (strong) dependence. The effects of the factors can be heterogeneous on the cross section

units, as described by the factor loading parameter matrix Γn. For example, in an earnings regression where

Ynt is the wage rate, each row of Γn may correspond to a vector of an individual’s skills and ft is the

skill premium which may be time varying. The number of unobserved factors is assumed to be a fixed

constant r that is much smaller than n and T .2 The matrix of n× r factor loading Γn and the T × r factors

2In many empirical studies, the number of factors is much smaller than the dimension of the dataset. For example, in Stock andWatson (2002), 6 factors are used to model 215 macroeconomic time series.

3

FT = ( f1, f2, · · · , fT )′ are not observed and are treated as parameters. The fixed effects approach is flexible

and allows unknown correlation between the common factor components and the regressors. The n× n

spatial weights matrices Wn and Wn in Eqs. (1) are used to model spatial dependences. The term λWnYnt

describes the contemporaneous spatial interactions. There are also dynamics in model (1). γYn,t−1 captures

the pure dynamic effect. ρWnYn,t−1 is a spatial time lag of interactions, which captures diffusion (Lee and

Yu (2013b)).3 The idiosyncratic error Unt with elements of εit being i.i.d. (0,σ2) also possesses a spatial

structure Wn, which may or may not be the same as Wn.

The specification in (1) is general which encompasses many models of empirical interest.

• Additive fixed individual and time effects:

Ynt = λWnYnt + γYn,t−1 +ρWnYn,t−1 +Xntβ +~ζn + `nξt + εnt , (2)

where~ζn =(

ζ1 ζ2 · · ζn

)′are individual effects, and ξt are time effects with `n =

(1 1 · · 1

)′.

Eq. (2) is a special case of Eq. (1) with Γn =

ζ1 ζ2 · · ζn

1 1 · · 1

′ and FT =

1 1 · · 1

ξ1 ξ2 · · ξT

′.• Spatial panel data model with common shocks by Bai and Li (2014):

Ynt = λWnYnt +Xntβ +~ζyn +Γyn ft + εnt ,

Xnt,k = ~ζxkn +Γxkn ft +νnt,k, k = 1, · · · ,K,

where Xnt,k is the k-th column of Xnt ,~ζyn =(

ζy1 ζy2 · · ζyn

)′and~ζxkn =

(ζxk1 ζxk2 · · ζxkn

)′are fixed effects, ft are r×1 common factors with loadings Γyn and Γxkn. While heteroscedasticity in

εnt cross i but invariant over t is allowed, their model is static and the common shocks are limited to

impact Xnt linearly. Pesaran (2006) considers the case with λ = 0 and heterogeneous coefficients.

Define An = S−1n (γIn +ρWn), where Sn = In−λWn. From Eq. (1), Ynt = AnYn,t−1 +S−1

n (Xntβ +Γn ft +Unt).

Continuous substitution gives Yn,t = ∑∞h=0 Ah

nS−1n (Xn,t−hβ +Γn ft−h +Un,t−h), assuming that the series con-

verges. With ‖An‖2 < 1,{

Ahn}∞

h=0 is absolutely summable and the initial condition Yn,0 does not affect the

asymptotic analysis when T → ∞. Lee and Yu (2013b) discuss the parameter space of γ , ρ and λ and reg-

ularity conditions that guarantee ‖An‖2 < 1. Let ϖni denote an eigenvalue of Wn and dni the corresponding

eigenvalue of An, we then have dni =γ+ρϖni1−λϖni

. If the spatial weights matrix Wn is row normalized from a

3In general, the spatial weights matrices for the contemporaneous spatial interactions and for the diffusion can be different.However it is straightforward to extend this paper’s analysis to such cases. QMLE is still consistent and asymptotically normalunder assumptions that will be introduced in Section 2.3. Assuming identical spatial weights simplifies the notation.

4

symmetric matrix (as in Ord (1975)),4 all eigenvalues of Wn are real. Furthermore, if tr(Wn) = 0, the con-

dition 1ϖn,min

< λ < 1 = ϖn,max implies that In−λWn is invertible, where ωn,min and ωn,max are, respectively,

the smallest and largest eigenvalues of Wn. Stationarity further requires that |dni| < 1 for all i. Lee and Yu

(2013b) show that with a row normalized Wn, the parameter space for ‖An‖2 < 1 can be characterized as a

region enclosed by four linear hyperplanes

Rs = {(λ ,γ,ρ) : γ +(ρ−λ )ϖn,min >−1,γ +ρ +λ < 1,γ +ρ−λ >−1,γ +(λ +ρ)ϖn,min < 1} .

Elements in Rs already imply that −1 < γ < 1 and 1ϖn,min

< λ < 1. Because |ϖn,i|< 1, a sufficient condition

for ‖An‖2 < 1 is |λ |+ |γ|+ |ρ|< 1.

2.2. Estimation Method

The parameters for the model (1) are θ = (δ ′,λ ,α)′ with δ = (γ,ρ,β ′)′, σ2, Γn and FT . It is convenient

to collect the predetermined variables and exogenous regressors by the n×K matrix Znt =(Yn,t−1,WnYn,t−1,Xnt),

where K = k+ 2, with k being the number of exogenous regressors in Xnt . Denote Sn(λ ) = In−λWn and

Rn(α) = In−αWn. The sample averaged quasi-log likelihood function is

QnT (θ ,σ2,Γn,FT ) =−

12

log2π− 12

logσ2 +

1n

log |Sn(λ )Rn (α) |

− 12σ2nT

T

∑t=1

(Sn(λ )Ynt −Zntδ −Γn ft)′Rn(α)′Rn(α)(S(λ )Ynt −Zntδ −Γn ft) . (3)

Concentrating out σ2 from the objective function (3) and dropping the overall constant term for simplic-

ity,

QnT (θ ,Γn,FT ) =1n

log |Sn(λ )Rn(α)|

− 12

log

(1

nT

T

∑t=1

(Sn(λ )Ynt −Zntδ −Γn ft)′Rn(α)′Rn(α)(Sn(λ )Ynt −Zntδ −Γn ft)

)

is a concentrated sample averaged log likelihood function of θ , Γn and FT . In view of the unobserved nature

of the common factor component, we shall make minimal assumptions on their structures. The number of

factors is set at r in the estimation, which does not necessarily equal to the true number of factors r0. Later

sections will show that consistency requires that r ≥ r0 while the results on limiting distribution require

r = r0. Because for our estimation method, no restriction is imposed on Γn and Rn(α) is assumed invertible

4This paper does not require Wn to be row normalized, see Assumption R2 which is a weaker condition. Although row normal-ized spatial weights matrix is convenient to work with, in some applications it is not appropriate. For example, in the analysis ofsocial interactions where the effect of network structure (e.g. centrality) is of interest or there are individuals who influence othersbut are not influenced by others, row normalization might not be appropriate, see Liu and Lee (2010).

5

for α in its parameter space, optimizing with respect to Γn ∈ Rn×r is equivalent to optimizing with respect

to the transformed Γn with Γn = Rn(α)Γn. The objective function can be equivalently written as

QnT (θ , Γn,FT ) =1n

log |Sn(λ )Rn(α)|

− 12

log

(1

nT

T

∑t=1

(Rn(α)(Sn(λ )Ynt −Zntδ )− Γn ft

)′ (Rn(α)(Sn(λ )Ynt −Zntδ )− Γn ft))

.

(4)

As the sample expands, the number of parameters in the factors and their loadings also increases. Be-

cause the parameter of interest is θ , we concentrate out factors and their loadings using the principal

component theory: minFT∈RT×r,Γn∈Rn×r tr((

HnT − ΓnF ′T)(

HnT − ΓnF ′T)′)

= minFT∈RT×r tr(HnT MFT H ′nT ) =

∑ni=r+1 µi (HnT H ′nT ) for an n×T matrix HnT . The concentrated log likelihood is

QnT (θ) = maxΓn∈Rn×r,FT∈RT×r

QnT(θ , Γn,FT

)=

1n

log |Sn(λ )Rn(α)|− 12

logLnT (θ), (5)

with LnT (θ) =1

nT ∑ni=r+1 µi

(Rn (α)

(Sn (λ )−∑

Kk=1 Zkδk

)(Sn (λ )−∑

Kk=1 Zkδk

)′Rn(α)′)

. The QML estima-

tor is θnT = argmaxθ∈Θ QnT (θ). The estimate for Γn can be obtained as the eigenvectors associated with

the first r largest eigenvalues of Rn (α)(Sn (λ )−∑

Kk=1 Zkδk

)(Sn (λ )−∑

Kk=1 Zkδk

)′Rn(α)′. By switching n

and T , the estimate for FT can be similarly obtained. Note that the estimated Γn and FT are not unique, as

ΓnHH−1F ′T is observationally equivalent to ΓnF ′T for any invertible r× r matrix H. However, the column

spaces of Γn and FT are invariant to H, hence the projectors MΓn

and MFT are uniquely determined.

2.3. Assumptions

The true values of θ , Γn and FT are denoted by θ0, Γn0 and FT 0. Note that the dimensions of Γn and

FT are n× r and T × r, and may not equal to the dimensions of Γn0 and FT 0 which are n× r0 and T × r0

respectively. Denote ε =(

εn1 εn2 · · εnT

), a n×T matrix.

Assumption E

1. The disturbances εit are independently distributed across i and over t with Eεit = 0, Eε2it = σ2

0 > 0 and

has uniformly bounded moment E |εit |4+η for some η > 0.

2. The disturbances in ε are independently distributed from regressors Xk and the factors F0T and Γ0n.

The disturbances of the model have a spatial structure. From Eq. (1), U = Rn(α0)−1ε . Its spatial hetero-

geneity is captured by Wn and coefficient α . In panels with factors, many estimation methods allow idiosyn-

cratic errors to be cross sectionally correlated and heteroskedastic in an unknown form, up to a degree (Bai

6

(2009), Pesaran (2006)). However, when a spatial autoregressive model is estimated by QML assuming ho-

moskedastic error, the QMLE is generally inconsistent if errors are in fact heteroskedastic but ignored (Lin

and Lee (2010)).5 In the current setting, consistency of the QMLE requires this stronger homoskedastic

assumption in ε . Latala (2005) show that, under Assumption E, ‖ε‖2 = OP

(√max(n,T )

).

To have more simplified notations, define n×n matrices Sn = Sn(λ0), Gn =WnS−1n , Rn = Rn(α0), Gn =

WnR−1n and n×T matrices ZK+1 =

(GnZn1δ0 · · GnZnT δ0

)=∑

Kk=1 δ0kGnZk, Y =

(Yn1 · · YnT

)and Y−1 =

(Yn0 · · YnT−1

). In the case of nonnormal disturbance, denote µ(3) = Eε3

it and µ(4) = Eε4it .

Assumption R

1. The parameter θ0 is in the interior of Θ, where Θ is a compact subset of RK+1. We use Θχ to denote

the parameter space for parameter χ , χ = λ ,α , etc.

2. The spatial weights matrices Wn and Wn are non-stochastic. Wn, S−1n , Wn and R−1

n are uniformly

bounded in absolute value in both row and column sums (UB). Sn(λ ) and Rn(α) are invertible for any

λ ∈Θλ and α ∈Θα . Furthermore, liminfn,T→∞ infλ∈Θλ|Sn(λ )|> 0 and liminfn,T→∞ infα∈Θα

|Rn(α)|>

0.

3. The elements of Xnt have uniformly bounded 4-th moments.

4. The number of factors is constant r0, and elements of Γn0 and FT 0 have uniformly bounded 4-th

moments.

5. ∑∞h=1 abs

(Ah

n)

is UB, where [abs(An)]i j =∣∣An,i j

∣∣. In addition, there exists a constant b < 1 and n0,

such that ‖An‖2 ≤ b for all n≥ n0.

6. n is a nondecreasing function of T . As T goes to infinity, so does n.6

Assumption R1 is standard. The sum ∑∞`=0 (λWn)

` is convergent if ‖λWn‖ < 1 for some norm ‖·‖.7 In

this situation, Sn(λ ) is invertible and Sn(λ )−1 = ∑

∞`=0 (λWn)

` is Neumann’s series. Therefore a sufficient

condition for the invertibility of Sn(λ ) is |λ |< 1‖Wn‖ . Similar properties hold for Rn(α).

2.4. Identification

The following Assumptions ID1 and ID2 are used to show that θ0, Γ0nF ′0T and the number of factors can

be uniquely recovered from the distribution of data on Y and Z. Their sample counterparts are the subsequent

5In Bai and Li (2014), disturbances can be heteroskedastic along the cross section but invariant over time. They are treatedas parameters. They show that the estimates of spatial correlation and slope coefficients are consistent as T → ∞. Consistentestimation methods remain to be seen if variances depend on individual explanatory variables across units and time.

6Nondecreasing function could be a constant function.7This condition does not depend on the type of norm used, since all norms in Rn×n are equivalent and therefore convergence of

a series does not depend on a specific type of norm.

7

Assumptions NC1 and NC2. Assumptions ID1 and NC1 require that Z1, · · · ,ZK+1 are linearly independent,

while Assumptions ID2 and NC2 relax this, but impose more restrictions on the variance structure. The

linear independence conditions can fail if GnZntδ0 is linearly dependent on Znt for all t = 1, · · · ,T . This can

happen in a pure SAR model with no regressors in which case δ0 = 0.

Assumption ID1

1. Let z =(

vec(Z1) · · · vec(ZK+1))

, which is an nT × (K +1) matrix of regressors. The (K +1)×

(K +1) matrix E(z′(MF0T ⊗

(Rn(α)′M

ΓnRn(α)

))z)

is positive definite for any α ∈Θα and Γn ∈Rn×r

with some r ≥ r0 where r0 is the true number of factors, MF0T = IT −F0T (F ′0T F0T )† F ′0T and M

Γn=

In− Γn(Γ′nΓn

)†Γ′n.

2. For any α 6= α0, Rn(α)′Rn(α) is linearly independent of R′nRn.

Assumption ID1(2) implies that 1n tr(R−1

n′Rn(α)′Rn(α)R−1

n)−∣∣R−1


n

∣∣ 1n > 0, by the inequal-

ity of arithmetic and geometric means. A sufficient condition for ID1(2) is that In, Wn +W ′n and W ′nWn are

linearly independent.8 Assumption ID1 generalizes those in Lee and Yu (2013a) on the identification of

spatial panel models with additive individual and time effects.

Assumption ID2

1. Let z=(

vec(Z1) · · · vec(ZK))

, which is nT×K. The K×K matrix E(z′(MF0T ⊗

(Rn(α)′M

ΓnRn(α)

))z)

is positive definite for any α ∈ Θα and Γn ∈ Rn×r with some r ≥ r0 where r0 is the true number of

factors, MF0T = IT −F0T (F ′0T F0T )† F ′0T and M

Γn= In− Γn

(Γ′nΓn

)†Γ′n.

2. For any λ ∈Θλ and α ∈Θα , if λ 6= λ0 or α 6= α0, then Sn(λ )′Rn(α)′Rn(α)Sn(λ ) is linearly indepen-

dent of S′nR′nRnSn.9

Assumption ID2(2) is equivalent to for any λ ∈Θλ and α ∈Θα , if λ 6= λ0 or α 6= α0,

1n

tr(R−1

n′S−1

n′Sn(λ )

′Rn(α)′Rn(α)Sn(λ )S−1n R−1

n)−∣∣R−1

n′S−1

n′Sn(λ )


n

∣∣ 1n > 0.

Assumption ID requires that regressors are not linearly dependent. For parameter identification, we need

the concentrated expected objective function to be uniquely maximized at the truth. We assume that the

8Let c1 and c2 be two scalars. For α 6= α0, c1Rn(α)′Rn(α) + c2R′nRn = (c1 + c2)In − (c1α + c2α0)(Wn +W ′n

)+ (c1α2 +

c2α20 )W

′nWn = 0 only if c1 = c2 = 0 because In, Wn +W ′n and W ′nWn are assumed to be linearly independent. Therefore for any

α 6= α0, Rn(α)′Rn(α) is linearly independent of R′nRn. Notice that Wn can be symmetric.9A sufficient condition is that the following 9 matrices are linearly independent, In, Wn +W ′n, Wn + W ′n, W ′n

(Wn +W ′n

)+(

Wn +W ′n)

Wn, W ′nWn, W ′nWn, W ′nWnWn +W ′nW ′nWn, W ′n(Wn +W ′n

)Wn and W ′nW ′nWnWn, for the case that Wn 6= Wn. In the event

that Wn =Wn, Assumption ID2(2) can only give local identification for λ0 and α0 in the sense that (λ0,α0) can not be distinguishedfrom (α0,λ0). The latter situation is similar to the identification issue of a pure spatial autoregressive with spatial error processYn = λWnYn +Un with Un = αWnUn + εn.

8

number of factors used in the concentrated expected objective function is not smaller than the true number

of factors. Given that the number of latent factors is small in many empirical applications, it is reasonable

to assume that an upper bound of the factor number is known. The estimated Γn and FT need not have full

column rank and the true number of factors, r0, can be recovered from the rank of ΓnFT , as the following

proposition shows.

Proposition 1. Under Assumptions E, R and ID1 (or ID2), θ0, Γ0nF ′0T and r0 are identified.

The proof is in Appendix B. Assumptions NC below are sample counterparts of Assumptions ID. They

are specifically needed for the consistency of the proposed estimator. They can be slightly weakened, but

will then involve the unobserved factors, as in Assumption A of Bai (2009).

Assumption NC1

1. There exists a positive constant b, such that minη∈BK+1,α∈Θα ∑ni=2r+1 µi

( 1nT Rn(α)(η ·Z)(η ·Z)′Rn(α)′

)≥

b > 0 wpa 1 as n, T → ∞, where BK+1 is the unit ball of the (K +1)−dimensional Euclidean space;

η is a (K + 1)× 1 nonzero vector with ‖η‖2 =√

η ′η = 1; η · Z ≡ ∑K+1k=1 ηkZk is a convex linear

combination of those n×T matrices Zk’s.

2. For any α ∈Θα , α 6= α0, liminfn,T→∞

(1n tr(R−1


n)−∣∣R−1


n

∣∣ 1n)> 0.

The assumption in NC1(1) requires no perfect collinearity between regressors and sufficient variations for

each regressor. Notice that this excludes constant regressors, because they are constant along i or t, and

∑ni=2r+1 µi


)is 0 for them. Such regressors include those that do not vary

over time, e.g., gender, race, and those that are common across the individuals, e.g., common time effects.

It is desirable to understand more about the condition in NC1(1) in terms of regressors implied by the

SAR model. Define n× (K +1) matrices Znt = (Znt,1, · · · ,Znt,K ,GnZntδ0) = (Znt ,GnZntδ0) for t = 1, · · · ,T ;

and the overall n× (K +1)T matrix ZnT = [Zn1, · · · ,ZnT ]. We have

Rn(α)(η ·Z)

=K+1

∑k=1

ηkRn(α)Zk = Rn(α)

(K

∑k=1

ηk

[Zn1,k Zn2,k · · · ZnT,k

]+ηK+1

[GnZn1δ0 GnZn2δ0 · · · GnZnT δ0

])

=Rn(α)[K

∑k=1

Zn1,kηk +(GnZn1δ0)ηK+1, · · · ,K

∑k=1

ZnT,kηk +(GnZnT δ0)ηK+1]

=Rn(α)[Zn1η Zn2η · · · ZnT η

]= Rn(α)ZnT (IT ⊗η),

where η = (η1, · · · ,ηK+1)′. So the NC1(1) condition concerns about the smallest (n− 2r) eigenvalues of

the n× n matrix 1nT Rn(α)(η · Z)(η · Z)′Rn(α)′ = 1

nT Rn(α)ZnT (IT ⊗η)(IT ⊗η ′)Z ′nT Rn(α)′ for each η ∈

9

BK+1 and α ∈ Θα . Because these matrices are nonnegative definite, their eigenvalues are nonnegative

but some can be zero. If there were an α ∈ Θα and η ∈ BK+1 with the n− 2r smallest eigenvalues of1

nT Rn(α)(η ·Z)(η ·Z)′Rn(α)′ being all zero, then the NC1 assumption would not be satisfied. So we need

some sufficient conditions to rule out such cases.

Proposition 2. If ∑ni=2r+1+KT µi(

1nT ZnT Z ′

nT )> 0, i.e., the sum of the smallest n−2r−1−KT eigenvalues

of 1nT ZnT Z ′

nT is positive, and µn (Rn(α)′Rn(α)) > 0 for all α ∈ Θα , with probability approaching 1 as

n,T → ∞, then Assumption NC1(1) is satisfied.

The proof is in Appendix B. In order for Proposition 2 to hold, it is necessary that ZnT has rank at

least as large as KT + 2r + 1, that in turn, requires (2r + 1+KT ) ≤ min{n,(K + 1)T} because ZnT is a

n× (K + 1)T matrix. As the problem under consideration has both n and T tend to infinity and r is finite,

the latter requires nT > K for large enough n and T .

The above analysis can be generalized to the case where GnZntδ0 is linearly dependent on Znt for all

t = 1, · · · ,T , such that GnZntδ0 = ZntC for a constant vector C. As pointed out preceding Assumption ID1,

this can happen in a pure SAR model. In this case, let η∗ = (η1, · · · ,ηK)′. Here,

Rn(α)η ·Z = Rn(α)[Zn1η∗+(GnZn1δ0)ηK+1, · · · ,ZnT η

∗+(GnZnT δ0)ηK+1] = Rn(α)[Zn1, · · · ,ZnT ](IT ⊗ η),

where η = η∗+ηK+1C. The previous result can now be applied to the n×KT matrix Z ∗nT = [Zn1, · · · ,ZnT ].

In such case, we have an alternative set of conditions that guarantee consistency, as follows.

Assumption NC2

1. Suppose that GnZntδ0 = ZntC for a constant vector C for all t = 1, · · · ,T . There exists a positive

constant b, such that minη∈BK ,α∈Θα ∑ni=2r+1 µi


)≥ b > 0 wpa 1 as n,

T →∞, where BK is the unit ball of the K−dimensional Euclidean space; η is a K×1 nonzero vector

with ‖η‖2 =√

η ′η = 1; η ·Z ≡ ∑Kk=1 ηkZk.

2. For any λ ∈Θλ and α ∈Θα , if λ 6= λ0 or α 6= α0,

lim infn,T→∞

(1n

tr(R−1

n′S−1

n′Sn(λ )


n)−∣∣R−1

n′S−1

n′Sn(λ )


n

∣∣ 1n

)> 0.

When GnZntδ0 is linearly dependent on Znt for all t, NC1(1) will not be satisfied. The additional condition

(2) of NC2 on the variance structure will make up for it, as will be clear in Proposition 3 below.

10

3. Asymptotic Theory

3.1. Consistency

Standard argument for consistency of an extremum estimator consists of showing that, for any τ >

0, limsupn,T→∞

(maxθ∈Θ(θ0,τ)

EQnT (θ)−EQnT (θ0))< 0, where Θ(θ0,τ) is the complement of an open

neighborhood of θ0 in Θ with radius τ (i.e., identification uniqueness); and QnT (θ)−EQnT (θ) converges

to zero uniformly on its parameter space Θ.

In many situations, the objective function with a finite number of parameters contains averages, and

LLN follows with regularity assumptions (e.g. Amemiya (1985)). With an additional smoothness condition,

uniform LLN would also follow (Andrews (1987)). In our model, the concentrated likelihood function (Eq.

(5)) involves sum of certain eigenvalues of a random matrix and is not in the direct form of sample averages.

Furthermore, the number of parameters increases to infinity as n (and T ) tends to infinity. It turns out that

for consistency proof, it is relatively easier to work with the objective function without concentrating out Γn

and FT . The idea is to respectively find a lower bound and an upper bound of the objective function, and

then to show that the former is strictly greater than the latter for any θ that is outside the τ−neighborhood

of θ0 for any τ > 0, as n and T increase. Since we are maximizing the objective function, the upper bound

at θ must be not smaller than the lower bound at θ0, which implies that the distance between θnT and θ0

is collapsing to 0 as n and T increase. Lemma 1 of Wu (1981) demonstrates this idea formally. Using this

method, Moon and Weidner (2015) show consistency of an NLS estimator for a regression panel model with

interactive effects. We adapt these arguments to the spatial panel setting.

Proposition 3. Under Assumptions NC1 (or NC2), E and R, and assuming that the number of factors is not

underestimated, i.e., r ≥ r0, then∥∥θnT −θ0

∥∥1 = oP(1).

The proof of Proposition 3 is in Appendix B. For consistency, we do not need to know the exact number

of factors. It is enough that the true number of factors is less than or equal to the number of factors assumed

in estimation. Intuitively, if the number of factors used is not fewer than the true number of factors, variations

due to factors can be accounted for. This feature has been observed in Moon and Weidner (2015) for the

panel regression model. This property turns out to hold also for spatial panels. A step in the consistency

proof is based on the following inequality (Eq. B.9),

QnT (θ0) =1n

log |Sn|−12

log

(min

Γn∈Rn×r,FT∈RT×r

1nT

T

∑t=1

(Γ0n f0t +Unt −Γn ft)′R′nRn (Γ0n f0t +Unt −Γn ft)

)

≥ 1n

log |Sn|−12

log

(1

nT

T

∑t=1

n

∑i=1

ε2it

),

11

which together with an upper bound of QnT(θnT)

justifies the condition of Eq. (B.10) for consistency. This

inequality trivially holds if the true number of factors is at most r, but might not hold if the number of factors

in the estimation is smaller than the number of true factors.

3.2. Limiting Distribution

In this section, we derive the limiting distribution of the QML estimator θnT . Recall that Γn = RnΓn.

Because FT and Γn are concentrated out in estimation, only the true factor and loading will be needed via

the analysis of first and second order derivatives of the concentrated objective function. Therefore, as no

confusion will arise, in subsequent sections FT and Γn refer to the true factor and loading with the subscript

’0’ omitted for simplicity. For the consistency of θnT , we do not need to make limiting assumptions on Γn

and FT . However, for the limiting distribution of θnT , additional assumptions on limiting behaviors of Γn

and FT are needed.

Assumption SF

The number of factors, r0, is constant and known. plimn,T→∞1n Γ′nΓn = Γ and plimn,T→∞

1T F ′T FT = F

exist and are positive definite.

The above assumption implies that for large enough n and T , all the eigenvalues of F and Γ are bounded

away from zero and are bounded from above. For consistency, it is not necessary that the true number of

factors is known, as long as it is constant and not larger than the number of factors specified in estimation.

But for deriving the limiting distribution, the number of factors needs to be exact in order for asymptotic

analysis to be tractable.10

Define d2min(Γn,FT )=

1nT µr0(ΓnF ′T FT Γ′n) and d2

max(Γn,FT )=1

nT µ1(ΓnF ′T FT Γ′n). Notice that 1nT ΓnF ′T FT Γ′n

has at most r0 positive eigenvalues. As a consequence of Assumption SF, plimn,T→∞d2min(Γn,FT ) > 0 and

plimn,T→∞d2max(Γn,FT )< ∞.11 The total variation in Y is 1

nT tr(YY ′), and its component 1nT tr

(ΓnF ′T FT Γ′n

)is

due to common factors. Assumption SF guarantees that each of the r0 factors has a nontrivial contribution

towards 1nT tr

(ΓnF ′T FT Γ′n

). Similar assumption is in Bai (2003, 2009). Moon and Weidner (2015) labels this

“strong factor assumption”.

In deriving the limiting distribution of θnT , we need to express LnT (θ) around θ0, where LnT (θ) =

10In Moon and Weidner (2015), they show that with additional assumptions on regressors and error distribution, the additionalterm does not change the limiting distribution of the estimator. However, those additional assumptions are rather strong. In spatialmodels, relevant assumptions remain to be seen.

11This is so because the n× n matrix 1nT ΓnF ′T FT Γ′n and the r0× r0 matrix 1

n Γ′nΓn1T F ′T FT have the same nonzero eigenvalues,

counting multiplicity. For large n and T , d2max(Γn,FT ) = µ1(

1n Γ′nΓn

1T F ′T FT ) ≤ µ1

( 1n Γ′nΓn

)µ1( 1

T F ′T FT)< ∞, and d2

min(Γn,FT ) =

µr0

( 1n Γ′nΓn

1T F ′T FT

)≥ µr0

( 1n Γ′nΓn

)µr0

( 1T F ′T FT

)> 0. See Theorem 8.12 (2) in Zhang (2011), which shows that for Hermitian and

positive semidefinite n×n matrices A and B, µi(A)µn(B)≤ µi(AB)≤ µi(A)µ1(B).

12

1nT ∑

ni=r0+1 µi

(Rn(α)

(Sn(λ )Y −∑

Kk=1 Zkδk

)(Sn(λ )Y −∑

Kk=1 Zkδk

)′Rn(α)′)

. The perturbation theory of lin-

ear operators is used. The technical details of perturbation are in Appendix C and the supplementary file.

We now provide the limiting distribution of θnT . The detailed proofs are in Appendix D. Let CnT

denote the sigma algebra generated by Xn1, · · ·XnT , Γn and FT . Define the n×T matrices Zk = E(Zk|CnT ),

Zk ≡ MΓn

RnZkMFT +MΓn

Rn (Zk− Zk), k = 1, · · · ,K + 1 and the n×T matrix of lagged disturbances εh =(εn,1−h · · εn,T−h

), h ≥ 1, where we drop subscripts n and T for those matrices for simplicity. Using

the reduced form of the dynamic equation in (1), we have

Z1− Z1 = ∑∞h=1 Ah−1

0n S−1n R−1

n εh, Z2− Z2 =Wn ∑∞h=1 Ah−1

0n S−1n R−1

n εh,

Zk− Zk = 0 for k = 3, · · · ,K, ZK+1− ZK+1 = (γ0Gn +ρ0GnWn)∑∞h=1 Ah−1

0n S−1n R−1

n εh.(6)

Theorems 1 and 3 characterize the asymptotic distribution, asymptotic bias and asymptotic variance of θnT .

Theorem 1. Assume that nT → κ2 > 0 and Assumptions NC1 (or NC2), E, R and SF hold, then

√nT (θnT −θ0)−

(σ

20 DnT

)−1ϕnT =

(σ

20 DnT

)−1 1√nT

νnT +oP(1), (7)

where DnT is defined in Eq. (9) and is assumed to be positive definite, ϕnT =(ϕnT,γ ,ϕnT,ρ ,01×(K−2),ϕnT,λ ,ϕnT,α

)′with ϕnT,γ =−

σ20√nT ∑

T−1h=1 tr

(J0PFT J′h

)tr(Ah−1

n S−1n), ϕnT,ρ =− σ2

0√nT ∑

T−1h=1 tr

(J0PFT J′h

)tr(WnAh−1

n S−1n), ϕnT,λ =

− σ20√nT ∑

T−1h=1 tr

(J0PFT J′h

)tr((γGn +ρGnWn)Ah−1

n S−1n)+√

Tn σ2

0( r0

n tr(Gn)− tr(P

ΓnRnGnR−1

n))

and ϕnT,α =√Tn σ2

0( r0

n tr(Gn)− tr

(P

ΓnGn))

. Jh =(0T×(T−h), IT×T ,0T×h

)′, IT×T is the T ×T identity matrix, and

νnT =(tr(Z1ε

′) , · · · , tr(ZKε′) ,

tr(ZK+1ε

′)+ 1√nT

tr(RnGnR−1

n εε′)− 1√

nTtr(εε′) 1

ntr(Gn) ,

1√nT

tr(Gnεε

′)− 1√nT

tr(εε′) 1

ntr(Gn))′

.

To derive the joint distribution of vnT , Cramér-Wold device can be used. Let c ∈ RK+2,

c′νnT = tr

(K+1

∑k=1

ckZkε′

)+ cK+1

(tr(RnGnR−1

n εε′)− 1

ntr(Gn) tr

(εε′))+ cK+2

(tr(Gnεε

′)− 1n

tr(Gn)

tr(εε′))

= vec

(K+1

∑k=1

ckZk

)′vec(ε)+vec(ε)′ cK+1

(IT ⊗

(RnGnR−1

n + Gn−1n

tr(Gn + Gn

)))vec(ε)

= bcnT′vec(ε)+ω

cnT′vec(ε)+vec(ε)′Ac

nT vec(ε) ,

where bcnT = ∑

K+1k=1 ckvec

(M

ΓnRnZkMFT

), ωc

nT = vec(∑

∞h=1 Pc

nhεh)

with Pcnh = Bc

nAh−10n S−1

n R−1n , and

Bcn = M

ΓnRn (c1In + c2Wn + cK+1(γ0Gn +ρ0GnWn)) , and Ac

nT =12(Ac1

nT +Ac1nT′) is an nT ×nT symmetric matrix with

Ac1nT = cK+1HK+1 + cK+2HK+2, HK+1 = IT ⊗

(RnGnR−1

n −1n

tr(Gn)

), and HK+2 = IT ⊗

(Gn−

1n

tr(Gn))

.

13

Under Assumptions R and SF, elements of bcnT have uniformly bounded 4-th moments; Ac

nT , Bcn, ∑

∞h=1 abs

(Ah−1

0n

),

S−1n and R−1

n are UB. Together with Assumption E, the CLT of the martingale difference array for linear-

quadratic form (Kelejian and Prucha (2001) and Yu et al. (2008) Lemma 13) are applicable.

Theorem 2. Under Assumptions E, R and SF, Ec′vnT = σ20 tr(Ac

nT ) = 0,

var(c′νnT

)=T σ

40Etr

(∞

∑h=1

Pcnh′Pc

nh

)+σ

20Ebc

nT′bc

nT +2σ40 tr(Ac

nT2)+2µ

(3)EnT

∑i=1

[bcnT ]i[A

cnT ]ii +

(µ(4)−3σ

40

) nT

∑i=1

[AcnT ]

2ii

=nT σ40 c′ (DnT +ΣnT +oP(1))c, (8)

where the (K +2)× (K +2) matrix DnT is

DnT =1

nT σ20X ′

nT XnT +

0 · · · 0 0...

......

0 · · · 0 0 00 · · · 0 ψK+1,K+1 ψK+1,K+20 · · · 0 ψK+1,K+2 ψK+2,K+2

, (9)

where XnT =(X1 · · · XK+1 0

), with Xk = vec

(M

ΓnRnZkMFT

), k = 1, · · · ,K +1;

ψK+1,K+1 =1n tr(RnGnR−1

n R−1n′G′nR′n

)+ 1

n tr(G2n)−2

(1n tr(Gn)

)2, ψK+1,K+2 =

1n tr(GnGn

)+ 1

n tr(RnGnR−1

n G′n)−

2n2 tr(Gn)tr(Gn) and ψK+2,K+2 =

1n tr(GnG′n

)+ 1

n tr(G2n)−2

(1n tr(Gn))2

; and furthermore

ΣnT =µ(3)

σ40

Σ

1,K+1nT,A Σ

1,K+2nT,A

0K×K...

...Σ

K,K+1nT,A Σ

K,K+2nT,A

Σ1,K+1nT,A · · · Σ

K,K+1nT,A 2Σ

K+1,K+1nT,A Σ

K+1,K+2nT,A

Σ1,K+2nT,A · · · Σ

K,K+2nT,A Σ

K+1,K+2nT,A 0

+µ(4)−3σ4

0

σ40

0 0

0K×K...

...0 0

0 · · · 0 ΣK+1,K+1nT,B Σ

K+1,K+2nT,B

0 · · · 0 ΣK+1,K+2nT,B Σ

K+2,K+2nT,B

,

(10)

where Σk1,k2nT,A = 1

nT ∑nTi=1[vec

(M

ΓnRnZk1MFT

)]iHk2,ii for k1 = 1, · · · ,K+1 and k2 =K+1 or K+2; Σ

K+1,K+1nT,B =

1nT ∑

nTi=1 (HK+1,ii)

2; ΣK+2,K+2nT,B = 1

nT ∑nTi=1 (HK+2,ii)

2; and ΣK+1,K+2nT,B = 1

nT ∑nTi=1 HK+1,iiHK+2,ii.

Theorem 3. Assume that Tn → κ2 > 0; D = plimn,T→∞DnT is positive definite; Σ = p limn,T→∞ ΣnT ; ϕ =

p limn,T→∞ ϕnT ; and suppose that Assumptions NC1 (or NC2), E, R and SF hold, then√

nT(θnT −θ0

)−(

σ20 D)−1

ϕd→ N

(0, D−1 (D+Σ)D−1

).

Theorem 3 shows that the limiting distribution of θnT may not center at θ0, with an asymptotic bias term(σ2

0 D)−1

ϕ . For a regression panel with factors, Moon and Weidner (2014) show that leading order biases

are due to the correlation between the predetermined regressors with the disturbances, and heteroskedastic

disturbances. In our model, the biases arise from the predetermined regressors and the interaction between

14

the spatial effects and the factor loadings. As the time factors and loadings contain infinite number of

parameters when sample sizes n and T go to infinity, the asymptotic bias must be due to the incidental

parameter problem. In the next subsection, a bias corrected estimator will be proposed. The variance matrix

in the limiting distribution has a sandwich form to accommodate possible non-normal disturbances. When

disturbances in the model are normally distributed, Σ = 0 and the limiting variance will be D−1 in a single

matrix form.

3.3. Bias Correction

This section proposes a method to correct the bias ϕ in the limiting distribution. The bias depends on θ ,

PΓn

, PFT , σ20 and D−1. The parameter θ can be estimated by θnT . The projectors P

Γnand PFT can be estimated

as follows. From Eq. (5), let BΓn

denote the n× r0 matrix of the eigenvectors associated with the largest r0

eigenvalues of 1nT Rn(α)

(Sn

(λ

)Y −∑

Kk=1 Zkδk

)(Sn

(λ

)Y −∑

Kk=1 Zkδk

)′Rn(α)′, where the subscripts nT

of parameter estimates are dropped for simplicity, then PˆΓn

= BΓn

B′Γn

, M ˆΓn

= In−PˆΓn

. Interchanging the role

of n and T , the factors PFTand MFT

can be similarly estimated. Let BFT denote the T×r0 matrix of the eigen-

vectors associated with the largest r0 eigenvalues of 1nT

(Sn

(λ

)Y −∑

Kk=1 Zkδk

)′Rn(α)′Rn(α)

(Sn

(λ

)Y −∑

Kk=1 Zkδk

),

and PFT= BFT B′FT

, MFT= IT −PFT

. Lemma 11 in Appendix C shows that under the assumptions of Theorem

3,∥∥∥M ˆ

Γn−M

Γn

∥∥∥2=∥∥∥Pˆ

Γn−P

Γn

∥∥∥2= OP(

1√n) and

∥∥MFT−MFT

∥∥2 =

∥∥PFT−PFT

∥∥2 = OP(

1√n).

The variance σ20 can be estimated by

σ2 = LnT (θnT ) =

1nT

n

∑i=r0+1

µi

(Rn(α)

(Sn(λ )Y −

K

∑k=1

Zkδk

)(Sn(λ )Y −

K

∑k=1

Zkδk

)′Rn(α)′

). (11)

Finally, DnT is an estimate of DnT (see Eq. (9)) with estimated θnT , PˆΓn

, PFTand σ2, and hence DnT also

estimates D. The bias ϕ can then be estimated by plugging in these estimated elements. The following

theorem shows that the bias corrected estimator θ cnT is asymptotically normal and centered at θ0.

Theorem 4. Assume that Tn → κ2 > 0; D = plimn,T→∞DnT is positive definite; Σ = limn,T→∞ ΣnT ; and

suppose that Assumptions NC1 (or NC2), E, R and SF hold, the bias corrected estimator is θ cnT = θnT −(

σ2nT DnT

)−1 1√nT

ϕnT . Then√

nT(θ c

nT −θ0) d→ N

(0, D−1 (D+Σ)D−1

).

Section 4 investigates the finite sample performance of the QML estimators in a Monte Carlo study.

3.4. The number of factors

As long as the number of factors specified is no fewer than the true number of factors, the QML estimator

is consistent. However, the limiting distribution is under the premise that the number of factors is correctly

specified. Although Moon and Weidner (2015) show that, under certain conditions, the limiting distribution

15

of the NLS estimator for a regression panel is invariant to the inclusion of redundant factors, its finite sample

performance may suffer, as Lu and Su (2015) emphasize. If factors are interpreted as omitted variables, their

detection is a first step in trying to measure them. In this section, we demonstrate how the factor number

can be consistently determined. Denote θnT the QML estimator with r ≥ r0, which is a preliminary con-

sistent estimator using a large number of factors. The residuals DnT = Rn (α)(

Sn

(λ

)Y −∑

Kk=1 Zkδk

)have

approximate factor structures as in Bai and Ng (2002). To see this, using the notation of Eq. (C.1), we

have DnT = ΓnF ′T +ε + EnT (θ), with EnT (θ) = ∑K+2k=1 ηkVk +∑

K+2k1,k2=1 ηk1 ηk2Vk1k2 . Several criterion on factor

number selection have been proposed in the literature, including Bai and Ng (2002)’s PC and IC criterion,

Onatski (2010)’s Edge Distribution estimator, and Ahn and Horenstein (2013)’s eigenvalue ratio tests. We

specifically show how Ahn and Horenstein (2013)’s eigenvalue ratio criterion is used here. Denote µnT,0 =

V (0)/ log(min(n,T )), µnT,k = µk( 1

nT DnT D′nT)

for k≥ 1, V (k) =∑nj=k+1 µnT, j for k≥−1. Define the “eigen-

value ratio” statistic, ER(k) = µnT,kµnT,k+1

, and the “growth ratio” statistic, GR(k) = log(

V (k−1)V (k)

)/ log

(V (k)

V (k+1)

).

The number of factors can be selected according to kER = max0≤k≤kmax ER(k) or kGR = max0≤k≤kmax GR(k)

where kmax is a pre-specified constant.

Theorem 5. Assuming that Assumptions E, R and SF hold, limn,T→∞nT → κ > 0, the preliminary estimator∥∥θnT −θ0

∥∥2 = op

(n−

12

), r0 ≥ 0, then limn,T→∞ Pr

(kER = r0

)= limn,T→∞ Pr

(kGR = r0

)= 1.

Therefore the number of factors can be determined consistently. The proof, which is in Appendix D,

checks that the relevant assumptions of Ahn and Horenstein (2013) are satisfied and their result then applies

here. Note that the case with no factors (r0 = 0) is covered.

4. Monte Carlo simulations

4.1. Design

We study the finite sample performance of the QML estimator and the accuracy of the factor number

selection in samples of different sizes, different degrees of spatial interaction, and different ratios between

the variances of the idiosyncratic error and the factors. We also illustrate finite sample biases due to mis-

specification when the DGP has either spatial or interactive effects but they are ignored.

The DGP is Ynt = λ0WnYnt + γ0Yn,t−1 + ρ0WnYn,t−1 +Xntβ0 +Γ0n f0t +Unt , with Unt = α0WnUnt + εnt .

The dependent variable is affected by 2 unobserved factors F0T =(

f01 f02 · · · f0T

)′which is T × 2

and their n× 2 loadings matrix Γ0n =(

γ01 γ02 · · · γ0n

)′. Xnt =

(Xnt,1 Xnt,2

)is a n× 2 matrix of

two regressors, which are generated according to Xnt,1i = 0.25(γ ′0i f0t +(γ ′0i f0t)

2 + `′γ0i + `′ f0t)+ηit,1 and

Xnt,2i =ηit,2, where `=(

1 1)′

. Elements of γ0i, f0t , ηit,1 and ηit,2 are generated from independent standard

normal variables. We see that Xnt,1i is correlated with the common component γ ′0i ft , its square, and the

16

factors and loadings separately. Xnt,2i is not affected by the factors. The spatial weights matrix Wn is

generated from a rook matrix. Individual units are arranged row by row on an√

n×√

n chessboard where

neighbors are defined as those who share a common border. Units in the interior of the chessboard have 4

neighbors, and units on the border and corner have respectively 3 and 2 neighbors. This design of the spatial

weights matrix is motivated by the observation that regions in most observed regional structures have similar

connectivity as units in the rook matrix. Define n×n matrix Mn, such that Mn,i j = 1 if and only if individuals

i and j are neighbors on the chessboard, and Mn,i j = 0 otherwise. Then the spatial weights matrix Wn is

defined as a row normalized Mn. εit’s are generated from independent normal (0,ϑ) distributions. Elements

of the common factor components and the idiosyncratic errors have the same variances when ϑ = 1, and

the latter has more variation when ϑ > 1. For each Monte Carlo experiment, Xnt , Γ0n and F0T are generated

according to the above specification for T +1000 time periods and the last T periods are taken as our sample.

4.2. Monte Carlo results

1000 Monte Carlo replications are carried out for each design. The numerical maximization routine

starts at multiple values, because the objective function is not concave and multiple local maxima might

exist. The baseline specification is θ a0 = (0.3,0.3,0.3,0.3,1,1) for θ = (λ ,γ,ρ,α,β1,β2). We then consider

the cases with negative spatial interaction (θ b0 = (−0.3,0.3,0.3,0.3,1,1)), no spatial correlation in distur-

bances (θ c0 = (0.3,0.3,0.3,0,1,1)), and no contemporaneous spatial interaction (θ d

0 = (0,0.3,0.3,0.3,1,1)).

We use combinations of n = 25,49,81 and T = 25,49,81. The Monte Carlo designs cover panels with small

and moderate n and T . Table 1 reports the Monte Carlo results for the QML estimator. The magnitude of

biases generally decreases as n and T increase. The coverage probability (CP) is calculated using the asymp-

totic variance covariance matrix and the nominal coverage probability is set at 95%. The estimates for α

have noticeable biases in finite sample, and as a result, their CPs are well below 95%. The CPs for other pa-

rameter estimates are also generally below 95% and therefore hypothesis tests will have over-rejection. The

Monte Carlo results of the bias corrected estimator are reported in Table 2. The biases have been reduced

significantly, especially for α . The CPs also improve which indicate a more reliable statistical inference

based on the bias corrected estimator. Due to limited space, Monte Carlo results for θ c0 and θ d

0 are reported

in the supplementary file.

Moon and Weidner (2015) show that in regression panel, the limiting distribution of the QML estimator

does not change when the number of factors is overspecified. Such a result is not available for the spatial

panel considered in this paper, but consistency is still possible as argued. In Table 3, we report the perfor-

mance of the bias corrected estimators when the number of factors is over-specified by 1, i.e. r = 3 while

the true r0 = 2. Although estimates are still consistent, α has noticeable bias in small sample that is not

17

removed by the bias correction procedure. Tables S.5 and S.6 in the supplementary file report additional

Monte Carlo results with more redundant factors. As the number of redundant factors increases, the CP

deteriorates. Note that the biases and CP improve in large samples (e.g., n = 81 and T = 81), and this is

consistent with the results of Moon and Weidner (2015). Therefore for valid inference in small sample, it is

important that a correct number of factors is chosen. The estimators are less sensitive to redundant factors

in large samples.

We check the accuracy of factor selections given by the eigenvalue ratio (ER) and growth ratio (GR)

criteria. Figure 1 shows the number of incorrect selection in 1000 simulations. The accuracy is almost

100% when the variances of the idiosyncratic error and the factors are the same (ϑ = 1), even in small

sample (n = 25, T = 25). We then make the idiosyncratic error to have up to 9 times more variation than

the factors so factors are relatively weaker, and find that the selection errors increase as a result. However,

the selection accuracy quickly improves as sample size increases, and close to 100% accuracy is achieved

in the sample with n = 81 and T = 81. The ER and GR criteria have similar overall performances, and GR

criterion performs slightly better when the factors are weak (high ϑ ).

For misspecification issues, Tables S.3 and S.4 in the supplementary file show that estimators might not

be consistent as the biases are substantial if factors or spatial effects are ignored in the estimation but in fact

they exist. Such biases are rather severe for the estimates ρ and α .

5. Empirical application: spatial spillovers in mortgage originations

Our empirical application is motivated by Haurin et al. (2014), which analyzes the effect of house prices

on state-level origination rates of the Home Equity Conversion Mortgage (HECM), but does not consider

spatial spillovers. HECM is the predominant type of reverse mortgages which enable senior homeowners

to withdraw their home equity without home sale or monthly payments. HECMs are insured and regulated

by the federal government, although the private market originates the loans. The insurance is provided by

the Federal Housing Administration through the mutual mortgage insurance fund, which guarantees that the

borrower can have access to the loan fund in the future even when the lender is no longer in business, and

the lender can be fully repaid when the loan terminates even if the house value is less than the loan balance.

The borrower pays mortgage insurance premium both at loan closing and monthly over the lifetime of the

loan. Haurin et al. (2014) find that states with past volatile house prices and current house price levels above

long term norms have higher origination rates. This is consistent with the hypothesis that households use

HECMs to insure against house price declines and therefore the mortgage insurance should take into account

this behavioral response to house price dynamics, as the insurance fund will face higher claim risk when the

18

Table 1: Performance of the QML Estimator

θ0 n T λ γ ρ α β1 β2

θ a0 25 25 Bias -0.00023 -0.00135 0.00007 0.01910 0.00173 -0.00137

CP 0.921 0.920 0.909 0.869 0.929 0.91925 49 Bias 0.00026 -0.00076 -0.00049 0.02024 -0.00024 -0.00038

CP 0.921 0.927 0.936 0.859 0.918 0.91925 81 Bias 0.00074 -0.00047 -0.00039 0.02142 0.00057 -0.00093

CP 0.933 0.931 0.924 0.830 0.932 0.93349 25 Bias -0.00078 -0.00126 0.00103 0.00940 -0.00020 -0.00080

CP 0.934 0.918 0.936 0.907 0.920 0.94049 49 Bias 0.00015 -0.00022 -0.00017 0.01268 -0.00054 0.00017

CP 0.946 0.938 0.942 0.890 0.943 0.93849 81 Bias -0.00037 -0.00000 0.00001 0.01080 0.00070 0.00087

CP 0.944 0.934 0.944 0.901 0.939 0.93781 25 Bias -0.00031 0.00014 -0.00038 0.00526 0.00043 -0.00072

CP 0.932 0.937 0.933 0.914 0.919 0.94481 49 Bias 0.00013 0.00019 -0.00074 0.00488 0.00017 0.00058

CP 0.944 0.935 0.955 0.937 0.929 0.92781 81 Bias -0.00037 -0.00017 0.00028 0.00692 0.00025 -0.00006

CP 0.926 0.945 0.924 0.905 0.941 0.957θ b

0 25 25 Bias 0.00238 -0.00173 -0.00118 0.02102 0.00194 -0.00087CP 0.915 0.920 0.923 0.857 0.929 0.921

25 49 Bias 0.00159 -0.00139 -0.00189 0.02106 -0.00003 -0.00001CP 0.928 0.922 0.919 0.861 0.911 0.917

25 81 Bias 0.00253 -0.00038 -0.00042 0.02106 0.00101 -0.00032CP 0.921 0.922 0.917 0.837 0.933 0.926

49 25 Bias 0.00108 -0.00141 -0.00039 0.00967 -0.00025 -0.00055CP 0.917 0.920 0.943 0.912 0.924 0.937

49 49 Bias 0.00133 -0.00030 -0.00057 0.01255 -0.00037 0.00050CP 0.944 0.931 0.929 0.896 0.950 0.934

49 81 Bias 0.00036 -0.00020 -0.00037 0.01064 0.00066 0.00100CP 0.944 0.936 0.951 0.902 0.945 0.946

81 25 Bias 0.00020 -0.00024 -0.00109 0.00587 0.00036 -0.00069CP 0.949 0.928 0.938 0.910 0.928 0.943

81 49 Bias 0.00081 0.00002 -0.00057 0.00482 0.00017 0.00078CP 0.934 0.945 0.952 0.920 0.931 0.929

81 81 Bias 0.00011 -0.00024 0.00003 0.00680 0.00019 -0.00000CP 0.921 0.944 0.929 0.910 0.936 0.954

θ a0 = (0.3,0.3,0.3,0.3,1,1), θ b

0 = (−0.3,0.3,0.3,0.3,1,1), and θ = (λ ,γ,ρ,α,β1,β2). ϑ = 1.

19

Table 2: Performance of Bias Corrected QML Estimator


θ a0 25 25 Bias -0.00037 -0.00124 0.00010 0.00040 0.00177 -0.00136

CP 0.921 0.928 0.910 0.905 0.931 0.91925 49 Bias 0.00019 -0.00075 -0.00045 0.00159 -0.00023 -0.00039

CP 0.920 0.929 0.936 0.919 0.919 0.92025 81 Bias 0.00067 -0.00046 -0.00034 0.00199 0.00057 -0.00094

CP 0.929 0.934 0.923 0.934 0.933 0.93349 25 Bias -0.00082 -0.00114 0.00097 -0.00026 -0.00016 -0.00078

CP 0.928 0.931 0.936 0.925 0.922 0.94149 49 Bias 0.00015 -0.00023 -0.00017 0.00258 -0.00053 0.00018

CP 0.942 0.949 0.944 0.936 0.945 0.93849 81 Bias -0.00038 -0.00001 0.00003 0.00043 0.00070 0.00087

CP 0.944 0.937 0.946 0.950 0.938 0.93781 25 Bias -0.00029 0.00015 -0.00042 -0.00077 0.00048 -0.00070

CP 0.934 0.944 0.937 0.921 0.921 0.94481 49 Bias 0.00015 0.00016 -0.00073 -0.00141 0.0001 0.00058

CP 0.944 0.944 0.955 0.937 0.929 0.92781 81 Bias -0.00038 -0.00015 0.00027 0.00065 0.00025 -0.00006

CP 0.926 0.947 0.925 0.938 0.942 0.957θ b

0 25 25 Bias 0.00134 -0.00152 -0.00089 0.00276 0.00181 -0.00113CP 0.917 0.925 0.921 0.905 0.928 0.920

25 49 Bias 0.00074 -0.00130 -0.00168 0.00294 -0.00018 -0.00024CP 0.934 0.930 0.920 0.928 0.913 0.919

25 81 Bias 0.00150 -0.00026 -0.00013 0.00234 0.00083 -0.00060CP 0.928 0.929 0.915 0.935 0.932 0.929

49 25 Bias 0.00064 -0.00125 -0.00026 0.00028 -0.00029 -0.00065CP 0.919 0.930 0.942 0.925 0.926 0.937

49 49 Bias 0.00083 -0.00026 -0.00044 0.00285 -0.00045 0.00037CP 0.945 0.934 0.930 0.935 0.951 0.933

49 81 Bias -0.00011 -0.00015 -0.00024 0.00065 0.00058 0.00087CP 0.944 0.941 0.952 0.950 0.945 0.944

81 25 Bias -0.00001 -0.00020 -0.00105 0.00003 0.00038 -0.00072CP 0.949 0.933 0.938 0.920 0.930 0.944

81 49 Bias 0.00057 0.00002 -0.00051 -0.00124 0.00014 0.00072CP 0.938 0.950 0.954 0.945 0.930 0.930

81 81 Bias -0.00015 -0.00020 0.00011 0.00075 0.00014 -0.00007CP 0.921 0.944 0.929 0.938 0.937 0.953

θ a0 = (0.3,0.3,0.3,0.3,1,1), θ b

0 = (−0.3,0.3,0.3,0.3,1,1), and θ = (λ ,γ,ρ,α,β1,β2). ϑ = 1. The bias-corrected estimator is from Theorem 4.

20

Table 3: Performance of the Bias Corrected Estimator When the Number of Factors is Overspecified by 1


θ a0 25 25 Bias -0.00001 -0.00178 0.00040 0.00327 0.00221 -0.00208

CP 0.875 0.882 0.875 0.829 0.894 0.87425 49 Bias 0.00054 -0.00088 -0.00068 0.00597 -0.00027 0.00013

CP 0.896 0.890 0.903 0.872 0.879 0.89125 81 Bias 0.00062 -0.00051 -0.00031 0.00692 0.00061 -0.00073

CP 0.904 0.914 0.907 0.863 0.912 0.90749 25 Bias -0.00025 -0.00125 0.00043 -0.00111 -0.00030 -0.00115

CP 0.891 0.905 0.892 0.880 0.900 0.90249 49 Bias 0.00015 -0.00038 -0.00006 0.00425 -0.00089 0.00033

CP 0.930 0.917 0.911 0.902 0.929 0.91649 81 Bias -0.00026 -0.00001 -0.00009 0.00165 0.00073 0.00077

CP 0.928 0.915 0.935 0.919 0.924 0.93381 25 Bias -0.00025 0.00000 -0.00043 -0.00082 0.00030 -0.00096

CP 0.909 0.916 0.903 0.885 0.902 0.92881 49 Bias 0.00020 0.00018 -0.00081 -0.00115 0.00022 0.00067

CP 0.936 0.929 0.941 0.935 0.919 0.91881 81 Bias -0.00036 -0.00019 0.00028 0.00081 0.00025 -0.00018

CP 0.919 0.937 0.908 0.914 0.929 0.947θ b

0 25 25 Bias 0.00322 -0.00240 -0.00164 0.00509 0.00270 -0.00157CP 0.869 0.898 0.882 0.829 0.882 0.889

25 49 Bias 0.00167 -0.00156 -0.00218 0.00738 -0.00004 0.00037CP 0.901 0.896 0.898 0.879 0.880 0.892

25 81 Bias 0.00212 -0.00041 -0.00031 0.00643 0.00093 -0.00028CP 0.905 0.916 0.897 0.855 0.913 0.903

49 25 Bias 0.00138 -0.00143 -0.00064 -0.00027 -0.00026 -0.00097CP 0.902 0.904 0.913 0.889 0.898 0.911

49 49 Bias 0.00091 -0.00042 -0.00042 0.00458 -0.00079 0.00054CP 0.934 0.922 0.924 0.909 0.932 0.918

49 81 Bias 0.00011 -0.00019 -0.00041 0.00190 0.00064 0.00080CP 0.937 0.926 0.934 0.918 0.925 0.936

81 25 Bias 0.00005 -0.00035 -0.00112 0.00004 0.00017 -0.00101CP 0.917 0.909 0.913 0.889 0.903 0.924

81 49 Bias 0.00051 0.00005 -0.00058 -0.00080 0.00017 0.00081CP 0.922 0.927 0.934 0.935 0.919 0.918

81 81 Bias -0.00014 -0.00023 0.00010 0.00093 0.00015 -0.00019CP 0.919 0.933 0.923 0.917 0.931 0.951

θ a0 = (0.3,0.3,0.3,0.3,1,1), θ b

0 = (−0.3,0.3,0.3,0.3,1,1), and θ = (λ ,γ,ρ,α,β1,β2). ϑ = 1. The DGP isthe same as described in the text. The true number of factors is 2 and the estimation assumes 3 factors. θ isthe bias corrected QML estimator assuming 3 factors.

21

Figure 1: Frequencies of Incorrect Estimation

True parameter values: θ a0 = (0.3,0.3,0.3,0.3,1,1), where θ = (λ ,γ,ρ,α,β1,β2). ϑ = 1. True number of

factors: 2. Initial estimates assume 10 factors in both equations.

22

Figure 2: Average HECM Origination Rates by US Regions

Figure 3: Average House Price Deviations and Volatility by US Regions

insured HECMs concentrate disproportionately in areas that more likely see house price declines.

Observing that the origination rates exhibit spatial clustering, it is of interest to quantify the spatial

spillover effect. If spatial effects are present, the HECM activity in a state can be affected by developments

in the neighboring states. Our data covers 51 states and 52 quarters from 2001 to 2013. Let yit denote the

HECM origination rate, defined as the number of newly originated HECM loans in state i at quarter t as a

percentage of the senior population (age 65 plus) in state i from the 2010 census. The n×n spatial weights

matrices is Wn and Wn,i j = 1 if states i and j share the same border and w1,i j = 0 otherwise. House price

dynamic variables are constructed using the Federal Housing Finance Agency’s quarterly all-transactions

house price indexes (HPI) deflated by the CPI, and include deviations from the previous 9 year averages

(hpi_dev), standard deviations of house price changes in the previous 9 years (hpi_v) and the interaction

between the two. Figures 2 and 3 show the averages of these variables by U.S. regions in our sample period.

It is likely that the origination rates are affected by some macroeconomic factors which are captured by

23

Table 4: Estimation of State-Level Origination RatesCoefficient SD

Contemporaneous Spatial Effect λ −0.05527∗∗∗ 0.01426Own Time Lag γ 0.68981∗∗∗ 0.01263

Spatial Diffusion ρ 0.05405∗∗∗ 0.00998Spatial Effect in Disturbances α 0.12180∗∗∗ 0.00808

House Price Deviation β1 −0.00025 0.00055House Price Volatility β2 0.01346∗∗∗ 0.00153Deviation × Volatility β3 0.03203∗∗∗ 0.00847

The sample size is n = 51 and T = 52.Bias corrected estimates are reported. ∗∗∗ p < 0.01, ∗∗ p < 0.05, ∗ p < 0.1. σ = 0.0063.

the time factors fyt . It is also likely that macroeconomic factors have different impacts in different states, as

captured by state-specific factor loadings. Note that the factor loadings may be spatially correlated, capturing

the residual spatial effects not directly modeled. The interactive effects include additive individual and time

effects as special cases, hence state time invariant variables and time variables invariant across states are not

included. The model consists of

yit =λ

n

∑j=1

Wn,i jy jt + γyi,t−1 +ρ

n

∑j=1

Wn,i jy j,t−1 +hpi_devitβ1 +hpi_vitβ2 +(hpi_devit ×hpi_vit)β3 +Γ′n,i ft +uit ,

uit =α

n

∑j=1

Wn,i ju jt + εit .

The preliminary estimates use 10 factors. Both the eigenvalue ratio and the growth ratio criteria indicate 1

unobserved factor in yit . Table 4 reports the estimation results with one unobserved factor.

The results reveal interesting spatial patterns. Higher HECM activity in a state negatively influences

neighboring states (λ ). This is consistent with the hypothesis that lenders shift resources towards states with

higher activity, resulting in lower activity in the neighboring states. On the other hand, spatial diffusion (ρ)

and spatial error (α) have positive effects, reflecting spatially correlated demand and supply effects. The

own time lag (γ) has positive effect, capturing serially correlated effects. In addition, the results show that

states with high past house price volatilities and current house prices above long term averages have higher

origination rates, consistent with the findings of Haurin et al. (2014).

6. Conclusion

Dynamic spatial panels with interactive effects are of practical interest. Outcomes of spatial units are

correlated due to spatial interactions and common factors. The model under consideration has a rich spatial

structure, which includes contemporaneous spatial interaction, spatial diffusion and spatial disturbances.

Unobserved interactive effects of individual loadings and time factors account for additional cross sectional

24

dependence and may correlate with observed regressors. This paper shows that the QML method provides

consistent and asymptotically normal estimators. There are asymptotic biases arising from the predeter-

mined regressors and the interaction between the spatial effects and the factor loadings, but bias correction

is possible. The Monte Carlo study shows that the proposed bias correction is effective in reducing bias.

Consistency of estimators may still hold when the number of factors is overspecified. There are various

criteria which can determine the number of factors in a sample. An application of the model to reverse

mortgage originations reveals interesting spatial patterns.

7. References

Ahn, Seung C. and Alex R. Horenstein, “Eigenvalue Ratio Test for the Number of Factors,” Econometrica,

May 2013, 81 (3), 1203–1227.

, Young H. Lee, and Peter Schmidt, “Panel Data Models with Multiple Time-Varying Individual Ef-

fects,” Journal of Econometrics, 2013, 174, 1–14.

Amemiya, Takeshi, Advanced Econometrics, Harvard University Press, 1985.

Andrews, Donald W. K., “Consistency in Nonlinear Econometric Models: A Generic Uniform Law of

Large Numbers,” Econometrica, Nov. 1987, 55 (6), 1465–1471.

Anselin, Luc, Spatial Econometrics: Methods and Models, Springer, 1988.

Bai, Jushan, “Inferential Theory for Factor Models of Large Dimensions,” Econometrica, Jan. 2003, 71

(1), 135–171.

, “Panel Data Models with Interactive Fixed Effects,” Econometrica, 2009, 77 (4), 1229–1279.

and Kunpeng Li, “Spatial Panel Data Models with Common Shocks,” Manuscript, Columbia University,

2014.

and Serena Ng, “Determining the Number of Factors in Approximate Factor Models,” Econometrica,

Jan. 2002, 70 (1), 191–221.

Bai, Z.D. and Y.Q. Yin, “Limit of the Smallest Eigenvalue of a Large Dimensional Sample Covariance

Matrix,” The Annals of Probability, 1993, 21 (3), 1275–1294.

Bernstein, D.S., Matrix Mathematics: Theory, Facts, and Formulas, Princeton, NJ: Princeton University

Press, 2009.

25

Cliff, Andrew David and J. K. Ord, Spatial Autocorrelation, London: Pion, 1973.

Conley, Timothy G. and Bill Dupor, “A Spatial Analysis of Sectoral Complementarity,” Journal of Politi-

cal Economy, 2003, 111, 311–352.

Forni, Mario, Marc Hallin, Marco Lippi, and Lucrezia Reichlin, “The Generalized Dynamic Factor

Model Consistency and Rates,” Journal of Econometrics, 2004, 119, 231–255.

Han, Xiaoyi, “Bayesian Estimation of a Spatial Autoregressive Model with an Unobserved Endogenous

Spatial Weight Matrix and Unobserved Factors,” Manuscript, The Ohio State University, 2013.

Haurin, Donald, Chao Ma, Stephanie Moulton, Maximilian Schmeiser, Jason Seligman, and Wei Shi,

“Spatial Variation in Reverse Mortgage Usage: House Price Dynamics and Consumer Selection,” Journal

of Real Estate Finance and Economics, Forthcoming, 2014.

Kato, Tosio, Perturbation Theory for Linear Operators, Springer-Verlag, 1995.

Kelejian, Harry H. and Ingmar R. Prucha, “On the Asymptotic Distribution of the Moran I Test Statistic

with Applications,” Journal of Econometrics, 2001, 104, 219–257.

and , “Specification and Estimation of Spatial Autoregressive Models with Autoregressive and Het-

eroskedastic Disturbances,” Journal of Econometrics, 2010, 157, 53–67.

Latala, Rafal, “Some Estimates of Norms of Random Matrices,” Proceedings of the American Mathemati-

cal Society, 2005, 133 (5), 1273–1282.

Lee, Lung-Fei, “Asymptotic Distributions of Quasi-Maximum Likelihood Estimator for Spatial Autore-

gressive Models,” Econometrica, Nov. 2004, 72 (6), 1899–1925.

, “Asymptotic Distributions of Quasi-Maximum Likelihood Estimator for Spatial Autoregressive Models,

Online Appendix,” Nov. 2004.

and Jihai Yu, “Some Recent Developments in Spatial Panel Data Models,” Regional Science and Urban

Economics, 2010, 40, 255–271.

and , “A Spatial Dynamic Panel Data Model with Both Time and Individual Fixed Effects,” Econo-

metric Theory, 2010, 26, 564–597.

and , “Identification of Spatial Durbin Panel Models,” Oct. 2013.

26

and , Spatial Panel Data Models The Oxford Handbooks: Panel Data, Oxford, England: Oxford

University Press, 2013.

Lin, Xu and Lung-Fei Lee, “GMM Estimation of Spatial Autoregressive Models with Unknown Het-

eroskedasticity,” Journal of Econometrics, 2010, 157, 34–52.

Liu, Xiaodong and Lung-Fei Lee, “GMM Estimation of Social Interaction Models with Centrality,” Jour-

nal of Econometrics, 2010, 159, 99–115.

Lu, Xun and Liangjun Su, “Shrinkage Estimation of Dynamic Panel Data Models with Interactive Fixed

Effects,” Manuscript, Singapore Management University, 2015.

Moon, Hyungsik Roger and Martin Weidner, “Linear Regression for Panel with Unknown Number of

Factors as Interactive Fixed Effects,” Econometrica, July 2015, 83 (4), 1543–1579.

and Moon Weidner, “Dynamic Linear Panel Regression Models with Interactive Fixed Effects,”

Manuscript, University of Southern California, 2014.

Onatski, Alexei, “Determining the Number of Factors from Empirical Distribution of Eigenvalues,” Review

of Economic and Statistics, 2010, 92, 1004–1016.

Ord, Keith, “Estimation Methods for Models of Spatial Interaction,” Journal of the American Statistical

Association, 1975, 70, 120–297.

Pesaran, M. Hashem, “Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error

Structure,” Econometrica, 2006, 74, 967–1012.

and Elisa Tosetti, “Large Panels with Common Factors and Spatial Correlation,” Journal of Economet-

rics, 2011, 161, 182–202.

Stock, James H. and Mark W. Watson, “Macroeconomic Forecasting Using Diffusion Indexes,” Journal

of Business and Economic Statistics, Apr. 2002, 20 (2), 147–162.

Su, Liangjun and Zhenlin Yang, “QML Estimation of Dynamic Panel Data Models with Spatial Errors,”

Journal of Econometrics, Forthcoming, 2014.

Wu, Chien-Fu, “Asymptotic Theory of Nonlinear Least Squares Estimation,” The Annals of Statistics, 1981,

9 (3), 501–513.

27

Yu, Jihai, Robert De Jong, and Lung-Fei Lee, “Quasi-maximum Likelihood Estimators for Spatial Dy-

namic Panel Data with Fixed Effects when Both n and T are Large,” Journal of Econometrics, 2008, 146,

118–134.

Zhang, Fuzhen, Matrix Theory: Basic Results and Techniques, Springer, 2011.

28

Appendix A. Some Matrix Algebra and Useful Lemmas

This section lists some results in matrix algebra used frequently throughout the paper. All matrices are

real. Lemma 1 on the uniform boundedness of some matrices is in Lee (2004b).

Lemma 1. Supposing that ‖Wn‖ and∥∥S−1

n

∥∥ are uniformly bounded for some matrix norm ‖·‖, where Sn =

In− λ0Wn. Then∥∥Sn(λ )

−1∥∥ is uniformly bounded for |λ −λ0| < 1

c2 , where Sn(λ ) = In− λWn and c =

max(limsupn ‖Wn‖ , limsupn

∥∥S−1n

∥∥).The following results on matrix norms are standard, see Bernstein (2009). Let µi(M) denote the i-th

largest eigenvalue of a symmetric matrix M, µn(M)≤ µn−1(M)≤ ·· · ≤ µ1(M).

Lemma 2. A is a n×m matrix, B is a m× p matrix, and C and D are n×n matrices. Then

1. ‖A‖2 ≤ ‖A‖F ≤ ‖A‖2 rank(A)12 ; and ‖A‖2

2 ≤ ‖A‖1 ‖A‖∞;

2. ‖AB‖F ≤ ‖A‖F ‖B‖2 ≤ ‖A‖F ‖B‖F ; and |tr(AB)| ≤ ‖A‖F ‖B‖F , if n = p;

3. |tr(C)| ≤ rank(C)‖C‖, where ‖·‖ is an induced matrix norm;

4. µi(C)µ1(D)≥ µi(CD)≥ µi(C)µn(D), where C, D are n×n symmetric positive semidefinite matrices.

We have following results on matrix norms for the n×T dimensional explanatory variables, of which proofs

are in the supplementary file.

Lemma 3. Under Assumptions E and R, ‖Zk‖2 = OP(√

nT), ‖Zk‖F = OP(

√nT ), and tr

((AnZk)

′ε)=

OP(√

nT)

for k = 1, · · · ,K +1, where An is an n×n nonstochastic UB matrix.

Lemma 4. Define Zk = E(Zk|CnT ) where the conditioning set CnT is the sigma algebra generated by

Xn1, · · · ,XnT and Γn and FT . Under Assumptions E, R and SF, we have

(1)∥∥∥P

Γn,FT

∥∥∥2= OP

(1√nT

), where P

Γn,FT= Γn

(Γ′nΓn

)−1(F ′T FT )

−1 F ′T ;∥∥P

ΓnεPFT

∥∥2 = OP(1);

(2) For k = 1, · · · ,K +1, ‖Zk− Zk‖2 = OP(max

(√n,√

T))

; and∥∥Zkε ′P

Γn

∥∥2 = OP (max(n,T )).

Lemma 5. Under Assumptions E, R and SF, 1√nT

[tr(Qn (Zk− Zk)PFT ε ′)−E(tr(Qn (Zk− Zk)PFT ε ′) |CnT )] =

OP(1√T) for all k = 1, · · · ,K +1, where Qn is any n×n UB matrix.

Lemma 6. Under Assumptions E, R and SF, we have(1) 1√

nT

(tr(P

Γnεε ′)−σ2

0 Tr0)= OP

(1√n

).

(2) 1√nT

(tr(P

ΓnRnGnR−1

n εε ′)−σ2

0 T tr(P

ΓnRnGnR−1

n))

= OP

(1√n

);

29

1√nT

(tr(RnGnR−1

n PΓn

εε ′)−σ2

0 T tr(P

ΓnRnGnR−1

n))

= OP

(1√n

); and

1√nT

(tr(P

ΓnRnGnR−1

n PΓn

εε ′)−σ2

0 T tr(P

ΓnRnGnR−1

n))

= OP

(1√n

).

(3) 1√nT

(tr(P

ΓnGnεε ′

)−σ2

0 T tr(P

ΓnGn))

=OP

(1√n

); 1√

nT

(tr(GnP

Γnεε ′)−σ2

0 T tr(P

ΓnGn))

=OP

(1√n

);

and 1√nT

(tr(P

ΓnGnP

Γnεε ′)−σ2

0 T tr(P

ΓnGn))

= OP

(1√n

).

Appendix B. Identification and Consistency: Proof of Propositions 1-3

Proof of Proposition 1.

Dropping constant terms, the expected objective function is equivalent to

QnT (θ , Γn,FT ) =1n

log |Sn(λ )Rn(α)|

− 12

log

(E

1nT

tr

[(Rn(α)

(Sn(λ )Y −

K

∑k=1

Zkδk

)− ΓnF ′T

)(Rn(α)

(Sn(λ )Y −

K

∑k=1

Zkδk

)− ΓnF ′T

)′]),

with Γn = Rn(α)Γn, Γn ∈Rn×r, because no restriction is imposed on Γn and Rn(α) is invertible for α ∈Θα .

Denote Γ0n = RnΓ0n. Thus, QnT (θ0, Γ0n,F0T ) =1n log |SnRn|− 1

2 logσ20 . Substituting in the DGP of Y ,

Rn(α)

(Sn(λ )Y −

K

∑k=1

Zkδk

)− ΓnF ′T =Rn(α)

(Sn(λ )S−1

n Γ0nF ′0T +Sn(λ )S−1n U +

K+1

∑k=1

Zk(θ0k−θk)

)− ΓnF ′T ,

where for simplicity, θk = δk, for k = 1, · · · ,K, θK+1 = λ , and ZK+1 = Gn ∑Kk=1 Zkθ0k. Under Assumption E,

as the idiosyncratic disturbances are independent from the interactive effects and regressors,

QnT (θ , Γn,FT )

≤1n


log(

σ20

ntr(Rn(α)Sn(λ )S−1

n R−1n R−1

n′S−1

n′Sn(λ )

′Rn(α)′)+

E1

nTtr

[M

ΓnRn(α)

(Sn(λ )S−1

n Γ0nF ′0T +K+1

∑k=1

Zk(θ0k−θk)

)(Sn(λ )S−1

n Γ0nF ′0T +K+1

∑k=1

Zk(θ0k−θk)

)′Rn(α)′M

Γn

])

≤1n


log(

σ20


n R−1n R−1

n′S−1

n′Sn(λ )

′Rn(α)′)

+E1

nTtr

(M

ΓnRn(α)

(K+1

∑k=1

Zk(θ0k−θk)

)MF0T

(K+1

∑k=1

Zk(θ0k−θk)

)′Rn(α)′M

Γn

))

=1n


log(

σ20


n R−1n R−1

n′S−1

n′Sn(λ )

′Rn(α)′)+

E1

nT

K+1

∑k1,k2=1

vec(Zk1)′ (MF0T ⊗Rn(α)′M

ΓnRn(α)

)vec(Zk2)(θ0k1−θk1)(θ0k2−θk2)

).

• Case 1: Assumption ID1 holds.

30

If δ 6= δ0 or λ 6= λ0, because Ez′(MF0T ⊗Rn(α)′M

ΓnRn(α)

)z with z =

(vec(Z1) · · · vec(ZK+1)

)is pos-

itive definite,

QnT (θ , Γn,FT )<1n


log(

σ20


n R−1n R−1

n′S−1

n′Sn(λ )

′Rn(α)′))

≤ 1n


log(

σ20∣∣Rn(α)Sn(λ )S−1

n R−1n R−1

n′S−1

n′Sn(λ )

′Rn(α)′∣∣ 1

n

)=−1

2logσ

20 +

1n

log |SnRn|= QnT (θ0, Γ0n,F0T ).

Therefore δ0 and λ0 are identified. At δ = δ0 and λ = λ0,

QnT (θ , Γn,FT )≤1n

log |Sn|+1n

log |Rn(α)|− 12

log(

σ20∣∣Rn(α)R−1

n R−1n′Rn(α)′

∣∣ 1n

)= QnT (θ0, Γ0n,F0T ).

If α 6= α0, this inequality will be strict by Assumption ID1(2). Therefore α0 is identified.

• Case 2: Assumption ID2 holds.

Because z′(MF0T ⊗Rn(α)′M

ΓnRn(α)

)z with z =

(vec(Z1) · · · vec(ZK+1)

)is positive semi-definite,

QnT (θ , Γn,FT )≤1n


log(

σ20∣∣Rn(α)Sn(λ )S−1

n R−1n R−1

n′S−1

n′Sn(λ )

′Rn(α)′∣∣ 1

n

)=−1

2logσ

20 +

1n

log |SnRn|= QnT (θ0, Γ0n,F0T ).

λ and α are identified, because this inequality holds strictly for λ 6= λ0 or α 6= α0 under Assumption ID2(2).

With λ = λ0, δ is then identified, because this inequality is also strict for δ 6= δ0, due to E(z′(MF0T ⊗Rn(α)′M

ΓnRn(α)

)z)

with z =(

vec(Z1) · · · vec(ZK))

being positive definite under Assumption ID2(1).

Therefore δ0, λ0 and α0 are identified under either Assumption ID1 or ID2. With δ = δ0, λ = λ0, and

α = α0, QnT (θ0, Γn,FT ) =1n log |SnRn|− 1

2 log(

σ20 +

1nT tr

[(Γ0nF ′0T − ΓnF ′T

)(Γ0nF ′0T − ΓnF ′T

)′]), which is

strictly less than QnT (θ0, Γ0n,F0T ) unless ΓnF ′T = Γ0nF ′0T . As a result, the number of factors r0 is identified

from the rank of Γ0nF ′0T = RnΓ0nF ′0T .


Instead of considering the positive eigenvalues of 1nT Rn(α)(η ·Z)(η ·Z)′Rn(α)′, one may consider the rel-

evant eigenvalues of 1nT (η · Z)

′Rn(α)′Rn(α)(η · Z). Because these two matrices have the same nonzero

eigenvalues, counting multiplicity.12 There are T eigenvalues of 1nT (η ·Z)

′Rn(α)′Rn(α)(η ·Z) = 1nT (IT ⊗

η ′)Z ′nT Rn(α)′Rn(α)ZnT (IT ⊗η). Because η ∈ BK+1,(IT ⊗η)′(IT ⊗η) = ||η ||22IT = IT . By the Poincaré

12See Theorem 2.8 in Zhang (2011).

31

interlacing theorem, µi+KT (1

nT Z ′nT Rn(α)′Rn(α)ZnT ) ≤ µi(

1nT (IT ⊗η ′)Z ′

nT Rn(α)′Rn(α)ZnT (IT ⊗η)), for

i = 1, · · · ,T .13 Because ( 1nT Z ′

nT Rn(α)′Rn(α)ZnT ) is of dimension (T +KT )× (T +KT ), its T smallest

eigenvalues (including multiplicity) are less than or equal to the T corresponding eigenvalues of 1nT (IT ⊗

η ′)Z ′nT Rn(α)′Rn(α)ZnT (IT ⊗η) uniformly in η ∈ BK+1. A way to see this is the following:

1. For each η , let nη be the number of positive eigenvalues of 1nT Rn(α)(η · Z)(η · Z)′Rn(α)′. Sup-

pose nη ≥ 2r + 1, then ∑ni=2r+1 µi(

1nT Rn(α)(η · Z)(η · Z)′Rn(α)′) = ∑

nη

i=2r+1 µi(1

nT Rn(α)(η · Z)(η ·

Z)′Rn(α)′).

2. There are nη positive eigenvalues of 1nT (η ·Z)

′Rn(α)′Rn(α)(η ·Z) and µi(1

nT (η ·Z)′Rn(α)′Rn(α)(η ·

Z)) = µi(1

nT Rn(α)(η ·Z)(η ·Z)′Rn(α)′) for i= 1, · · · ,nη . The remaining T−nη eigenvalues of 1nT (η ·

Z)′Rn(α)′Rn(α)(η ·Z) are zero. Note that it is necessary that nη ≤min{n,T}.

3. Because µi+KT (1

nT Z ′nT Rn(α)′Rn(α)ZnT )≤ µi(

1nT (IT⊗η ′)Z ′

nT Rn(α)′Rn(α)ZnT (IT⊗η))= µi(1

nT (η ·

Z)′Rn(α)′Rn(α)(η ·Z)) for i = 2r+1, · · · ,nη , it follows that

T+KT

∑i=2r+1+KT

µi(1

nTZ ′

nT Rn(α)′Rn(α)ZnT )

≤T

∑i=2r+1

µi(1

nT(η ·Z)′Rn(α)′Rn(α)(η ·Z)) =

nη

∑i=2r+1

µi(1

nT(η ·Z)′Rn(α)′Rn(α)(η ·Z)).

Because 1nT Z ′

nT Rn(α)′Rn(α)ZnT and 1nT ZnT Z ′

nT Rn(α)′Rn(α) have the same positive eigenvalues, counting

multiplicity,

T+KT

∑i=2r+1+KT

µi(1

nTZ ′

nT Rn(α)′Rn(α)ZnT )

=n

∑i=2r+1+KT

µi(1

nTZnT Z ′

nT Rn(α)′Rn(α))≥n

∑i=2r+1+KT

µi(1

nTZnT Z ′

nT )µn(Rn(α)′Rn(α)

),

where the last inequality is due to Lemma 2(4). Therefore, a set of sufficient conditions for Assumption

NC1(1) is ∑ni=2r+1+KT µi(

1nT ZnT Z ′

nT )> 0 and µn (Rn(α)′Rn(α))> 0 wpa 1 as n,T → ∞.


Maximizing Eq. (4) with respect to Γn and FT is equivalent to

minΓn∈Rn×r,FT∈RT×r

1nT

T

∑t=1

(Sn(λ )S−1

n (Zntδ0 +Unt)−Zntδ +Sn(λ )S−1n Γ0n f0t −Γn ft

)′Rn(α)′Rn(α)

×(Sn(λ )S−1

n (Zntδ0 +Unt)−Zntδ +Sn(λ )S−1n Γ0n f0t −Γn ft

)13See Zhang (2011), Theorem 8.10 and p.271

32

≥ minΓn∈Rn×2r,FT∈RT×2r

1nT

T

∑t=1

[Rn(α)

(Sn(λ )S−1

n (Zntδ0 +Unt)−Zntδ)− Γn ft

]′ [Rn(α)

(Sn(λ )S−1

n (Zntδ0 +Unt)−Zntδ)− Γn ft

]= min

Γn∈Rn×2r

1nT

T

∑t=1

(Sn(λ )S−1

n (Zntδ0 +Unt)−Zntδ)′

Rn(α)′MΓn

Rn(α)(Sn(λ )S−1

n (Zntδ0 +Unt)−Zntδ)

(B.1)

≥ minΓn∈Rn×2r

tr

(1

nT

T

∑t=1

Rn(α)(Sn(λ )S−1

n Zntδ0−Zntδ)(

Sn(λ )S−1n Zntδ0−Zntδ

)′Rn(α)′M

Γn

)

+2

nT

T

∑t=1

(Sn(λ )S−1

n Zntδ0−Zntδ)′

Rn(α)′Rn(α)(Sn(λ )S−1

n Unt)+

1nT

T

∑t=1

(Sn(λ )S−1

n Unt)′


n Unt)

− maxΓn∈Rn×2r

tr

(2

nT

T

∑t=1

Rn(α)Sn(λ )S−1n Unt

(Sn(λ )S−1


Rn(α)′PΓn

)

− maxΓn∈Rn×2r

tr

(1

nT

T

∑t=1

Rn(α)Sn(λ )S−1n Unt

(Sn(λ )S−1

n Unt)′

Rn(α)′PΓn

). (B.2)

The first inequality above follows because the value of the minimization problem can be no less than the case

where we were also able to optimally choose Sn(λ )S−1n Γ0n and F0T . Eq. (B.1) is obtained by concentrating

out the factor ft . Because there is no restriction on Γn ∈ Rn×2r, optimization with respect to Rn(α)Γn is

equivalent to optimization with respect to Γn in Eq. (B.1). Now we examine the terms in Eq. (B.2) one by

one. Denote η =(

δ ′0−δ ′, λ0−λ

)′. Then

minΓn∈Rn×2r

tr

(1

nT

T

∑t=1

Rn(α)(Sn(λ )S−1

n Zntδ0−Zntδ)(


)′Rn(α)′M

Γn

)

=n

∑i=2r+1

µi

(1

nTRn(α)(η ·Z)(η ·Z)′Rn(α)′

)≥ b||η ||21, (B.3)

for some constants b > 0 wpa 1 as n,T → ∞, by Assumption NC1. When GnZntδ0 = ZntC for a constant

vector C, then Assumption NC2 can be used in (B.3). In this case, Sn(λ )S−1n Zntδ0−Zntδ = Znt(δ0− δ )+

ZntC(λ0−λ ) = Zntη∗, where η∗ = δ0−δ +C(λ0−λ ). Denote η∗ ·Z∗ = ∑

Kk=1 η∗k Zk, by Assumption NC2,

minΓn∈Rn×2r

tr

(1

nT

T

∑t=1

Rn(α)(Sn(λ )S−1

n Zntδ0−Zntδ)(


)′Rn(α)′M

Γn

)

=n

∑i=2r+1

µi

(1

nTRn(α)(η∗ ·Z∗)(η∗ ·Z∗)′Rn(α)′

)≥ b||η∗||21.

for some constant b≥ 0 wpa 1 as n,T → ∞. Consider the next term,

2nT

T

∑t=1

(Sn(λ )S−1



n Unt)

=2

nT

K+1

∑k=1

ηktr((

R−1n′S−1

n′Sn(λ )

′Rn(α)′Rn(α)Zk)′

ε

)= OP

(||η ||1√

nT

), (B.4)

33

by Lemma 3. For the next term, by using a law of large numbers for quadratic form (Lemma 9 in Yu et al.

(2008)),

1nT

T

∑t=1

(Sn(λ )S−1

n Unt)′


n Unt)

=σ2

0n

tr(R−1

n′S−1

n′Sn(λ )


n)+OP

(1√nT

). (B.5)

From Lemma 2 (3), for an n×n matrix A, |tr(A)| ≤ rank(A)‖A‖, where ‖·‖ is an induced matrix norm,

2nT

∣∣∣∣∣ maxΓn∈Rn×2r

tr

(T

∑t=1

Rn(α)(Sn(λ )S−1

n Unt)(


)′Rn(α)′P

Γn

)∣∣∣∣∣≤ 4r

nT

K+1

∑k=1|ηk|

∥∥Rn(α)Sn(λ )S−1n R−1

n εZ′kRn(α)′∥∥

2 = OP

(‖η‖1√

min(n,T )

). (B.6)

The last equality is due to the sub-multiplicative property of ‖·‖2 matrix norm, and that for a matrix A,

‖A‖22≤‖A‖1 ‖A‖∞

; furthermore, Assumptions E and R guarantee that ‖ε‖2 =OP(√

max(n,T )) and ‖Zk‖2 =

OP(√

nT ). Similarly,

1nT

∣∣∣∣∣ maxΓn∈Rn×2r

tr

(T

∑t=1

Rn(α)(Sn(λ )S−1

n Unt)(

Sn(λ )S−1n Unt

)′Rn(α)′P

Γn

)∣∣∣∣∣= OP

(1

min(n,T )

). (B.7)

Under Assumption NC1, by substituting Eqs. (B.3)-(B.7) into Eq. (B.2),

QnT (θ)≤1n

log |Sn(λ )Rn(α)|

− 12

log

(b||η ||21 +

σ20

ntr(R−1

n′S−1

n′Sn(λ )


n)+OP

(1√

min(n,T )

))≡QnT (θ) . (B.8)

At θ = θ0,

QnT (θ0) =1n

log |SnRn|−12

log

(min

Γn∈Rn×r,FT∈RT×r

1nT

T

∑t=1

(Γ0n f0t +Unt −Γn ft)′R′nRn (Γ0n f0t +Unt −Γn ft)

)

≥1n

log |SnRn|−12

log

(1

nT

T

∑t=1

ε′ntεnt

)=

1n

log |SnRn|−12

log(

σ20 +OP

(1√nT

))≡ Q∼nT

(θ0) .

(B.9)

Lemma 1 in Wu (1981) provides a criterion for consistency of θnT = argmaxθ∈Θ QnT (θ). To show that∥∥θnT −θ0∥∥

1p→ 0, it is sufficient to show that for all τ > 0, liminfn,T→∞ P

(infθ∈Θ,‖θ−θ0‖1≥τ (QnT (θ0)−QnT (θ))> 0

)=

34

1. From Eqs. (B.8) and (B.9),

QnT (θ0)−QnT (θ)≥ Q∼nT

(θ0)− QnT (θ)

=12

log(

b||η ||21 +σ2

0n

tr(R−1

n′S−1

n′Sn(λ )


n))

− 12

log(

σ20∣∣R−1

n′S−1

n′Sn(λ )


n

∣∣ 1n

)+OP

(1√

min(n,T )

)

≥12

log(

b||η ||21 +σ2

0n

tr(R−1

n′S−1

n′Sn(λ )


n))

− 12

log(

σ20

ntr(R−1

n′S−1

n′Sn(λ )


n))

+oP(1), (B.10)

because 1n tr(R−1

n′S−1

n′Sn(λ )


n)≥∣∣R−1

n′S−1

n′Sn(λ )


n

∣∣ 1n by

the inequality of arithmetic and geometric means. Under Assumption NC1, when ‖δ −δ0‖1≥ τ or |λ −λ0| ≥

τ , b > 0 and liminfn,T→∞ P(infθ∈Θ,‖θ−θ0‖1≥τ (QnT (θ0)−QnT (θ))> 0

)= 1 holds. In addition, (B.10) is

strict when α 6= α0 even if λ = λ0, as guaranteed by Assumption NC1(2). In either case, consistency

follows from Lemma 1 of Wu (1981).

When Assumption NC2 holds instead of NC1, the inequality in (B.10) is strict when α 6= α0 or λ 6= λ0

from Assumption NC2(2), and even with λ = λ0, b||η ||21 > 0 when ‖δ −δ0‖1 ≥ τ . Therefore consistency

follows from liminfn,T→∞ P(infθ∈Θ,‖θ−θ0‖1≥τ (QnT (θ0)−QnT (θ))> 0

)= 1.

Appendix C. Perturbation Theory and Series Expansion of the Concentrated Log Likelihood Function

Kato (1995) has a systematic presentation of perturbation theory. Moon and Weidner (2015) applies the

perturbation theory to a regression panel with common factors. Here we show specifically how to expand

LnT (θ0) itself and LnT (θ) around θ0. Firstly,

Rn(α)

(Sn(λ )Y −

K

∑k=1

Zkδk

)

=(Rn +(α0−α)Wn

)(ΓnF ′T +R−1

n ε +K

∑k=1

Zk (δ0k−δk)+(ZK+1 +GnΓnF ′T +GnR−1

n ε)(λ0−λ )

)

=ΓnF ′T +K+2

∑k=0

ξkVk +K+2

∑k1=0,k2=0

ξk1ξk2Vk1k2 , (C.1)

where Γn = RnΓn, ξ0 =‖ε‖2√

nT, ξk = δ0k− δk for k = 1, · · · ,K, ξK+1 = λ0−λ , ξK+2 = α0−α , V0 =

√nT ε

‖ε‖2,

Vk = RnZk for k = 1, · · · ,K, VK+1 = Rn(ZK+1 +GnR−1n ΓnF ′T ), and VK+2 = WnR−1

n ΓnF ′T ; and also, Vk1k2 =

RnGnR−1n

√nT ε

‖ε‖2for k1 = 0, k2 = K + 1; Vk1k2 = Gn

√nT ε

‖ε‖2for k1 = 0, k2 = K + 2; Vk1k2 = WnZk1 for k1 =

35

1, · · · ,K, k2 = K +2; Vk1k2 = Wn(ZK+1 +GnR−1n ΓnF ′T +GnR−1

n ε) for k1 = K +1, k2 = K +2; and Vk1k2 = 0

otherwise. All the Vk and Vk1k2 above are n×T matrices. In Moon and Weidner (2015), for the NLS estimator

of a factor panel regression, only the similar term ΓnF ′T +∑Kk=0 εkVk appears. For our model, ξK+1, VK+1,

ξK+2, VK+2 and Vk1k2 are extra terms due to the contemporaneous spatial lag and spatial disturbances. In

addition, our objective function (Eq. (5)) has the Jacobian term log |Sn(λ )Rn(α)|.

For our spatial model, the part LnT (θ) of the objective function without the Jacobian term is

LnT (ξ ) =n

∑i=r0+1

µi

(1

nTT(0)+

1nT

K+2

∑k1=0

ξk1T(1)k1

+1

nT

K+2

∑k1=0

K+2

∑k2=0

ξk1ξk2T(2)k1k2

+1

nT

K+2

∑k1=0

K+2

∑k2=0

K+2

∑k3=0

ξk1ξk2ξk3T(3)k1k2k3

+1

nT

K+2

∑k1=0

K+2

∑k2=0

K+2

∑k3=0

K+2

∑k4=0

ξk1ξk2ξk3ξk4T(4)k1k2k3k4

), (C.2)

where T(0)= ΓnF ′T FT Γ′n, T(1)k1

=Vk1FT Γ′n+ΓnF ′TV ′k1, T(2)

k1k2=Vk1k2FT Γ′n+ΓnF ′TV ′k1k2

+Vk1V′k2

, T (3)k1k2k3

=Vk1k2V′k3+

Vk3V′k1k2

, and T (4)k1k2k3k4

=Vk1k2V′k3k4

, with k j = 0, · · · ,K +2 and j = 1,2,3,4. In our model, T (0) is the unper-

turbed operator and it has exactly n− r0 zero eigenvalues and the rest r0 eigenvalues are strictly positive.

Therefore, if there is no perturbation, i.e., ξk = 0 for k = 0, · · · ,K +2, LnT (ξ = 0) = 0. The objective is to

expand LnT (ξ ) around LnT (ξ = 0) = 0 in terms of ξ , T(1), T(2), T (3) and T (4) using the formula in Kato

(1995). Define

bnT = max

K+2

∑k1=0|ξk1 |

∥∥∥∥∥T (1)k1

nT

∥∥∥∥∥2

1d2

max(Γn,FT ),

(K+2

∑k1,k2=0

|ξk1 | |ξk2 |

∥∥∥∥∥T (2)k1k2

nT

∥∥∥∥∥2

) 12 (

1d2

max(Γn,FT )

) 12

,

(K+2

∑k1,k2,k3=0

|ξk1 | |ξk2 | |ξk3 |

∥∥∥∥∥T (3)k1k2k3

nT

∥∥∥∥∥2

) 13 (

1d2

max(Γn,FT )

) 13

,

(K+2

∑k1,k2,k3,k4=0

|ξk1 | |ξk2 | |ξk3 | |ξk4 |

∥∥∥∥∥T (4)k1k2k3k4

nT

∥∥∥∥∥2

) 14 (

1d2

max(Γn,FT )

) 14

. (C.3)

Because∥∥∥T (1)

k1

∥∥∥2=OP(nT ),

∥∥∥T (2)k1k2

∥∥∥2=OP(nT ),

∥∥∥T (3)k1k2k3

∥∥∥2=OP(nT ),

∥∥∥T (4)k1k2k3k4

∥∥∥2=OP(nT ), and d2

max(Γn,FT )

is converging to some positive constant, bnT = OP (‖ξ‖1). The following lemma provides an expansion of

LnT (ξ ) as a power series in ξ . The expansion is valid if ξ is small such that the condition on bnT in Lemma

7 below holds. We list the expansion below for its use, but the detailed derivation of this lemma and bound

on the remainder can be found in the supplementary file.

Lemma 7. Under Assumption SF and assume that d2min(Γn,FT )

16d2max(Γn,FT )

− bnT > 0, then LnT (ξ ) has a convergent

36

series expansion LnT (ξ ) =1

nT ∑∞g=1 ∑

K+2k1=0 ∑

K+2k2=0 · · ·∑

K+2kg=0 ξk1ξk2 · · ·ξkg tr

(T (g)

k1k2···kg

), where

T (g)k1k2···kg

=−g

∑p=d g

4e(−1)p

∑v1+v2+···+vp=g

m1+···+mp+1=p−1ν j=1,··· ,4,m j=0,1,···

S(m1)T(v1)k1···S

(m2) · · ·S(mp)T(vp)···kg

S(mp+1)

with S(0) =−MΓn

, S(m) = Sm0 for m≥ 1 where S0 = Γn(Γ

′nΓn)

−1(F ′T FT )−1(Γ′nΓn)

−1Γ′n.14 For g≥ 5,

1nT

K+2

∑k1,··· ,kg=0

∣∣∣ξk1 · · ·ξkg tr(

T (g)k1k2···kg

)∣∣∣≤ 16r0d2max(Γn,FT )d2

min(Γn,FT )

16d2max(Γn,FT )−d2

min(Γn,FT )

(16d2

max(Γn,FT )bnT

d2min(Γn,FT )

)g

.

When the series expansion is truncated at an order G≥ 4,∣∣∣∣∣LnT (ξ )−1

nT

G

∑g=1

K+2

∑k1=0

K+2

∑k2=0· · ·

K+2

∑kg=0

ξk1ξk2 · · ·ξkg tr(

T (g)k1k2···kg

)∣∣∣∣∣≤

16r0d2max(Γn,FT )d2

min(Γn,FT )(16d2

max(Γn,FT )−d2min(Γn,FT )

)(1− 16d2

max(Γn,FT )bnTd2

min(Γn,FT )

) (16d2max(Γn,FT )bnT

d2min(Γn,FT )

)G+1

= OP(‖ξ‖G+11 ).

The condition on bnT can be satisfied wpa 1 for n,T→∞, because bnT =OP (‖ξ‖1), |ξ0|=OP(1√

min(n,T )),

and∥∥θ −θ0

∥∥1 = oP(1) by Proposition 3. At θ0, LnT (θ0) has the perturbation ξ0 =

‖ε‖2√nT

while other ξk,

k = 1, · · · ,K +2 are zero. LnT (θ0) has an expansion in terms of ξ0 as,

LnT (θ0) =1

nT

n

∑i=r0+1

µi

((ΓnF ′T + ε

)(ΓnF ′T + ε

)′)=

1nT

tr(M

ΓnεMFT ε

′)− 2nT

tr(

MΓn

εMFT ε′P

Γn,FTε′)+JnT +OP

((‖ε‖2√

nT

)5)

(C.4)

=σ20 +OP(min(n,T )−1), (C.5)

where in Eq. (C.4), the series expansion is truncated at order 4 with

JnT =− 1nT

tr(

S(0)T (2)00 S(1)T (2)

00 S(0))

+1

nTtr(

S(0)T (2)00 S(0)T (1)

0 S(2)T (1)0 S(0)

)+

1nT

tr(

S(0)T (2)00 S(1)T (1)

0 S(1)T (1)0 S(0)

)+

1nT

tr(

S(1)T (1)0 S(0)T (2)

00 S(0)T (1)0 S(1)

)+

1nT

tr(

S(0)T (1)0 S(1)T (2)

00 S(1)T (1)0 S(0)

)+

1nT

tr(

S(0)T (1)0 S(2)T (1)

0 S(0)T (2)00 S(0)

)+

1nT

tr(

S(0)T (1)0 S(1)T (1)

0 S(1)T (2)00 S(0)

)− 1

nTtr(

S(0)T (1)0 S(2)T (1)

0 S(0)T (1)0 S(1)T (1)

0 S(0))− 1

nTtr(

S(0)T (1)0 S(1)T (1)

0 S(0)T (1)0 S(2)T (1)

0 S(0))

14Notice that v1+ · · ·+vp = g. When v = 1, T (v)k has a single k subscript; when v = 2, T (v)

kk′ has two subscripts, k, k′. For example,

when g = 4, and in a particular summand, v1 = 1, v2 = 2, v3 = 1, a typical term in the summand is S(m1)T (1)k1

S(m2)T (2)k2k3

S(m3)T (1)k4

.

37

− 1nT

tr(

S(1)T (1)0 S(0)T (1)

0 S(1)T (1)0 S(0)T (1)

0 S(1))− 1

nTtr(

S(0)T (1)0 S(1)T (1)

0 S(1)T (1)0 S(1)T (1)

0 S(0)). (C.6)

It follows that 1LnT (θ0)

= 1σ2

0+OP(min(n,T )−1).

For the derivation of the asymptotic distribution of the QMLE, it is sufficient to consider the series

expansion of LnT (θ) truncated at order G = 4. Define the following (K +2)×1 vectors,

C(1) =(

1√nT

tr(M

ΓnRnZ1MFT ε ′

), · · · 1√

nTtr(M

ΓnRnZKMFT ε ′

), 1√

nTtr(M

ΓnRnZK+1MFT ε ′

), 0

)′(C.7)

C(2) =

− 1√nT

tr(

MΓn

εMFT ε ′PΓn,FT

Z′1R′n)− 1√

nTtr(

MΓn

εMFT Z′1R′nPΓn,FT

ε ′)− 1√

nTtr(

MΓn

RnZ1MFT ε ′PΓn,FT

ε ′)

...

− 1√nT

tr(

MΓn

εMFT ε ′PΓn,FT

Z′KR′n)− 1√

nTtr(

MΓn

εMFT Z′KR′nPΓn,FT

ε ′)− 1√

nTtr(

MΓn

RnZKMFT ε ′PΓn,FT

ε ′)

− 1√nT

tr(

MΓn

εMFT ε ′PΓn,FT

Z′K+1R′n)− 1√

nTtr(

MΓn

εMFT Z′K+1R′nPΓn,FT

ε ′)− 1√

nTtr(

MΓn

RnZK+1MFT ε ′PΓn,FT

ε ′)

0

,

(C.8)

C(3) =(

0, · · · 0, 1√nT

tr(M

ΓnRnGnR−1

n MΓn

εMFT ε ′), 1√

nTtr(M

ΓnGnM

ΓnεMFT ε ′

))′, (C.9)

where PΓn,FT

= Γn(Γ′nΓn)

−1(F ′T FT )−1F ′T . Define the following (K +2)× (K +2) matrices,

C1 =

1nT tr

(M

ΓnRnZ1MFT Z′1R′n

)· · · 1

nT tr(M

ΓnRnZ1MFT Z′KR′n

) 1nT tr

(M

ΓnRnZ1MFT Z′K+1R′n

)0

......

......

1nT tr

(M


)· · · 1

nT tr(M

ΓnRnZKMFT Z′KR′n

) 1nT tr

(M

ΓnRnZKMFT Z′K+1R′n

)0

1nT tr

(M


)· · · 1

nT tr(M


) 1nT tr

(M

ΓnRnZK+1MFT Z′K+1R′n

)0

0 · · · 0 0 0

,

C2 =

0 · · · 0 0 0...

......

...

0 0 0

0 · · · 0 σ20

n tr(RnGnR−1


)σ2

0n tr(GnGn

)+

σ20

n tr(RnGnR−1

n G′n)

0 · · · 0 σ20

n tr(GnGn

)+

σ20

n tr(RnGnR−1

n G′n)

σ20

n tr(GnG′n

)

, and

C =C1 +C2. (C.10)

Lemma 8. Assume that nT → κ2 > 0, Assumptions NC1 (or NC2), E, R and SF hold, then with θ =(

δ ′, λ , α)′ in a small neighborhood of θ0,

LnT (θ) = LnT (θ0)−2√nT

(θ −θ0)′(

C(1)+C(2)+C(3))+(θ −θ0)

′C(θ −θ0)+LremnT (θ),

38

where C(1), C(2), C(3) and C are defined in Eqs. (C.7)-(C.10), and the remainder term is LremnT (θ) =

OP

(‖θ −θ0‖3

1

)+OP

(‖θ −θ0‖2

1 (nT )−14

)+OP

(‖θ −θ0‖1 (nT )−

34

)+OP

((nT )−

54

).

Lemma 8 is an application of Lemma 7 by rearrangement.

Lemma 9. Under Assumptions E, R and SF, and assuming that nT → κ > 0, we have C = OP(1); also for

k = 1, · · · ,K,K +1,

C(1)k =

1√nT

tr(Zkε

′)− 1√nT

tr(M

ΓnRn (Zk− Zk)PFT ε

′)= OP(1),C(2)k = oP(1),

and

C(3) =(

0, · · · 0, 1√nT

tr(M

ΓnRnGnR−1

n MΓn

εε ′)+OP(1), 1√

nTtr(M

ΓnGnM

Γnεε ′)+OP(1)

)′=(

0, · · · 0,√

Tn tr(Gn)σ2

0 +OP(1),√

Tn tr(Gn)

σ20 +OP(1)

)′.

Lemma 10. Assume that nT → κ2 > 0 and Assumptions E, R and SF hold, then DnT −DnT = oP(1), where

DnT and DnT are respectively defined in Eq. (9) and Eq. (D.9).

Lemma 11. Under the assumptions of Theorem 3,∥∥∥M ˆ

Γn−M

Γn

∥∥∥2=∥∥∥Pˆ

Γn−P

Γn

∥∥∥2=OP(

1√n), and

∥∥MFT−MFT

∥∥2 =∥∥PFT

−PFT

∥∥2 = OP(

1√T), where M ˆ

Γnand MFT

are defined in Section 3.3.

Appendix D. Asymptotic Distributions: Proof of Theorems 1-5

Proof of Theorem 1.

Lemma 8 shows that, for θ close to θ0, LnT (θ) = LnT (θ0)− 2√nT(θ − θ0)

′ (C(1)+C(2)+C(3))+ (θ −

θ0)′C(θ−θ0)+Lrem

nT (θ), where LremnT (θ)=OP

(‖θ −θ0‖3

1

)+OP

(‖θ −θ0‖2

1 (nT )−14

)+OP

(‖θ −θ0‖1 (nT )−

34

)+

OP

((nT )−

54

). In the following, we provide concise expressions for those terms in the expansion of QnT (θnT )

around θ0, where θnT satisfies∥∥θnT −θ0

∥∥1 = oP(1).

Before proceeding further, let’s examine the (K+2)×1 vectors C(1), C(2), C(3) and the (K+2)×(K+2)

matrix C more closely. Notice that the K +1 and K +2 entries of C(3) have C(3)K+1 = OP(

√nT ) and C(3)

K+2 =

OP(√

nT ), which are of a higher stochastic order than C(1) and C(2). However, as Theorem 1 shows, the

higher order parts of C(3) will be canceled with terms from the log Jacobian determinant in the log likelihood

function.

Recall that Zk = MΓn

RnZkMFT +MΓn

Rn (Zk− Zk) for k = 1, · · · ,K + 1. Notice that Zk,it is independent

from εit . The concentrated likelihood function is

QnT (θnT ) =1n

log∣∣∣Sn(λnT )Rn(αnT )

∣∣∣− 12

logLnT (θnT )

39

=1n

log∣∣∣Sn(λnT )Rn(αnT )

∣∣∣− 1

2log(

LnT (θ0)−2√nT

(θnT −θ0

)′(C(1)+C(2)+C(3))+(θnT −θ0

)′C(θnT −θ0)+Lrem

nT (θnT )

)= QnT (θ0)−

1n

tr(Gn)(λnT −λ0)−12n

tr(G2

n)(λnT −λ0)

2 +O(∣∣∣λnT −λ0

∣∣∣3)− 1

ntr(Gn)(αnT −α0)−

12n

tr(G2

n)(αnT −α0)

2 +O(|αnT −α0|3)

− 12

log

1− 2√nT

(θnT −θ0

)′(C(1)+C(2)+C(3)

LnT (θ0)

)+(θnT −θ0

)′ CLnT (θ0)

(θnT −θ0

)+

LremnT (θnT )

LnT (θ0)︸︷︷︸x

,

(D.1)

where the Taylor expansions of log∣∣∣Sn(λnT )

∣∣∣ and log |Rn(αnT )| are used. From Lemma 9 and by using

log(1+ x) = x− x2

2 +O(x3) with

x =− 2nT LnT (θ0)

[tr(M

ΓnRnGnR−1

n MΓn

εε′)(

λnT −λ0

)+ tr

(M

ΓnGnM

Γnεε′)(αnT −α0)

]+OP

(∥∥θnT −θ0∥∥

1 (nT )−12

)+OP

(∥∥θnT −θ0∥∥2

1 (nT )−14

)+OP

(∥∥θnT −θ0∥∥2

1

)+OP

((nT )−

54

),

QnT (θnT ) = QnT (θ0)−1n

tr(Gn)(λnT −λ0)−12n

tr(G2

n)(λnT −λ0)

2− 1n

tr(Gn)(αnT −α0)−

12n

tr(G2

n)(αnT −α0)

2

+1√nT

(θnT −θ0

)′(C(1)+C(2)+C(3)

LnT (θ0)

)− 1

2(θnT −θ0

)′ CLnT (θ0)

(θnT −θ0

)+

1

(nT )2 (λnT −λ0)2 tr(M

ΓnRnGnR−1

n MΓn

εε ′)2

LnT (θ0)2 +1

(nT )2 (αnT −α0)2 tr(M

ΓnGnM

Γnεε ′)2

LnT (θ0)2

+2

(nT )2 (λnT −λ0)(αnT −α0)tr(M

ΓnRnGnR−1

n MΓn

εε ′)

tr(M

ΓnGnM

Γnεε ′)

LnT (θ0)2

− LremnT (θnT )

2LnT (θ0)+OP

(∥∥θnT −θ0∥∥2

1 (nT )−12

)+OP

(∥∥θnT −θ0∥∥3

1 (nT )−14

)+OP

(∥∥θnT −θ0∥∥3

1

)+OP

((nT )−

52

).

= QnT (θ0)+1√nT

(θnT −θ0

)′(C(1)+C(2)+C(3)

LnT (θ0)−D(1)

)− 1

2(θnT −θ0

)′DnT(θnT −θ0

)+Qrem

nT (θnT ),

(D.2)

where D(1) =(

0, · · · 0,√

Tn tr(Gn) ,

√Tn tr(Gn))′

is a (K + 2)× 1 vector and the (K + 2)× (K + 2)

matrix DnT is the one in Eq. (9). The remainder term is

QremnT (θnT ) = OP

((nT )−

54

(1+nT

∥∥θnT −θ0∥∥2

1 +2√

nT∥∥θnT −θ0

∥∥1

))+OP

(∥∥θnT −θ0∥∥3

1

)(D.3)

= oP

((nT )−1

(1+√

nT∥∥θnT −θ0

∥∥1

)2). (D.4)

40

In Eq. (D.3), OP

(∥∥θnT −θ0∥∥3

1

)= oP(

∥∥θnT −θ0∥∥2

1), because∥∥θnT −θ0

∥∥1 = oP(1).

Lemma 9 shows that C(1)+C(2)+C(3)

LnT (θ0)−D(1) = OP(1). Define γ = 1√

nTD−1

nT

(C(1)+C(2)+C(3)

LnT (θ0)−D(1)

)which is

OP

(1√nT

). The rest of the proof is similar to Corollary 4.3 of Moon and Weidner (2015). Completing the

squares in Eq. (D.2), QnT (θnT ) = QnT (θ0)− 12

(θnT −θ0− γ

)′DnT(θnT −θ0− γ

)+ 1

2 γ ′DnT γ +QremnT (θnT ).

Consider the following two cases, θnT = θ = argmaxQnT (θ) and θnT = θ0+γ . Notice that both θ and θ0+γ

satisfy the condition that∥∥θnT −θ0

∥∥1 = oP(1). As QnT (θnT )=QnT (θ0)− 1

2

(θnT −θ0− γ

)′DnT

(θnT −θ0− γ

)+

12 γ ′DnT γ +Qrem

nT (θnT ), QnT (θ0 + γ) = QnT (θ0)+12 γ ′DnT γ +Qrem

nT (θ0 + γ), and QnT (θnT ) ≥ QnT (θ0 + γ), it

follows that(θnT −θ0− γ

)′DnT

(θnT −θ0− γ

)≤ 2Qrem

nT (θnT )−2QremnT (θ0+γ). Using Eq. (D.4), Qrem

nT (θnT )−

QremnT (θ0 + γ)≤ oP

((nT )−1 ((1+√nT

∥∥θnT −θ0∥∥

1

)+(1+√

nT ‖γ‖1))2)

. Because DnT is assumed to be

positive definite,

√nT∥∥θnT −θ0− γ

∥∥1 ≤ oP

(1+√

nT∥∥θnT −θ0

∥∥1

)+oP

(1+√

nT ‖γ‖1

)= oP

(1+√

nT∥∥θnT −θ0− γ

∥∥1

)+oP

(1+√

nT ‖γ‖1

).

Because γ =OP(1√nT), oP(1+

√nT ‖γ‖1)= oP(1), therefore

√nT∥∥θnT −θ0− γ

∥∥1 = oP(1), and

√nT(θnT −θ0

)−

√nT γ = oP(1), which implies

√nT(θnT −θ0

)= D−1

nT

(C(1)+C(2)+C(3)

LnT (θ0)−D(1)

)+oP (1). Because LnT (θ0) =

σ20 +OP

(1n

),

√nT (θnT −θ0)

=D−1nT

(C(1)+C(2)+C(3)

LnT (θ0)−D(1)

)+oP (1)

=(LnT (θ0)DnT )−1(

C(1)+C(2)+C(3)−D(1)LnT (θ0))+oP(1)

=(LnT (θ0)DnT )−1(

C(1)+C(2)+C(3)−D(1)σ

20

)+(LnT (θ0)DnT )

−1 D(1) (σ

20 −LnT (θ0)

)+oP(1)

=(σ

20 DnT

)−1(

C(1)+C(2)+C(3)−D(1)σ

20

)+(σ

20 DnT

)−1D(1)

(σ

20 −

1nT

tr(εε′)+ 1

nTtr(P

Γnεε′)+ 1

nTtr(εPFT ε

′)+OP(n−32 )

)+oP(1)

=(σ

20 DnT

)−1

1√nT ∑

ni=1 ∑

Tt=1Z1,itεit − 1√

nTtr(M

ΓnRn (Z1− Z1)PFT ε ′

)1√nT ∑

ni=1 ∑

Tt=1Z2,itεit − 1√

nTtr(M


)...

1√nT ∑

ni=1 ∑

Tt=1ZK,itεit

1√nT ∑

ni=1 ∑

Tt=1ZK+1,itεit − 1√

nTtr(M

ΓnRn (ZK+1− ZK+1)PFT ε ′

)0

41

+(σ

20 DnT

)−1

0...

01√nT

tr(M

ΓnRnGnR−1

n MΓn

εε ′)− 1√

nTtr(M

ΓnRnGnR−1

n MΓn

εPFT ε ′)−√

Tn tr(Gn)σ2

0

1√nT

tr(M

ΓnGnM

Γnεε ′)− 1√

nTtr(M

ΓnGnM

ΓnεPFT ε ′

)−√

Tn tr(Gn)

σ20

+(σ

20 DnT

)−1

0...

01n tr(Gn)

(√nT σ2

0 − 1√nT

tr(εε ′))+ 1

n tr(Gn)1√nT

tr(P

Γnεε ′)+ 1

n tr(Gn)1√nT

tr(εPFT ε ′)

1n tr(Gn)(√

nT σ20 − 1√

nTtr(εε ′)

)+ 1

n tr(Gn) 1√

nTtr(P

Γnεε ′)+ 1

n tr(Gn) 1√

nTtr(εPFT ε ′)

+oP(1),

(D.5)

where the results of Lemma 9 are used. Eq. (D.5) can be further simplified. Notice that

1n

tr(Gn) tr(εPFT ε

′)− tr(M

ΓnRnGnR−1

n MΓn

εPFT ε′)= vec(ε)′M vec(ε),

where M = PFT ⊗ 12

(2n tr(Gn)In−M

ΓnRnGnR−1

n MΓn−M

ΓnR−1

n′G′nR′nM

Γn

). M is symmetric, tr(M ) =O(1),

and tr(M 2)=O(n). We have E(vec(ε)′M vec(ε))2 =(µ(4)−3σ4

0)

∑nTi=1 [M ]2ii+2σ4

0 tr(M 2

)+(σ2

0 tr(M ))2

=

O(n). Therefore 1√nT

[1n tr(Gn) tr(εPFT ε ′)− tr

(M

ΓnRnGnR−1

n MΓn

εPFT ε ′)]

= OP(1√T). Similarly, we have

1√nT

[1n tr(Gn)

tr(εPFT ε ′)− tr(M

ΓnGnM

ΓnεPFT ε ′

)]= OP(

1√T). Putting the above two terms into the remain-

der, Eq. (D.5) becomes,

√nT (θnT −θ0−

(σ

20 DnT

)−1 1√nT

∆nT )

=(σ

20 DnT

)−1

1√nT ∑

ni=1 ∑

Tt=1Z1,itεit

1√nT ∑

ni=1 ∑

Tt=1Z2,itεit

...1√nT ∑

ni=1 ∑

Tt=1ZK,itεit

1√nT ∑

ni=1 ∑

Tt=1ZK+1,itεit

0

+(σ

20 DnT

)−1

0...

01√nT

tr(RnGnR−1

n εε ′)− 1√

nTtr(εε ′) 1

n tr(Gn)

1√nT

tr(Gnεε ′

)− 1√

nTtr(εε ′) 1

n tr(Gn)

+oP(1),

(D.6)

42

where

∆nT =1√nT

−tr(M


)−tr(M


)0...

0

−tr(M

ΓnRn (ZK+1− ZK+1)PFT ε ′

)0

(D.7)

+1√nT

0

0

0...

0

−tr(P

ΓnRnGnR−1

n εε ′)− tr

(RnGnR−1

n PΓn

εε ′)+ tr

(P

ΓnRnGnR−1

n PΓn

εε ′)+ 1

n tr(Gn) tr(P

Γnεε ′)

−tr(P

ΓnGnεε ′

)− tr

(GnP

Γnεε ′)+ tr

(P

ΓnGnP

Γnεε ′)+ 1

n tr(Gn)

tr(P

Γnεε ′)

.

(D.8)

From (D.6) it is clear that the limiting distribution is not centered but with the bias(σ2

0 DnT)−1 1

nT ∆nT .

The bias arises from the predetermined regressors (D.7) and the interactions between spatial effects and

the factor loadings (D.8). The bias terms would disappear if all regressors were strictly exogenous and no

spatial effects were present.

Applying Lemmas 5 and 6, ∆nT = ϕnT +OP

(1√n

), where

ϕnT =

− σ20√nT ∑

T−1h=1 tr

(J0PFT J′h

)tr(Ah−1

n S−1n)

− σ20√nT ∑

T−1h=1 tr

(J0PFT J′h

)tr(WnAh−1

n S−1n)

0...

0

− σ20√nT ∑

T−1h=1 tr

(J0PFT J′h

)tr((γGn +ρGnWn)Ah−1

n S−1n)+√

Tn σ2

0( r0

n tr(Gn)− tr(P

ΓnRnGnR−1

n))√

Tn σ2

0( r0

n tr(Gn)− tr

(P

ΓnGn))

,

43

Jh =(0T×(T−h), IT×T ,0T×h

)′ for h = 0, · · · ,T −1, and IT×T is T ×T identity matrix. It follows that

√nT (θnT −θ0)−

(σ

20 DnT

)−1ϕnT

=(σ

20 DnT

)−1

1√nT ∑

ni=1 ∑

Tt=1Z1,itεit

1√nT ∑

ni=1 ∑

Tt=1Z2,itεit

...1√nT ∑

ni=1 ∑

Tt=1ZK,itεit

1√nT ∑

ni=1 ∑

Tt=1ZK+1,itεit

0

+(σ

20 DnT

)−1

0...

01√nT

tr(RnGnR−1

n εε ′)− 1√

nTtr(εε ′) 1

n tr(Gn)

1√nT

tr(Gnεε ′

)− 1√

nTtr(εε ′) 1

n tr(Gn)

+oP(1).

Proof of Theorem 2.

The object of interest is c′νnT = bcnT′vec(ε)+ωc

nT′vec(ε)+vec(ε)′Ac

nT vec(ε), where ωcnT = vec

(∑

∞h=1 Pc

nhεh)

and AcnT is a nonstochastic symmetric matrix. From Yu et al. (2008), p.128, Ec′νnT = σ2

0 tr(AcnT ) = 0,

var(c′νnT

)= T σ

40Etr

(∞

∑h=1

Pcnh′Pc

nh

)+σ

20Ebc

nT′bc

nT +2σ40 tr(Ac

nT2)+2µ

(3)EnT

∑i=1

[bcnT ]i[A

cnT ]ii +

(µ(4)−3σ

40

) nT

∑i=1

[AcnT ]

2ii.

Explicitly, 1nT var(c′νnT ) = σ4

0 c′ (DnT +ΣnT +o(1))c, where DnT = p limn,T→∞ DnT , and

DnT =1

σ20

1nT tr

(M

ΓnRnZ1MFT Z′1R′n

)· · · 1

nT tr(M


) 1nT tr

(M


)0

......

......

1nT tr

(M

ΓnRnZKMFT Z′1R′n

)· · · 1

nT tr(M

ΓnRnZKMFT Z′KR′n

) 1nT tr

(M


)0

1nT tr

(M

ΓnRnZK+1MFT Z′1R′n

)· · · 1

nT tr(M

ΓnRnZK+1MFT Z′KR′n

) 1nT tr

(M

ΓnRnZK+1MFT Z′K+1R′n

)0

0 · · · 0 0 0

+

φ1,1 φ1,2 0 · · · φ1,K+1 0

φ1,2 φ2,2 0 · · · φ2,K+1 0

0 0 0 · · · 0 0...

......

......

0 0 0 0 0 0

φ1,K+1 φ2,K+1 0 · · · φK+1,K+1 +ψK+1,K+1 ψK+1,K+2

0 0 0 · · · ψK+1,K+2 ψK+2,K+2

, (D.9)

and φ1,1 =1n tr(RnPnR′n), φ1,2 =

1n tr(RnPnW ′nR′n), φ2,2 =

1n tr(RnWnPnW ′nR′n), φ1,K+1 =

1n tr(RnPn (γ0G′n +ρ0W ′nG′n)R′n),

φ2,K+1 =1n tr(RnWnPn (γ0G′n +ρ0W ′nG′n)R′n), φK+1,K+1 =

1n tr(Rn (γ0Gn +ρ0GnWn)Pn (γ0G′n +ρ0W ′nG′n)R′n),

44

where Pn =∑∞h=1 Ah−1

0n S−1n R−1

n R−1n′S−1

n′Ah−1

0n′; and ψK+1,K+1 =

1n tr(RnGnR−1


)+ 1

n tr(G2n)−2

(1n tr(Gn)

)2,

ψK+1,K+2 =1n tr(GnGn

)+ 1

n tr(RnGnR−1

n G′n)− 2

n2 tr(Gn)tr(Gn) and ψK+2,K+2 =1n tr(GnG′n

)+ 1

n tr(G2n)−2

(1n tr(Gn))2.

ΣnT is defined in Eq. (10).

Lemma 10 shows that DnT = DnT +oP(1), therefore 1nT var(c′νnT ) = σ4

0 c′ (DnT +ΣnT +oP(1))c.

Proof of Theorem 3.

The theorem follows from Theorem 2 by applying the CLT of the martingale difference array.

Proof of Theorem 4.

We have the following useful results.

(1) σ20 can be estimated by LnT (θnT ). This is because from Lemmas 8, and

∥∥θnT −θ0∥∥

1 = OP

(1√nT

),

LnT (θnT ) = LnT (θ0)+OP

(1√nT

). From Eq. (C.5), LnT (θ0) = σ2

0 +OP

(1√nT

).

(2)∥∥Rn−Rn

∥∥2 = OP(

1√nT), because Rn = Rn +(α0− α)Wn and

∥∥θnT −θ0∥∥

1 = OP

(1√nT

).

(3) By the mean value theorem, ˆGn− Gn = WnRn(α)−1WnRn(α)−1(α −α0), for some α in between of

α0 and α . Because∥∥Wn

∥∥2 and

∥∥R−1n

∥∥2 are uniformly bounded,

∥∥R−1n (α)

∥∥2 is uniformly bounded in a

neighborhood of α0 by Lemma 1.∥∥∥ ˆGn− Gn

∥∥∥2= OP

(1√nT

). Similarly,

∥∥Gn−Gn∥∥

2 = OP

(1√nT

)and∥∥R−1

n −Rn∥∥

2 = OP

(1√nT

).

(4)∥∥ZK+1−ZK+1

∥∥2 =OP(1), because ZK+1−ZK+1 =∑

Kk=1 δkGnZk−∑

Kk=1 δ0kGnZk =∑

Kk=1 δk(Gn−Gn)Zk+

∑Kk=1(δk−δ0k)GnZk,

∥∥∥δ −δ

∥∥∥1= OP(

1√nT), and from (3),

∥∥Gn−Gn∥∥

2 = OP

(1√nT

).

To prove Theorem 4, it is sufficient to show that ϕnT −ϕp→ 0, which immediately follows from (1) to

(4) above. For example,∣∣∣tr(PˆΓn

ˆGn

)− tr

(P

ΓnGn)∣∣∣≤ ∣∣∣tr((Pˆ

Γn−P

Γn

)ˆGn

)∣∣∣+ ∣∣∣tr(PΓn

(ˆGn− Gn

))∣∣∣≤ 2r0

∥∥∥PˆΓn−P

Γn

∥∥∥2

∥∥∥ ˆGn

∥∥∥2+ r0

∥∥PΓn

∥∥2

∥∥∥ ˆGn− Gn

∥∥∥2= OP

(1√n

),

where∥∥∥Pˆ

Γn−P

Γn

∥∥∥2= OP

(1√n

)by Lemma 11 and

∥∥Gn−Gn∥∥

2 = OP

(1√nT

)from (3).

Furthermore, DnT =DnT +oP(1), because, for example, 1nT

∣∣∣tr(M ˆΓn

RnZkMFTZ′K+1R′n

)− tr

(M

ΓnRnZkMFT ZK+1R′n

)∣∣∣=oP(1) for k= 1, · · · ,K by (1) to (4) above. 1

n

∣∣∣tr( ˆGn

)− tr

(Gn)∣∣∣=OP

(1√nT

), 1

n

∣∣tr(Gn)− tr(Gn)

∣∣=OP

(1√nT

)and

∣∣∣tr(R−1n Pˆ

ΓnRnGn

)− tr

(R−1

n PΓn

RnGn)∣∣∣= OP

(1√nT

). Finally, D−1

nT = D−1nT +oP(1).

Proof of Theorem 5.

We show that Corollary 1 of Ahn and Horenstein (2013) holds by checking that their Assumption A-D

are satisfied. Assuming that n and T are proportional, and the preliminary estimator satisfies∥∥θ −θ0

∥∥2 =

op

(n−

12

), we have,

45

(1)∥∥ε + EnT (θ)

∥∥2 =Op (

√n), because ‖ε‖2 =Op (

√n) by Assumption E and

∥∥EnT(θ)∥∥

2≤∑K+2k=1 ‖ηk‖2 ‖Vk‖2+

∑K+2k1,k2=1 ‖ηk1‖2 ‖ηk2‖2 ‖Vk1k2‖2 = op (

√n).

(2) µn

(1n

(ε + EnT (θ)

)(ε + EnT (θ)

)′)≥ c+op(1) for some positive constant c. This is because

µn

(1n

(ε + EnT (θ)

)(ε + EnT (θ)

)′)≥ µn

(1n

εε′)+µn

(1n

EnT(θ)

EnT(θ)′)

+µn

(1n

εEnT(θ)′+

1n

EnT(θ)

ε′)

≥µn

(1n

εε′)− 1

n

∥∥∥EnT(θ)

EnT(θ)′∥∥∥

2− 1

n

∥∥∥εEnT(θ)′+ EnT

(θ)

ε′∥∥∥

2

=µn

(1n

εε′)+oP(1)≥ c+oP(1),

where the last inequality is from Lemma A.1 of Ahn and Horenstein (2013) (due to Bai and Yin (1993)).

(3) For any matrix AT×q = (a1, · · · ,aq) such that A′A = T Iq,

1n3

∣∣tr(A′FT Γ′n(ε + EnT

(θ))

A)∣∣≤ r0

n3

∥∥AA′∥∥

2

∥∥FT Γ′n

∥∥2

∥∥ε + EnT(θ)∥∥

2 = OP

(n−

12

).

Similarly, we have

1n3

∣∣∣tr(A′(ε + EnT

(θ))′

Γn(Γ′nΓn)−1

Γ′n(ε + EnT

(θ))

A)∣∣∣

≤ r0

n3

∥∥AA′∥∥

2

∥∥ε + EnT(θ)∥∥2

2

∥∥∥Γn(Γ′nΓn)−1

Γ′n

∥∥∥2= OP

(n−1) .

In Ahn and Horenstein (2013), Assumptions C and D can be replaced by the conditions in their Eqs. (2)

and (3), which are satisfied by (1) and (2) above. Their Assumption B is used to prove their Lemma A.10,

which is satisfied by (3), and their Assumption A is satisfied by our Assumption SF. Therefore, Ahn and

Horenstein (2013)’s result applies here.

46

Spatial Dynamic Panel Data Models with Interactive Fixed ... · Spatial Dynamic Panel Data Models with Interactive Fixed EffectsI Wei Shi Department of Economics, The Ohio State University,

Documents