Estimation of spatial autoregressive panel data models with –xed …econ.ucsb.edu/~doug/245a/Papers/Spatial Autoregressive... · 2010-01-06 · when T is small (but n is large).2

Estimation of spatial autoregressive panel data models with �xede¤ects

Lung-fei Lee�

Department of EconomicsOhio State Universityl�[email protected]

Jihai YuDepartment of EconomicsUniversity of [email protected]

March 4, 2008

Abstract

This paper establishes asymptotic properties of quasi-maximum likelihood estimators for �xed e¤ectsSAR panel data models with SAR disturbances where the time periods T and/or the number of spatialunits n can be �nite or large in all combinations except that both T and n are �nite. A direct approachis to estimate all the parameters including �xed e¤ects. We propose alternative estimation methodsbased on transformation. For the model with only individual e¤ects, the transformation approach yieldsconsistent estimators for all the parameters when either n or T are large, while the direct approach doesnot yield a consistent estimator of the variance of disturbances unless T is large, although the estimatorsfor other parameters are the same as those of the transformation approach. For the model with bothindividual and time e¤ects, the transformation approach yields consistent estimators of all the parameterswhen either n or T are large. When we estimate both individual and time e¤ects directly, consistency ofthe variance parameter requires both n and T to be large and consistency of other parameters requiresn to be large.

JEL classi�cation: C13; C23; R15Keywords: Spatial autoregression, Panel data, Fixed e¤ects, Time e¤ects, Quasi-maximum likelihood

estimation, Conditional likelihood

�Lee acknowledges �nancial support for his research from NSF under Grant No. SES-0519204

1 Introduction

Spatial econometrics deals with the spatial interactions of economic units in cross-section and/or panel

data. To capture correlation among cross-sectional units, the spatial autoregressive (SAR) model by Cli¤ and

Ord (1973) has received the most attention in economics. It extends autocorrelation in times series to spatial

dimensions and captures interactions or competition among spatial units. Early development in estimation

and testing is summarized in Anselin (1988), Cressie (1993), Kelejian and Robinson (1993), and Anselin and

Bera (1998), among others. The spatial correlation can be extended to panel data models (Anselin, 1988).

Baltagi et al. (2003) consider the speci�cation test of the spatial correlation in a panel regression with error

component and SAR disturbances. Kapoor et al. (2007) provide a rigorous theoretical analysis of a panel

model with SAR disturbances which incorporate error components. Baltagi et al. (2007) generalize Baltagi et

al. (2003) by allowing for spatial correlations in both individual and error components such that they might

have di¤erent spatial autoregressive parameters, which encompasses the spatial correlation speci�cations in

Baltagi et al. (2003) and Kapoor et al. (2007). Instead of random e¤ect error components, an alternative

speci�cation for panel data model assumes �xed e¤ects. The �xed e¤ects speci�cation has the advantage of

robustness in that the �xed e¤ects are allowed to correlate with included regressors in the model (Hausman,

1978). Yu et al. (2006, 2007) and Yu and Lee (2007) consider the spatial correlation in a dynamic panel data

setting, where the data generating processes (DGPs) are speci�ed to be, respectively, stationary, partially

nonstationary and nonstationary.

For panel data models with �xed individual e¤ects, when the time dimension T is �xed, we are likely to

encounter the �incidental parameters�problem discussed in Neyman and Scott (1948). This is because the

introduction of �xed e¤ects increases the number of parameters to be estimated. In a linear panel regression

model or a logit panel regression model with �xed individual e¤ects, the �xed e¤ects can be eliminated by

the method of conditional likelihood when e¤ective su¢ cient statistics can be found for each of the �xed

e¤ects. For those panel models, the time average of the dependent variables provides the su¢ cient statistic

(see Hsiao, 1986).

For the linear panel regression model with �xed e¤ects, the direct maximum likelihood (ML) approach

will estimate jointly the common parameters and �xed e¤ects. The corresponding ML estimates (MLEs) of

the regression coe¢ cients are known as the within estimates, which happen to be the conditional likelihood

estimates conditional on the time means1 . For the SAR panel data models with individual e¤ects, similar

�ndings of the direct ML approach will be shown in this paper. This direct estimation approach will yield

consistent estimates for the spatial and regression coe¢ cients except for the variance of the disturbances

1However, e¤ective su¢ cient statistics might not be available for many other models. The well-known example is the probitpanel regression model, where the time average of the dependent variables does not provide the su¢ cient statistic even thoughprobit and logit models are close substitutes (see Chamberlain, 1982).

1

when T is small (but n is large).2 However, for SAR panel models with time e¤ects, the direct estimation

approach will be shown to be inconsistent for all parameters when n is small (but T is large). The inconsistent

estimates are consequences of the incidental parameters (Neyman and Scott, 1948).

In this paper, in order to avoid the incidental parameters problem, we suggest alternative estimation

methods. By using the data transformation (IT � 1T lT l

0T ) to eliminate the individual e¤ects, the transformed

disturbances are uncorrelated although not i:i:d: in general. The transformed equation can be estimated by

the quasi-maximum likelihood (QML) approach. For the more general model with both individual and time

�xed e¤ects, one may combine the transformation (In � 1n lnl

0n) with the transformation (IT � 1

T lT l0T ) to

eliminate both the individual and time �xed e¤ects. By exploring the generalized inverse of the transformed

equation, one may end up with a QML approach for the transformed model3 .

Panel regression models with SAR disturbances have been recently considered in the literature. The

model considered in Baltagi et al. (2003) is Ynt = Xnt�0 + cn0 + Unt; Unt = �0WnUnt + Vnt; t = 1; 2; :::; T ,

where elements of Vnt are i:i:d: (0; �20), cn0 is an n � 1 vector of individual error components and the

spatial correlation is in Unt. A di¤erent speci�cation has been considered in Kapoor et al. (2007) with

Ynt = Xnt�0 + U+nt and U

+nt = �0WnU

+nt + dn0 + Vnt; t = 1; 2; :::; T , where dn0 is the vector of individual

error components. Kapoor et al. (2007) propose method of moment (MOM) procedure for the estimation of

�0 and the variance parameters of dn0 and Vnt. The two panel models are di¤erent in terms of the variance

matrices of the overall disturbances. The variance matrix in Baltagi et al. (2003) is more complicated

and its inverse is computationally demanding; the variance matrix in Kapoor et al. (2007) has a special

pattern and its inverse can be easier to compute. Baltagi et al. (2007) allow for spatial correlations in both

individual and error components where they might have di¤erent spatial autoregressive parameters. Both

Baltagi et al. (2003) and Baltagi et al. (2007) have emphasized on the test of spatial correlation in their

models. With the �xed e¤ects speci�cation, these panel models can have the same representation. By the

transformation (In � �0Wn), the DGP of Kapoor et al. (2007) becomes Ynt = Xnt�0 + cn0 + Unt where

cn0 = (In � �0Wn)�1dn0 and Unt = U+nt � (In � �0Wn)

�1dn0. The Unt = �0WnUnt + Vnt forms a SAR

process. By regarding (In��0Wn)�1dn0 as a vector of unknown �xed e¤ect parameters, these two equations

are identical to a linear panel regression with �xed e¤ects and SAR disturbances. Hence, to generalize Baltagi

et al. (2003), Baltagi et al. (2007) and Kapoor et al. (2007), where the spatial e¤ects are in the disturbances,

and to generalize the SAR panel model where the spatial e¤ects are in the regression equation, we are going

to consider the estimation of the SAR panel model with both spatial lag and spatial disturbances. We allow

that the time periods T and/or the number of spatial units n can be �nite or large in all combinations except

2When a dynamic e¤ect is considered into the SAR panel data, we will have an �initial condition�problem which will causethe inconsistency of the direct likelihood estimates for all the parameters unless T is large ( see Yu et al, 2006, 2007 and Yuand Lee (2007)).

3The use of (IT � 1TlT l

0T ) to eliminate time �xed e¤ects has been considered in Lee and Yu (2007a) for a spatial dynamic

panel model with large T . In a group setting with group �xed e¤ects, a similar transformation can eliminate the group e¤ects(Lee et al., 2008).

2

that both T and n are �nite. In this paper, we pay special attention to the model with individual e¤ects

when n is large but T is small. On the other hand, for the model with time e¤ects, the special interest is on

the model with large T but small n.

This paper is organized as follows. In Section 2, the model with individual �xed e¤ects is introduced

and the data transformation procedure is proposed. We then establish the consistency and asymptotic

distribution of the QML estimator of the transformation approach. The direct ML approach is discussed

in Section 3 where the individual e¤ects are estimated directly. Section 4 generalizes the model to include

both individual and time e¤ects. After the individual e¤ects are eliminated, we can further eliminate the

time e¤ects and the asymptotics are derived. Alternatively, we can estimate the transformed time e¤ects

directly, or estimate both e¤ects directly, both of which are discussed in Section 5. Simulation results are

reported in Section 6 to compare di¤erent approaches. Section 7 concludes the paper. Proofs are collected

in the Appendix.

2 Transformation Approach

The SAR panel model with SAR disturbances where we have individual e¤ects is

Ynt = �0WnYnt +Xnt�0 + cn0 + Unt; Unt = �0MnUnt + Vnt; t = 1; 2; :::; T , (2.1)

where Ynt = (y1t; y2t; :::; ynt)0 and Vnt = (v1t; v2t; :::; vnt)0 are n � 1 column vectors and vit is i:i:d: across i

and t with zero mean and variance �20, Wn is an n � n spatial weights matrix, which is predetermined and

generates the spatial dependence among cross sectional units yit, Xnt is an n� kX matrix of nonstochastic

regressors, and cn0 is an n� 1 column vector of �xed e¤ects.

In panel data models, when T is �nite, we need to take care of the incidental parameters problem. In

dynamic panel data, the �rst di¤erence or Helmert transformation can be made to eliminate the individual

e¤ects (see Anderson and Hsiao (1981) and Arellano and Bover (1995) among others). In this paper, we use

an orthogonal transformation which includes the Helmert transformation as a special case. Our asymptotic

results are obtained where T and/or n can be �nite or large in all combinations except that both T and

n are �nite4 . De�ne Sn(�) = In � �Wn and Rn(�) = In � �Mn for any � and �. At the true parameter,

Sn = Sn(�0) and Rn = Rn(�0). Then, presuming Sn and Rn are invertible, (2.1) can be rewritten as

Ynt = S�1n Xnt�0 + S

�1n cn0 + S

�1n R�1n Vnt. (2.2)

For our analysis of the asymptotic properties of estimators, we make the following assumptions:

Assumption 1. Wn and Mn are nonstochastic spatial weights matrices and their diagonal elements satisfy

wn;ii = 0 and mn;ii = 0 for i = 1; 2; � � � ; n.4We do not have an exact �nite small sample theory for the estimators with both n and T being �nite.

3

Assumption 2. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with zero

mean, variance �20 and E jvitj4+�

<1 for some � > 0.

Assumption 3. Sn(�) and Rn(�) are invertible for all � 2 � and � 2 P. Furthermore, � and P are compact,

�0 is in the interior of � and �0 is in the interior of P.

Assumption 4. The elements of Xnt are nonstochastic and bounded5 , uniformly in n and t. Also, under

the setting in Assumption 6, the limit of 1nT

PTt=1

~X 0nt~Xnt exists and is nonsingular.6

Assumption 5. Wn and Mn are uniformly bounded in row and column sums in absolute value (for short,

UB)7 . Also S�1n (�) and R�1n (�) are UB8 , uniformly in � 2 � and � 2 P.

Assumption 6. (1) n is large, where T can be �nite or large; or, (2) T is large, where n can be �nite or

large.

Assumption 1 is a standard normalization assumption in spatial econometrics. This assumption helps

the interpretation of the spatial e¤ect as self-in�uence shall be excluded in practice. Assumption 2 provides

regularity assumptions for vit and our analysis is based on i.i.d. disturbances. If there are unknown het-

eroskedasticity, the MLE (QMLE) would not be consistent. Consistent methods such as the GMM in Lin

and Lee (2005) and that in Kelejian and Prucha (2007) may be designed for the model. Invertibility of Sn(�)

and Rn(�) in Assumption 3 guarantees that (2.2) is valid. Also, compactness is a condition for theoretical

analysis. In many empirical applications, each of the rows of Wn and Mn sums to 1, which ensures that

all the weights are between 0 and 1. When Wn and Mn are row normalized, it is often to take a compact

subset of (-1,1) as the parameter space. When exogenous variables Xnt are included in the model, it is

convenient to assume that the exogenous regressors are uniformly bounded as in Assumption 4. Assumption

5 is originated by Kelejian and Prucha (1998, 2001) and also used in Lee (2004, 2007). ThatWn,Mn, S�1n (�)

and R�1n (�) are UB is a condition that limits the spatial correlation to a manageable degree. Assumption 6

allows three cases: (i) both n and T are large; (ii) T is �xed and n is large; (iii) n is �xed and T is large.

For (ii), we are interested in the short panel data case in contrast to the case where T needs to be large in

other studies, e.g., Hahn and Kuersteiner (2002) and Yu et al. (2006). When n is large and T is �nite, the

incidental parameter problem may appear so that careful estimation methods need to be designed. However,

our suggested transformation approach for the estimation of (2.1) is general and it may also apply to the

cases (i) and (iii) where T can be large.

5 If Xnt is allowed to be stochastic and unbounded, appropriate moment conditions can be imposed instead.6For notational purposes, we de�ne ~Ynt = Ynt � �YnT and ~Yn;t�1 = Yn;t�1 � �YnT;�1 for t = 1; 2; � � � ; T where �YnT =

1T

PTt=1 Ynt and �YnT;�1 =

1T

PTt=1 Yn;t�1. Similarly, we de�ne ~Xnt = Xnt � �XnT and ~Vnt = Vnt � �VnT .

7We say a (sequence of n � n) matrix Pn is uniformly bounded in row and column sums if supn�1 kPnk1 < 1 andsupn�1 kPnk1 < 1, where kPnk1 = sup1�i�n

Pnj=1 jpij;nj is the row sum norm and kPnk1 = sup1�j�n

Pni=1 jpij;n j is the

column sum norm.8This assumption has e¤ectively ruled out some cases, and, hence, imposed limited dependence across spatial units. For

example, if �0n = 1� 1=n under n!1, it is a near unit root case for a cross sectional spatial autoregressive model and S�1nwill not be UB (see Lee and Yu (2007b)).

4

2.1 Data Transformation and Conditional Likelihood

Let [FT;T�1; 1pTlT ] be the orthonormal matrix of the eigenvectors of JT = (IT � 1

T lT l0T ), where FT;T�1 is

the T � (T � 1) eigenvector matrix9 corresponding to the eigenvalues of one and 1pTlT is the T -dimensional

column vector of ones. For any n � T matrix [Zn1; � � � ; ZnT ] where each Znt, t = 1; � � � ; T , is a T -

dimensional column vector, we de�ne the corresponding transformed n� (T �1) matrix [Z�n1; � � � ; Z�n;T�1] =

[Zn1; � � � ; ZnT ]FT;T�1. Denote X�nt = [X

�nt;1; X

�nt;2; � � � ; X�

nt;kX]. Then, (2.1) implies

Y �nt = �0WnY�nt +X

�nt�0 + U

�nt; U�nt = �0MnU

�nt + V

�nt, t = 1; � � � ; T � 1. (2.3)

Because

0B@ V �n1...V �n;T�1

1CA = (F 0T;T�1 In)

0B@ Vn1...VnT

1CA and vit is i:i:d:, we have

E

0B@ V �n1...V �n;T�1

1CA (V �0n1; � � � ; V �0n;T�1) = �20(F 0T;T�1In)(FT;T�1In) = �20(F 0T;T�1FT;T�1In) = �20In(T�1).Hence, v�it�s are uncorrelated for all i and t (and independent under normality) where v

�it is the ith element

of V �nt.

Denote � = (�0; �; �; �2)0 and � = (�0; �; �)0. At the true value, �0 = (�00; �0; �0; �

20)0 and �0 = (�

00; �0; �0)

0.

The likelihood function of (2.3) as if the disturbances were normally distributed, is

lnLn;T (�) = �n(T � 1)

2ln 2� � n(T � 1)

2ln�2 + (T � 1)[ln jSn(�)j+ ln jRn(�)j]�

1

2�2

XT�1

t=1V �0nt(�)V

�nt(�),

(2.4)

where V �nt(�) = Rn(�)[Sn(�)Y�nt �X�

nt�]. Thus, V�nt = V

�nt(�0). The QMLE �nT is the extremum estimator

derived from the maximization of (2.4). For any n-dimensional column vectors pnt and qnt, asPT�1t=1 p

�0ntq

�nt = (p0n1; � � � ; p0nT )(FT;T�1 In)(F 0T;T�1 In)(q0n1; � � � ; q0nT )0

= (p0n1; � � � ; p0nT )(JT In)(q0n1; � � � ; q0nT )0

=PT

t=1 ~p0nt~qnt

by using (~pn1; � � � ; ~pnT ) = (pn1; � � � ; pnT )JT , (2.4) can be rewritten as

lnLn;T (�) = �n(T � 1)

2ln 2� � n(T � 1)


1

2�2

XT

t=1~V 0nt(�) ~Vnt(�),

(2.5)

where ~Vnt(�) = Rn(�)[Sn(�) ~Ynt � ~Xnt�]. From (2.5), the �rst and second order derivatives of the likelihood

function are (A.1) and (A.2) in Appendix A.1. At true �0, they are (A.3) and (A.4). We note that the

likelihood function in (2.5) has a conditional likelihood interpretation. It is the conditional likelihood condi-

tional on �YnT , which is a su¢ cient statistic for cn0 under normality. This is so as follows. (2.1) implies that

9A special selection of FT;T�1 gives rise to the Helmert transformation where Vnt is transformed to (T�t

T�t+1 )1=2[Vnt �

1T�t (Vn;t+1 + � � �+ VnT )], which is of particular interest for dynamic panel data models.

5

�YnT = �0Wn�YnT + �XnT�0 + cn0 + �UnT with �UnT = �0Mn

�UnT + �VnT and ~Ynt = �0Wn~Ynt + ~Xnt�0 + ~Unt

with ~Unt = �0Mn~Unt + ~Vnt. As ~Vnt, t = 1; � � � ; T , are independent of �VnT under normality, the likelihood in

(2.5) corresponds to the density function of ~Ynt, t = 1; � � � ; T .

2.2 Asymptotic Properties

For the likelihood function (2.5) divided by the e¤ective sample size n(T �1), the corresponding expected

value function is Qn;T (�) = Emaxcn1

n(T�1) lnLn;T (�; cn), which is

Qn;T (�) =1

n(T � 1)E lnLn;T (�) (2.6)

= �12ln 2� � 1

2ln�2 +

1

n[ln jSn(�)j+ ln jRn(�)j]�

1

2�21

n(T � 1)EXT

t=1~V 0nt(�) ~Vnt(�).

To show the consistency of �nT , we need the following uniform convergence result.

Claim 1 Let � be any compact parameter space of �. Under Assumptions 1-6, 1n(T�1) lnLn;T (�)�Qn;T (�)

p!

0 uniformly in � 2 � and Qn;T (�) is uniformly equicontinuous for � 2 �.

Proof. See Appendix A.2.

For local identi�cation, a su¢ cient condition (but not necessary) is that the information matrix ��0;nT ,

where ��0;nT = �E�

1n(T�1)

@2 lnLn;T (�0)@�@�0

�, is nonsingular and �E

�1

n(T�1)@2 lnLn;T (�)

@�@�0

�has full rank for any

� in some neighborhood N(�0) of �0 (see Rothenberg (1971)). The ��0;nT is derived in (A.4) of Appendix

A.1 and its nonsingularity is analyzed in Appendix A.3. While the conditions for the nonsingularity of the

information matrix provide local identi�cation, the conditions in the following assumption are global ones.

Denote

HnT (�) =1

n(T � 1)XT

t=1( ~Xnt; Gn ~Xnt�0)

0R0n(�)Rn(�)(~Xnt; Gn ~Xnt�0),

�2n(�) =�20ntr[(Rn(�)R

�1n )0(Rn(�)R

�1n )],

�2n(�; �) =�20ntr[(Rn(�)Sn(�)S

�1n R�1n )0(Rn(�)Sn(�)S

�1n R�1n )].

Assumption 7. Either (a) the limit of HnT (�) is nonsingular for each possible � in P and the limit of�1n ln

��20R�10n R�1n�� 1

n ln��2n(�)R�1n (�)0R�1n (�)

�� is not zero10 for � 6= �0; or (b) the limit of�1

nln��20R�10n S�10n S�1n R�1n

�� 1

nln��2n(�; �)R�1n (�)0S�1n (�)0S�1n (�)R�1n (�)

��is not zero for (�; �) 6= (�0; �0).11

10When n is �nite and T is large, this inequality becomes 1nln j�20R

�10n R�1n j � 1

nln j�2n(�)R�1n (�)0R�1n (�)j 6= 0.

11The inequality will be 1nln j�20R

�10n S�10n S�1n R�1n j� 1

nln j�2n(�; �)R�1n (�)0S�1n (�)0S�1n (�)R�1n (�)j 6= 0 when n is �nite and T

is large. When Mn =Wn and �0 6= �0, this condition would not be satis�ed as (�0; �0) and (�0; �0) could not be distinguishedfrom each other. Identi�cation will rely on either Assumption 7 (a) or extra information on the order of magnitudes of �0 and�0.

6

This assumption states the identi�cation conditions of the model which generalize those for a cross section

SAR model in Lee and Liu (2006) to the panel case. The part (a) of Assumption 7 represents the possible

identi�cation of �0 and �0 through the deterministic part of the reduced form equation of (2.3) and the

identi�cation of �0 and �20 from the SAR process of U�nt in (2:3). The part (b) of Assumption 7 provides

identi�cation through the SAR process of the reduced form of disturbances of Y �nt. The global identi�cation

and consistency are shown in the following theorem.

Theorem 1 Under Assumptions 1-7, �0 is globally identi�ed and, for the extremum estimator �nT derived

from (2.5), �nTp! �0.


The asymptotic distribution of the QMLE �nT can be derived from the Taylor expansion of @ lnLn;T (�nT )@�

around �0. At �0, the �rst order derivative of the likelihood function involves both linear and quadratic

functions of ~Vnt and is derived in (A.3). The variance matrix of 1pn(T�1)

@ lnLn;T (�0)@� is equal to

E

�1

n(T � 1)@ lnLn;T (�0)

@�� @ lnLn;T (�0)

@�0

�= ��0;nT +�0;n,

and �0;n =�4�3�40�40

0BBB@0kX�kX � � �01�kX

1n

Pni=1

�G2n;ii � �01�kX

1n

Pni=1

�Gn;iiHn;ii1n

Pni=1H

2n;ii �

01�kX1

2�20ntr �Gn

12�20n

trHn14�40

1CCCA is a symmetric matrix with �4

being the fourth moment of vit, where Gn;ii is the (i; i) entry of Gn, Hn;ii is the (i; i) entry of Hn, �Gn is a

matrix transformed from Gn as de�ned in Appendix A.1 after (A.4). When Vnt are normally distributed,

�0;n = 0 because �4 � 3�40 = 0 for a normal distribution. Denote ��0 as the limit of ��0;nT and �0 as

the limit of �0;n, then, the limiting variance matrix of1p

n(T�1)@ lnLn;T (�0)

@� is equal to ��0 + �0 . The

asymptotic distribution of 1pn(T�1)

@ lnLn;T (�0)@� can be derived from the central limit theorem for martingale

di¤erence arrays12 . Denote Cn = Gn � trGn

n In and Dn = Hn � trHn

n In.

Assumption 8. The limit of 1n2

�tr(CsnC

sn)tr(D

snD

sn)� tr2(CsnDs

n)�is strictly positive.13

Assumption 8 is a condition for the nonsingularity of the limiting information matrix ��0 (see Appendix

A.3). When the limit of HnT is singular, as long as the limit of 1n2

�tr(CsnC

sn)tr(D

snD

sn)� tr2(CsnDs

n)�is

strictly positive, the limiting information matrix ��0 is still nonsingular. Also, its rank does not change in

a small neighborhood of �0.14 .

Claim 2 Under Assumptions 1-6 and 7(a); or 1-6, 7(b) and 8, 1pn(T�1)

@ lnLn;T (�0)@�

d! N(0;��0 + �0).

When fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, 1pn(T�1)

@ lnLn;T (�0)@�

d! N(0;��0).

12When T is �nite, we can use the central limit theorem in Kelejian and Prucha (2001). When T is large, we can use thecentral limit theorem in Yu et al. (2006).13When n is �nite and T is large, Assumption 8 is � 1

n2

�tr(CsnC

sn)tr(D

snD

sn)� tr2(CsnDs

n)�> 0�.

14See (C.10) in Yu et al. (2006) for the case T is large. When T is �nite, it still holds according to Lee (2004).

7


Also, under Assumptions 1-7, we have 1n(T�1)

@2 lnLn;T (�)@�@�0 � 1

n(T�1)@2 lnLn;T (�0)

@�@�0 = k� � �0k � Op(1) and1

n(T�1)@2 lnLn;T (�0)

@�@�0 � @2Qn;T (�0)@�@�0 = Op

�1p

n(T�1)

�.15 Combined with Claim 2, we have the following theorem

for the distribution of �nT .

Theorem 2 Under Assumptions 1-6 and 7(a); or 1-6, 7(b) and 8, for the extremum estimator �nT derived

from (2.5), pn(T � 1)(�nT � �0)

d! N(0;��1�0 (��0 +�0)��1�0), (2.7)

Additionally, if fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, (2.7) becomespn(T � 1)(�nT � �0)

d!

N(0;��1�0 ).


Hence, after the data transformation to eliminate the individual e¤ects, the QMLE is consistent and

asymptotically normal when either n or T are large.

3 The Direct Approach

For the estimation of the linear panel regression model with �xed individual e¤ects, the ML approach

which estimates the �xed e¤ects directly provides consistent estimates of the regression coe¢ cients, which

are known as the within estimates. For the spatial panel model with �xed individual e¤ects, one may wonder

whether or not the ML approach will yield consistent estimates when T is small. As we will see below, this

direct approach will yield the same consistent estimator of the transformation approach for �0 = (�00; �0; �0)

0;

however, the estimator of �20 is inconsistent unless T is large.

3.1 The Likelihood Function

The likelihood function for the model before transformation (2.1) is

lnLdn;T (�; cn) = �nT

2ln 2� � nT

2ln�2 + T [ln jSn(�)j+ ln jRn(�)j]�

1

2�2

XT

t=1V 0nt(�)Vnt(�), (3.1)

where Vnt(�) = Rn(�)[Sn(�)Ynt�Xnt��cn]. We can estimate cn directly and have the asymptotic analysis

on the estimator of �0 via the concentrated likelihood function.

Using the �rst order condition that@ lnLdn;T (�;cn)

@cn= 1

�2R0n(�)

PTt=1 Vnt(�), we have cnT (�) =

1T

PTt=1(Sn(�)Ynt�

Xnt�) and the concentrated likelihood is

lnLdn;T (�) = �nT

2ln 2� � nT


1

2�2

XT

t=1~V 0nt(�) ~Vnt(�), (3.2)

15See (C.7) and (C.8) in Yu et al. (2006) for the case T is large. When T is �nite, it still holds according to Lee (2004).

8

with ~Vnt(�) being the same one in (2.5). One may compare the concentrated likelihood function in (3.2) with

the likelihood function from the transformation approach in (2.5). We see that the di¤erence is on the use

of T in (3.2) but (T � 1) in (2.5). For large T , the two functions can be very close to each other. Therefore,

we may expect that the estimates of �0 from these two approaches could be asymptotically equivalent when

T is large. The interesting comparison is for the case where T is �nite.


For (3.2), we can further concentrate out � and �2 and focus on (�; �). Denote

�d

nT (�; �) =hPT

t=1~X 0ntR

0n(�)Rn(�) ~Xnt

i�1 hPTt=1

~X 0ntR

0n(�)Rn(�)Sn(�) ~Ynt

i,

�d2nT (�; �) =1

nT

PTt=1

hSn(�) ~Ynt � ~Xnt�

d

nT (�; �)i0R0n(�)Rn(�)

hSn(�) ~Ynt � ~Xnt�

d

nT (�; �)i.

The concentrated log likelihood function of (�; �) is

lnLdn;T (�; �) = �nT

2(ln(2�) + 1)� nT

2ln �d2nT (�; �) + T [ln jSn(�)j+ ln jRn(�)j]. (3.3)

We can compare it with the concentrated likelihood function from (2.5) where the corresponding estimates

are

�nT (�; �) =hPT

t=1~X 0ntR

0n(�)Rn(�) ~Xnt

i�1 hPTt=1

~X 0ntR

0n(�)Rn(�)Sn(�) ~Ynt

i,

�2nT (�; �) =1

n(T � 1)PT

t=1

hSn(�) ~Ynt � ~Xnt�nT (�; �)

i0R0n(�)Rn(�)

hSn(�) ~Ynt � ~Xnt�nT (�; �)

i,

and the �2nT (�; �) for the transformed approach is consistent even when T is small (n goes to in�nity). The

concentrated log likelihood function of (�; �) from (2.5) is

lnLn;T (�; �) = �n(T � 1)

2(ln(2�) + 1)� n(T � 1)

2ln �2nT (�; �) + (T � 1)[ln jSn(�)j+ ln jRn(�)j]. (3.4)

Note that �nT (�; �) is the same as �d

nT (�; �), and �d2nT (�; �) =

T�1T �2nT (�; �). Equation (3.3) can be rewritten

as lnLdn;T (�; �) = �nT2 (ln(2�) + ln

T�1T + 1) � nT

2 ln �2nT (�; �) + T [ln jSn(�)j + ln jRn(�)j]. By comparing

(3.3) and (3.4), we can see that they will yield the same maximizer (�nT ; �nT ). As �d

nT (�; �) has the same

expression as �nT (�; �), we can conclude that the QMLE of �0 = (�00; �0; �0)

0 from this direct approach will

yield the same consistent estimate as the transformation approach. However, the estimation of �20 from the

direct approach will not be consistent unless T is large, which can be seen from �d2nT (�; �) and �2nT (�; �).

16

Hence, the ML estimation of the spatial panel model with �xed individual e¤ects shares some common

features on their estimates with those of the ML estimation of the linear panel regression model with �xed

e¤ects.17

16Note that, for the linear panel regression model with �xed e¤ects, while the within estimates of the regression coe¢ cientsare consistent, the corresponding MLE of �20 is not, which is the consequence of the incidential parameters problem (Neymanand Scott 1948).17As the bias of the direct estimate of �20 is due to the degree of freedom (T � 1) instead T , one may easily correct the biased

estimate to a bias corrected estimate. The bias corrected estimator will become the conditional likelihood estimator in thismodel.

9

4 A General Model With Time E¤ects: Transformation Approach

Both Baltagi et al. (2003) and Kapoor et al. (2007) focus on models with only individual e¤ects. While

in the panel data literature, there are also two way error component regression models where we have not

only unobservable individual e¤ects but also unobservable time e¤ects (See Wallace and Hussain (1969),

Amemiya (1971), Nerlove (1971) and Hahn and Moon (2006), etc). Hence, it is natural to generalize the

model to include both individual e¤ects and time e¤ects. This would be useful for empirical applications

where the time dummy e¤ects might be important and should be taken into account, for example, in growth

theory and regional economics (see Ertur and Koch (2007) and Foote (2007) for recent empirical applications

of panel data models with both time dummy e¤ects and spatial e¤ects). Hence, we generalize (2.1) to

Ynt = �0WnYnt +Xnt�0 + cn0 + �tln + Unt; Unt = �0MnUnt + Vnt, t = 1; 2; :::; T , (4.1)

where �t is the �xed time e¤ects. For (4.1), we may �rst eliminate the individual e¤ects by FT;T�1 similar

to (2.3), which yields

Y �nt = �0WnY�nt +X

�nt�0 + �

�t ln + U

�nt, U

�nt = �0MnU

�nt + V

�nt, t = 1; 2; :::; T � 1, (4.2)

where [��1ln; ��2ln; � � � ; ��T�1ln] = [�1ln; �2ln; � � � ; �T ln]FT;T�1 can be considered as the transformed time

e¤ects. We can make a further transformation to (4.2) to eliminate the transformed time e¤ects. For this

further transformation approach, it is investigated in this section. Alternatively, we can estimate the ��t

directly. Section 5 covers the direct approach where we will estimate the transformed time e¤ects directly.

Furthermore, we might be interested to investigate the estimators when we estimate both time e¤ects and

individual e¤ects directly. This is also discussed in Section 5.

4.1 Data Transformation and the Likelihood Function

To eliminate the time dummy e¤ects, we needWn andMn to be row normalized for analytical purpose18 .

Also, Assumption 4 is changed accordingly. Let Jn = In � 1n lnl

0n be the deviation from the group mean

transformation over spatial units.

Assumption 1�. Wn and Mn are row normalized nonstochastic spatial weights matrices.

Assumption 4�. The elements of Xnt are nonstochastic and bounded, uniformly in n and t. Also, under

the setting in Assumption 6, the limit of 1nT

PTt=1

~X 0ntJn ~Xnt exists and is nonsingular.

Let (Fn;n�1, ln=pn) be the orthonormal matrix of eigenvectors of Jn where Fn;n�1 corresponds to the

eigenvalues of ones and ln=pn corresponds to the eigenvalue zero. Similar to Lee and Yu (2007a), we can

transform the n-dimensional vector Y �nt to an (n � 1)-dimensional vector Y ��nt such that Y ��nt = F 0n;n�1Y�nt.

18When Wn and Mn are not row normalized, we can still eliminate the transformed time e¤ects; however, we will not havethe presentation of (4.3).

10

Hence, (4.2) will be transformed into

Y ��nt = �0(F0n;n�1WnFn;n�1)Y

��nt +X

��nt�0 + U

��nt ; U

��nt = �0(F

0n;n�1MnFn;n�1)U

��nt + V

��nt , (4.3)

where X��nt;k = F 0n;n�1X

�nt;k and V

��nt = F 0n;n�1V

�nt. After the transformations, the e¤ective sample size is

now (n � 1)(T � 1). Because

0B@ V ��n1...V ��n;T�1

1CA = (IT�1 F 0n;n�1)

0B@ V �n1...V �n;T�1

1CA = (IT�1 F 0n;n�1)(F 0T;T�1

In)

0B@ Vn1...VnT

1CA = (F 0T;T�1 F 0n;n�1)

0B@ Vn1...VnT

1CA, we have E0B@ V ��n1...V ��n;T�1

1CA (V ��0n1 ; � � � ; V ��n;T�1) = �20(F 0T;T�1

F 0n;n�1)(FT;T�1 Fn;n�1) = �20(IT�1 In�1) = �20I(n�1)(T�1). Hence, v��it �s are uncorrelated for all i and t

(and independent under normality) where v��it is the ith element of V��nt .

The likelihood function for (4.3) is

lnLn;T (�) = � (n� 1)(T � 1)2

ln 2� � (n� 1)(T � 1)2

ln�2 + (T � 1) ln��In�1 � �F 0n;n�1WnFn;n�1

��+(T � 1) ln

��In�1 � �F 0n;n�1MnFn;n�1�� 1

2�2

T�1Xt=1

V ��0nt (�)V��nt (�), (4.4)

where V ��nt (�) = R�n(�)[(In�1 � �F 0n;n�1WnFn;n�1)Y

��nt �X��

nt�], R�n(�) = In�1 � �F 0n;n�1MnFn;n�1 and the

determinant and inverse of (In�1 � �F 0n;n�1WnFn;n�1) are��In�1 � �F 0n;n�1WnFn;n�1�� = 1

1� � jIn � �Wnj , (In�1��F 0n;n�1WnFn;n�1)�1 = F 0n;n�1(In��Wn)

�1Fn;n�1,

and similarly for (In�1��F 0n;n�1MnFn;n�1) (see Lee and Yu (2007a)). For any n-dimensional column vector

pnt and qnt, as Jn(pn1; � � � ; pnT )JT = Jn(~pn1; � � � ; ~pnT ),

PT�1t=1 p

��0nt q

��nt = (p0n1; � � � ; p0nT )(FT;T�1 Fn;n�1)(F 0T;T�1 F 0n;n�1)(q0n1; � � � ; q0nT )0

= (p0n1; � � � ; p0nT )(JT Jn)(q0n1; � � � ; q0nT )0

=PT

t=1 ~p0ntJn~qnt.

This implies that the likelihood function (4.4) is numerically identical to

lnLn;T (�) = � (n� 1)(T � 1)2

ln 2� � (n� 1)(T � 1)2

ln�2 � (T � 1) ln(1� �)� (T � 1) ln(1� �)

+(T � 1) ln jSn(�)j+ (T � 1) ln jRn(�)j �1

2�2

TXt=1

~V 0nt(�)Jn ~Vnt(�), (4.5)

where ~Vnt(�) = Rn(�)[(In � �Wn) ~Ynt � ~Xnt�].19

19We note that this likelihood function is, in general, not necessarily a conditional likelihood as the sample average overspatial units at each t might not be a su¢ cient statistic for the time dummy.

11


The �rst and second order derivatives of (4.5) are (C.1) and (C.2) in Appendix C.1. From (C.1) and

(C.2), the score is in (C.3) and the information matrix ��0;nT = �E�

1(n�1)(T�1)

@2 lnLn;T (�0)@�@�0

�is in (C.4).

The following Assumptions provide conditions for global identi�cation. Denote

HnT (�) =1

(n� 1)(T � 1)XT

t=1( ~Xnt; Gn ~Xnt�0)

0R0n(�)JnRn(�)(~Xnt; Gn ~Xnt�0),

�2n(�) =�20n� 1 tr[(Rn(�)R

�1n )0Jn(Rn(�)R

�1n )],

�2n(�; �) =�20n� 1 tr[(Rn(�)Sn(�)S

�1n R�1n )0Jn(Rn(�)Sn(�)S

�1n R�1n )].

Assumption 7�. Either (a) the limit of HnT (�) is nonsingular for each possible � in P and the limit

of�

1n�1 ln

��20R�10n JnR�1n

�� 1n�1 ln

��2n(�)R�1n (�)0JnR�1n (�)

�� is not zero20 for � 6= �0; or (b) the limit of�1

n�1 ln��20R�10n S�10n JnS

�1n R�1n

�� 1n�1 ln

��2n(�; �)R�1n (�)0S�1n (�)0JnS�1n (�)R�1n (�)

�� is not zero for (�; �) 6=(�0; �0).

21

Assumption 8�. The limit of 1(n�1)2

�tr(CsnC

sn)tr(D

snD

sn)� tr2(CsnDs

n)�is strictly positive, where Cn =

JnGn � trJnGn

n�1 In and Dn = JnHn � trJnHn

n�1 In.22

The variance matrix of 1p(n�1)(T�1)

@ lnLn;T (�0)@� is equal to

E

�1

(n� 1)(T � 1)@ lnLn;T (�0)

@�� @ lnLn;T (�0)

@�0

�= ��0;nT +�0;n,

where �0;n =�4�3�40�40

0BBBBBB@

0kX�kX � � �01�kX

1n�1

nPi=1

[(Jn �Gn)ii]2 � �

01�kX1

n�1

nPi=1

[(Jn �Gn)ii(JnHn)ii]1

n�1

nPi=1

[(JnHn)ii]2 �

01�kX1

2�20(n�1)tr(Jn �Gn)

12�20(n�1)

tr(JnHn)14�40

1CCCCCCA. The asymp-totics of the transformation approach with both time and individual e¤ects eliminated can be obtained

similarly as Theorem 2.

Theorem 3 Under Assumptions 1�,2,3,4�,5,6 and 7�(a); or 1�,2,3,4�,5,6,7�(b) and 8�, for the extremum

estimator �nT derived from (4.5),

p(n� 1)(T � 1)(�nT � �0)

d! N(0;��1�0 (��0 +�0)��1�0), (4.6)

20When n is �nite and T is large, this inequality becomes 1n�1 ln j�

20R

�10n JnR

�1n j � 1

nln j�2n(�)R�1n (�)0JnR

�1n (�)j 6= 0.

21The inequality will be 1n�1 ln j�

20R

�10n S�10n JnS

�1n R�1n j � 1

n�1 ln j�2n(�; �)R

�1n (�)0JnS

�1n (�)0S�1n (�)R�1n (�)j 6= 0 when n is

�nite and T is large. When Mn = Wn and �0 6= �0, this condition would not be satis�ed as (�0; �0) and (�0; �0) couldnot be distinguished from each other. Identi�cation will rely on either Assumption 7�(a) or extra information on the order ofmagnitudes of �0 and �0.22When n is �nite and T is large, Assumption 8�is � 1

(n�1)2 [tr(CsnC

sn)tr(D

snD

sn)� tr2(CsnDs

n)] > 0�.

12

Additionally, if fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, (4.6) becomesp(n� 1)(T � 1)(�nT � �0)

d! N(0;��1�0 ):

Proof. See Appendix C.2.

Hence, after the data transformation to eliminate both the individual e¤ects and time e¤ects, the QMLE

is consistent and asymptotically normal when either n or T are large.

5 A General Model With Time E¤ects: Direct Approaches

5.1 Direct Approach I: Estimation of Transformed Time E¤ects

Given (4.2) where the individual e¤ects are eliminated and time e¤ects are still present, when n ! 1

and T might be �nite or large, we can estimate the transformed time e¤ects consistently. Denote ��T =

(��1; ��2; � � � ; ��T ), the likelihood function for (4.2) is

lnLdn;T (�;��T ) = �

n(T � 1)2

ln 2��n(T � 1)2

ln�2+(T�1)[ln jSn(�)j+ln jRn(�)j]�1

2�2

T�1Xt=1

V �0nt(�;��T )V

�nt(�;�

�T ),

(5.1)

where V �nt(�;��T ) = Rn(�)[Sn(�)Y

�nt�X�

nt��t ln]. By using the �rst order condition, given �, the estimate

of ��t is ��t (�) = (l

0nR

0n(�)Rn(�)ln)

�1l0nR0n(�)Rn(�)(Sn(�)Y

�nt�X�

nt�). Using Rn(�)ln =11�� ln, the likelihood

function with ��T concentrated out is

lnLdn;T (�) = �n(T � 1)

2ln 2� � n(T � 1)


1

2�2

T�1Xt=1

V �0nt(�)JnV�nt(�);

(5.2)

where V �nt(�) = Rn(�)[Sn(�)Y�nt �X�

nt�]. For any n-dimensional column vector pnt and qnt, asPT�1t=1 p

�0ntJnq

�nt = (p0n1; � � � ; p0nT )(FT;T�1 In)(IT�1 Jn)(F 0T;T�1 In)(q0n1; � � � ; q0nT )0

= (p0n1; � � � ; p0nT )(JT Jn)(q0n1; � � � ; q0nT )0

=PT

t=1 ~p0ntJn~qnt,

the likelihood function (5.2) is numerically identical to

lnLdn;T (�) = �n(T � 1)

2ln 2� � n(T � 1)


1

2�2

TXt=1

~V 0nt(�)Jn ~Vnt(�).

(5.3)

For the concentrated likelihood function (5.3), the �rst and second order derivatives are in (D.1) and (D.2)

in Appendix D.1.

From Sections 2 and 3, we can see that for the SAR panel data model with only individual e¤ects, both the

transformation approach and the direct approach will yield the same consistent estimator of �0 = (�00; �0; �0)

0.

13

But the direct approach will not yield a consistent estimator of �20 as the transformation approach does,

unless T is large. However, for the SAR panel model with both individual and time e¤ects, this direct

approach will not yield any consistent estimator, unless n is large.

For the SAR panel data with both individual and time e¤ects, one can see the di¤erence of the two

approaches via their log likelihood functions in (4.5) and (5.3). For the direct approach, its concentrated

likelihood (5.3) does not adjust the degree of freedom in spatial units n and also does not adjust the compo-

nents on the determinants of Sn(�) and Rn(�) while the likelihood of the transformed approach in (4.5) does.

These di¤erences would result in the inconsistent estimates of �0 and �0 in addition to that of �20. Because

the estimate of �0 will depend on the estimates of �0 and �0, it would also be inconsistent. To be convincing,

the inconsistency of the QMLE with a �nite (small) n can be revealed by investigating the probability limit

of the normalized gradient vector, 1n(T�1)

@ lnLdn;T (�0)

@� from (D.1) and compare it with 1(n�1)(T�1)

@ lnLnT (�0)@�

from (C.1) of the transformation approach. As the one from (C.1) is zero because the transformation ap-

proach is consistent, the di¤erences are on the derivatives with �, � and �2. For simplicity, let plim and

lim denote that at least one of n and T goes to in�nity. We have plim 1n(T�1)

@Ldn;T (�0)

@� = � 11��0 lim

1n ;

plim 1n(T�1)

@Ldn;T (�0)

@� = � 11��0

lim 1n , and plim

1n(T�1)

@Ldn;T (�0)

@�2 = � 12�20

lim 1n . These three limits are, in gen-

eral, not zero unless n is large. When n is �nite, �0 does not solve the equation plim 1n(T�1)

@ lnLdn;T (�)

@� = 0.

The estimator �ml which maximizes the concentrated log likelihood lnLdn;T (�) would solve the normal equa-

tion 1n(T�1)

@ lnLdn;T (�ml)

@� = 0. From the asymptotic theory of an extremum (or M) estimation theory, �ml

would converge in probability to a �1 which solves the limiting equation plim 1n(T�1)

@ lnLdn;T (�1)

@� = 0 (see,

e.g., Amemiya (1985), Ch.4). But �1 6= �0, so the estimates from the concentrated likelihood function would

not be consistent unless n is large. Compared to Section 3, with time e¤ects included, the direct approach

does not give a consistent estimate of �0 = (�00; �0; �0)

0 when n is �nite (T goes to in�nity).

5.2 Direct Approach II: Estimation of Both Time and Individual E¤ects

We can also estimate both time e¤ects and individual e¤ects directly for (4.1). The likelihood function

of (4.1) is

lnLdn;T (�; cn;�T ) = �nT

2ln 2�� nT

2ln�2+T [ln jSn(�)j+ln jRn(�)j]�

1

2�2

TXt=1

V 0nt(�; cn;�T )Vnt(�; cn;�T ),

(5.4)

where Vnt(�; cn;�T ) = Rn(�)[Sn(�)Ynt �Xnt� � cn � �tln]. Using the �rst order conditions for �t and cn,

the likelihood function with both cn and �T concentrated out is

lnLdn;T (�) = �nT

2ln 2� � nT


1

2�2

TXt=1

~V 0nt(�)Jn~Vnt(�). (5.5)

For (5.5), the �rst and second order derivatives are, respectively, (D.5) and (D.6) in Appendix D.2.

14

The concentrated likelihood estimates of �0 from (5:5) can be derived from the �rst order conditions which

set the �rst order derivatives in (D.5) to zero. These �rst order conditions characterize the concentrated

likelihood estimates of the direct approach. Denote these estimates as �nd; �nd; �nd and �2nd. For the direct

estimation of the transformed time e¤ect in Section 5.1, their estimates, denoted by ~�nd; ~�nd; ~�nd and ~�2nd,

are characterized by the �rst order conditions with (D.1). We see that these two sets of �rst order conditions

are the same except that the parameter �2 in (D.5) is taken place by T�1T �2 in (D.1).23 Thus, it follows that

(~�0nd;~�nd; ~�nd) = (�

0nd; �nd; �nd) and ~�

2nd =

T�1T �2nd. From Section 5:1, the direct estimation of transformed

time e¤ects will yield inconsistent estimators for all the parameters unless n is large. If we are going to

estimate both the time e¤ects and individual e¤ects directly, the consistency of �0 will require that n is large

and the consistency of �20 requires that both n and T are large.24

6 Monte Carlo

We conduct a small Monte Carlo experiment to evaluate the performance of our transformation approach

and the direct ML estimators under di¤erent settings. We �rst check the case that there is individual

e¤ects but no time e¤ects in the DGP (see (2.1)), where we compare the performance of the transformation

approach in Section 2 with the direct approach in Section 3. Then, we check the case that time e¤ects are

also included in the DGP (see (4.1)), where we compare the transformation approach in Section 4 with the

direct approaches in Section 5.

We �rst generate samples from (2.1):

Ynt = �0WnYnt +Xnt�0 + cn0 + Unt; Unt = �0MnUnt + Vnt t = 1; 2; :::; T ,

using �a0 = (1:0; 0:2; 0:5; 1)0 and �b0 = (1; 0:5; 0:2; 1)0 where �0 = (�00; �0; �0; �20)0, and Xnt; cn0 and Vnt are

generated from independent standard normal distributions and both the spatial weights matrices Wn and

Mn we use are the same rook matrices25 . We use T = 5, 10; 50, and n = 9, 16; 49. For each set of generated

sample observations, we calculate the ML estimator �nT and evaluate the bias �nT � �0. We do this for

1000 times to get 11000

P1000i=1 (�nT � �0)i. With two di¤erent values of �0 for each n and T , �nite sample

properties of both estimators are summarized in Table 1. For each case, we report the bias (Bias), empirical

standard deviation (E-SD), root mean square error (RMSE) and theoretical standard deviation (T-SD)26 .

Both approaches have the same estimate of �0 = (�00; �0; �0)

0 while the estimator of �20 by the direct approach

23 Instead of the �rst order conditions, one may also follow the analysis in Section 3 by investigating the two concentratedlikelihood functions of (�, �) by concentrating out � and �2.24For this direct approach, the asymptotic bias will be of the order O(max(1=n; 1=T )) and we can have bias corrected

estimators which have centered normal distributions as long as n=T 3 ! 0 and T=n3 ! 0. See Appendix D:2 for more details.25We use the rook matrix based on an r board (so that n = r2). The rook matrix represents a square tessellation with a

connectivity of four for the inner �elds on the chessboard and two and three for the corner and border �elds, respectively. Mostempirically observed regional structures in spatial econometrics are made up of regions with connectivity close to the range ofthe rook tessellation.26The T-SD is obtained from diagonal elements of the estimated Hessian matrix.

15

has a larger bias. The transformation approach yields a consistent estimator of �20 and the direct approach

does not, which can be seen from the last two columns in Table 1 when T is small. We can see that the

Biases, E-SDs, RMSEs and T-SDs for estimators of the �0 = (�00; �0; �0)

0 are small when either n or T are

large. Also, T-SDs are similar to E-SDs, which implies that the Hessian matrix provides proper estimates

for the variances of estimators. Also, when T is larger, the bias of the estimator of �20 by the direct approach

decreases.

We then generate samples from (4.1):

Ynt = �0WnYnt +Xnt�0 + cn0 + �tln + Unt; Unt = �0MnUnt + Vnt, t = 1; 2; :::; T ,

using the same n, T , �a0 , �b0, Wn and Mn. The Xnt; cn0, �T0 = (�1; �2; � � � ; �T ) and Vnt are generated from

independent standard normal distributions. The �nite sample properties of the estimators are summarized

in Table 2 and Table 3, where Table 2 is for the performance of the estimators using the transformation

approach in Section 4. Table 3 is for the performance of the estimators using both direct approaches discussed

in Section 5.1 and 5.2. We can see that the bias of the transformation approach is small. For the approach

that estimates the transformed time e¤ects directly, the bias is small when n is large, while the bias is

large when n is small even though T might be large. For the direct approach, it has the same estimate of

�0 = (�00; �0; �0)

0 as the approach that estimates the transformed time e¤ects directly, while the bias for the

estimate of �20 is small only when both n and T are large. This is consistent with the theoretical prediction.

Also, when both n and T are large, the biases of all the parameters from three approaches are small and the

RMSEs are reduced.

Table 1-3 here.

7 Conclusion

In this paper, we consider the estimation of a SAR panel model with �xed e¤ects and SAR disturbances

where the time periods T and/or the number of spatial units n can be �nite or large in all combinations

except that both T and n are �nite.

We �rst consider the SAR panel model with individual e¤ects. If T is �nite but n is large, we show that a

direct ML estimation by estimating jointly all the parameters including the �xed e¤ects will yield consistent

estimators except for the variance of disturbances. These features are similar to the direct ML estimation

of the linear panel regression model with �xed individual e¤ects. In this paper, we suggest a transformation

approach, which eliminates the individual �xed e¤ects and can provide consistent estimates for all the

parameters including the variance of disturbances. When the individual e¤ects are eliminated by taking

deviation from time average for each spatial unit, the resulted disturbances will be correlated over the time

16

dimension and there is linear dependence among the resulted disturbances. The transformation approach is

motivated by a ML approach which takes into account the generalized inverse of the resulted disturbances.

The transformation approach is shown to be a conditional likelihood approach if the disturbances were

normally distributed.

We consider next the SAR model with both individual and time �xed e¤ects. We investigate two possible

direct ML approaches for the estimation. The �rst direct approach is to transform the data to eliminate the

individual e¤ects and then estimates the remaining parameters including the time e¤ects by the ML method.

The second direct approach is to estimate both individual and time e¤ects directly. We show that the �rst

direct ML approach will yield inconsistent estimates for all the parameters, unless n is large; and the second

direct approach will yield inconsistent estimates only when both n and T are large. In fact, these two direct

ML approaches provide identical estimates of the spatial e¤ects and the regression coe¢ cients except for

the estimates of �20. These results are in contradiction with those of the direct ML estimation of the panel

regression models with both individual and time e¤ects where the regression coe¢ cients can be consistently

estimated as long as either n or T is large. Consistent estimation based on transformations is available,

where both the individual and time e¤ects can be eliminated by proper transformations. All the parameter

estimates are consistent when either n or T is large. Monte Carlo results are provided to illustrate �nite

sample properties of the various estimators with n and/or T being small or moderately large.

Compared with Baltagi et al. (2003), Baltagi et al. (2007) and Kapoor et al. (2007) where random e¤ects

are assumed, the SAR model in this paper considers a �xed e¤ects speci�cation. The proposed estimation

methods are robust regardless of the di¤erent speci�cations in Baltagi et al. (2003) and Kopoor et al. (2007),

and are computationally simpler than the ML approach for the estimation of the generalized random e¤ects

model in Baltagi et al. (2007). However, when the individual e¤ects are random in the true DGP, proper

methods which take into account the random e¤ects�variance structure can improve the e¢ ciency of the

estimates. Hausman�s type of speci�cation test of �xed e¤ects vs random e¤ects may also be constructed.

These may be investigated in the future research.

17

Appendices

A Transformation Approach

A.1 The First and Second Order Derivatives

For the �rst and second order derivatives of (2.5), we have

@ lnLn;T (�)

@�=

0BBB@1�2

PTt=1(Rn(�)

~Xnt)0 ~Vnt(�)

1�2

PTt=1(Rn(�)Wn

~Ynt)0 ~Vnt(�)� (T � 1)trGn(�)

1�2

PTt=1(Hn(�)

~Vnt(�))0 ~Vnt(�)� (T � 1)trHn(�)

12�4

PTt=1(

~V 0nt(�) ~Vnt(�)� nT�1T �2)

1CCCA , (A.1)

�@2 lnLn;T (�)

@�@�0=

0BBBBBBB@

1�2

PTt=1(Rn(�)

~Xnt)0Rn(�) ~Xnt � � �

1�2

PTt=1(Rn(�)Wn

~Ynt)0Rn(�) ~Xnt

1�2

PTt=1(Rn(�)Wn

~Ynt)0Rn(�)Wn

~Ynt+(T � 1)tr(G2n(�))

� �1�2

PTt=1(Hn(�)

~Vnt(�))0Rn(�) ~Xnt)

+ 1�2

PTt=1

~V 0nt(�)Mn~Xnt

1�2

PTt=1(Rn(�)Wn

~Ynt)0nHn(�) ~Vnt(�)

+ 1�2

PTt=1(MnWn

~Ynt)0 ~Vnt(�)

0 0

1�4

PTt=1

~V 0nt(�)Rn(�)~Xnt

1�4

PTt=1(Rn(�)Wn

~Ynt)0 ~Vnt(�) 0 0

1CCCCCCCA

+

0BBBB@0 0 0 00 0 0 0

0 01�2

PTt=1(Hn(�)

~Vnt(�))0Hn(�) ~Vnt(�)

+(T � 1)tr(H2n(�))

�

0 0 1�4

PTt=1(Hn(�)

~Vnt(�))0 ~Vnt(�) �n(T�1)

2�4 + 1�6

PTt=1(

~V 0nt(�) ~Vnt(�))

1CCCCA . (A.2)

At true �0, we have

1pn(T � 1)

@ lnLn;T (�0)

@�=

0BBBBBB@

1�20

1pn(T�1)

PTt=1

�X 0nt~Vnt

1�20

1pn(T�1)

PTt=1(

�Gn �Xnt�0)0 ~Vnt +

1�20

1pn(T�1)

PTt=1(

~V 0nt �G0n~Vnt � T�1

T �20tr�Gn)

1�20

1pn(T�1)

PTt=1(

~V 0ntH0n~Vnt � T�1

T �20trHn)

12�40

1pn(T�1)

PTt=1(

~V 0nt ~Vnt � nT�1T �20)

1CCCCCCA ,(A.3)

and the information matrix is equal to ��0;nT = �E�

1n(T�1)

@2 lnLn;T (�0)@�@�0

�=

1

�20

0@ HnT � �01�(kX+1) 0 �01�(kX+1) 0 0

1A+0BB@0kX�kX � � �01�kX

1n tr

�Gsn �Gn) � �01�kX

1n tr(H

sn�Gn)

1n tr(H

snHn) �

01�kX1�20ntr( �Gn)

1�20ntr(Hn)

12�40

1CCA ; (A.4)

where we denote Asn = A0n + An for any n � n matrix An, Gn = WnS�1n , �Wn = RnWnR

�1n , �Gn =

�Wn(In��0 �Wn)�1, Hn =MnR

�1n , �Xnt = Rn ~Xnt and HnT =

1n(T�1)

PTt=1(

�Xnt, �Gn �Xnt�0)0( �Xnt, �Gn �Xnt�0).

A.2 Proof of Claim 1

To prove 1n(T�1) lnLn;T (�)�Qn;T (�)

p! 0 uniformly in � in any compact parameter space �:

18

From ~Vnt(�) = Rn(�)[Sn(�) ~Ynt � ~Xnt�], we have

~Vnt(�)� ~Vnt = Rn(�)[Sn(�) ~Ynt � ~Xnt�]�Rn[Sn ~Ynt � ~Xnt�0]

= Rn(�)[Sn(�) ~Ynt � ~Xnt�]�Rn(�)[Sn ~Ynt � ~Xnt�0] +Rn(�)[Sn ~Ynt � ~Xnt�0]�Rn[Sn ~Ynt � ~Xnt�0]

= Rn(�)[(�0 � �)Wn~Ynt + ~Xnt(�0 � �)] + (�0 � �)Mn[Sn ~Ynt � ~Xnt�0]

= Rn(�)[(�0 � �)Wn~Ynt + ~Xnt(�0 � �)] + (�0 � �)Hn ~Vnt.

Similarly to Lee (2004) and Yu et al. (2006), we can show that27

1

n(T � 1)

TXt=1

~V 0nt(�) ~Vnt(�)�1

n(T � 1)ETXt=1

~V 0nt(�) ~Vnt(�)p! 0 uniformly in �.

Hence, by using the fact that �2 is bounded away from zero in �,

1

n(T � 1) lnLn;T (�)�Qn;T (�) = �1

2�2

1

n(T � 1)

TXt=1

~V 0nt(�)~Vnt(�)�

1

n(T � 1)ETXt=1

~V 0nt(�)~Vnt(�)

!p! 0

uniformly in � in �.

To prove Qn;T (�) is uniformly equicontinuous in � in any compact parameter space �:

From (2.6),

QnT (�) = �1

2ln 2� � 1

2ln�2 +

1

n[ln jSn(�)j+ ln jRn(�)j]�

1

2�2n(T � 1)EXT

t=1~V 0nt(�) ~Vnt(�):

The uniform equicontinuity of Qn;T (�) can be shown similarly to Lee (2004) and Yu et al. (2006). �

A.3 Information Matrix

We can prove the nonsingularity of the limiting information matrix by using an argument by contradiction

(similar to Lee (2004)). Denote the limit of ��0;nT as ��0 where ��0;nT is (A.4), we need to prove that

��0c = 0 implies c = 0 where c = (c01; c2; c3; c4)

0, c2; c3; c4 are scalars and c1 is kX � 1 vector. If this is true,

then, columns of ��0 would be linear independent and ��0 would be nonsingular. Denote H� as the limit

of 1n(T�1)

PTt=1

�X 0nt�Xnt, H�� as the limit of 1

n(T�1)PT

t=1�X 0nt�Gn �Xnt�0, H�� = H0

�� and H� as the limit of1

n(T�1)PT

t=1(�Gn ~Xnt�0)

0 �Gn ~Xnt�0, then28

��0 =1

�20

0BBB@H� H�� 0kX�1 0kX�1

H�� H� + limn!1�20n tr(

�Gsn �Gn) limn!1�20n tr(H

sn�Gn) limn!1

1n tr(

�Gn)

01�kX limn!1�20n tr(H

sn�Gn) limn!1

�20n tr(H

snHn) limn!1

1n tr(Hn)

01�kX limn!11n tr(

�Gn) limn!11n tr(Hn)

12�20

1CCCA .Hence, ��0c = 0 implies

(1) H� � c1 +H�� c2 = 0;27When n is large and T is �xed, the derivation is similar to Lee (2004) for the cross sectional SAR model. When T is large

and n could be �nite and large, the derivation is similar to Yu et al. (2006).28When n is �nite and T is large, we do not need the limit before each trace operator in the entries of ��0 .

19

(2) 1�20H�� c1 +

�1�20H� + limn!1

1n tr(

�Gsn�Gn)�� c2 + limn!1

1n tr(H

sn�Gn)� c3

+ limn!11�20ntr( �Gn)� c4 = 0;

(3) limn!11n tr(H

sn�Gn)� c2 + limn!1

1n tr(H

snHn)� c3 + 1

�20limn!1

1n tr(Hn)� c4 = 0,

(4) limn!11n tr(

�Gn)� c2 + limn!11n tr(Hn)� c3 +

12�20

� c4 = 0.

The �rst equation implies c1 = �(H�)�1H�� c2. Denote Cn = Gn � trGn

n In and Dn = Hn � trHn

n In

so that 1n tr(

�Gsn �Gn)� 2�tr �Gn

n

�2= 1

2n tr(CsnC

sn),

1n tr(H

snHn)� 2

�trHn

n

�2= 1

2n tr(DsnD

sn) and

1n tr(H

sn�Gn)�

2 trHn

ntr �Gn

n = 12n tr(C

snD

sn). From the third and fourth equations, we have

1

ntr(CsnD

sn)c2 +

1

ntr(Ds

nDsn)c3 = 0,

4

n2

htr(Hs

nHn)tr �Gn � tr(Hsn�Gn)trHn

ic2 +

1

n�20tr(Ds

nDsn)c4 = 0.

By eliminating c1; c3 and c4, the second equation becomes�limn!1

�1

�20

1

ntr(Ds

nDsn)�H� �H��(H�)

�1H��

�+�n

�� c2 = 0

where

�n =1

4n2�tr(CsnC

sn)tr(D

snD

sn)� tr2(CsnDs

n)�

(A.5)

is nonnegative by the Cauchy inequality. TheH��H��(H�)�1H�� is nonnegative by the Schwartz inequality.

The nonsingularity of ��0 follows from Assumption 7. �

A.4 Proof of Theorem 1

As EPT

t=1~V 0nt ~Vnt = n(T � 1)�20, at �0, (2.6) implies E lnLn;T (�0) = �

n(T�1)2 ln 2�� n(T�1)

2 ln�20+(T �

1)[ln jSnj+ ln jRnj]� n(T�1)2 . As ~Ynt = S�1n ( ~Xnt�0 +R

�1n~Vnt) and Sn(�)S�1n = In + (�0 � �)Gn, we have

~Vnt(�) = Rn(�)[Sn(�)S�1n R�1n ~Vnt + (�0 � �)Gn ~Xnt�0 + ~Xnt(�0 � �)]:

Denote

�2n(�) =�20ntr[(Rn(�)R

�1n )0(Rn(�)R

�1n )],

�2n(�; �) =�20ntr[(Rn(�)Sn(�)S

�1n R�1n )0(Rn(�)Sn(�)S

�1n R�1n )].

It follows that1

n(T�1)E lnLn;T (�)�1

n(T�1)E lnLn;T (�0)

= � 12 (ln�

2�ln�20)+ 1n ln jSn(�)j�

1n ln jSnj+

1n ln jRn(�)j�

1n ln jRnj�

�12�2

1n(T�1)

PTt=1E

~V 0nt(�) ~Vnt(�)� 12

�= T1;n(�; �; �

2)� 12�2T2;n;T (�; �; �)

where

T1;n(�; �; �2) = �1

2(ln�2 � ln�20) +

1

nln jSn(�)j �

1

nln jSnj+

1

nln jRn(�)j �

1

nln jRnj �

1

2�2(�2n(�; �)� �2),

T2;n;T (�; �; �) =1

n(T � 1)XT

t=1

n( ~Xnt(�0 � �) + (�0 � �)Gn ~Xnt�0)0R0n(�)Rn(�)( ~Xnt(�0 � �) + (�0 � �)Gn ~Xnt�0)

o:

20

From the pure SAR panel model with SAR disturbances, using the information inequality, T1;n(�; �; �2) � 0

for any (�; �; �2). Also, T2;n;T (�; �; �) is a quadratic function of � and � given �.

Under the condition that the limit of HnT (�) is nonsingular, T2;n;T (�; �; �) > 0 given any � when-

ever (�; �) 6= (�0; �0). Hence, (�; �) is globally identi�ed. Given �0, �0 and �20 are the unique maxi-

mizer of the limiting function of T1;n(�; �; �2) under the condition that the limit of 1n ln

��20R�10n R�1n��

1n ln

��2n(�)R�1n (�)0R�1n (�)�� is not zero for � 6= �0.29 Hence, (�; �; �; �2) is globally identi�ed.

When the limit of HnT (�) is singular, �0 and �0 cannot be identi�ed from T2;n;T (�; �; �). Global iden-

ti�cation requires that the limit of T1;n(�; �; �2) is strictly less than zero. As T1;n(�; �; �2) � 0 by the

information inequality for the pure SAR model with SAR disturbances, the limit of T1;n(�; �; �2) is not zero

is equivalent to the limit of 1n ln

��20R�10n S�10n S�1n R�1n�� 1

n ln��2n(�; �)R�1n (�)0S�1n (�)0S�1n (�)R�1n (�)

�� is notzero (similar to Lee (2004), Proof of Theorem 4.1). After �0, �0 and �

20 are identi�ed, given �0, �0 can be

identi�ed from T2;n;T (�; �; �).

Combined with uniform convergence and equicontinuity in Claim 1, the consistency follows. �

A.5 Proof of Claim 2

The central limit theorem of martingale di¤erence arrays can be applied. When T is �nite and n is large,

we can use the central limit theorem in Kelejian and Prucha (2001). When T is large and n could be �nite

and large, we can use the central limit theorem in Yu et al. (2006). �

A.6 Proof of Theorem 2

According to the Taylor expansion,pn(T � 1)(�nT��0) =

�� 1n(T�1)

@2 lnLn;T (��nT )@�@�0

��1��

1pn(T�1)

@ lnLn;T (�0)@�

�where 1p

n(T�1)@ lnLn;T (�0)

@�

d! N(0;��0 + �0) and ��nT lies between �0 and �nT . As � 1nT

@2 lnLn;T (��nT )@�@�0 =�

� 1nT

@2 lnLn;T (��nT )@�@�0 �

�� 1nT

@2 lnLn;T (�0)@�@�0

��+�� 1nT

@2 lnLn;T (�0)@�@�0 � ��0;nT

�+ ��0;nT where the �rst term is ��nT � �0 �Op(1) and the second term is Op

�1p

n(T�1)

�respectively30 , � 1

nT@2 lnLn;T (��nT )

@�@�0 = ��nT � �0 �

Op(1) + Op

�1p

n(T�1)

�+ ��0;nT . Because

��nT � �0 = op(1) and ��0;nT is nonsingular in the limit,

� 1n(T�1)

@2 lnLn;T (��nT )@�@�0 is invertible for large n or T and

�� 1n(T�1)

@2 lnLn;T (��nT )@�@�0

��1is Op(1). Then, it fol-

lows that �nT � �0 = Op

�1p

n(T�1)

�. Hence,

pn(T � 1)(�nT � �0) =

��0;nT +Op

�1p

n(T�1)

��1��

1pnT

@ lnLn;T (�0)@�

�. Using the fact that

��0;nT +Op

�1p

n(T�1)

��1= ��1�0;nT +Op

�1p

n(T�1)

�, we havep

n(T � 1)(�nT � �0)d! N(0;��1�0 (��0 +�0)�

�1�0). �

29This is equivalent to the identi�cation of a pure SAR model. See Proof of Theorem 4.1 in Lee (2004).30When n is large and T is �xed, the derivation is similar to Lee (2004) for the cross sectional SAR model. When T is large

and n could be �nite and large, the derivation is similar to Yu et al. (2006).

21

B Direct Approach: The First and Second Order Derivatives

For the concentrated likelihood function (3.2), the �rst and second order derivatives are

1pnT

@ lnLdn;T (�)

@�=

0BBBBB@1�2

1pnT

PTt=1(Rn(�)

~Xnt)0 ~Vnt(�)

1�2

1pnT

PTt=1

�(Rn(�)Wn

~Ynt)0 ~Vnt(�)� �2trGn(�)

�1�2

1pnT

PTt=1

�(Hn(�) ~Vnt(�))

0 ~Vnt(�)� �2trHn(�)�

12�4

1pnT

PTt=1(

~V 0nt(�)~Vnt(�)� n�2)

1CCCCCA , (B.1)

1

nT

@2 lnLdn;T (�)

@�@�0= � 1

nT

0BBBBBBB@

1�2

PTt=1(Rn(�)

~Xnt)0Rn(�) ~Xnt � � �

1�2

PTt=1(Rn(�)Wn

~Ynt)0Rn(�) ~Xnt

1�2

PTt=1(Rn(�)Wn

~Ynt)0Rn(�)Wn

~Ynt+Ttr(G2n(�))

� �1�2

PTt=1(Rn(�)

~Xnt)0Hn(�) ~Vnt(�)

+ 1�2

PTt=1(Mn

~Xnt)0 ~Vnt(�)

1�2

PTt=1(Rn(�)Wn

~Ynt)0nHn(�) ~Vnt(�)

+ 1�2

PTt=1(MnWn

~Ynt)0 ~Vnt(�)

0 0

1�4

PTt=1

~V 0nt(�)Rn(�) ~Xnt1�4

PTt=1(Rn(�)Wn

~Ynt)0 ~Vnt(�) 0 0

1CCCCCCCA

+

0BBBB@0 0 0 00 0 0 0

0 01�2

PTt=1(Hn(�)

~Vnt(�))0Hn(�) ~Vnt(�)

+T tr(H2n(�))

�

0 0 1�4

PTt=1(Hn(�)

~Vnt(�))0 ~Vnt(�) � nT

2�4 +1�6

PTt=1(

~V 0nt(�) ~Vnt(�))

1CCCCA . (B.2)

Hence,

1pnT

@ lnLdn;T (�0)

@�=

0BBBB@1�20

1pnT

PTt=1

�X 0nt~Vnt

1�20

1pnT

PTt=1(

�Gn �Xnt�0)0 ~Vnt +

1�20

1pnT

PTt=1(

~V 0nt �Gn ~Vnt � �20trGn)1�20

1pnT

PTt=1(

~V 0ntH0n~Vnt � �20trHn)

12�40

1pnT

PTt=1(

~V 0nt ~Vnt � n�20)

1CCCCA , (B.3)

and the information matrix is equal to �d�0;nT = �E�

1nT

@2 lnLdn;T (�0)

@�@�0

�where

�d�0;nT =

0BBBBBB@

1�20nT

PTt=1

�X 0nt�Xnt � � �

1�20nT

PTt=1(

�Gn �Xnt�0)0 �Xnt

1�20nT

PTt=1(

�Gn �Xnt�0)0 �Gn �Xnt�0

+T�1T

1n tr(

�G0n �Gn) +1n tr(

�G2n)� �

01�kXT�1T [ 1n tr(Hn

�Gn) +1n tr(H

0n�Gn)]

1n

�T�1T tr(H 0

nHn) + tr(H2n)�

�01�kX

T�1T

1�20ntr( �Gn)

T�1T

1�20ntr(Hn)

T�1T

12�40

1CCCCCCA .(B.4)

22

C Transformation Approach with Time Dummy

C.1 The First and Second Order Derivatives of (4.5)

Using trGn(�)� tr(JnGn(�)) = 11�� and tr(G

2n(�))� tr((JnGn(�))2) = 1

(1��)2 (see Lee and Yu (2007a)),

for the concentrated likelihood function (4.5), the �rst and second order derivatives are

@ lnLn;T (�)

@�=

0BBB@@ lnLn;T (�)

@�@ lnLn;T (�)

@�@ lnLn;T (�)

@�@ lnLn;T (�)

@�2

1CCCA =

0BBB@1�2

PTt=1(Rn(�)

~Xnt)0Jn ~Vnt(�)

1�2

PTt=1(Rn(�)Wn

~Ynt)0Jn ~Vnt(�)� (T � 1)trJnGn(�)

1�2

PTt=1(Hn(�)

~Vnt(�))0Jn ~Vnt(�)� (T � 1)trJnHn(�)

12�4

PTt=1(

~V 0nt(�)Jn ~Vnt(�)� (n� 1)T�1T �2)

1CCCA , (C.1)

@2 lnLn;T (�)

@�@�0=

0BBBBBBB@

1�2

PTt=1(Rn(�)

~Xnt)0JnRn(�) ~Xnt � � �

1�2

PTt=1(Rn(�)Wn

~Ynt)0JnRn(�) ~Xnt

1�2

PTt=1(Rn(�)Wn

~Ynt)0JnRn(�)Wn

~Ynt+(T � 1)tr(JnG2n(�))

� �1�2

PTt=1(Hn(�)

~Vnt(�))0JnRn(�) ~Xnt)

+ 1�2

PTt=1

~V 0nt(�)JnMn~Xnt

1�2

PTt=1(Rn(�)Wn

~Ynt)0nJnHn(�) ~Vnt(�)

+ 1�2

PTt=1(MnWn

~Ynt)0Jn ~Vnt(�)

0 0

1�4

PTt=1

~V 0nt(�)JnRn(�)~Xnt

1�4

PTt=1(Rn(�)Wn

~Ynt)0Jn ~Vnt(�) 0 0

1CCCCCCCA

+

0BBBB@0 0 0 00 0 0 0

0 01�2

PTt=1(Hn(�)

~Vnt(�))0JnHn(�) ~Vnt(�)

+(T � 1)tr(JnH2n(�))

�

0 0 1�4

PTt=1(Hn(�)

~Vnt(�))0Jn ~Vnt(�) � (n�1)(T�1)

2�4 + 1�6

PTt=1(

~V 0nt(�)Jn ~Vnt(�))

1CCCCA . (C.2)

From (C.1), the score vector and the information matrix are

1p(n� 1)(T � 1)

@ lnLn;T (�0)

@�

=

0BBBBBBBBBB@

1

�20

p(n�1)(T�1)

TPt=1( �X 0

ntJn~Vnt)

1

�20

p(n�1)(T�1)

TPt=1

�( �Gn �Xnt�0)

0Jn ~Vnt

�+ 1

�20

p(n�1)(T�1)

TPt=1( ~V 0nt

�GnJn ~Vnt � T�1T �20trJn

�Gn)

1

�20

p(n�1)(T�1)

TPt=1( ~V 0ntHnJn ~Vnt � T�1

T �20trJnHn)

1

2�40

p(n�1)(T�1)

TPt=1( ~V 0ntJn ~Vnt � T�1

T (n� 1)�20)

1CCCCCCCCCCA,(C.3)

��0;nT =1

�20

0@ HnT � �01�kX 0 �01�kX 0 0

1A+0BBB@0kX�kX � � �01�kX

1n�1 tr(

�GsnJn�Gn) � �

01�kX1

n�1 tr(HsnJn �Gn)

1n�1 tr(H

snJnHn) �

01�kX1

�20(n�1)tr(Jn �Gn)

1�20(n�1)

tr(JnHn)12�40

1CCCA , (C.4)

where HnT =1

(n�1)(T�1)

TPt=1( �Xnt; Gn �Xnt�0)

0Jn( �Xnt; Gn �Xnt�0).

C.2 Proof for Theorem 3

This is similar to proof of Theorem 1 and 2.

23

D Direct Approaches with Time Dummy

D.1 The First and Second Order Derivatives of (5.3)

For the concentrated likelihood function (5.3), the �rst and second order derivatives are

@ lnLdn;T (�)

@�=

0BBB@1�2

PTt=1((Rn(�)

~Xnt)0Jn ~Vnt(�)

1�2

PTt=1(JnRn(�)Wn

~Ynt)0 ~Vnt(�)� (T � 1)trGn(�)

1�2

PTt=1(JnHn(�)

~Vnt(�))0 ~Vnt(�)� (T � 1)trHn(�)

12�4

PTt=1(

~V 0nt(�)Jn~Vnt(�)� nT�1T �2)

1CCCA , (D.1)

�@2 lnLdn;T (�)

@�@�0=

0BBBBBBB@

1�2

PTt=1(Rn(�)


1�2

PTt=1(Rn(�)Wn


1�2

PTt=1(Rn(�)Wn

~Ynt)0JnRn(�)Wn

~Ynt+(T � 1)tr(G2n(�))

� �1�2

PTt=1(Rn(�)

~Xnt)0JnHn(�) ~Vnt(�)

+ 1�2

PTt=1(Mn

~Xnt)0Jn ~Vnt(�)

1�2

PTt=1(Rn(�)Wn

~Ynt)0nJnHn(�) ~Vnt(�)

+ 1�2

PTt=1(MnWn

~Ynt)0Jn ~Vnt(�)

0 0

1�4

PTt=1

~V 0nt(�)JnRn(�) ~Xnt1�4

PTt=1(Rn(�)Wn


1CCCCCCCA(D.2)

+

0BBBB@0 0 0 00 0 0 0

0 01�2

PTt=1(Hn(�)


+(T � 1)tr(H2n(�))

�

0 0 1�4

PTt=1(Hn(�)

~Vnt(�))0Jn ~Vnt(�) �n(T�1)

2�4 + 1�6

PTt=1(

~V 0nt(�)Jn~Vnt(�))

1CCCCA .

Hence, for the �rst order derivative evaluated at �0, we have

1pn(T � 1)

@ lnLdn;T (�0)

@�=

0BBBBBBBBBB@

1�20

1pn(T�1)

TPt=1

�X 0ntJn ~Vnt

1�20

1pn(T�1)

TPt=1( �Gn �Xnt�0)

0Jn ~Vnt +1�20

1pn(T�1)

TPt=1( ~V 0nt �G

0nJn ~Vnt � �20 T�1T trGn)

1�20

1pn(T�1)

TPt=1( ~V 0ntH

0nJn ~Vnt � �20 T�1T trHn)

12�40

1pn(T�1)


T n�20)

1CCCCCCCCCCA.

(D.3)

For the information matrix, denote �d�0;nT = �E�

1n(T�1)

@2 lnLdn;T (�0)

@�@�0

�, we have

�d�0;nT =1

�20

0@ HdnT � �

01�kX 0 �01�kX 0 0


1n

htr( �G0nJn �Gn) + tr( �G

2n)i

� �01�kX

1n tr(H

snJn �Gn)

1n

�tr(H 0

nJnHn) + tr(H2n)�

�01�kX

1�20ntr(Jn �Gn)

1�20ntr(JnHn)

12�40

1CCCA ,(D.4)

where HdnT =

1n(T�1)

TPt=1( �Xnt; �Gn �Xnt�0)

0Jn( �Xnt; �Gn �Xnt�0).

24

D.2 The First and Second Order Derivatives of (5.5) and Asymptotic Bias

The �rst and second order derivatives of the concentrated log likelihood in (5.5) are

@ lnLdn;T (�)

@�=

0BBB@1�2

PTt=1(Rn(�)

~Xnt)0Jn ~Vnt(�)

1�2

PTt=1(Rn(�)Wn

~Ynt)0Jn ~Vnt(�)� T trGn(�)

1�2

PTt=1(Hn(�)

~Vnt(�))0Jn ~Vnt(�)� T trHn(�)

12�4

PTt=1(

~V 0nt(�)Jn ~Vnt(�)� n�2)

1CCCA , (D.5)

�@2 lnLdn;T (�)

@�@�0=

0BBBBBBB@

1�2

PTt=1(Rn(�)


1�2

PTt=1(Rn(�)Wn


1�2

PTt=1(Rn(�)Wn

~Ynt)0JnRn(�)Wn

~Ynt+T tr(G2n(�))

� �1�2

PTt=1(Rn(�)

~Xnt)0JnHn(�) ~Vnt(�)

+ 1�2

PTt=1(Mn

~Xnt)0Jn ~Vnt(�)

1�2

PTt=1(Rn(�)Wn

~Ynt)0nJnHn(�)

~Vnt(�)

+ 1�2

PTt=1(MnWn

~Ynt)0Jn ~Vnt(�)

0 0

1�4

PTt=1

~V 0nt(�)JnRn(�) ~Xnt1�4

PTt=1(Rn(�)Wn


1CCCCCCCA

+

0BBBB@0 0 0 00 0 0 0

0 01�2

PTt=1(Hn(�)


+T tr(H2n(�))

�

0 0 1�4

PTt=1(Hn(�)

~Vnt(�))0Jn ~Vnt(�) � nT

2�4 +1�6

PTt=1(

~V 0nt(�)Jn~Vnt(�))

1CCCCA . (D.6)

For the �rst order derivative evaluated at �0, it has three components such that

@ lnLdn;T (�0)

@�=@ lnLd;un;T (�0)

@�� n � a�0;n;1 � (T � 1) � a�0;2 (D.7)

where

@ lnLd;un;T (�0)

@�=

0BBBBBBBBBB@

1�20

TPt=1

�X 0ntJn ~Vnt

1�20

TPt=1( �Gn �Xnt�0)

0Jn ~Vnt +1�20

TPt=1( ~V 0nt �G

0nJn ~Vnt � �20 T�1T tr �G0nJn)

1�20

TPt=1( ~V 0ntH

0nJn ~Vnt � �20 T�1T trH 0

nJn)

12�40


T (n� 1)�20)

1CCCCCCCCCCA,

a�0;n;1 = (01�kX ;1n trGn;

1n trHn;

12�20)0 and a�0;2 = (01�kX ;

11��0 ;

11��0

; 12�20)0. For the information matrix,

denote �d�0;nT = �E�

1nT

@2 lnLdn;T (�0)

@�@�0

�and Hd

nT =1nT

TPt=1( �Xnt; �Gn �Xnt�0)

0Jn( �Xnt; �Gn �Xnt�0), we have

�d�0;nT =1

�20

0@ HdnT � �

01�kX 0 �01�kX 0 0


1n

htr( �G0nJn �Gn) + tr( �G

2n)i

� �01�kX

1n tr(H

snJn

�Gn)1n

�tr(H 0

nJnHn) + tr(H2n)�

�01�kX

1�20ntr(Jn �Gn)

1�20ntr(JnHn)

12�40

1CCCA .

As 1pnT

@ lnLd;un;T (�0)

@� will be normally distributed asymptotically, we can see that the estimators from this

direct approach will have O(1=T ) bias 1T (�

d�0;nT

)�1a�0;n;1 and O(1=n) bias1n (�

d�0;nT

)�1a�0;2. Similar to

25

Lee and Yu (2007a), a bias correction procedure can be designed to eliminate the bias. Denote �nT as the

QMLE that solves (5:5), the bias corrected estimator can be �d1

nT = �d

nT �B1;nT

T � B2;nT

n where B1;nT =h�(�d�;nT )�1 � a�;n;1

i��=�

dnT

and B2;nT =h�(�d�;nT )�1 � a�;2

i��=�

dnT

. Similar to Lee and Yu (2007a), it can

be shown that when n=T 3 ! 0 and T=n3 ! 0, �d1

nT ispnT consistent and asymptotically centered normal.

26

References

Amemiya, T., 1971. The estimation of the variances in a variance-components model. International Eco-

nomic Review 12, 1-13.

Amemiya, T., 1985. Advanced Econometrics. Harvard University Press, Cambridge, MA.

Anderson, T.W. and C. Hsiao, 1981. Estimation of dynamic models with error components. Journal of the

American Statistical Association 76, 598-606.

Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic, The Netherlands.

Anselin, L. and A.K. Bera, 1998. Spatial dependence in linear regression models with an introduction to

spatial econometrics, A. Ullah and D.E.A. Giles (eds.). Handbook of Applied Economics Statistics, Marcel

Dekker, New York.

Arellano, M. and O. Bover, 1995. Another look at the instrumental-variable estimation of error-components

models. Journal of Econometrics 68, 29-51.

Baltagi, B., S.H. Song and W. Kon, 2003. Testing panel data regression models with spatial error correlation.

Journal of Econometrics 117, 123-150.

Baltagi, B., P. Egger and M. Pfa¤ermayr, 2007. A generalized spatial panel data model with random e¤ects.

Working Paper, Syracuse University.

Chamberlain, G., 1982. Multivariate regression models for panel data. Journal of Econometrics 18, 5-46.

Cli¤, A.D. and J.K. Ord, 1973. Spatial Autocorrelation. London: Pion Ltd.

Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York.

Ertur C. and W. Koch, 2007. Growth, technological interdependence and spatial externalities: theory and

evidence. Journal of Applied Econometrics 22, 1033-1062.

Foote, C.L., 2007. Space and time in macroeconomic panel data: young workers and state-level unemploy-

ment revisited. Working Paper No. 07-10, Federal Reserve Bank of Boston.

Hahn, J. and Kuersteiner, 2002. Asymptotically unbiased inference for a dynamic panel model with �xed

e¤ects when both n and T are Large. Econometrica 70, No.4, 1639-1657.

Hahn, J. and H.R. Moon, 2006. Reducing bias of MLE in a dynamic panel model. Econometric Theory 22,

499-512.

Hausman, J.A., 1978. Speci�cation tests in econometrics. Econometrica 46, 1251-1271.

Hsiao, C., 1986. Analysis of Panel Data. Cambridge University Press.

Kapoor, M., Kelejian, H.H. and I.R. Prucha, 2007. Panel data models with spatially correlated error

components. Journal of Econometrics, 140, 97-130.

Kelejian, H.H. and I.R. Prucha, 1998. A generalized spatial two-stage least squares procedure for estimating

a spatial autoregressive model with autoregressive disturbance. Journal of Real Estate Finance and

Economics 17:1, 99-121.

27

Kelejian H.H. and I.R. Prucha, 2001. On the asymptotic distribution of the Moran I test statistic with

applications. Journal of Econometrics, 104, 219-257.

Kelejian, H.H. and D. Robinson, 1993. A suggested method of estimation for spatial interdependent models

with autocorrelated errors, and an application to a county expenditure model. Papers in Regional Science

72, 297-312.

Kelejian H.H. and I.R. Prucha, 2007. Speci�cation and estimation of spatial autoregressive models with

autoregressive and heteroskedastic disturbances. Forthcoming in Journal of Econometrics.

Lee, L.F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric

models. Econometrica 72, 1899-1925.

Lee, L.F., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. Journal of

Econometrics 137, 489-514.

Lee, L.F. and X. Liu, 2006. E¢ cient GMM estimation of a spatial autoregressive model with autoregressive

disturbances. Working Paper, The Ohio State University.

Lee, L.F., X. Liu and X. Lin, 2008. Speci�cation and estimation of social interaction models with network

structure, contexual factors, correlation and �xed e¤ects. Working Paper, The Ohio State University.

Lee, L.F. and J. Yu, 2007a. A spatial dynamic panel data model with both time and individual �xed e¤ects.

Working Paper, The Ohio State University.

Lee, L.F. and J. Yu, 2007b. Near unit root in the spatial autoregressive model. Working Paper, The Ohio

State University.

Lin, X. and L.F. Lee, 2005. GMM estimation of spatial autoregressive models with unknown heteroskedas-

ticity. Working Paper, The Ohio State University. Forthcoming in Journal of Econometrics.

Nerlove, M., 1971. A note on error components models. Econometrica 39, 383-396.

Neyman, J. and E.L. Scott, 1948. Consistent estimates based on partially consistent observations. Econo-

metrica 16, 1-32.

Rothenberg, T.J., 1971. Identi�cation in parametric models. Econometrica 39, No.3, 577-591.

Wallace, T.D. & A. Hussain, 1969. The use of error components models in combining cross-section and

time-series data. Econometrica 37, 55-72.

Yu, J., R. de Jong and L.F. Lee, 2006. Quasi-maximum likelihood estimators for spatial dynamic panel data

with �xed e¤ects when both n and T are large. Working Paper, The Ohio State University.

Yu, J., R. de Jong and L.F. Lee, 2007. Quasi-maximum likelihood estimators for spatial dynamic panel data

with �xed e¤ects when both n and T are large: a nonstationary case. Working Paper, The Ohio State

University.

Yu, J. and L.F. Lee, 2007. Estimation of unit root spatial dynamic panel data models. Working Paper, The

Ohio State University.

28

Table 1: Transformation and Direct Approaches: Model with Individual E¤ects OnlyT n �0 � � � �21 �22

(1) 5 49 �a0 Bias -0.0027 0.0096 -0.0279 -0.0216 -0.2173E-SD 0.0766 0.1377 0.1459 0.1067 0.0854RMSE 0.0766 0.1380 0.1485 0.1089 0.2334T-SD 0.0743 0.1355 0.1371 0.1043 0.0746

(2) 5 49 �b0 Bias -0.0039 -0.0173 0.0021 -0.0027 -0.2182E-SD 0.0736 0.1150 0.1590 0.1044 0.0835RMSE 0.0737 0.1163 0.1590 0.1068 0.2336T-SD 0.0718 0.1134 0.1574 0.1024 0.0733



(5) 50 9 �a0 Bias 0.0003 0.0072 -0.0126 -0.0082 -0.0280E-SD 0.0501 0.0844 0.0810 0.0713 0.0699RMSE 0.0501 0.0847 0.0820 0.0718 0.0753T-SD 0.0499 0.0842 0.0787 0.0704 0.0683




(9) 50 49 �a0 Bias -0.0009 -0.0011 -0.0004 -0.0025 -0.0224E-SD 0.0220 0.0405 0.0401 0.0305 0.0298RMSE 0.0220 0.0405 0.0401 0.0306 0.0373T-SD 0.0214 0.0404 0.0396 0.0303 0.0294


Note: 1. �a0= (1; 0:2; 0:5; 1) and �b0= (1; 0:5; 0:2; 1).

2. The column of �21 is from the transformation approach;and the column of �22 is from the direct approach.

3. The transformarion approach and the direct approach yield the same estimate of�0= (�

00; �0; �0)

0.4. The T-SD is obtained from the transformation approach, except for �22, which is fromthe direct approach.

29

Table 2: Transformation Approach: Model with Both Time and Individual E¤ectsT n �0 � � � �2

(1) 5 49 �a0 Bias -0.0020 0.0121 -0.0300 -0.0223E-SD 0.0764 0.1403 0.1529 0.1078RMSE 0.0764 0.1408 0.1558 0.1100T-SD 0.0751 0.1406 0.1481 0.1045

(2) 5 49 �b0 Bias -0.0042 -0.0167 0.0017 -0.0242E-SD 0.0737 0.1227 0.1658 0.1052RMSE 0.0738 0.1238 0.1658 0.1079T-SD 0.0723 0.1223 0.1654 0.1031


(4) 10 49 �b0 Bias -0.0013 -0.0064 -0.0005 -0.0133E-SD 0.0471 0.0836 0.1126 0.0700RMSE 0.0471 0.0839 0.1126 0.0712T-SD 0.0478 0.0816 0.1122 0.0691

(5) 50 9 �a0 Bias 0.0010 0.0098 -0.0102 -0.0110E-SD 0.0546 0.1038 0.1260 0.0729RMSE 0.0546 0.1042 0.1264 0.0738T-SD 0.0540 0.1021 0.1276 0.0721




(9) 50 49 �a0 Bias -0.0009 -0.0011 -0.0002 -0.0026E-SD 0.0222 0.0422 0.0434 0.0305RMSE 0.0222 0.0423 0.0434 0.0306T-SD 0.0216 0.0417 0.0428 0.0304


Note: �a0= (1; 0:2; 0:5; 1) and �b0= (1; 0:5; 0:2; 1).

30

Table 3: Direct Approaches: Model With Both Time and Individual E¤ectsT n �0 � � � �21 �22




(4) 10 49 �b0 Bias 0.0001 -0.0305 -0.0178 -0.0240 -0.1216E-SD 0.0471 0.0733 0.0980 0.0691 0.0622RMSE 0.0471 0.0794 0.0996 0.0731 0.1366T-SD 0.0450 0.0771 0.1060 0.0579 0.0679

(5) 50 9 �a0 Bias -0.0014 -0.0179 -0.3438 -0.1081 -0.1260E-SD 0.0519 0.0541 0.0566 0.0663 0.0649RMSE 0.0520 0.0570 0.3484 0.1268 0.1417T-SD 0.0488 0.0983 0.1140 0.0587 0.0605

(6) 50 9 �b0 Bias -0.0091 -0.1959 -0.1330 -0.1079 -0.1258E-SD 0.0526 0.0528 0.0571 0.0664 0.0651RMSE 0.0534 0.2029 0.1447 0.1267 0.1416T-SD 0.0479 0.0965 0.1192 0.0600 0.0619





Note: 1. �a0= (1; 0:2; 0:5; 1) and �b0= (1; 0:5; 0:2; 1).

2. The column of �21 is from direct approach I;and the column of �22 is from dirct approach II.

3. The two direct approaches yield the same estimate of �0= (�00; �0; �0)

0.4. The T-SD is obtained from the direct approach, except for �22, which is

from the direct approach II.

31

Estimation of spatial autoregressive panel data models with –xed …econ.ucsb.edu/~doug/245a/Papers/Spatial Autoregressive... · 2010-01-06 · when T is small (but n is large).2

Documents