Estimation of spatial autoregressive panel data models with xed e/ects Lung-fei Lee Department of Economics Ohio State University [email protected]Jihai Yu Department of Economics University of Kentucky [email protected]March 4, 2008 Abstract This paper establishes asymptotic properties of quasi-maximum likelihood estimators for xed e/ects SAR panel data models with SAR disturbances where the time periods T and/or the number of spatial units n can be nite or large in all combinations except that both T and n are nite. A direct approach is to estimate all the parameters including xed e/ects. We propose alternative estimation methods based on transformation. For the model with only individual e/ects, the transformation approach yields consistent estimators for all the parameters when either n or T are large, while the direct approach does not yield a consistent estimator of the variance of disturbances unless T is large, although the estimators for other parameters are the same as those of the transformation approach. For the model with both individual and time e/ects, the transformation approach yields consistent estimators of all the parameters when either n or T are large. When we estimate both individual and time e/ects directly, consistency of the variance parameter requires both n and T to be large and consistency of other parameters requires n to be large. JEL classication: C13; C23; R15 Keywords: Spatial autoregression, Panel data, Fixed e/ects, Time e/ects, Quasi-maximum likelihood estimation, Conditional likelihood Lee acknowledges nancial support for his research from NSF under Grant No. SES-0519204
32
Embed
Estimation of spatial autoregressive panel data models with –xed …econ.ucsb.edu/~doug/245a/Papers/Spatial Autoregressive... · 2010-01-06 · when T is small (but n is large).2
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Estimation of spatial autoregressive panel data models with �xede¤ects
This paper establishes asymptotic properties of quasi-maximum likelihood estimators for �xed e¤ectsSAR panel data models with SAR disturbances where the time periods T and/or the number of spatialunits n can be �nite or large in all combinations except that both T and n are �nite. A direct approachis to estimate all the parameters including �xed e¤ects. We propose alternative estimation methodsbased on transformation. For the model with only individual e¤ects, the transformation approach yieldsconsistent estimators for all the parameters when either n or T are large, while the direct approach doesnot yield a consistent estimator of the variance of disturbances unless T is large, although the estimatorsfor other parameters are the same as those of the transformation approach. For the model with bothindividual and time e¤ects, the transformation approach yields consistent estimators of all the parameterswhen either n or T are large. When we estimate both individual and time e¤ects directly, consistency ofthe variance parameter requires both n and T to be large and consistency of other parameters requiresn to be large.
�Lee acknowledges �nancial support for his research from NSF under Grant No. SES-0519204
1 Introduction
Spatial econometrics deals with the spatial interactions of economic units in cross-section and/or panel
data. To capture correlation among cross-sectional units, the spatial autoregressive (SAR) model by Cli¤ and
Ord (1973) has received the most attention in economics. It extends autocorrelation in times series to spatial
dimensions and captures interactions or competition among spatial units. Early development in estimation
and testing is summarized in Anselin (1988), Cressie (1993), Kelejian and Robinson (1993), and Anselin and
Bera (1998), among others. The spatial correlation can be extended to panel data models (Anselin, 1988).
Baltagi et al. (2003) consider the speci�cation test of the spatial correlation in a panel regression with error
component and SAR disturbances. Kapoor et al. (2007) provide a rigorous theoretical analysis of a panel
model with SAR disturbances which incorporate error components. Baltagi et al. (2007) generalize Baltagi et
al. (2003) by allowing for spatial correlations in both individual and error components such that they might
have di¤erent spatial autoregressive parameters, which encompasses the spatial correlation speci�cations in
Baltagi et al. (2003) and Kapoor et al. (2007). Instead of random e¤ect error components, an alternative
speci�cation for panel data model assumes �xed e¤ects. The �xed e¤ects speci�cation has the advantage of
robustness in that the �xed e¤ects are allowed to correlate with included regressors in the model (Hausman,
1978). Yu et al. (2006, 2007) and Yu and Lee (2007) consider the spatial correlation in a dynamic panel data
setting, where the data generating processes (DGPs) are speci�ed to be, respectively, stationary, partially
nonstationary and nonstationary.
For panel data models with �xed individual e¤ects, when the time dimension T is �xed, we are likely to
encounter the �incidental parameters�problem discussed in Neyman and Scott (1948). This is because the
introduction of �xed e¤ects increases the number of parameters to be estimated. In a linear panel regression
model or a logit panel regression model with �xed individual e¤ects, the �xed e¤ects can be eliminated by
the method of conditional likelihood when e¤ective su¢ cient statistics can be found for each of the �xed
e¤ects. For those panel models, the time average of the dependent variables provides the su¢ cient statistic
(see Hsiao, 1986).
For the linear panel regression model with �xed e¤ects, the direct maximum likelihood (ML) approach
will estimate jointly the common parameters and �xed e¤ects. The corresponding ML estimates (MLEs) of
the regression coe¢ cients are known as the within estimates, which happen to be the conditional likelihood
estimates conditional on the time means1 . For the SAR panel data models with individual e¤ects, similar
�ndings of the direct ML approach will be shown in this paper. This direct estimation approach will yield
consistent estimates for the spatial and regression coe¢ cients except for the variance of the disturbances
1However, e¤ective su¢ cient statistics might not be available for many other models. The well-known example is the probitpanel regression model, where the time average of the dependent variables does not provide the su¢ cient statistic even thoughprobit and logit models are close substitutes (see Chamberlain, 1982).
1
when T is small (but n is large).2 However, for SAR panel models with time e¤ects, the direct estimation
approach will be shown to be inconsistent for all parameters when n is small (but T is large). The inconsistent
estimates are consequences of the incidental parameters (Neyman and Scott, 1948).
In this paper, in order to avoid the incidental parameters problem, we suggest alternative estimation
methods. By using the data transformation (IT � 1T lT l
0T ) to eliminate the individual e¤ects, the transformed
disturbances are uncorrelated although not i:i:d: in general. The transformed equation can be estimated by
the quasi-maximum likelihood (QML) approach. For the more general model with both individual and time
�xed e¤ects, one may combine the transformation (In � 1n lnl
0n) with the transformation (IT � 1
T lT l0T ) to
eliminate both the individual and time �xed e¤ects. By exploring the generalized inverse of the transformed
equation, one may end up with a QML approach for the transformed model3 .
Panel regression models with SAR disturbances have been recently considered in the literature. The
model considered in Baltagi et al. (2003) is Ynt = Xnt�0 + cn0 + Unt; Unt = �0WnUnt + Vnt; t = 1; 2; :::; T ,
where elements of Vnt are i:i:d: (0; �20), cn0 is an n � 1 vector of individual error components and the
spatial correlation is in Unt. A di¤erent speci�cation has been considered in Kapoor et al. (2007) with
Ynt = Xnt�0 + U+nt and U
+nt = �0WnU
+nt + dn0 + Vnt; t = 1; 2; :::; T , where dn0 is the vector of individual
error components. Kapoor et al. (2007) propose method of moment (MOM) procedure for the estimation of
�0 and the variance parameters of dn0 and Vnt. The two panel models are di¤erent in terms of the variance
matrices of the overall disturbances. The variance matrix in Baltagi et al. (2003) is more complicated
and its inverse is computationally demanding; the variance matrix in Kapoor et al. (2007) has a special
pattern and its inverse can be easier to compute. Baltagi et al. (2007) allow for spatial correlations in both
individual and error components where they might have di¤erent spatial autoregressive parameters. Both
Baltagi et al. (2003) and Baltagi et al. (2007) have emphasized on the test of spatial correlation in their
models. With the �xed e¤ects speci�cation, these panel models can have the same representation. By the
transformation (In � �0Wn), the DGP of Kapoor et al. (2007) becomes Ynt = Xnt�0 + cn0 + Unt where
cn0 = (In � �0Wn)�1dn0 and Unt = U+nt � (In � �0Wn)
�1dn0. The Unt = �0WnUnt + Vnt forms a SAR
process. By regarding (In��0Wn)�1dn0 as a vector of unknown �xed e¤ect parameters, these two equations
are identical to a linear panel regression with �xed e¤ects and SAR disturbances. Hence, to generalize Baltagi
et al. (2003), Baltagi et al. (2007) and Kapoor et al. (2007), where the spatial e¤ects are in the disturbances,
and to generalize the SAR panel model where the spatial e¤ects are in the regression equation, we are going
to consider the estimation of the SAR panel model with both spatial lag and spatial disturbances. We allow
that the time periods T and/or the number of spatial units n can be �nite or large in all combinations except
2When a dynamic e¤ect is considered into the SAR panel data, we will have an �initial condition�problem which will causethe inconsistency of the direct likelihood estimates for all the parameters unless T is large ( see Yu et al, 2006, 2007 and Yuand Lee (2007)).
3The use of (IT � 1TlT l
0T ) to eliminate time �xed e¤ects has been considered in Lee and Yu (2007a) for a spatial dynamic
panel model with large T . In a group setting with group �xed e¤ects, a similar transformation can eliminate the group e¤ects(Lee et al., 2008).
2
that both T and n are �nite. In this paper, we pay special attention to the model with individual e¤ects
when n is large but T is small. On the other hand, for the model with time e¤ects, the special interest is on
the model with large T but small n.
This paper is organized as follows. In Section 2, the model with individual �xed e¤ects is introduced
and the data transformation procedure is proposed. We then establish the consistency and asymptotic
distribution of the QML estimator of the transformation approach. The direct ML approach is discussed
in Section 3 where the individual e¤ects are estimated directly. Section 4 generalizes the model to include
both individual and time e¤ects. After the individual e¤ects are eliminated, we can further eliminate the
time e¤ects and the asymptotics are derived. Alternatively, we can estimate the transformed time e¤ects
directly, or estimate both e¤ects directly, both of which are discussed in Section 5. Simulation results are
reported in Section 6 to compare di¤erent approaches. Section 7 concludes the paper. Proofs are collected
in the Appendix.
2 Transformation Approach
The SAR panel model with SAR disturbances where we have individual e¤ects is
Ynt = �0WnYnt +Xnt�0 + cn0 + Unt; Unt = �0MnUnt + Vnt; t = 1; 2; :::; T , (2.1)
where Ynt = (y1t; y2t; :::; ynt)0 and Vnt = (v1t; v2t; :::; vnt)0 are n � 1 column vectors and vit is i:i:d: across i
and t with zero mean and variance �20, Wn is an n � n spatial weights matrix, which is predetermined and
generates the spatial dependence among cross sectional units yit, Xnt is an n� kX matrix of nonstochastic
regressors, and cn0 is an n� 1 column vector of �xed e¤ects.
In panel data models, when T is �nite, we need to take care of the incidental parameters problem. In
dynamic panel data, the �rst di¤erence or Helmert transformation can be made to eliminate the individual
e¤ects (see Anderson and Hsiao (1981) and Arellano and Bover (1995) among others). In this paper, we use
an orthogonal transformation which includes the Helmert transformation as a special case. Our asymptotic
results are obtained where T and/or n can be �nite or large in all combinations except that both T and
n are �nite4 . De�ne Sn(�) = In � �Wn and Rn(�) = In � �Mn for any � and �. At the true parameter,
Sn = Sn(�0) and Rn = Rn(�0). Then, presuming Sn and Rn are invertible, (2.1) can be rewritten as
Ynt = S�1n Xnt�0 + S
�1n cn0 + S
�1n R�1n Vnt. (2.2)
For our analysis of the asymptotic properties of estimators, we make the following assumptions:
Assumption 1. Wn and Mn are nonstochastic spatial weights matrices and their diagonal elements satisfy
wn;ii = 0 and mn;ii = 0 for i = 1; 2; � � � ; n.4We do not have an exact �nite small sample theory for the estimators with both n and T being �nite.
3
Assumption 2. The disturbances fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are i:i:d across i and t with zero
mean, variance �20 and E jvitj4+�
<1 for some � > 0.
Assumption 3. Sn(�) and Rn(�) are invertible for all � 2 � and � 2 P. Furthermore, � and P are compact,
�0 is in the interior of � and �0 is in the interior of P.
Assumption 4. The elements of Xnt are nonstochastic and bounded5 , uniformly in n and t. Also, under
the setting in Assumption 6, the limit of 1nT
PTt=1
~X 0nt~Xnt exists and is nonsingular.6
Assumption 5. Wn and Mn are uniformly bounded in row and column sums in absolute value (for short,
UB)7 . Also S�1n (�) and R�1n (�) are UB8 , uniformly in � 2 � and � 2 P.
Assumption 6. (1) n is large, where T can be �nite or large; or, (2) T is large, where n can be �nite or
large.
Assumption 1 is a standard normalization assumption in spatial econometrics. This assumption helps
the interpretation of the spatial e¤ect as self-in�uence shall be excluded in practice. Assumption 2 provides
regularity assumptions for vit and our analysis is based on i.i.d. disturbances. If there are unknown het-
eroskedasticity, the MLE (QMLE) would not be consistent. Consistent methods such as the GMM in Lin
and Lee (2005) and that in Kelejian and Prucha (2007) may be designed for the model. Invertibility of Sn(�)
and Rn(�) in Assumption 3 guarantees that (2.2) is valid. Also, compactness is a condition for theoretical
analysis. In many empirical applications, each of the rows of Wn and Mn sums to 1, which ensures that
all the weights are between 0 and 1. When Wn and Mn are row normalized, it is often to take a compact
subset of (-1,1) as the parameter space. When exogenous variables Xnt are included in the model, it is
convenient to assume that the exogenous regressors are uniformly bounded as in Assumption 4. Assumption
5 is originated by Kelejian and Prucha (1998, 2001) and also used in Lee (2004, 2007). ThatWn,Mn, S�1n (�)
and R�1n (�) are UB is a condition that limits the spatial correlation to a manageable degree. Assumption 6
allows three cases: (i) both n and T are large; (ii) T is �xed and n is large; (iii) n is �xed and T is large.
For (ii), we are interested in the short panel data case in contrast to the case where T needs to be large in
other studies, e.g., Hahn and Kuersteiner (2002) and Yu et al. (2006). When n is large and T is �nite, the
incidental parameter problem may appear so that careful estimation methods need to be designed. However,
our suggested transformation approach for the estimation of (2.1) is general and it may also apply to the
cases (i) and (iii) where T can be large.
5 If Xnt is allowed to be stochastic and unbounded, appropriate moment conditions can be imposed instead.6For notational purposes, we de�ne ~Ynt = Ynt � �YnT and ~Yn;t�1 = Yn;t�1 � �YnT;�1 for t = 1; 2; � � � ; T where �YnT =
1T
PTt=1 Ynt and �YnT;�1 =
1T
PTt=1 Yn;t�1. Similarly, we de�ne ~Xnt = Xnt � �XnT and ~Vnt = Vnt � �VnT .
7We say a (sequence of n � n) matrix Pn is uniformly bounded in row and column sums if supn�1 kPnk1 < 1 andsupn�1 kPnk1 < 1, where kPnk1 = sup1�i�n
Pnj=1 jpij;nj is the row sum norm and kPnk1 = sup1�j�n
Pni=1 jpij;n j is the
column sum norm.8This assumption has e¤ectively ruled out some cases, and, hence, imposed limited dependence across spatial units. For
example, if �0n = 1� 1=n under n!1, it is a near unit root case for a cross sectional spatial autoregressive model and S�1nwill not be UB (see Lee and Yu (2007b)).
4
2.1 Data Transformation and Conditional Likelihood
Let [FT;T�1; 1pTlT ] be the orthonormal matrix of the eigenvectors of JT = (IT � 1
T lT l0T ), where FT;T�1 is
the T � (T � 1) eigenvector matrix9 corresponding to the eigenvalues of one and 1pTlT is the T -dimensional
column vector of ones. For any n � T matrix [Zn1; � � � ; ZnT ] where each Znt, t = 1; � � � ; T , is a T -
dimensional column vector, we de�ne the corresponding transformed n� (T �1) matrix [Z�n1; � � � ; Z�n;T�1] =
[Zn1; � � � ; ZnT ]FT;T�1. Denote X�nt = [X
�nt;1; X
�nt;2; � � � ; X�
nt;kX]. Then, (2.1) implies
Y �nt = �0WnY�nt +X
�nt�0 + U
�nt; U�nt = �0MnU
�nt + V
�nt, t = 1; � � � ; T � 1. (2.3)
Because
0B@ V �n1...V �n;T�1
1CA = (F 0T;T�1 In)
0B@ Vn1...VnT
1CA and vit is i:i:d:, we have
E
0B@ V �n1...V �n;T�1
1CA (V �0n1; � � � ; V �0n;T�1) = �20(F 0T;T�1In)(FT;T�1In) = �20(F 0T;T�1FT;T�1In) = �20In(T�1).Hence, v�it�s are uncorrelated for all i and t (and independent under normality) where v
�it is the ith element
of V �nt.
Denote � = (�0; �; �; �2)0 and � = (�0; �; �)0. At the true value, �0 = (�00; �0; �0; �
20)0 and �0 = (�
00; �0; �0)
0.
The likelihood function of (2.3) as if the disturbances were normally distributed, is
lnLn;T (�) = �n(T � 1)
2ln 2� � n(T � 1)
2ln�2 + (T � 1)[ln jSn(�)j+ ln jRn(�)j]�
1
2�2
XT�1
t=1V �0nt(�)V
�nt(�),
(2.4)
where V �nt(�) = Rn(�)[Sn(�)Y�nt �X�
nt�]. Thus, V�nt = V
�nt(�0). The QMLE �nT is the extremum estimator
derived from the maximization of (2.4). For any n-dimensional column vectors pnt and qnt, asPT�1t=1 p
with ~Unt = �0Mn~Unt + ~Vnt. As ~Vnt, t = 1; � � � ; T , are independent of �VnT under normality, the likelihood in
(2.5) corresponds to the density function of ~Ynt, t = 1; � � � ; T .
2.2 Asymptotic Properties
For the likelihood function (2.5) divided by the e¤ective sample size n(T �1), the corresponding expected
value function is Qn;T (�) = Emaxcn1
n(T�1) lnLn;T (�; cn), which is
Qn;T (�) =1
n(T � 1)E lnLn;T (�) (2.6)
= �12ln 2� � 1
2ln�2 +
1
n[ln jSn(�)j+ ln jRn(�)j]�
1
2�21
n(T � 1)EXT
t=1~V 0nt(�) ~Vnt(�).
To show the consistency of �nT , we need the following uniform convergence result.
Claim 1 Let � be any compact parameter space of �. Under Assumptions 1-6, 1n(T�1) lnLn;T (�)�Qn;T (�)
p!
0 uniformly in � 2 � and Qn;T (�) is uniformly equicontinuous for � 2 �.
Proof. See Appendix A.2.
For local identi�cation, a su¢ cient condition (but not necessary) is that the information matrix ��0;nT ,
where ��0;nT = �E�
1n(T�1)
@2 lnLn;T (�0)@�@�0
�, is nonsingular and �E
�1
n(T�1)@2 lnLn;T (�)
@�@�0
�has full rank for any
� in some neighborhood N(�0) of �0 (see Rothenberg (1971)). The ��0;nT is derived in (A.4) of Appendix
A.1 and its nonsingularity is analyzed in Appendix A.3. While the conditions for the nonsingularity of the
information matrix provide local identi�cation, the conditions in the following assumption are global ones.
Denote
HnT (�) =1
n(T � 1)XT
t=1( ~Xnt; Gn ~Xnt�0)
0R0n(�)Rn(�)(~Xnt; Gn ~Xnt�0),
�2n(�) =�20ntr[(Rn(�)R
�1n )0(Rn(�)R
�1n )],
�2n(�; �) =�20ntr[(Rn(�)Sn(�)S
�1n R�1n )0(Rn(�)Sn(�)S
�1n R�1n )].
Assumption 7. Either (a) the limit of HnT (�) is nonsingular for each possible � in P and the limit of�1n ln
���20R�10n R�1n��� 1
n ln���2n(�)R�1n (�)0R�1n (�)
��� is not zero10 for � 6= �0; or (b) the limit of�1
nln���20R�10n S�10n S�1n R�1n
��� 1
nln���2n(�; �)R�1n (�)0S�1n (�)0S�1n (�)R�1n (�)
���is not zero for (�; �) 6= (�0; �0).11
10When n is �nite and T is large, this inequality becomes 1nln j�20R
�10n R�1n j � 1
nln j�2n(�)R�1n (�)0R�1n (�)j 6= 0.
11The inequality will be 1nln j�20R
�10n S�10n S�1n R�1n j� 1
nln j�2n(�; �)R�1n (�)0S�1n (�)0S�1n (�)R�1n (�)j 6= 0 when n is �nite and T
is large. When Mn =Wn and �0 6= �0, this condition would not be satis�ed as (�0; �0) and (�0; �0) could not be distinguishedfrom each other. Identi�cation will rely on either Assumption 7 (a) or extra information on the order of magnitudes of �0 and�0.
6
This assumption states the identi�cation conditions of the model which generalize those for a cross section
SAR model in Lee and Liu (2006) to the panel case. The part (a) of Assumption 7 represents the possible
identi�cation of �0 and �0 through the deterministic part of the reduced form equation of (2.3) and the
identi�cation of �0 and �20 from the SAR process of U�nt in (2:3). The part (b) of Assumption 7 provides
identi�cation through the SAR process of the reduced form of disturbances of Y �nt. The global identi�cation
and consistency are shown in the following theorem.
Theorem 1 Under Assumptions 1-7, �0 is globally identi�ed and, for the extremum estimator �nT derived
from (2.5), �nTp! �0.
Proof. See Appendix A.4.
The asymptotic distribution of the QMLE �nT can be derived from the Taylor expansion of @ lnLn;T (�nT )@�
around �0. At �0, the �rst order derivative of the likelihood function involves both linear and quadratic
functions of ~Vnt and is derived in (A.3). The variance matrix of 1pn(T�1)
@ lnLn;T (�0)@� is equal to
E
�1
n(T � 1)@ lnLn;T (�0)
@�� @ lnLn;T (�0)
@�0
�= ��0;nT +�0;n,
and �0;n =�4�3�40�40
0BBB@0kX�kX � � �01�kX
1n
Pni=1
�G2n;ii � �01�kX
1n
Pni=1
�Gn;iiHn;ii1n
Pni=1H
2n;ii �
01�kX1
2�20ntr �Gn
12�20n
trHn14�40
1CCCA is a symmetric matrix with �4
being the fourth moment of vit, where Gn;ii is the (i; i) entry of Gn, Hn;ii is the (i; i) entry of Hn, �Gn is a
matrix transformed from Gn as de�ned in Appendix A.1 after (A.4). When Vnt are normally distributed,
�0;n = 0 because �4 � 3�40 = 0 for a normal distribution. Denote ��0 as the limit of ��0;nT and �0 as
the limit of �0;n, then, the limiting variance matrix of1p
n(T�1)@ lnLn;T (�0)
@� is equal to ��0 + �0 . The
asymptotic distribution of 1pn(T�1)
@ lnLn;T (�0)@� can be derived from the central limit theorem for martingale
di¤erence arrays12 . Denote Cn = Gn � trGn
n In and Dn = Hn � trHn
n In.
Assumption 8. The limit of 1n2
�tr(CsnC
sn)tr(D
snD
sn)� tr2(CsnDs
n)�is strictly positive.13
Assumption 8 is a condition for the nonsingularity of the limiting information matrix ��0 (see Appendix
A.3). When the limit of HnT is singular, as long as the limit of 1n2
�tr(CsnC
sn)tr(D
snD
sn)� tr2(CsnDs
n)�is
strictly positive, the limiting information matrix ��0 is still nonsingular. Also, its rank does not change in
a small neighborhood of �0.14 .
Claim 2 Under Assumptions 1-6 and 7(a); or 1-6, 7(b) and 8, 1pn(T�1)
@ lnLn;T (�0)@�
d! N(0;��0 + �0).
When fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, 1pn(T�1)
@ lnLn;T (�0)@�
d! N(0;��0).
12When T is �nite, we can use the central limit theorem in Kelejian and Prucha (2001). When T is large, we can use thecentral limit theorem in Yu et al. (2006).13When n is �nite and T is large, Assumption 8 is � 1
n2
�tr(CsnC
sn)tr(D
snD
sn)� tr2(CsnDs
n)�> 0�.
14See (C.10) in Yu et al. (2006) for the case T is large. When T is �nite, it still holds according to Lee (2004).
7
Proof. See Appendix A.5.
Also, under Assumptions 1-7, we have 1n(T�1)
@2 lnLn;T (�)@�@�0 � 1
n(T�1)@2 lnLn;T (�0)
@�@�0 = k� � �0k � Op(1) and1
n(T�1)@2 lnLn;T (�0)
@�@�0 � @2Qn;T (�0)@�@�0 = Op
�1p
n(T�1)
�.15 Combined with Claim 2, we have the following theorem
for the distribution of �nT .
Theorem 2 Under Assumptions 1-6 and 7(a); or 1-6, 7(b) and 8, for the extremum estimator �nT derived
from (2.5), pn(T � 1)(�nT � �0)
d! N(0;��1�0 (��0 +�0)��1�0), (2.7)
Additionally, if fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, (2.7) becomespn(T � 1)(�nT � �0)
d!
N(0;��1�0 ).
Proof. See Appendix A.6.
Hence, after the data transformation to eliminate the individual e¤ects, the QMLE is consistent and
asymptotically normal when either n or T are large.
3 The Direct Approach
For the estimation of the linear panel regression model with �xed individual e¤ects, the ML approach
which estimates the �xed e¤ects directly provides consistent estimates of the regression coe¢ cients, which
are known as the within estimates. For the spatial panel model with �xed individual e¤ects, one may wonder
whether or not the ML approach will yield consistent estimates when T is small. As we will see below, this
direct approach will yield the same consistent estimator of the transformation approach for �0 = (�00; �0; �0)
0;
however, the estimator of �20 is inconsistent unless T is large.
3.1 The Likelihood Function
The likelihood function for the model before transformation (2.1) is
lnLdn;T (�; cn) = �nT
2ln 2� � nT
2ln�2 + T [ln jSn(�)j+ ln jRn(�)j]�
1
2�2
XT
t=1V 0nt(�)Vnt(�), (3.1)
where Vnt(�) = Rn(�)[Sn(�)Ynt�Xnt��cn]. We can estimate cn directly and have the asymptotic analysis
on the estimator of �0 via the concentrated likelihood function.
Using the �rst order condition that@ lnLdn;T (�;cn)
@cn= 1
�2R0n(�)
PTt=1 Vnt(�), we have cnT (�) =
1T
PTt=1(Sn(�)Ynt�
Xnt�) and the concentrated likelihood is
lnLdn;T (�) = �nT
2ln 2� � nT
2ln�2 + T [ln jSn(�)j+ ln jRn(�)j]�
1
2�2
XT
t=1~V 0nt(�) ~Vnt(�), (3.2)
15See (C.7) and (C.8) in Yu et al. (2006) for the case T is large. When T is �nite, it still holds according to Lee (2004).
8
with ~Vnt(�) being the same one in (2.5). One may compare the concentrated likelihood function in (3.2) with
the likelihood function from the transformation approach in (2.5). We see that the di¤erence is on the use
of T in (3.2) but (T � 1) in (2.5). For large T , the two functions can be very close to each other. Therefore,
we may expect that the estimates of �0 from these two approaches could be asymptotically equivalent when
T is large. The interesting comparison is for the case where T is �nite.
3.2 Asymptotic Properties
For (3.2), we can further concentrate out � and �2 and focus on (�; �). Denote
�d
nT (�; �) =hPT
t=1~X 0ntR
0n(�)Rn(�) ~Xnt
i�1 hPTt=1
~X 0ntR
0n(�)Rn(�)Sn(�) ~Ynt
i,
�d2nT (�; �) =1
nT
PTt=1
hSn(�) ~Ynt � ~Xnt�
d
nT (�; �)i0R0n(�)Rn(�)
hSn(�) ~Ynt � ~Xnt�
d
nT (�; �)i.
The concentrated log likelihood function of (�; �) is
2 ln �2nT (�; �) + T [ln jSn(�)j + ln jRn(�)j]. By comparing
(3.3) and (3.4), we can see that they will yield the same maximizer (�nT ; �nT ). As �d
nT (�; �) has the same
expression as �nT (�; �), we can conclude that the QMLE of �0 = (�00; �0; �0)
0 from this direct approach will
yield the same consistent estimate as the transformation approach. However, the estimation of �20 from the
direct approach will not be consistent unless T is large, which can be seen from �d2nT (�; �) and �2nT (�; �).
16
Hence, the ML estimation of the spatial panel model with �xed individual e¤ects shares some common
features on their estimates with those of the ML estimation of the linear panel regression model with �xed
e¤ects.17
16Note that, for the linear panel regression model with �xed e¤ects, while the within estimates of the regression coe¢ cientsare consistent, the corresponding MLE of �20 is not, which is the consequence of the incidential parameters problem (Neymanand Scott 1948).17As the bias of the direct estimate of �20 is due to the degree of freedom (T � 1) instead T , one may easily correct the biased
estimate to a bias corrected estimate. The bias corrected estimator will become the conditional likelihood estimator in thismodel.
9
4 A General Model With Time E¤ects: Transformation Approach
Both Baltagi et al. (2003) and Kapoor et al. (2007) focus on models with only individual e¤ects. While
in the panel data literature, there are also two way error component regression models where we have not
only unobservable individual e¤ects but also unobservable time e¤ects (See Wallace and Hussain (1969),
Amemiya (1971), Nerlove (1971) and Hahn and Moon (2006), etc). Hence, it is natural to generalize the
model to include both individual e¤ects and time e¤ects. This would be useful for empirical applications
where the time dummy e¤ects might be important and should be taken into account, for example, in growth
theory and regional economics (see Ertur and Koch (2007) and Foote (2007) for recent empirical applications
of panel data models with both time dummy e¤ects and spatial e¤ects). Hence, we generalize (2.1) to
Ynt = �0WnYnt +Xnt�0 + cn0 + �tln + Unt; Unt = �0MnUnt + Vnt, t = 1; 2; :::; T , (4.1)
where �t is the �xed time e¤ects. For (4.1), we may �rst eliminate the individual e¤ects by FT;T�1 similar
to (2.3), which yields
Y �nt = �0WnY�nt +X
�nt�0 + �
�t ln + U
�nt, U
�nt = �0MnU
�nt + V
�nt, t = 1; 2; :::; T � 1, (4.2)
where [��1ln; ��2ln; � � � ; ��T�1ln] = [�1ln; �2ln; � � � ; �T ln]FT;T�1 can be considered as the transformed time
e¤ects. We can make a further transformation to (4.2) to eliminate the transformed time e¤ects. For this
further transformation approach, it is investigated in this section. Alternatively, we can estimate the ��t
directly. Section 5 covers the direct approach where we will estimate the transformed time e¤ects directly.
Furthermore, we might be interested to investigate the estimators when we estimate both time e¤ects and
individual e¤ects directly. This is also discussed in Section 5.
4.1 Data Transformation and the Likelihood Function
To eliminate the time dummy e¤ects, we needWn andMn to be row normalized for analytical purpose18 .
Also, Assumption 4 is changed accordingly. Let Jn = In � 1n lnl
0n be the deviation from the group mean
transformation over spatial units.
Assumption 1�. Wn and Mn are row normalized nonstochastic spatial weights matrices.
Assumption 4�. The elements of Xnt are nonstochastic and bounded, uniformly in n and t. Also, under
the setting in Assumption 6, the limit of 1nT
PTt=1
~X 0ntJn ~Xnt exists and is nonsingular.
Let (Fn;n�1, ln=pn) be the orthonormal matrix of eigenvectors of Jn where Fn;n�1 corresponds to the
eigenvalues of ones and ln=pn corresponds to the eigenvalue zero. Similar to Lee and Yu (2007a), we can
transform the n-dimensional vector Y �nt to an (n � 1)-dimensional vector Y ��nt such that Y ��nt = F 0n;n�1Y�nt.
18When Wn and Mn are not row normalized, we can still eliminate the transformed time e¤ects; however, we will not havethe presentation of (4.3).
10
Hence, (4.2) will be transformed into
Y ��nt = �0(F0n;n�1WnFn;n�1)Y
��nt +X
��nt�0 + U
��nt ; U
��nt = �0(F
0n;n�1MnFn;n�1)U
��nt + V
��nt , (4.3)
where X��nt;k = F 0n;n�1X
�nt;k and V
��nt = F 0n;n�1V
�nt. After the transformations, the e¤ective sample size is
now (n � 1)(T � 1). Because
0B@ V ��n1...V ��n;T�1
1CA = (IT�1 F 0n;n�1)
0B@ V �n1...V �n;T�1
1CA = (IT�1 F 0n;n�1)(F 0T;T�1
In)
0B@ Vn1...VnT
1CA = (F 0T;T�1 F 0n;n�1)
0B@ Vn1...VnT
1CA, we have E0B@ V ��n1...V ��n;T�1
1CA (V ��0n1 ; � � � ; V ��n;T�1) = �20(F 0T;T�1
F 0n;n�1)(FT;T�1 Fn;n�1) = �20(IT�1 In�1) = �20I(n�1)(T�1). Hence, v��it �s are uncorrelated for all i and t
(and independent under normality) where v��it is the ith element of V��nt .
The likelihood function for (4.3) is
lnLn;T (�) = � (n� 1)(T � 1)2
ln 2� � (n� 1)(T � 1)2
ln�2 + (T � 1) ln��In�1 � �F 0n;n�1WnFn;n�1
��+(T � 1) ln
��In�1 � �F 0n;n�1MnFn;n�1��� 1
2�2
T�1Xt=1
V ��0nt (�)V��nt (�), (4.4)
where V ��nt (�) = R�n(�)[(In�1 � �F 0n;n�1WnFn;n�1)Y
��nt �X��
nt�], R�n(�) = In�1 � �F 0n;n�1MnFn;n�1 and the
determinant and inverse of (In�1 � �F 0n;n�1WnFn;n�1) are��In�1 � �F 0n;n�1WnFn;n�1�� = 1
1� � jIn � �Wnj , (In�1��F 0n;n�1WnFn;n�1)�1 = F 0n;n�1(In��Wn)
�1Fn;n�1,
and similarly for (In�1��F 0n;n�1MnFn;n�1) (see Lee and Yu (2007a)). For any n-dimensional column vector
pnt and qnt, as Jn(pn1; � � � ; pnT )JT = Jn(~pn1; � � � ; ~pnT ),
This implies that the likelihood function (4.4) is numerically identical to
lnLn;T (�) = � (n� 1)(T � 1)2
ln 2� � (n� 1)(T � 1)2
ln�2 � (T � 1) ln(1� �)� (T � 1) ln(1� �)
+(T � 1) ln jSn(�)j+ (T � 1) ln jRn(�)j �1
2�2
TXt=1
~V 0nt(�)Jn ~Vnt(�), (4.5)
where ~Vnt(�) = Rn(�)[(In � �Wn) ~Ynt � ~Xnt�].19
19We note that this likelihood function is, in general, not necessarily a conditional likelihood as the sample average overspatial units at each t might not be a su¢ cient statistic for the time dummy.
11
4.2 Asymptotic Properties
The �rst and second order derivatives of (4.5) are (C.1) and (C.2) in Appendix C.1. From (C.1) and
(C.2), the score is in (C.3) and the information matrix ��0;nT = �E�
1(n�1)(T�1)
@2 lnLn;T (�0)@�@�0
�is in (C.4).
The following Assumptions provide conditions for global identi�cation. Denote
HnT (�) =1
(n� 1)(T � 1)XT
t=1( ~Xnt; Gn ~Xnt�0)
0R0n(�)JnRn(�)(~Xnt; Gn ~Xnt�0),
�2n(�) =�20n� 1 tr[(Rn(�)R
�1n )0Jn(Rn(�)R
�1n )],
�2n(�; �) =�20n� 1 tr[(Rn(�)Sn(�)S
�1n R�1n )0Jn(Rn(�)Sn(�)S
�1n R�1n )].
Assumption 7�. Either (a) the limit of HnT (�) is nonsingular for each possible � in P and the limit
of�
1n�1 ln
���20R�10n JnR�1n
��� 1n�1 ln
���2n(�)R�1n (�)0JnR�1n (�)
��� is not zero20 for � 6= �0; or (b) the limit of�1
n�1 ln���20R�10n S�10n JnS
�1n R�1n
��� 1n�1 ln
���2n(�; �)R�1n (�)0S�1n (�)0JnS�1n (�)R�1n (�)
��� is not zero for (�; �) 6=(�0; �0).
21
Assumption 8�. The limit of 1(n�1)2
�tr(CsnC
sn)tr(D
snD
sn)� tr2(CsnDs
n)�is strictly positive, where Cn =
JnGn � trJnGn
n�1 In and Dn = JnHn � trJnHn
n�1 In.22
The variance matrix of 1p(n�1)(T�1)
@ lnLn;T (�0)@� is equal to
E
�1
(n� 1)(T � 1)@ lnLn;T (�0)
@�� @ lnLn;T (�0)
@�0
�= ��0;nT +�0;n,
where �0;n =�4�3�40�40
0BBBBBB@
0kX�kX � � �01�kX
1n�1
nPi=1
[(Jn �Gn)ii]2 � �
01�kX1
n�1
nPi=1
[(Jn �Gn)ii(JnHn)ii]1
n�1
nPi=1
[(JnHn)ii]2 �
01�kX1
2�20(n�1)tr(Jn �Gn)
12�20(n�1)
tr(JnHn)14�40
1CCCCCCA. The asymp-totics of the transformation approach with both time and individual e¤ects eliminated can be obtained
similarly as Theorem 2.
Theorem 3 Under Assumptions 1�,2,3,4�,5,6 and 7�(a); or 1�,2,3,4�,5,6,7�(b) and 8�, for the extremum
estimator �nT derived from (4.5),
p(n� 1)(T � 1)(�nT � �0)
d! N(0;��1�0 (��0 +�0)��1�0), (4.6)
20When n is �nite and T is large, this inequality becomes 1n�1 ln j�
20R
�10n JnR
�1n j � 1
nln j�2n(�)R�1n (�)0JnR
�1n (�)j 6= 0.
21The inequality will be 1n�1 ln j�
20R
�10n S�10n JnS
�1n R�1n j � 1
n�1 ln j�2n(�; �)R
�1n (�)0JnS
�1n (�)0S�1n (�)R�1n (�)j 6= 0 when n is
�nite and T is large. When Mn = Wn and �0 6= �0, this condition would not be satis�ed as (�0; �0) and (�0; �0) couldnot be distinguished from each other. Identi�cation will rely on either Assumption 7�(a) or extra information on the order ofmagnitudes of �0 and �0.22When n is �nite and T is large, Assumption 8�is � 1
(n�1)2 [tr(CsnC
sn)tr(D
snD
sn)� tr2(CsnDs
n)] > 0�.
12
Additionally, if fvitg, i = 1; 2; :::; n and t = 1; 2; :::; T; are normal, (4.6) becomesp(n� 1)(T � 1)(�nT � �0)
d! N(0;��1�0 ):
Proof. See Appendix C.2.
Hence, after the data transformation to eliminate both the individual e¤ects and time e¤ects, the QMLE
is consistent and asymptotically normal when either n or T are large.
5 A General Model With Time E¤ects: Direct Approaches
5.1 Direct Approach I: Estimation of Transformed Time E¤ects
Given (4.2) where the individual e¤ects are eliminated and time e¤ects are still present, when n ! 1
and T might be �nite or large, we can estimate the transformed time e¤ects consistently. Denote ��T =
(��1; ��2; � � � ; ��T ), the likelihood function for (4.2) is
lnLdn;T (�;��T ) = �
n(T � 1)2
ln 2��n(T � 1)2
ln�2+(T�1)[ln jSn(�)j+ln jRn(�)j]�1
2�2
T�1Xt=1
V �0nt(�;��T )V
�nt(�;�
�T ),
(5.1)
where V �nt(�;��T ) = Rn(�)[Sn(�)Y
�nt�X�
nt����t ln]. By using the �rst order condition, given �, the estimate
of ��t is ��t (�) = (l
0nR
0n(�)Rn(�)ln)
�1l0nR0n(�)Rn(�)(Sn(�)Y
�nt�X�
nt�). Using Rn(�)ln =11�� ln, the likelihood
function with ��T concentrated out is
lnLdn;T (�) = �n(T � 1)
2ln 2� � n(T � 1)
2ln�2 + (T � 1)[ln jSn(�)j+ ln jRn(�)j]�
1
2�2
T�1Xt=1
V �0nt(�)JnV�nt(�);
(5.2)
where V �nt(�) = Rn(�)[Sn(�)Y�nt �X�
nt�]. For any n-dimensional column vector pnt and qnt, asPT�1t=1 p
the likelihood function (5.2) is numerically identical to
lnLdn;T (�) = �n(T � 1)
2ln 2� � n(T � 1)
2ln�2 + (T � 1)[ln jSn(�)j+ ln jRn(�)j]�
1
2�2
TXt=1
~V 0nt(�)Jn ~Vnt(�).
(5.3)
For the concentrated likelihood function (5.3), the �rst and second order derivatives are in (D.1) and (D.2)
in Appendix D.1.
From Sections 2 and 3, we can see that for the SAR panel data model with only individual e¤ects, both the
transformation approach and the direct approach will yield the same consistent estimator of �0 = (�00; �0; �0)
0.
13
But the direct approach will not yield a consistent estimator of �20 as the transformation approach does,
unless T is large. However, for the SAR panel model with both individual and time e¤ects, this direct
approach will not yield any consistent estimator, unless n is large.
For the SAR panel data with both individual and time e¤ects, one can see the di¤erence of the two
approaches via their log likelihood functions in (4.5) and (5.3). For the direct approach, its concentrated
likelihood (5.3) does not adjust the degree of freedom in spatial units n and also does not adjust the compo-
nents on the determinants of Sn(�) and Rn(�) while the likelihood of the transformed approach in (4.5) does.
These di¤erences would result in the inconsistent estimates of �0 and �0 in addition to that of �20. Because
the estimate of �0 will depend on the estimates of �0 and �0, it would also be inconsistent. To be convincing,
the inconsistency of the QMLE with a �nite (small) n can be revealed by investigating the probability limit
of the normalized gradient vector, 1n(T�1)
@ lnLdn;T (�0)
@� from (D.1) and compare it with 1(n�1)(T�1)
@ lnLnT (�0)@�
from (C.1) of the transformation approach. As the one from (C.1) is zero because the transformation ap-
proach is consistent, the di¤erences are on the derivatives with �, � and �2. For simplicity, let plim and
lim denote that at least one of n and T goes to in�nity. We have plim 1n(T�1)
@Ldn;T (�0)
@� = � 11��0 lim
1n ;
plim 1n(T�1)
@Ldn;T (�0)
@� = � 11��0
lim 1n , and plim
1n(T�1)
@Ldn;T (�0)
@�2 = � 12�20
lim 1n . These three limits are, in gen-
eral, not zero unless n is large. When n is �nite, �0 does not solve the equation plim 1n(T�1)
@ lnLdn;T (�)
@� = 0.
The estimator �ml which maximizes the concentrated log likelihood lnLdn;T (�) would solve the normal equa-
tion 1n(T�1)
@ lnLdn;T (�ml)
@� = 0. From the asymptotic theory of an extremum (or M) estimation theory, �ml
would converge in probability to a �1 which solves the limiting equation plim 1n(T�1)
@ lnLdn;T (�1)
@� = 0 (see,
e.g., Amemiya (1985), Ch.4). But �1 6= �0, so the estimates from the concentrated likelihood function would
not be consistent unless n is large. Compared to Section 3, with time e¤ects included, the direct approach
does not give a consistent estimate of �0 = (�00; �0; �0)
0 when n is �nite (T goes to in�nity).
5.2 Direct Approach II: Estimation of Both Time and Individual E¤ects
We can also estimate both time e¤ects and individual e¤ects directly for (4.1). The likelihood function
of (4.1) is
lnLdn;T (�; cn;�T ) = �nT
2ln 2�� nT
2ln�2+T [ln jSn(�)j+ln jRn(�)j]�
1
2�2
TXt=1
V 0nt(�; cn;�T )Vnt(�; cn;�T ),
(5.4)
where Vnt(�; cn;�T ) = Rn(�)[Sn(�)Ynt �Xnt� � cn � �tln]. Using the �rst order conditions for �t and cn,
the likelihood function with both cn and �T concentrated out is
lnLdn;T (�) = �nT
2ln 2� � nT
2ln�2 + T [ln jSn(�)j+ ln jRn(�)j]�
1
2�2
TXt=1
~V 0nt(�)Jn~Vnt(�). (5.5)
For (5.5), the �rst and second order derivatives are, respectively, (D.5) and (D.6) in Appendix D.2.
14
The concentrated likelihood estimates of �0 from (5:5) can be derived from the �rst order conditions which
set the �rst order derivatives in (D.5) to zero. These �rst order conditions characterize the concentrated
likelihood estimates of the direct approach. Denote these estimates as �nd; �nd; �nd and �2nd. For the direct
estimation of the transformed time e¤ect in Section 5.1, their estimates, denoted by ~�nd; ~�nd; ~�nd and ~�2nd,
are characterized by the �rst order conditions with (D.1). We see that these two sets of �rst order conditions
are the same except that the parameter �2 in (D.5) is taken place by T�1T �2 in (D.1).23 Thus, it follows that
(~�0nd;~�nd; ~�nd) = (�
0nd; �nd; �nd) and ~�
2nd =
T�1T �2nd. From Section 5:1, the direct estimation of transformed
time e¤ects will yield inconsistent estimators for all the parameters unless n is large. If we are going to
estimate both the time e¤ects and individual e¤ects directly, the consistency of �0 will require that n is large
and the consistency of �20 requires that both n and T are large.24
6 Monte Carlo
We conduct a small Monte Carlo experiment to evaluate the performance of our transformation approach
and the direct ML estimators under di¤erent settings. We �rst check the case that there is individual
e¤ects but no time e¤ects in the DGP (see (2.1)), where we compare the performance of the transformation
approach in Section 2 with the direct approach in Section 3. Then, we check the case that time e¤ects are
also included in the DGP (see (4.1)), where we compare the transformation approach in Section 4 with the
direct approaches in Section 5.
We �rst generate samples from (2.1):
Ynt = �0WnYnt +Xnt�0 + cn0 + Unt; Unt = �0MnUnt + Vnt t = 1; 2; :::; T ,
using �a0 = (1:0; 0:2; 0:5; 1)0 and �b0 = (1; 0:5; 0:2; 1)0 where �0 = (�00; �0; �0; �20)0, and Xnt; cn0 and Vnt are
generated from independent standard normal distributions and both the spatial weights matrices Wn and
Mn we use are the same rook matrices25 . We use T = 5, 10; 50, and n = 9, 16; 49. For each set of generated
sample observations, we calculate the ML estimator �nT and evaluate the bias �nT � �0. We do this for
1000 times to get 11000
P1000i=1 (�nT � �0)i. With two di¤erent values of �0 for each n and T , �nite sample
properties of both estimators are summarized in Table 1. For each case, we report the bias (Bias), empirical
standard deviation (E-SD), root mean square error (RMSE) and theoretical standard deviation (T-SD)26 .
Both approaches have the same estimate of �0 = (�00; �0; �0)
0 while the estimator of �20 by the direct approach
23 Instead of the �rst order conditions, one may also follow the analysis in Section 3 by investigating the two concentratedlikelihood functions of (�, �) by concentrating out � and �2.24For this direct approach, the asymptotic bias will be of the order O(max(1=n; 1=T )) and we can have bias corrected
estimators which have centered normal distributions as long as n=T 3 ! 0 and T=n3 ! 0. See Appendix D:2 for more details.25We use the rook matrix based on an r board (so that n = r2). The rook matrix represents a square tessellation with a
connectivity of four for the inner �elds on the chessboard and two and three for the corner and border �elds, respectively. Mostempirically observed regional structures in spatial econometrics are made up of regions with connectivity close to the range ofthe rook tessellation.26The T-SD is obtained from diagonal elements of the estimated Hessian matrix.
15
has a larger bias. The transformation approach yields a consistent estimator of �20 and the direct approach
does not, which can be seen from the last two columns in Table 1 when T is small. We can see that the
Biases, E-SDs, RMSEs and T-SDs for estimators of the �0 = (�00; �0; �0)
0 are small when either n or T are
large. Also, T-SDs are similar to E-SDs, which implies that the Hessian matrix provides proper estimates
for the variances of estimators. Also, when T is larger, the bias of the estimator of �20 by the direct approach
decreases.
We then generate samples from (4.1):
Ynt = �0WnYnt +Xnt�0 + cn0 + �tln + Unt; Unt = �0MnUnt + Vnt, t = 1; 2; :::; T ,
using the same n, T , �a0 , �b0, Wn and Mn. The Xnt; cn0, �T0 = (�1; �2; � � � ; �T ) and Vnt are generated from
independent standard normal distributions. The �nite sample properties of the estimators are summarized
in Table 2 and Table 3, where Table 2 is for the performance of the estimators using the transformation
approach in Section 4. Table 3 is for the performance of the estimators using both direct approaches discussed
in Section 5.1 and 5.2. We can see that the bias of the transformation approach is small. For the approach
that estimates the transformed time e¤ects directly, the bias is small when n is large, while the bias is
large when n is small even though T might be large. For the direct approach, it has the same estimate of
�0 = (�00; �0; �0)
0 as the approach that estimates the transformed time e¤ects directly, while the bias for the
estimate of �20 is small only when both n and T are large. This is consistent with the theoretical prediction.
Also, when both n and T are large, the biases of all the parameters from three approaches are small and the
RMSEs are reduced.
Table 1-3 here.
7 Conclusion
In this paper, we consider the estimation of a SAR panel model with �xed e¤ects and SAR disturbances
where the time periods T and/or the number of spatial units n can be �nite or large in all combinations
except that both T and n are �nite.
We �rst consider the SAR panel model with individual e¤ects. If T is �nite but n is large, we show that a
direct ML estimation by estimating jointly all the parameters including the �xed e¤ects will yield consistent
estimators except for the variance of disturbances. These features are similar to the direct ML estimation
of the linear panel regression model with �xed individual e¤ects. In this paper, we suggest a transformation
approach, which eliminates the individual �xed e¤ects and can provide consistent estimates for all the
parameters including the variance of disturbances. When the individual e¤ects are eliminated by taking
deviation from time average for each spatial unit, the resulted disturbances will be correlated over the time
16
dimension and there is linear dependence among the resulted disturbances. The transformation approach is
motivated by a ML approach which takes into account the generalized inverse of the resulted disturbances.
The transformation approach is shown to be a conditional likelihood approach if the disturbances were
normally distributed.
We consider next the SAR model with both individual and time �xed e¤ects. We investigate two possible
direct ML approaches for the estimation. The �rst direct approach is to transform the data to eliminate the
individual e¤ects and then estimates the remaining parameters including the time e¤ects by the ML method.
The second direct approach is to estimate both individual and time e¤ects directly. We show that the �rst
direct ML approach will yield inconsistent estimates for all the parameters, unless n is large; and the second
direct approach will yield inconsistent estimates only when both n and T are large. In fact, these two direct
ML approaches provide identical estimates of the spatial e¤ects and the regression coe¢ cients except for
the estimates of �20. These results are in contradiction with those of the direct ML estimation of the panel
regression models with both individual and time e¤ects where the regression coe¢ cients can be consistently
estimated as long as either n or T is large. Consistent estimation based on transformations is available,
where both the individual and time e¤ects can be eliminated by proper transformations. All the parameter
estimates are consistent when either n or T is large. Monte Carlo results are provided to illustrate �nite
sample properties of the various estimators with n and/or T being small or moderately large.
Compared with Baltagi et al. (2003), Baltagi et al. (2007) and Kapoor et al. (2007) where random e¤ects
are assumed, the SAR model in this paper considers a �xed e¤ects speci�cation. The proposed estimation
methods are robust regardless of the di¤erent speci�cations in Baltagi et al. (2003) and Kopoor et al. (2007),
and are computationally simpler than the ML approach for the estimation of the generalized random e¤ects
model in Baltagi et al. (2007). However, when the individual e¤ects are random in the true DGP, proper
methods which take into account the random e¤ects�variance structure can improve the e¢ ciency of the
estimates. Hausman�s type of speci�cation test of �xed e¤ects vs random e¤ects may also be constructed.
These may be investigated in the future research.
17
Appendices
A Transformation Approach
A.1 The First and Second Order Derivatives
For the �rst and second order derivatives of (2.5), we have
@ lnLn;T (�)
@�=
0BBB@1�2
PTt=1(Rn(�)
~Xnt)0 ~Vnt(�)
1�2
PTt=1(Rn(�)Wn
~Ynt)0 ~Vnt(�)� (T � 1)trGn(�)
1�2
PTt=1(Hn(�)
~Vnt(�))0 ~Vnt(�)� (T � 1)trHn(�)
12�4
PTt=1(
~V 0nt(�) ~Vnt(�)� nT�1T �2)
1CCCA , (A.1)
�@2 lnLn;T (�)
@�@�0=
0BBBBBBB@
1�2
PTt=1(Rn(�)
~Xnt)0Rn(�) ~Xnt � � �
1�2
PTt=1(Rn(�)Wn
~Ynt)0Rn(�) ~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0Rn(�)Wn
~Ynt+(T � 1)tr(G2n(�))
� �1�2
PTt=1(Hn(�)
~Vnt(�))0Rn(�) ~Xnt)
+ 1�2
PTt=1
~V 0nt(�)Mn~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0nHn(�) ~Vnt(�)
+ 1�2
PTt=1(MnWn
~Ynt)0 ~Vnt(�)
0 0
1�4
PTt=1
~V 0nt(�)Rn(�)~Xnt
1�4
PTt=1(Rn(�)Wn
~Ynt)0 ~Vnt(�) 0 0
1CCCCCCCA
+
0BBBB@0 0 0 00 0 0 0
0 01�2
PTt=1(Hn(�)
~Vnt(�))0Hn(�) ~Vnt(�)
+(T � 1)tr(H2n(�))
�
0 0 1�4
PTt=1(Hn(�)
~Vnt(�))0 ~Vnt(�) �n(T�1)
2�4 + 1�6
PTt=1(
~V 0nt(�) ~Vnt(�))
1CCCCA . (A.2)
At true �0, we have
1pn(T � 1)
@ lnLn;T (�0)
@�=
0BBBBBB@
1�20
1pn(T�1)
PTt=1
�X 0nt~Vnt
1�20
1pn(T�1)
PTt=1(
�Gn �Xnt�0)0 ~Vnt +
1�20
1pn(T�1)
PTt=1(
~V 0nt �G0n~Vnt � T�1
T �20tr�Gn)
1�20
1pn(T�1)
PTt=1(
~V 0ntH0n~Vnt � T�1
T �20trHn)
12�40
1pn(T�1)
PTt=1(
~V 0nt ~Vnt � nT�1T �20)
1CCCCCCA ,(A.3)
and the information matrix is equal to ��0;nT = �E�
1n(T�1)
@2 lnLn;T (�0)@�@�0
�=
1
�20
0@ HnT � �01�(kX+1) 0 �01�(kX+1) 0 0
1A+0BB@0kX�kX � � �01�kX
1n tr
�Gsn �Gn) � �01�kX
1n tr(H
sn�Gn)
1n tr(H
snHn) �
01�kX1�20ntr( �Gn)
1�20ntr(Hn)
12�40
1CCA ; (A.4)
where we denote Asn = A0n + An for any n � n matrix An, Gn = WnS�1n , �Wn = RnWnR
�1n , �Gn =
�Wn(In��0 �Wn)�1, Hn =MnR
�1n , �Xnt = Rn ~Xnt and HnT =
1n(T�1)
PTt=1(
�Xnt, �Gn �Xnt�0)0( �Xnt, �Gn �Xnt�0).
A.2 Proof of Claim 1
To prove 1n(T�1) lnLn;T (�)�Qn;T (�)
p! 0 uniformly in � in any compact parameter space �:
Similarly to Lee (2004) and Yu et al. (2006), we can show that27
1
n(T � 1)
TXt=1
~V 0nt(�) ~Vnt(�)�1
n(T � 1)ETXt=1
~V 0nt(�) ~Vnt(�)p! 0 uniformly in �.
Hence, by using the fact that �2 is bounded away from zero in �,
1
n(T � 1) lnLn;T (�)�Qn;T (�) = �1
2�2
1
n(T � 1)
TXt=1
~V 0nt(�)~Vnt(�)�
1
n(T � 1)ETXt=1
~V 0nt(�)~Vnt(�)
!p! 0
uniformly in � in �.
To prove Qn;T (�) is uniformly equicontinuous in � in any compact parameter space �:
From (2.6),
QnT (�) = �1
2ln 2� � 1
2ln�2 +
1
n[ln jSn(�)j+ ln jRn(�)j]�
1
2�2n(T � 1)EXT
t=1~V 0nt(�) ~Vnt(�):
The uniform equicontinuity of Qn;T (�) can be shown similarly to Lee (2004) and Yu et al. (2006). �
A.3 Information Matrix
We can prove the nonsingularity of the limiting information matrix by using an argument by contradiction
(similar to Lee (2004)). Denote the limit of ��0;nT as ��0 where ��0;nT is (A.4), we need to prove that
��0c = 0 implies c = 0 where c = (c01; c2; c3; c4)
0, c2; c3; c4 are scalars and c1 is kX � 1 vector. If this is true,
then, columns of ��0 would be linear independent and ��0 would be nonsingular. Denote H� as the limit
of 1n(T�1)
PTt=1
�X 0nt�Xnt, H�� as the limit of 1
n(T�1)PT
t=1�X 0nt�Gn �Xnt�0, H�� = H0
�� and H� as the limit of1
n(T�1)PT
t=1(�Gn ~Xnt�0)
0 �Gn ~Xnt�0, then28
��0 =1
�20
0BBB@H� H�� 0kX�1 0kX�1
H�� H� + limn!1�20n tr(
�Gsn �Gn) limn!1�20n tr(H
sn�Gn) limn!1
1n tr(
�Gn)
01�kX limn!1�20n tr(H
sn�Gn) limn!1
�20n tr(H
snHn) limn!1
1n tr(Hn)
01�kX limn!11n tr(
�Gn) limn!11n tr(Hn)
12�20
1CCCA .Hence, ��0c = 0 implies
(1) H� � c1 +H�� � c2 = 0;27When n is large and T is �xed, the derivation is similar to Lee (2004) for the cross sectional SAR model. When T is large
and n could be �nite and large, the derivation is similar to Yu et al. (2006).28When n is �nite and T is large, we do not need the limit before each trace operator in the entries of ��0 .
19
(2) 1�20H�� � c1 +
�1�20H� + limn!1
1n tr(
�Gsn�Gn)�� c2 + limn!1
1n tr(H
sn�Gn)� c3
+ limn!11�20ntr( �Gn)� c4 = 0;
(3) limn!11n tr(H
sn�Gn)� c2 + limn!1
1n tr(H
snHn)� c3 + 1
�20limn!1
1n tr(Hn)� c4 = 0,
(4) limn!11n tr(
�Gn)� c2 + limn!11n tr(Hn)� c3 +
12�20
� c4 = 0.
The �rst equation implies c1 = �(H�)�1H�� � c2. Denote Cn = Gn � trGn
n In and Dn = Hn � trHn
n In
so that 1n tr(
�Gsn �Gn)� 2�tr �Gn
n
�2= 1
2n tr(CsnC
sn),
1n tr(H
snHn)� 2
�trHn
n
�2= 1
2n tr(DsnD
sn) and
1n tr(H
sn�Gn)�
2 trHn
ntr �Gn
n = 12n tr(C
snD
sn). From the third and fourth equations, we have
1
ntr(CsnD
sn)c2 +
1
ntr(Ds
nDsn)c3 = 0,
4
n2
htr(Hs
nHn)tr �Gn � tr(Hsn�Gn)trHn
ic2 +
1
n�20tr(Ds
nDsn)c4 = 0.
By eliminating c1; c3 and c4, the second equation becomes�limn!1
�1
�20
1
ntr(Ds
nDsn)�H� �H��(H�)
�1H��
�+�n
��� c2 = 0
where
�n =1
4n2�tr(CsnC
sn)tr(D
snD
sn)� tr2(CsnDs
n)�
(A.5)
is nonnegative by the Cauchy inequality. TheH��H��(H�)�1H�� is nonnegative by the Schwartz inequality.
The nonsingularity of ��0 follows from Assumption 7. �
A.4 Proof of Theorem 1
As EPT
t=1~V 0nt ~Vnt = n(T � 1)�20, at �0, (2.6) implies E lnLn;T (�0) = �
From the pure SAR panel model with SAR disturbances, using the information inequality, T1;n(�; �; �2) � 0
for any (�; �; �2). Also, T2;n;T (�; �; �) is a quadratic function of � and � given �.
Under the condition that the limit of HnT (�) is nonsingular, T2;n;T (�; �; �) > 0 given any � when-
ever (�; �) 6= (�0; �0). Hence, (�; �) is globally identi�ed. Given �0, �0 and �20 are the unique maxi-
mizer of the limiting function of T1;n(�; �; �2) under the condition that the limit of 1n ln
���20R�10n R�1n�� �
1n ln
���2n(�)R�1n (�)0R�1n (�)�� is not zero for � 6= �0.29 Hence, (�; �; �; �2) is globally identi�ed.
When the limit of HnT (�) is singular, �0 and �0 cannot be identi�ed from T2;n;T (�; �; �). Global iden-
ti�cation requires that the limit of T1;n(�; �; �2) is strictly less than zero. As T1;n(�; �; �2) � 0 by the
information inequality for the pure SAR model with SAR disturbances, the limit of T1;n(�; �; �2) is not zero
is equivalent to the limit of 1n ln
���20R�10n S�10n S�1n R�1n�� � 1
n ln���2n(�; �)R�1n (�)0S�1n (�)0S�1n (�)R�1n (�)
�� is notzero (similar to Lee (2004), Proof of Theorem 4.1). After �0, �0 and �
20 are identi�ed, given �0, �0 can be
identi�ed from T2;n;T (�; �; �).
Combined with uniform convergence and equicontinuity in Claim 1, the consistency follows. �
A.5 Proof of Claim 2
The central limit theorem of martingale di¤erence arrays can be applied. When T is �nite and n is large,
we can use the central limit theorem in Kelejian and Prucha (2001). When T is large and n could be �nite
and large, we can use the central limit theorem in Yu et al. (2006). �
A.6 Proof of Theorem 2
According to the Taylor expansion,pn(T � 1)(�nT��0) =
�� 1n(T�1)
@2 lnLn;T (��nT )@�@�0
��1��
1pn(T�1)
@ lnLn;T (�0)@�
�where 1p
n(T�1)@ lnLn;T (�0)
@�
d! N(0;��0 + �0) and ��nT lies between �0 and �nT . As � 1nT
@2 lnLn;T (��nT )@�@�0 =�
� 1nT
@2 lnLn;T (��nT )@�@�0 �
�� 1nT
@2 lnLn;T (�0)@�@�0
��+�� 1nT
@2 lnLn;T (�0)@�@�0 � ��0;nT
�+ ��0;nT where the �rst term is ��nT � �0 �Op(1) and the second term is Op
�1p
n(T�1)
�respectively30 , � 1
nT@2 lnLn;T (��nT )
@�@�0 = ��nT � �0 �
Op(1) + Op
�1p
n(T�1)
�+ ��0;nT . Because
��nT � �0 = op(1) and ��0;nT is nonsingular in the limit,
� 1n(T�1)
@2 lnLn;T (��nT )@�@�0 is invertible for large n or T and
�� 1n(T�1)
@2 lnLn;T (��nT )@�@�0
��1is Op(1). Then, it fol-
lows that �nT � �0 = Op
�1p
n(T�1)
�. Hence,
pn(T � 1)(�nT � �0) =
���0;nT +Op
�1p
n(T�1)
���1��
1pnT
@ lnLn;T (�0)@�
�. Using the fact that
���0;nT +Op
�1p
n(T�1)
���1= ��1�0;nT +Op
�1p
n(T�1)
�, we havep
n(T � 1)(�nT � �0)d! N(0;��1�0 (��0 +�0)�
�1�0). �
29This is equivalent to the identi�cation of a pure SAR model. See Proof of Theorem 4.1 in Lee (2004).30When n is large and T is �xed, the derivation is similar to Lee (2004) for the cross sectional SAR model. When T is large
and n could be �nite and large, the derivation is similar to Yu et al. (2006).
21
B Direct Approach: The First and Second Order Derivatives
For the concentrated likelihood function (3.2), the �rst and second order derivatives are
1pnT
@ lnLdn;T (�)
@�=
0BBBBB@1�2
1pnT
PTt=1(Rn(�)
~Xnt)0 ~Vnt(�)
1�2
1pnT
PTt=1
�(Rn(�)Wn
~Ynt)0 ~Vnt(�)� �2trGn(�)
�1�2
1pnT
PTt=1
�(Hn(�) ~Vnt(�))
0 ~Vnt(�)� �2trHn(�)�
12�4
1pnT
PTt=1(
~V 0nt(�)~Vnt(�)� n�2)
1CCCCCA , (B.1)
1
nT
@2 lnLdn;T (�)
@�@�0= � 1
nT
0BBBBBBB@
1�2
PTt=1(Rn(�)
~Xnt)0Rn(�) ~Xnt � � �
1�2
PTt=1(Rn(�)Wn
~Ynt)0Rn(�) ~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0Rn(�)Wn
~Ynt+Ttr(G2n(�))
� �1�2
PTt=1(Rn(�)
~Xnt)0Hn(�) ~Vnt(�)
+ 1�2
PTt=1(Mn
~Xnt)0 ~Vnt(�)
1�2
PTt=1(Rn(�)Wn
~Ynt)0nHn(�) ~Vnt(�)
+ 1�2
PTt=1(MnWn
~Ynt)0 ~Vnt(�)
0 0
1�4
PTt=1
~V 0nt(�)Rn(�) ~Xnt1�4
PTt=1(Rn(�)Wn
~Ynt)0 ~Vnt(�) 0 0
1CCCCCCCA
+
0BBBB@0 0 0 00 0 0 0
0 01�2
PTt=1(Hn(�)
~Vnt(�))0Hn(�) ~Vnt(�)
+T tr(H2n(�))
�
0 0 1�4
PTt=1(Hn(�)
~Vnt(�))0 ~Vnt(�) � nT
2�4 +1�6
PTt=1(
~V 0nt(�) ~Vnt(�))
1CCCCA . (B.2)
Hence,
1pnT
@ lnLdn;T (�0)
@�=
0BBBB@1�20
1pnT
PTt=1
�X 0nt~Vnt
1�20
1pnT
PTt=1(
�Gn �Xnt�0)0 ~Vnt +
1�20
1pnT
PTt=1(
~V 0nt �Gn ~Vnt � �20trGn)1�20
1pnT
PTt=1(
~V 0ntH0n~Vnt � �20trHn)
12�40
1pnT
PTt=1(
~V 0nt ~Vnt � n�20)
1CCCCA , (B.3)
and the information matrix is equal to �d�0;nT = �E�
1nT
@2 lnLdn;T (�0)
@�@�0
�where
�d�0;nT =
0BBBBBB@
1�20nT
PTt=1
�X 0nt�Xnt � � �
1�20nT
PTt=1(
�Gn �Xnt�0)0 �Xnt
1�20nT
PTt=1(
�Gn �Xnt�0)0 �Gn �Xnt�0
+T�1T
1n tr(
�G0n �Gn) +1n tr(
�G2n)� �
01�kXT�1T [ 1n tr(Hn
�Gn) +1n tr(H
0n�Gn)]
1n
�T�1T tr(H 0
nHn) + tr(H2n)�
�01�kX
T�1T
1�20ntr( �Gn)
T�1T
1�20ntr(Hn)
T�1T
12�40
1CCCCCCA .(B.4)
22
C Transformation Approach with Time Dummy
C.1 The First and Second Order Derivatives of (4.5)
Using trGn(�)� tr(JnGn(�)) = 11�� and tr(G
2n(�))� tr((JnGn(�))2) = 1
(1��)2 (see Lee and Yu (2007a)),
for the concentrated likelihood function (4.5), the �rst and second order derivatives are
@ lnLn;T (�)
@�=
0BBB@@ lnLn;T (�)
@�@ lnLn;T (�)
@�@ lnLn;T (�)
@�@ lnLn;T (�)
@�2
1CCCA =
0BBB@1�2
PTt=1(Rn(�)
~Xnt)0Jn ~Vnt(�)
1�2
PTt=1(Rn(�)Wn
~Ynt)0Jn ~Vnt(�)� (T � 1)trJnGn(�)
1�2
PTt=1(Hn(�)
~Vnt(�))0Jn ~Vnt(�)� (T � 1)trJnHn(�)
12�4
PTt=1(
~V 0nt(�)Jn ~Vnt(�)� (n� 1)T�1T �2)
1CCCA , (C.1)
@2 lnLn;T (�)
@�@�0=
0BBBBBBB@
1�2
PTt=1(Rn(�)
~Xnt)0JnRn(�) ~Xnt � � �
1�2
PTt=1(Rn(�)Wn
~Ynt)0JnRn(�) ~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0JnRn(�)Wn
~Ynt+(T � 1)tr(JnG2n(�))
� �1�2
PTt=1(Hn(�)
~Vnt(�))0JnRn(�) ~Xnt)
+ 1�2
PTt=1
~V 0nt(�)JnMn~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0nJnHn(�) ~Vnt(�)
+ 1�2
PTt=1(MnWn
~Ynt)0Jn ~Vnt(�)
0 0
1�4
PTt=1
~V 0nt(�)JnRn(�)~Xnt
1�4
PTt=1(Rn(�)Wn
~Ynt)0Jn ~Vnt(�) 0 0
1CCCCCCCA
+
0BBBB@0 0 0 00 0 0 0
0 01�2
PTt=1(Hn(�)
~Vnt(�))0JnHn(�) ~Vnt(�)
+(T � 1)tr(JnH2n(�))
�
0 0 1�4
PTt=1(Hn(�)
~Vnt(�))0Jn ~Vnt(�) � (n�1)(T�1)
2�4 + 1�6
PTt=1(
~V 0nt(�)Jn ~Vnt(�))
1CCCCA . (C.2)
From (C.1), the score vector and the information matrix are
1p(n� 1)(T � 1)
@ lnLn;T (�0)
@�
=
0BBBBBBBBBB@
1
�20
p(n�1)(T�1)
TPt=1( �X 0
ntJn~Vnt)
1
�20
p(n�1)(T�1)
TPt=1
�( �Gn �Xnt�0)
0Jn ~Vnt
�+ 1
�20
p(n�1)(T�1)
TPt=1( ~V 0nt
�GnJn ~Vnt � T�1T �20trJn
�Gn)
1
�20
p(n�1)(T�1)
TPt=1( ~V 0ntHnJn ~Vnt � T�1
T �20trJnHn)
1
2�40
p(n�1)(T�1)
TPt=1( ~V 0ntJn ~Vnt � T�1
T (n� 1)�20)
1CCCCCCCCCCA,(C.3)
��0;nT =1
�20
0@ HnT � �01�kX 0 �01�kX 0 0
1A+0BBB@0kX�kX � � �01�kX
1n�1 tr(
�GsnJn�Gn) � �
01�kX1
n�1 tr(HsnJn �Gn)
1n�1 tr(H
snJnHn) �
01�kX1
�20(n�1)tr(Jn �Gn)
1�20(n�1)
tr(JnHn)12�40
1CCCA , (C.4)
where HnT =1
(n�1)(T�1)
TPt=1( �Xnt; Gn �Xnt�0)
0Jn( �Xnt; Gn �Xnt�0).
C.2 Proof for Theorem 3
This is similar to proof of Theorem 1 and 2.
23
D Direct Approaches with Time Dummy
D.1 The First and Second Order Derivatives of (5.3)
For the concentrated likelihood function (5.3), the �rst and second order derivatives are
@ lnLdn;T (�)
@�=
0BBB@1�2
PTt=1((Rn(�)
~Xnt)0Jn ~Vnt(�)
1�2
PTt=1(JnRn(�)Wn
~Ynt)0 ~Vnt(�)� (T � 1)trGn(�)
1�2
PTt=1(JnHn(�)
~Vnt(�))0 ~Vnt(�)� (T � 1)trHn(�)
12�4
PTt=1(
~V 0nt(�)Jn~Vnt(�)� nT�1T �2)
1CCCA , (D.1)
�@2 lnLdn;T (�)
@�@�0=
0BBBBBBB@
1�2
PTt=1(Rn(�)
~Xnt)0JnRn(�) ~Xnt � � �
1�2
PTt=1(Rn(�)Wn
~Ynt)0JnRn(�) ~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0JnRn(�)Wn
~Ynt+(T � 1)tr(G2n(�))
� �1�2
PTt=1(Rn(�)
~Xnt)0JnHn(�) ~Vnt(�)
+ 1�2
PTt=1(Mn
~Xnt)0Jn ~Vnt(�)
1�2
PTt=1(Rn(�)Wn
~Ynt)0nJnHn(�) ~Vnt(�)
+ 1�2
PTt=1(MnWn
~Ynt)0Jn ~Vnt(�)
0 0
1�4
PTt=1
~V 0nt(�)JnRn(�) ~Xnt1�4
PTt=1(Rn(�)Wn
~Ynt)0Jn ~Vnt(�) 0 0
1CCCCCCCA(D.2)
+
0BBBB@0 0 0 00 0 0 0
0 01�2
PTt=1(Hn(�)
~Vnt(�))0JnHn(�) ~Vnt(�)
+(T � 1)tr(H2n(�))
�
0 0 1�4
PTt=1(Hn(�)
~Vnt(�))0Jn ~Vnt(�) �n(T�1)
2�4 + 1�6
PTt=1(
~V 0nt(�)Jn~Vnt(�))
1CCCCA .
Hence, for the �rst order derivative evaluated at �0, we have
1pn(T � 1)
@ lnLdn;T (�0)
@�=
0BBBBBBBBBB@
1�20
1pn(T�1)
TPt=1
�X 0ntJn ~Vnt
1�20
1pn(T�1)
TPt=1( �Gn �Xnt�0)
0Jn ~Vnt +1�20
1pn(T�1)
TPt=1( ~V 0nt �G
0nJn ~Vnt � �20 T�1T trGn)
1�20
1pn(T�1)
TPt=1( ~V 0ntH
0nJn ~Vnt � �20 T�1T trHn)
12�40
1pn(T�1)
TPt=1( ~V 0ntJn ~Vnt � T�1
T n�20)
1CCCCCCCCCCA.
(D.3)
For the information matrix, denote �d�0;nT = �E�
1n(T�1)
@2 lnLdn;T (�0)
@�@�0
�, we have
�d�0;nT =1
�20
0@ HdnT � �
01�kX 0 �01�kX 0 0
1A+0BBB@0kX�kX � � �01�kX
1n
htr( �G0nJn �Gn) + tr( �G
2n)i
� �01�kX
1n tr(H
snJn �Gn)
1n
�tr(H 0
nJnHn) + tr(H2n)�
�01�kX
1�20ntr(Jn �Gn)
1�20ntr(JnHn)
12�40
1CCCA ,(D.4)
where HdnT =
1n(T�1)
TPt=1( �Xnt; �Gn �Xnt�0)
0Jn( �Xnt; �Gn �Xnt�0).
24
D.2 The First and Second Order Derivatives of (5.5) and Asymptotic Bias
The �rst and second order derivatives of the concentrated log likelihood in (5.5) are
@ lnLdn;T (�)
@�=
0BBB@1�2
PTt=1(Rn(�)
~Xnt)0Jn ~Vnt(�)
1�2
PTt=1(Rn(�)Wn
~Ynt)0Jn ~Vnt(�)� T trGn(�)
1�2
PTt=1(Hn(�)
~Vnt(�))0Jn ~Vnt(�)� T trHn(�)
12�4
PTt=1(
~V 0nt(�)Jn ~Vnt(�)� n�2)
1CCCA , (D.5)
�@2 lnLdn;T (�)
@�@�0=
0BBBBBBB@
1�2
PTt=1(Rn(�)
~Xnt)0JnRn(�) ~Xnt � � �
1�2
PTt=1(Rn(�)Wn
~Ynt)0JnRn(�) ~Xnt
1�2
PTt=1(Rn(�)Wn
~Ynt)0JnRn(�)Wn
~Ynt+T tr(G2n(�))
� �1�2
PTt=1(Rn(�)
~Xnt)0JnHn(�) ~Vnt(�)
+ 1�2
PTt=1(Mn
~Xnt)0Jn ~Vnt(�)
1�2
PTt=1(Rn(�)Wn
~Ynt)0nJnHn(�)
~Vnt(�)
+ 1�2
PTt=1(MnWn
~Ynt)0Jn ~Vnt(�)
0 0
1�4
PTt=1
~V 0nt(�)JnRn(�) ~Xnt1�4
PTt=1(Rn(�)Wn
~Ynt)0Jn ~Vnt(�) 0 0
1CCCCCCCA
+
0BBBB@0 0 0 00 0 0 0
0 01�2
PTt=1(Hn(�)
~Vnt(�))0JnHn(�) ~Vnt(�)
+T tr(H2n(�))
�
0 0 1�4
PTt=1(Hn(�)
~Vnt(�))0Jn ~Vnt(�) � nT
2�4 +1�6
PTt=1(
~V 0nt(�)Jn~Vnt(�))
1CCCCA . (D.6)
For the �rst order derivative evaluated at �0, it has three components such that
@ lnLdn;T (�0)
@�=@ lnLd;un;T (�0)
@�� n � a�0;n;1 � (T � 1) � a�0;2 (D.7)
where
@ lnLd;un;T (�0)
@�=
0BBBBBBBBBB@
1�20
TPt=1
�X 0ntJn ~Vnt
1�20
TPt=1( �Gn �Xnt�0)
0Jn ~Vnt +1�20
TPt=1( ~V 0nt �G
0nJn ~Vnt � �20 T�1T tr �G0nJn)
1�20
TPt=1( ~V 0ntH
0nJn ~Vnt � �20 T�1T trH 0
nJn)
12�40
TPt=1( ~V 0ntJn ~Vnt � T�1
T (n� 1)�20)
1CCCCCCCCCCA,
a�0;n;1 = (01�kX ;1n trGn;
1n trHn;
12�20)0 and a�0;2 = (01�kX ;
11��0 ;
11��0
; 12�20)0. For the information matrix,
denote �d�0;nT = �E�
1nT
@2 lnLdn;T (�0)
@�@�0
�and Hd
nT =1nT
TPt=1( �Xnt; �Gn �Xnt�0)
0Jn( �Xnt; �Gn �Xnt�0), we have
�d�0;nT =1
�20
0@ HdnT � �
01�kX 0 �01�kX 0 0
1A+0BBB@0kX�kX � � �01�kX
1n
htr( �G0nJn �Gn) + tr( �G
2n)i
� �01�kX
1n tr(H
snJn
�Gn)1n
�tr(H 0
nJnHn) + tr(H2n)�
�01�kX
1�20ntr(Jn �Gn)
1�20ntr(JnHn)
12�40
1CCCA .
As 1pnT
@ lnLd;un;T (�0)
@� will be normally distributed asymptotically, we can see that the estimators from this
direct approach will have O(1=T ) bias 1T (�
d�0;nT
)�1a�0;n;1 and O(1=n) bias1n (�
d�0;nT
)�1a�0;2. Similar to
25
Lee and Yu (2007a), a bias correction procedure can be designed to eliminate the bias. Denote �nT as the
QMLE that solves (5:5), the bias corrected estimator can be �d1
nT = �d
nT �B1;nT
T � B2;nT
n where B1;nT =h�(�d�;nT )�1 � a�;n;1
i����=�
dnT
and B2;nT =h�(�d�;nT )�1 � a�;2
i����=�
dnT
. Similar to Lee and Yu (2007a), it can
be shown that when n=T 3 ! 0 and T=n3 ! 0, �d1
nT ispnT consistent and asymptotically centered normal.
26
References
Amemiya, T., 1971. The estimation of the variances in a variance-components model. International Eco-
nomic Review 12, 1-13.
Amemiya, T., 1985. Advanced Econometrics. Harvard University Press, Cambridge, MA.
Anderson, T.W. and C. Hsiao, 1981. Estimation of dynamic models with error components. Journal of the
American Statistical Association 76, 598-606.
Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic, The Netherlands.
Anselin, L. and A.K. Bera, 1998. Spatial dependence in linear regression models with an introduction to
spatial econometrics, A. Ullah and D.E.A. Giles (eds.). Handbook of Applied Economics Statistics, Marcel
Dekker, New York.
Arellano, M. and O. Bover, 1995. Another look at the instrumental-variable estimation of error-components
models. Journal of Econometrics 68, 29-51.
Baltagi, B., S.H. Song and W. Kon, 2003. Testing panel data regression models with spatial error correlation.
Journal of Econometrics 117, 123-150.
Baltagi, B., P. Egger and M. Pfa¤ermayr, 2007. A generalized spatial panel data model with random e¤ects.
Working Paper, Syracuse University.
Chamberlain, G., 1982. Multivariate regression models for panel data. Journal of Econometrics 18, 5-46.
Cli¤, A.D. and J.K. Ord, 1973. Spatial Autocorrelation. London: Pion Ltd.
Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York.
Ertur C. and W. Koch, 2007. Growth, technological interdependence and spatial externalities: theory and
evidence. Journal of Applied Econometrics 22, 1033-1062.
Foote, C.L., 2007. Space and time in macroeconomic panel data: young workers and state-level unemploy-
ment revisited. Working Paper No. 07-10, Federal Reserve Bank of Boston.
Hahn, J. and Kuersteiner, 2002. Asymptotically unbiased inference for a dynamic panel model with �xed
e¤ects when both n and T are Large. Econometrica 70, No.4, 1639-1657.
Hahn, J. and H.R. Moon, 2006. Reducing bias of MLE in a dynamic panel model. Econometric Theory 22,
499-512.
Hausman, J.A., 1978. Speci�cation tests in econometrics. Econometrica 46, 1251-1271.
Hsiao, C., 1986. Analysis of Panel Data. Cambridge University Press.
Kapoor, M., Kelejian, H.H. and I.R. Prucha, 2007. Panel data models with spatially correlated error
components. Journal of Econometrics, 140, 97-130.
Kelejian, H.H. and I.R. Prucha, 1998. A generalized spatial two-stage least squares procedure for estimating
a spatial autoregressive model with autoregressive disturbance. Journal of Real Estate Finance and
Economics 17:1, 99-121.
27
Kelejian H.H. and I.R. Prucha, 2001. On the asymptotic distribution of the Moran I test statistic with
applications. Journal of Econometrics, 104, 219-257.
Kelejian, H.H. and D. Robinson, 1993. A suggested method of estimation for spatial interdependent models
with autocorrelated errors, and an application to a county expenditure model. Papers in Regional Science
72, 297-312.
Kelejian H.H. and I.R. Prucha, 2007. Speci�cation and estimation of spatial autoregressive models with
autoregressive and heteroskedastic disturbances. Forthcoming in Journal of Econometrics.
Lee, L.F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric
models. Econometrica 72, 1899-1925.
Lee, L.F., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. Journal of
Econometrics 137, 489-514.
Lee, L.F. and X. Liu, 2006. E¢ cient GMM estimation of a spatial autoregressive model with autoregressive
disturbances. Working Paper, The Ohio State University.
Lee, L.F., X. Liu and X. Lin, 2008. Speci�cation and estimation of social interaction models with network
structure, contexual factors, correlation and �xed e¤ects. Working Paper, The Ohio State University.
Lee, L.F. and J. Yu, 2007a. A spatial dynamic panel data model with both time and individual �xed e¤ects.
Working Paper, The Ohio State University.
Lee, L.F. and J. Yu, 2007b. Near unit root in the spatial autoregressive model. Working Paper, The Ohio
State University.
Lin, X. and L.F. Lee, 2005. GMM estimation of spatial autoregressive models with unknown heteroskedas-
ticity. Working Paper, The Ohio State University. Forthcoming in Journal of Econometrics.
Nerlove, M., 1971. A note on error components models. Econometrica 39, 383-396.
Neyman, J. and E.L. Scott, 1948. Consistent estimates based on partially consistent observations. Econo-