Top Banner
ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3 Econometrics Journal (2005), volume 8, pp. 115–142. Moment approximation for least-squares estimators in dynamic regression models with a unit root J AN F. KIVIET AND GARRY D. A. PHILLIPS Tinbergen Institute & Faculty of Economics and Econometrics, Universiteit van Amsterdam, Roetersstraat 11, 1018 WB Amsterdam, The Netherlands E-mail: [email protected] Cardiff Business School, Aberconway Building, Colum Drive, CF10 3EU, Cardiff, Wales, UK E-mail: [email protected] Received: February 2005 Summary To find approximations for bias, variance and mean-squared error of least-squares estimators for all coefficients in a linear dynamic regression model with a unit root, we derive asymptotic expansions and examine their accuracy by simulation. It is found that in this particular context useful expansions exist only when the autoregressive model contains at least one non-redundant exogenous explanatory variable. Surprisingly, the large-sample and small- disturbance asymptotic techniques give closely related results, which is not the case in stable dynamic regression models. We specialize our general expressions for moment approximations to the case of the random walk with drift model and find that they are unsatisfactory when the drift is small. Therefore, we develop what we call small-drift asymptotics which proves to be very accurate, especially when the sample size is very small. Keywords: Asymptotic expansions, Dynamic regression, Finite sample bias, Moment approximation, Small-drift asymptotics, Unit root. 1. INTRODUCTION A huge literature in statistics focuses on finding the first and second moments of estimators for the parameters in classes of non-standard models, aiming at high accuracy in finite samples. In dynamic regression models with normally distributed white noise disturbances least-squares (maximum likelihood) estimators are seriously biased in small samples. Sawa (1978) used numerical integration of moment generating functions to evaluate the exact bias and variance of the least-squares estimator of the lagged-dependent variable coefficient in the case of a constant but no exogenous variables, that is, in the stable AR(1) model. This work was corrected by Nankervis An early version of this paper was presented at the Econometric Society European Meeting of 1997 in Toulouse, and a later version has been catalogued as Tinbergen Institute discussion paper 2001-118/4. We are grateful for the stimulating criticism by two anonymous referees and by Co-ordinating Editor Karim Abadir, and for the many constructive remarks that we received from various colleagues over several years. C Royal Economic Society 2005. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA, 02148, USA.
28

Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

Mar 22, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Econometrics Journal (2005), volume 8, pp. 115–142.

Moment approximation for least-squares estimators

in dynamic regression models with a unit root∗

JAN F. KIVIET† AND GARRY D. A. PHILLIPS‡

†Tinbergen Institute & Faculty of Economics and Econometrics,Universiteit van Amsterdam, Roetersstraat 11, 1018 WB Amsterdam,

The NetherlandsE-mail: [email protected]

‡Cardiff Business School, Aberconway Building, Colum Drive, CF10 3EU,Cardiff, Wales, UK

E-mail: [email protected]

Received: February 2005

Summary To find approximations for bias, variance and mean-squared error of least-squaresestimators for all coefficients in a linear dynamic regression model with a unit root, we deriveasymptotic expansions and examine their accuracy by simulation. It is found that in thisparticular context useful expansions exist only when the autoregressive model contains at leastone non-redundant exogenous explanatory variable. Surprisingly, the large-sample and small-disturbance asymptotic techniques give closely related results, which is not the case in stabledynamic regression models. We specialize our general expressions for moment approximationsto the case of the random walk with drift model and find that they are unsatisfactory when thedrift is small. Therefore, we develop what we call small-drift asymptotics which proves to bevery accurate, especially when the sample size is very small.

Keywords: Asymptotic expansions, Dynamic regression, Finite sample bias, Momentapproximation, Small-drift asymptotics, Unit root.

1. INTRODUCTION

A huge literature in statistics focuses on finding the first and second moments of estimators forthe parameters in classes of non-standard models, aiming at high accuracy in finite samples.In dynamic regression models with normally distributed white noise disturbances least-squares(maximum likelihood) estimators are seriously biased in small samples. Sawa (1978) usednumerical integration of moment generating functions to evaluate the exact bias and variance of theleast-squares estimator of the lagged-dependent variable coefficient in the case of a constant butno exogenous variables, that is, in the stable AR(1) model. This work was corrected by Nankervis

∗An early version of this paper was presented at the Econometric Society European Meeting of 1997 in Toulouse, and alater version has been catalogued as Tinbergen Institute discussion paper 2001-118/4. We are grateful for the stimulatingcriticism by two anonymous referees and by Co-ordinating Editor Karim Abadir, and for the many constructive remarksthat we received from various colleagues over several years.

C© Royal Economic Society 2005. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 MainStreet, Malden, MA, 02148, USA.

Page 2: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

116 Jan F. Kiviet and Garry D. A. Phillips

and Savin (1988), and extended by Hoque and Peters (1986) to the case of included exogenousvariables under normality assumptions, while Peters (1987) analysed the same ARX(1) model withnon-normal disturbances. These papers did not provide explicit expressions for moments that allowfurther analysis, but they did yield useful numerical results for different disturbance structuresand specific exogenous data series. Such numerical results, however, can also be obtained to ahigh degree of accuracy by straightforward simulation.

An alternative approach to obtaining the moments of non-standard econometric estimators is tofind explicit analytic expressions, for instance by higher-order asymptotic approximations. Thesehave the advantage that they may allow further theoretical investigations and suggest correctionsto standard estimators. In the context of stable ARX(1) models, this was the method followedby Grubb and Symons (1987) who used large-T asymptotics, where T is the sample size, in thetradition of Kendall (1954). They derived an expression for the bias to the order of T −1 of thelagged-dependent variable coefficient, while the present authors—henceforth referred to as KP—analysed the bias of the full coefficient vector, see KP (1993). Later, KP (1994) extended theanalysis to—and suggest bias-reduction techniques for—the higher-order dynamic regressionmodel, that is, ARX(p), and Kiviet et al. (1995) generalize this for the dynamic seeminglyunrelated regression model. More recently, KP (1998b) found the bias to the order of T −2 inthe stable ARX(1) model. The moment approximations in dynamic regression models referred toabove were all obtained in stable models with stationary or non-stationary exogenous regressors.Although the large-T approximations, in particular the second-order approximations, are oftenremarkably accurate, it has also been demonstrated that they are of limited use for models wherethe AR parameter is close to unity. In KP (1998b) it was even established that for near unit rootmodels, an approximation to the order of T −2 is generally much more vulnerable than the simplerapproximation to order T −1.

In econometrics there are two main approaches to finding asymptotic approximations to themoments of estimators in models with random regressors. The first was introduced by Nagar(1959), who found large sample approximations to the moments of consistent k-class estimatorsin a static simultaneous equation model, while a second alternative procedure was employedin the same model by Kadane (1971) based upon small-disturbance asymptotics. This yieldedsmall-σ asymptotic approximations which, remarkably, were essentially the same as the large-T ones. However, KP (1993, 1994) compared bias approximations from these two approachesand found that they can produce quite different results for least-squares estimators in dynamicregression models. In particular, the large-T approximation (which was also used by Grubb andSymons) was found to be superior, both theoretically and numerically, although Evans and Savin(1984) had already established equivalence between the first-order asymptotic distribution of thelagged-dependent variable coefficient estimator when obtained from large-T or small-σ methods.

Given the current interest in non-stationary models, a natural extension of the KP workis to a model in which the stability assumption is relaxed. However, the joint occurrence ofexogenous—possibly non-stationary—regressors and a unit root, which generates complicatedforms of stochastic and deterministic trends in the dependent variable, does have profound effectson the order of magnitude of the relevant terms in asymptotic expansions, so that earlier resultsfor the stable model cannot simply be extrapolated for the unit root case. Therefore, in this paperwe develop expansions specially designed for non-stable regressions. As we shall see, this willlead to a better understanding for a much wider class of models of the underlying reasons for theoccurrence of (lack of) correspondence between large-T and small-σ asymptotic results.

Approximations for moments of least-squares estimators in normal unit root models have beenexamined before. In the model with no regressors and zero start-up, Abadir (1993) provides a

C© Royal Economic Society 2005

Page 3: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 117

closed-form analytical approximation for the bias in terms of cylinder functions; he also developsa simpler heuristic approximation. Similar results for VAR systems have been derived in Abadiret al. (1999), and Lawford and Stamatogiannis (2004) build on this and produce response surfacesfor OLS bias and variance in possibly over-parameterized pure unit root VARs. However, resultsfor unit root models including arbitrary regressors are not yet available. The derivation of explicitanalytic expressions for the leading terms of estimator bias and of higher-order approximationsto the true variance of least-squares estimators in such models is of great importance. Suchexpressions give insight into the limitations of first-order asymptotic results, and when the latterprove to be inaccurate the higher-order asymptotic approximations can be used to correct standardparameter estimators and their variance estimators, as has been demonstrated in KP (1994, 1998a)and also, for instance, by MacKinnon and Smith (1998). The need for bias-reduction methods inunit root models is also expounded in Abadir (1995).

Therefore, in this paper we examine the moments of the least-squares estimator in the singlenormal ARX(1) model with an arbitrary number of exogenous regressors when the true coefficientof the lagged-dependent variable is unity. Our major achievements are: (i) derivation of anapproximation accurate to order T −3 for the bias of the lagged-dependent variable coefficient(this bias is of order T −2 when the exogenous regressors are stationary); (ii) presentation of anapproximation accurate to order T−4 for the mean-squared error (MSE) of this coefficient (whenthe bias is of order T −2, then the variance and the MSE are of order T −3); (iii) demonstratingthat in this case the large-T and small-σ approximations produce results, which are very closelyrelated, with—unlike in the stable case—the small-σ results potentially slightly superior here;(iv) derivation of simplified formulas for the random-walk with drift model, and development ofurgently needed special small-drift asymptotic approximations; (v) we also obtain approximationsto the first two moments of the full vector of coefficients for unit root models containing anarbitrary number of exogenous regressors that may be either stationary or non-stationary; and (vi)provide numerical illustrations of the relevance and accuracy of all these higher-order asymptoticapproximations in finite samples.

This study is organized as follows. In Section 2, we focus on obtaining asymptotic expansionsfor the lagged-dependent variable coefficient estimator in ARX(1) models with a unit root. Inseparate subsections we distinguish large sample and small-disturbance asymptotic methods,obtain actual expansions and highlight the special role played by the so-called base of an expansion,identify existence problems for expansions when all regressors are redundant and overcomeinaccuracy problems—arising when regressors are close to being redundant—by exploiting analternative asymptotic sequence. In Section 3, we exploit the expansions to obtain various generallarge-T and small-σ approximations to finite sample bias, variance and MSE in the unit rootARX(1) model, and specialize these results for the random walk with drift case. Section 4 extendsthe results for the full coefficient vector of a general unit root ARX(1) model. In Section 5,we investigate the accuracy of the analytic expressions via simulation methods and Section 6concludes. Proofs are given in Appendices.

2. EXPANSIONS FOR THE UNIT ROOT COEFFICIENT

2.1. Two types of expansions

In the Nagar approach to finding moment approximations we commence by expressing theestimation error in terms of stochastic components which are of decreasing order of magnitude

C© Royal Economic Society 2005

Page 4: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

118 Jan F. Kiviet and Garry D. A. Phillips

in terms of the sample size T . In particular, we determine a positive constant δ such that for anestimator λ of the unknown parameter λ we have the expansion

T δ (λ − λ) = a0 + T −1/2a1 + T −1a2 + T −3/2a3 + · · · + T −q/2aq + T −(q+1)/2rq+1, (1)

where aj, j = 0, . . . , q and the remainder r q+1 are all Op(1) as T → ∞. Notice that the first-order

asymptotic distribution is determined by the leading term, that is, T δ (λ − λ)L→ a0 as T → ∞ . For

least-squares estimators in classic and in stationary dynamic regression models δ = 12 , but—as

we will see—it may take other values in dynamic regressions involving non-stationarity.The small-disturbance approach requires that a normalized estimation error be represented in

terms of stochastic components which are of decreasing order of magnitude with respect to thestandard deviation of the disturbance term, σ . Typically, the expansion takes the form

σ−1(λ − λ) = a0 + σ a1 + σ 2a2 + σ 3a3 + · · · + σ q aq + σ q+1rq+1, (2)

where ai , i = 0, . . . , q and rq+1 are all bounded in probability as σ → 0.When large-T or small-σ expansions have been found, moment approximations can be

obtained by dividing the corresponding moments of the retained terms in the expansion by thenormalizing constant (i.e. T δ or σ−1). However, there is no standard approach to finding theseexpansions; we shall return to this point later and demonstrate when the alternative approachesare feasible and how they are related.

2.2. Implementation in a unit root model

The autoregressive model of interest will be written as

y = λy−1 + Xβ + σε, (3)

where y = (y1, . . . , yT )′ is a T × 1 vector of observations on a dependent variable, y−1 is the yvector lagged one time period, that is, y−1 = (y0, . . . , y T −1)′, X is a T × k regressor matrix andu := σε is the T × 1 disturbance vector, and initially we make

Assumption 1.. In model (3) the scalar λ and the k × 1 vector β are unknown coefficients andwe have: (i) a positive unit root, that is, λ = 1; (ii) all elements of β are finite and not allare redundant, that is, β �= 0; (iii) the regressors are stationary, so for their realisations we haveX ′ X = O(T ); (iv) the T × (k + 1) matrix Z = (y−1 : X ) has rank(Z ) = k + 1 with probability one;(v) the regressors X are strongly exogenous, that is, X and u are independent; (vi) the disturbancesfollow u ∼ N(0, σ 2 I T ), 0 < σ < ∞; (vii) the start-up value is y0 ∼ N(y0, ω

2σ 2), 0 ≤ ω < ∞;(viii) y0 and u are independent.

We shall examine the moments of the least-squares estimators of λ and β conditional on X andy0. In particular, we focus on the bias, variance and MSE of these estimators in finite samplesunder Assumption 1, but will also consider some of the effects of relaxing (ii), (iii) and (vi).

The least-squares estimator for λ in (3) is given by

λ = y′−1 My

y′−1 My−1

= λ + σy′−1 Mε

y′−1 My−1

, (4)

C© Royal Economic Society 2005

Page 5: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 119

where M = IT − X (X ′ X )−1 X ′. We may write

y−1 = y0c(λ) + C(λ)Xβ + σC(λ)ε, (5)

where c(λ) is a T × 1 vector with (c(λ))t = λt−1 and c(λ) a T × T lower triangular matrix withzeroes on its main diagonal and remaining elements (C(λ))ts = λt−s−11t>s , t , s = 1, . . . , T . In theunit root case these specialize to ι := c(1), which is just a vector with all elements unity, whereasJ := C(1) has zeroes on and above its main diagonal and components unity below. Hence, underAssumption 1 we have y−1 = y0ι + JXβ + σ Jε. In Appendix A various properties of expressionsin ι and J are collected, which will be used in the derivations to follow. Due to the unit root thesediffer markedly from those given for C(λ) in Appendix A of KP (1998a), where |λ| < 1.

In the presence of a constant in the model, or more generally under Mι = 0, we find thatMy−1 = MJXβ + σMJε does not depend on y0. Then it follows from (4) that the distribution ofλ is invariant with respect to the assumptions (vii) and (viii). This is in sharp contrast to the stablemodel, for which KP (1998b) showed to what extent finite sample bias is affected by the actualvalue and stochastic properties of the start-up value y0.

From (4) and (5) we find for the estimation error of the unit root model with intercept

λ − 1 = σβ ′ X ′ J ′Mε + σ 2ε′ J ′Mε

β ′ X ′ J ′MJXβ + 2σβ ′ X ′ J ′MJε + σ 2ε′ J ′MJε, (6)

for which an expansion is to be developed. We first focus on obtaining large-T results. To proceed,we have to examine the orders of magnitude of all terms in the above ratio. Under Assumption 1,exploiting in particular β �= 0 and X ′ X = O(T ), this is done in Appendix B. The dominating termof the denominator will be used as a so-called base. The term β ′ X ′ J ′MJXβ = O(T 3) is found tobe the ‘largest’, and—since it is non-zero under (ii)—we can use it as a base. By defining µ :=(β ′ X ′ J ′MJXβ)−1 and multiplying both numerator and denominator of (7) by µ we obtain

λ − 1 = (µσβ ′ X ′ J ′Mε + µσ 2ε′ J ′Mε)(1 + 2µσβ ′ X ′ J ′M Jε + µσ 2ε′ J ′MJε)−1. (7)

This operation reduces the inverted factor to unity plus terms that are Op(T −κ ) with κ > 0;therefore, when this inverse is expanded as a power series, successive terms are of decreasingorder in probability. The two additional terms in the inverted factor of (7 ) are in fact of orderOp(T −1/2) and Op(T −1), respectively. A very simple expansion is thus

(1 + 2µσβ ′ X ′ J ′MJε + µσ 2ε′ J ′MJε)−1 = 1 − 2µσβ ′ X ′ J ′MJε + Op(T −1).

Since the first factor of (7) has two terms which are Op(T −3/2) and Op(T −2), respectively, it iseasily shown that, upon omitting in (7) all terms of stochastic magnitude op(T −2)

λ − 1 = µσβ ′ X ′ J ′Mε + µσ 2ε′ J ′Mε − 2µ2σ 2β ′ X ′ J ′Mεε′ J ′MJXβ + op(T −2). (8)

This reveals the Op(T −3/2) and Op(T −2) components of the estimation error. The leadingterm µσβ ′ X ′ J ′Mε determines the first-order asymptotic distribution of the estimator. UnderAssumption 1, it is readily shown that the limiting distribution is still normal, that is, T 3/2(λ −1)

L→ N(0, σ 2 limT →∞ T 3µ), but that the rate of convergence is faster (δ = 3/2) than in the stablemodel. This surprising result is well known now, see for example West (1988) and Banerjee et al.(1993, Chapter 6), who give particular attention to the case where X = ι and β is just a constantterm, yielding limT →∞T 3µ = 12/β2.

C© Royal Economic Society 2005

Page 6: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

120 Jan F. Kiviet and Garry D. A. Phillips

2.3. Coherence between large-T and small-σ

The large-T expansion (8) exhibits a close correspondence with a small-σ expansion. It involvesterms of non-decreasing order in σ , so that the expansions to Op(σ ) and Op(σ 2) coincide withthe expansions to Op(T −3/2) and Op(T −2), respectively. However, when focusing on bias, theexpansions again show some difference between small-σ and large-T approximations. Below,we will establish that the third term in the expansion (8), which is Op(T −2), has an expectationthat is actually O(T −3), and which is thus omitted from a large-T approximation to order T −2.The expectation of this third term is also O(σ 2), and therefore the O(σ 2) contribution to the biascontains—apart from the components of order of magnitude T −2—some contributions of orderT −3. So, surprisingly (because small-σ was found to be deficient in stable dynamic models), inthis unit root model, the first-order small-σ bias approximation is very effective and includes acontribution which is of second-order in a large-T sense.

To explain correspondence or disparity between large-T and small-σ expansions, we highlighthere some earlier specific findings on expansions of various estimators in different types ofregression models. In the stable dynamic model and using the approach followed by Kendall(1954), Grubb and Symons (1987) and KP (1993, 1994, 1998a,b) the largest non-stochasticcomponents of the denominator of (4) are collected by E(y′−1My−1), giving a base for thechosen expansion, which is generally affine with respect to σ 2. As a result the large-T andthe small-σ expansions yield qualitatively very different results, as shown in KP (1993, 1994),because any Op(T −κ ) term in the expansion involves powers of the inverse of the base, and thisinverse itself corresponds to an infinite power series in σ . Therefore, any finite-order small-σapproximation omits terms of order T −1, whereas a finite-order large-T approximation includesall such contributions irrespective of their order with respect to σ . Large-σ is thus inferior in stabledynamic regression. Note, however, that in the unit root model the denominator of (7) naturallycontains a decomposition of y′−1My−1 into its stochastic and non-stochastic parts such that the non-stochastic part β ′ X ′ J ′MJXβ = µ−1 is independent of σ 2, whereas it is also the ‘largest’ term andsubsequently may then form a suitable base for the expansion of the denominator in both the large-T and small-σ expansion. Close correspondence of large-T and small-σ asymptotic results hasearlier been established by the findings of Nagar (1959) and Kadane (1971) for consistent k-classestimators in a static simultaneous equation framework. As shown in KP (1996), this equivalencebreaks down, however, for the inconsistent least-squares estimator in the static simultaneousequations model, where an appropriate base for the large-T expansion is again affine in σ 2,whereas it is self-evidently independent of σ 2 in the small-σ approach.

A specific finding in KP (1993) is that in the stable model the small-σ expansion is not feasiblewhen y0 = 0 and β = 0, because the estimator λ is invariant with respect to σ in that case. Whenβ = 0 in the present unit root model (and y0 = 0 or Mι = 0) small-σ is again not feasiblebecause the estimation error (7) then reduces to a simple ratio of quadratic forms in standardnormal variates. Hence, its moments can be accurately determined by well-known numericalmethods, see Paolella (2003) for a recent overview. Nevertheless, it is instructive to examinelarge-T expansions for this case. We obtain

λ − 1 = σ 2ε′ J ′Mε

σ 2ε′ J ′MJε= ε′ J ′Mε

ε′ J ′MJε= ε′ J ′Mε

E (ε′ J ′MJε)

[1 + ε′ J ′MJε − E

(ε′ J ′MJε

)E (ε′ J ′MJε)

]−1

. (9)

Since ε′ J ′MJε −E (ε′ J ′MJε) = O p(T 2) and E (ε′ J ′MJε) = O(T 2), the random term in the inversefactor is Op(1) and not Op(T −κ ) with κ > 0 as would be required for a convergent expansion, that

C© Royal Economic Society 2005

Page 7: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 121

is, E(ε′ J ′MJε ) is an improper base, because it does not collect all the leading (i.e. largest) termsof the numerator ε′ J ′MJε . An alternative formulation is however

ε′ J ′Mε

ε′ J ′M Jε= ε′ J ′Mε

ε′ J ′ Jε

[1 − ε′ J ′ X (X ′ X )−1 X Jε

ε′ J ′ Jε

]−1

(10)

and now the random term in the inverse factor is Op(T −1/2) enabling a valid expansion. However,evaluation of moment approximations from this expansion requires the evaluation of products ofratios of stochastic terms, and hence is in fact more involved than obtaining the expectation of theleft-hand side simple ratio directly.

The above illustrates that preferably a base is non-random, but at the same time it shouldcollect all dominating terms, whereas it has to be independent of σ in order to yield coherencebetween large-T and small-σ results.

2.4. Small-drift asymptotics

As we shall see from the simulations in Section 5, the accuracy of our large-T or small-σ momentapproximations, to be obtained in Sections 3 and 4, deteriorates when β is close to zero. Therefore,it could be worthwhile to develop special approximations for the case where β gets local to zeroin some sense. Thus, alongside the cases β = O(1) and β = 0, where β is fixed, we examinedsequences β T = O(T −κ ) and βσ = O(σκ ) for κ > 0. The result is that for 0 < κ < 1

2 the originalexpansion is valid and so yields similar unsatisfactory results. For κ = 1

2 no valid expansion can befound, while for κ > 1

2 the largest term in the numerator of (7) is u′ J ′Mu and in the denominatoru′ J ′MJu. This implies the same problems as in the β = 0 case. Hence, expansions for β local tozero in a small-σ or in a large-T sense either do not exist, or are inaccurate, or they cannot beusefully employed in this context.

However, an expansion in which β may get small in its own right will prove to be useful. Wewill label this ‘small-drift asymptotics’, which operates as follows. Consider as a special caseunder Assumption 1 the random walk with drift model yt = λyt−1 + β + σε t (with start-up valuey0). Defining y∗

t = (yt − y0)/σ , the model can be rewritten as

y∗t = λy∗

t−1 + β∗ + εt , (11)

with λ = 1, β∗ = β /σ , y∗0 = 0 and ε t ∼ i.i.d. N(0, 1). Note that β∗ is the drift standardized by

σ . Defining A := IT − ιι′/T , we can rewrite (7) for this special case as

λ − 1 = ι′ J ′ Aεβ∗ + ε′ J ′ Aε

ι′ J ′AJιβ∗2 + 2ι′ J ′AJεβ∗ + ε′ J ′AJε, (12)

where now, when β∗ gets small, ε′ J ′AJε is the largest term in the denominator. Despite itsrandomness, we shall use it as a base, and from

λ − 1 = ε′ J ′ Aε + ι′ J ′ Aεβ∗

ε′ J ′AJε

[1 + 2ι′ J ′AJεβ∗ + ι′ J ′AJιβ∗2

ε′ J ′AJε

]−1

,

we find the expansion (accurate to order β∗2)

λ − 1 = ε′ J ′ Aε + ι′ J ′ Aεβ∗

ε′ J ′AJε

[1 − 2ι′ J ′AJεβ∗ + ι′ J ′AJιβ∗2

ε′ J ′AJε+ 4

(ι′ J ′AJε)2β∗2

(ε′ J ′AJε)2

]+ Op(β∗3).

C© Royal Economic Society 2005

Page 8: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

122 Jan F. Kiviet and Garry D. A. Phillips

Figure 1. Functions g0(T ) [——] and g1(T ) [– – –].

Taking expectations we lose all first-order terms in β∗ and obtain

E(λ − 1) = E

(ε′ J ′ Aε

ε′ J ′AJε

)

− β∗2E

[(ι′ J ′AJι)ε′ J ′ Aε + 2ε′AJιι′ J ′AJε

(ε′ J ′AJε)2− 4ε′ J ′ Aε(ι′ J ′AJε)2

(ε′ J ′AJε)3

]+ o(β∗2).

(13)

Note that the two right-hand side expectations are functions of T only. Since they do not involveany unknown parameters, they have to be calculated only once. The first term is the least-squaresbias of λ in the unit root model with zero drift (and any start-up), and the second term approximatesto order O(β∗2) the incrementation in bias due to a non-zero drift. We write (13) as

E(λ − 1) = g0(T ) − g1(T )β∗2 + o(β∗2). (14)

The functions g0 and g1 can be obtained either by sophisticated analytical approximations orby straightforward simulation. We chose the latter and examined T = 10, 11, . . . , 80. Using 107

replications we inferred from the standard errors of the estimates of these expectations that theprecision for g0(T ) is satisfactory (standard error smaller than 0.0001), whereas for g1(T ) theseare much larger, especially when T is small, though not exceeding 0.015 when T ≥ 15. Sinceg1(T ) is to be multiplied by β∗2 (whereas the approximation requires that β∗ should be smallerthan about 0.1, as is found in Section 5), the overall precision seems satisfactory. Graphs of thesefunctions are given in Figure 1. It is easily derived that g0 converges to zero; since the two termsin square brackets in (13) are ratios where both numerator and denominator are of the same orderin T , they remain random in the limit, which explains the larger simulation variance of g1(T ).

C© Royal Economic Society 2005

Page 9: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 123

3. THE MOMENTS OF THE UNIT ROOT COEFFICIENT ESTIMATOR

We now derive in the model with arbitrary X matrix approximations to the bias, the varianceand the MSE of the estimator λ given in (4) according to large-T and small-σ principles. Anapproximation to the bias accurate to O(T −2) is obtained by summing the expected values of thethree terms in (8). Since the expected value of the first term is zero and that of the third term isof order T −3, just the second term determines the O(T −2) bias. Extending this expansion andincluding all terms of Op(T −3) leads to the following result (proved in Appendix C).

Theorem 1 Under Assumption 1 and Mι = 0 the bias of the least-squares estimator of λ to theorder of T −3 is given by

E(λ − 1) = σ 2µ[1 + tr(M J )] − σ 4µ2tr(M J )

× [tr(J ′M J ) − 4µβ ′ X ′ J ′M J J ′MJXβ] + o(T −3). (15)

Note that λ is unbiased to order T −3/2 and also to order σ . Also note that the bias of λ is O(T −2)and that the bias to order T −2 is given by

E(λ − 1) = −σ 2µtr{(X ′ X )−1 X ′ J X} + o(T −2), (16)

whereas an approximation to order σ 2 incorporates an extra O(T −3) contribution, namely, theterm σ 2µ. Hence, large-T and small-σ asymptotic expansions correspond here more closelythan in the stable dynamic model (where small-σ is inferior because any finite-order small-σapproximation omits terms of order T −1), but they are not equivalent and the leading term ofsmall-σ incorporates contributions here, which are omitted in the leading term of the large-Tapproximation.

The case where there is a constant term but no further exogenous variables is of particularinterest. The corresponding bias can be obtained by substituting X = ι in Theorem 1, but theresulting expression will then include also some elements which are o(T −3). These can beeliminated; the resulting ‘trimmed’ expression is evaluated in Appendix D, yielding:

Corollary 1 If in the model of Theorem 1 we have X = ι, then the bias simplifies to

E(λ − 1) = −6

β

)2 1

T 2+ 18

β

)2 1

T 3− 84

5

β

)4 1

T 3+ o(T −3). (17)

From this we see that the bias is always negative to the order T −2, and that the magnitude of thebias crucially depends on the ratio σ/β . From Corollary 1 it is fully evident that an approximationto order O(σ 2) incorporates some of the order T −3 bias, namely, a positive contribution, whereas anegative order T −3 contribution is omitted because it is O(σ 4). Note that whenσ/β = (90/84)1/2 ≈1.035 the two O(T −3) terms cancel.

To obtain an approximation to the MSE of λ we use (7) and write

(λ − 1)2 = µ2(σβ ′ X ′ J ′Mε + σ 2ε′ J ′Mε)2(1 + 2µσβ ′ X ′ J ′M Jε + µσ 2ε′ J ′M Jε)−2. (18)

C© Royal Economic Society 2005

Page 10: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

124 Jan F. Kiviet and Garry D. A. Phillips

Expanding the right-hand side term as a power series in which successive terms are of increasingpowers of Op(T −1/2) yields the following (proof in Appendix E).

Theorem 2 In the model of Theorem 1 the MSE of the least-squares estimator of λ to the orderof T−4 is given by

E(λ − 1)2 = σ 2µ + σ 4µ2{[tr(MJ)]2 + tr(JMJM) − tr(J ′MJ)}+ 4σ 4µ3(β ′ X ′ J ′MJJ′MJXβ − β ′ X ′ J ′MJMJMJXβ) + o(T −4). (19)

Because the squared bias of λ is O(T −4) the first term in the above expression, which is theonly O(T −3) contribution to the MSE, establishes also an approximation to Var(λ) = MSE(λ) −[E(λ − 1)]2. For the special case of a constant and no further exogenous regressors this yields:

Corollary 2 If X = ι in the model of Theorem 2 then the MSE and variance simplify to

E(λ − 1)2 = 12

β

)2 1

T 3+ 336

5

β

)4 1

T 4+ o(T −4) (20)

and

Var(λ) = 12

β

)2 1

T 3+ 156

5

β

)4 1

T 4+ o(T −4). (21)

From the results of the two corollaries it is apparent that for a given sample size T the quality ofthe approximations will deteriorate as |σ/β| increases above unity. Thus the smaller the absolutevalue of the standardized drift |β/σ |, the larger the sample size will need to be to achieve a desiredaccuracy. This point will be addressed in Section 5 where the accuracy of the approximationswill be examined and compared not only with their ‘untrimmed’ counterparts, but also with thespecially designed small-drift asymptotic approximations.

4. THE MOMENTS OF THE FULL COEFFICIENT VECTOR

We now approximate the first two moments (conditional on X and y0) of the least-squares estimatorof the full vector of coefficients (λ, β ′), and at the same time we shall relax the Assumption 1 (iii)on the stationarity of the exogenous regressors. We rewrite model (3) as

y = λy−1 + Xβ + σε = Zα + σε, (22)

where Z = (y−1 : X ), α = (λ, β ′)′ and the least-squares estimator

α = (λ, β ′)′ = (Z ′ Z )−1 Z ′y (23)

has estimation error (λ − 1

β − β

)= α − α = σ (Z ′ Z )−1 Z ′ε. (24)

In the case where all regressors X are finite, the estimation error of λ is Op(T −3/2) while that of β

is Op(T −1/2). If some of the regressors in X are realisations of a non-stationary process this affectsthe order of probability of both the corresponding coefficients’ estimation error and that of λ. For

C© Royal Economic Society 2005

Page 11: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 125

regressors that are realisations of an I(1) process (integrated of order one), the estimation errorwill be Op(T −3/2), and if such a regressor has a non-zero coefficient the dependent variable willin principle be I(2), due to the unit root, which reduces the estimation error of λ to Op(T −5/2);the same happens when a non-redundant linear deterministic trend occurs in the model.

To facilitate the development of an appropriate asymptotic expansion for general X we shallre-scale the regressors and coefficients so that all components of the re-scaled estimation errorvector are of the same stochastic magnitude. Thus, we consider the (k + 1) × (k + 1) diagonalmatrix D designed such that, conditional on X:

D = diag (d1, . . . , dk+1) , di = T δi (i = 1, . . . , k + 1), DZ ′ Z D = Op(T ). (25)

In the unit root model with stationary X we should have δ1 = −1 and δ i = 0 for i > 1; in a modelwith k = 2, where the first column of X corresponds to the constant and the second is a lineartrend (or the realisation of an I(1) process), we should select δ1 = −2 (if β 2 �= 0 and δ1 = −1otherwise), δ2 = 0 and δ3 = −1. The model is now

y = Z D(D−1α) + σε, (26)

with re-scaled coefficients D−1α and estimation error

D−1 (α − α) = σ (DZ ′ Z D)−1 DZ ′ε. (27)

To simplify subsequent analysis, we put

W = Z D = Z D + Z D = W + W , (28)

where W = Z D = E (Z ) D is non-stochastic and W = Z D = (Z − Z )D is stochastic with zeromean. Since Z = σ Jεe′

1, with e1 = (1, 0, . . . , 0)′ a (k + 1) × 1 unit-vector, we may writeW = σ Jεe′

1 D. Now (27) can be expressed as

D−1 (α − α) = σ (W ′W + W ′W + W ′W + W ′W )−1(W + W )′ε. (29)

Note that D is designed such that W ′W = DZ ′ Z D = O(T ), W ′W = DZ ′ Z D = Op(T 1/2),W ′W = DZ ′ Z D = Op(1), σ W ′ε = Op(T 1/2) and W ′u = Op(1). Assuming that W ′W isinvertible we use this as a base-matrix. Putting

R = (W ′W )−1, P = W ′W R + W ′W R, S = W ′W R, (30)

where R = O(T −1), P = Op(T −1/2) and S = Op(T −1), we may write

D−1 (α − α) = σ R(I + P + S)−1(W + W )′ε, (31)

and the inverse matrix can be expanded with successive terms being of descending stochasticorder. It is our intention here to find a stochastic expansion of (31) including terms up to Op(T −1)only. Hence, it will suffice to approximate the inverse matrix by

(I + P + S)−1 = I − P + op(T −1/2).

The required expansion is then

D−1 (α − α) = σ R(I − P)(W + W )′ε + op(T −1)

= σ RW ′ε + σ RW ′ε − σ R PW ′ε + op(T −1), (32)

from which the following bias approximation readily follows (see Appendix G).

C© Royal Economic Society 2005

Page 12: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

126 Jan F. Kiviet and Garry D. A. Phillips

Theorem 3 In the first-order dynamic regression model (22), where the coefficient of the lagged-dependent variable λ is equal to unity, the regressor matrix Z = (y−1 : X ) and the scaling matrixD = diag(d 1, . . . , d k+1), with d i = T δi (i = 1, . . . , k + 1), is such that, conditional on X, DZ′ZD =Op(T ) , the bias of the least-squares estimator of the separate elements of the coefficient vectorα = (λ, β ′)′ can be approximated, provided that Z = (y0ι + J Xβ : X ) has full column rank andβ is finite and non-zero, as (i = 1, . . . , k) :

E(βi − βi ) = −σ 2e′i+1

[(Z ′ Z )−1 Z ′ J Z + 1

2(T − k − 1)Ik+1

](Z ′ Z )−1e1 + o(T −1+δi+1 ), (33)

and

E(λ − 1) = −1

2(T − k)σ 2e′

1(Z ′ Z )−1e1 + o(T −1+δ1 ). (34)

This bias approximation of order O(T −1+δ1 ) for λ is equivalent to the O(T −2) expression given in(16). From this, and more generally from the lines followed in the proof of Theorem 3, it is evidentthat non-stationarity of the regressors does not change the algebraic form of the approximations;the principal difference is just that the various terms in the approximations may be of smallerorder of magnitude. Hence, the full approximation given in Theorem 1 also applies to a modelwhich includes a non-redundant I(1) regressor or a linear trend, but then its accuracy is actuallyof order O(T −4) rather than O(T −3).

Finally, we shall derive an approximation to the MSE of all elements of the coefficient vector.From (31) we obtain the expansion

D−1 (α − α) = σ R(I − P − S + P P)W ′ε + σ R(I − P)W ′ε + op(T −3/2), (35)

from which an expansion for D−1 (α − α) (α − α)′ D−1 to order T −2 easily follows and this yields(proof in Appendix H):

Theorem 4 In the model of Theorem 3 the elements of the MSE(α) matrix, that is, E(αi − αi )(α j −α j ) for i , j = 1, . . . , k + 1, are given by

σ 2e′i Qe j

+ σ 4[tr(Q Z ′ J J ′ Z ) − 2tr(Q Z ′ J J Z ) − tr(J ′ J )

+ tr(Q Z ′ J Z Q Z ′ J Z ) + tr(Q Z ′ J Z )tr(Q Z ′ J Z )](e′i Qe1)(e′

j Qe1)

+ σ 4(e′1 Qe1)(e′

i Q Z ′[J J ′ − J J − J ′ J ′ + J Z Q Z ′ J + J ′ Z Q Z ′ J ′]Z Qe j )

+ σ 4(e′1 Qe j )(e′

1 Q Z ′[J J ′ − J ′ J − J ′ J ′]Z Qei )

+ σ 4(e′1 Qei )(e′

1 Q Z ′[J J ′ − J ′ J − J J ]Z Qe j )

+ σ 4[(e′1 Q Z ′ J Z Qe1) + tr(Q Z ′ J Z )(e′

1 Qe1)](e′i Q Z ′[J + J ′]Z Qe j )

+ o(T −2+δi +δ j ), (36)

where Q = (Z ′ Z )−1, Z = E(Z ) and ei is the i-th unit vector.

From the results in Theorems 3 and 4, approximations to the elements of Var(α) can be obtainedstraightforwardly.

C© Royal Economic Society 2005

Page 13: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 127

5. THE ACCURACY OF THE APPROXIMATIONS

The accuracy of the approximations has been examined for three types of autoregressive modelswith a unit root, namely, (i) the AR(1) model with a constant only, (ii) the same model includinga linear trend as well and (iii) the model where this linear trend has been replaced by an arbitraryexogenous regressor generated according to a stationary AR(1) process. Although some of themoments of these least-squares estimators can be obtained efficiently by advanced numericaltechniques, see Paolella (2003), we shall estimate them straightforwardly by simulation. With asufficiently large number of replications, the exact moments can be obtained to a very high degreeof accuracy. Our estimates of true bias, variance and MSE presented below are based on 105

replications and we also present estimated standard errors of our Monte Carlo estimates indicatedby MCSE. With respect to estimated expectation of estimators of the unit root this MCSE is neverlarger than 0.0006. So the simulation error will be much smaller than 0.5%, which is adequate,because we just want to check whether or not the approximation formulae are adequate in one orperhaps even two digits.

5.1. Random walk with drift

For the random walk with drift case, the model actually simulated was y∗t = λy∗

t−1 + β∗ +ε t (t = 1, . . . , T ) of (11), where y∗

0 = 0, λ = 1, β∗ = β/σ �= 0. Note that we have already foundthat λ is invariant with respect to y0; so taking this to be zero has no consequences for our findingson λ. From (7) it is directly seen that the properties of λ are not determined by β and σ separately,but only by their ratio, and that is why we scaled the simulation model and gave it unit variance.For 0 ≤ |β/σ | < 1 the stochastic trend of the random walk with drift model dominates thedeterministic trend in a certain sense; for | β/σ | > 1 the deterministic trend dominates. Since weare especially interested in cases where β is non-negative and because the present approximationsare not valid for β = 0, (for Monte Carlo results on estimator bias in this model when β = 0,see the g0(T ) function of Figure 1 and also MacKinnon and Smith, 1998), we examined caseswhere 10 � β/σ � 0.1. Results for three different sample sizes are given in Tables 1, 2 and 3,respectively. As is to be expected, the bias of λ depends strongly on β/σ . For β much larger thanσ , the bias is very small, even in very small samples. For relatively small values of β the bias issubstantial in samples of a limited size and there is a very serious bias problem in small sampleswhen β is much smaller than σ .

The case β < σ seems to be relevant in practice. Some empirical evidence is provided byRudebusch (1992, table 3), where a difference stationary model is fitted by least-squares to 14time-series. Some care is required in interpreting these results, because they are biased and eveninconsistent in case of model mis-specification, but also because the random walk with driftmodel was estimated directly in only a few cases. The more usual case involved augmentedequations which implicitly use a transformation to remove serial correlation and, hence, changethe constant term. However, one can easily recover an estimate of the original constant. For the12 cases that have a positive estimate of β/σ , 4 range from 0.19 to 0.26, 7 estimates rangefrom 0.45 to 0.66 and 1 is 1.18. Empirical evidence is also found in Hylleberg and Mizon (1989,p. 227) and Banerjee et al. (1993, p. 171), which report β/σ values of 0.25, 0.72, 0.77 and about 1,respectively.

The numerical results for the various approximations to moments derived in this paper arelabelled in the tables by the order of their smallest fully included term and also by the formula from

C© Royal Economic Society 2005

Page 14: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

128 Jan F. Kiviet and Garry D. A. Phillips

Table 1. Bias, variance and MSE of coefficient estimators in model (11) for various values of β/σ .

T = 20 Ref. 10 5 2 1 0.5 0.2 0.1

bias λ −0.0001 −0.0005 −0.0033 −0.0151 −0.0774 −0.1916 −0.2229

(MCSE) (0.0000) (0.0000) (0.0001) (0.0001) (0.0004) (0.0006) (0.0006)

O(T −2) (17) −0.0002 −0.0006 −0.0037 −0.0150 −0.0600 −0.3750 −1.5000

O(T −2) (16) −0.0001 −0.0006 −0.0036 −0.0143 −0.0571 −0.3571 −1.4286

O(σ 2) (17) −0.0001 −0.0005 −0.0032 −0.0127 −0.0510 −0.3187 −1.2750

O(σ 2) (15) −0.0001 −0.0005 −0.0032 −0.0128 −0.0511 −0.3195 −1.2782

O(T −3) (17) −0.0001 −0.0005 −0.0033 −0.0148 −0.0846 −1.6313 −22.275

O(T −3) (15) −0.0001 −0.0005 −0.0033 −0.0148 −0.0834 −1.5803 −21.450

O(β∗2) (14) −0.0037 −0.1967 −0.2244

Var λ 0.0000 0.0001 0.0004 0.0018 0.0138 0.0311 0.0331

O(T −3) (21) 0.0000 0.0001 0.0004 0.0015 0.0060 0.0375 0.1500

O(T −3) (19) 0.0000 0.0001 0.0004 0.0015 0.0060 0.0376 0.1504

O(T −4) (21) 0.0000 0.0001 0.0004 0.0017 0.0091 0.1594 2.1000

MSE λ 0.0000 0.0001 0.0004 0.0020 0.0198 0.0678 0.0828

O(T −4) (20) 0.0000 0.0001 0.0004 0.0019 0.0127 0.3000 4.3500

O(T −4) (19) 0.0000 0.0001 0.0004 0.0019 0.0127 0.3002 4.3519

O(T −4) (36) 0.0000 0.0001 0.0004 0.0019 0.0121 0.2747 3.9448

bias β/σ 0.0159 0.0331 0.0853 0.1757 0.3075 0.2381 0.1323

(MCSE) (0.0014) (0.0014) (0.0014) (0.0013) (0.0015) (0.0020) (0.0022)

O(T −1) (33) 0.0171 0.0343 0.0857 0.1714 0.3429 0.8571 1.7143

Var β/σ 0.1864 0.1862 0.1847 0.1816 0.2131 0.4030 0.4821

O(T −2) (36) 0.1857 0.1856 0.1848 0.1820 0.1709 0.0929 −0.1857

MSE β/σ 0.1866 0.1873 0.1920 0.2125 0.3077 0.4596 0.4996

O(T −2) (36) 0.1860 0.1867 0.1921 0.2114 0.2884 0.8276 2.7531

which they originate (sometimes by deliberately omitting terms in order to be able to examine theeffects of these higher-order terms). Note that non-trimmed approximations may include parts ofterms, which are of the same or of smaller order as the remainder term.

In the majority of the cases examined, the approximations are very good. However, in situationswhere the bias is very substantial, and these are the cases where β/σ is small, the quality of thelarge-T and small-σ approximations is generally poor, or sometimes extremely bad, and in thosesituations the higher-order approximation is even worse than the approximation established bythe leading term only; note for this phenomenon the difference for the result of Corollary 1when the full O(T −3) formula is used or only its O(T −2) term. For large β/σ the higher-orderapproximation is better. We find no systematic quality difference between the trimmed and theuntrimmed approximations, and the O(σ 2) approximation is not found to be systematically betterthan the O(T −2) approximation. Note that the small-drift approximation (only calculated forβ∗ ≤ 0.5) does well where the other approximations break down. Especially when T and the driftare both small the O(β∗2) approximation is remarkably accurate.

C© Royal Economic Society 2005

Page 15: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 129

Table 2. Bias, variance and MSE of coefficient estimators in model (11) for various values of β/σ .

T = 40 Ref. 10 5 2 1 0.5 0.2 0.1

bias λ −0.0000 −0.0001 −0.0009 −0.0037 −0.0193 −0.0845 −0.1131

(MCSE) (0.0000) (0.0000) (0.0000) (0.0001) (0.0001) (0.0003) (0.0003)

O(T −2) (17) −0.0000 −0.0002 −0.0009 −0.0037 −0.0150 −0.0938 −0.3750

O(T −2) (16) −0.0000 −0.0001 −0.0009 −0.0037 −0.0146 −0.0915 −0.3659

O(σ 2) (17) −0.0000 −0.0001 −0.0009 −0.0035 −0.0139 −0.0867 −0.3469

O(σ 2) (15) −0.0000 −0.0001 −0.0009 −0.0035 −0.0139 −0.0868 −0.3471

O(T −3) (17) −0.0000 −0.0001 −0.0009 −0.0037 −0.0181 −0.2508 −2.9719

O(T −3) (15) −0.0000 −0.0001 −0.0009 −0.0037 −0.0180 −0.2472 −2.9136

O(β∗2) (14) 0.1599 −0.0795 −0.1137

Var λ 0.0000 0.0000 0.0000 0.0002 0.0014 0.0081 0.0098

O(T −3) (21) 0.0000 0.0000 0.0000 0.0002 0.0008 0.0047 0.0187

O(T −3) (19) 0.0000 0.0000 0.0000 0.0002 0.0008 0.0047 0.0188

O(T −4) (21) 0.0000 0.0000 0.0000 0.0002 0.0009 0.0123 0.1406

MSE λ 0.0000 0.0000 0.0000 0.0002 0.0018 0.0153 0.0226

O(T −4) (20) 0.0000 0.0000 0.0000 0.0002 0.0012 0.0211 0.2813

O(T −4) (19) 0.0000 0.0000 0.0000 0.0002 0.0012 0.0211 0.2813

O(T −4) (36) 0.0000 0.0000 0.0000 0.0002 0.0011 0.0203 0.2679

bias β/σ 0.0078 0.0171 0.0452 0.0930 0.1895 0.2269 0.1407

(MCSE) (0.0010) (0.0010) (0.0010) (0.0010) (0.0010) (0.0014) (0.0016)

O(T −1) (33) 0.0093 0.0185 0.0463 0.0927 0.1854 0.4634 0.9268

Var β/σ 0.0962 0.0962 0.0958 0.0951 0.0993 0.1886 0.2604

O(T −2) (36) 0.0963 0.0963 0.0961 0.0955 0.0930 0.0758 0.0140

MSE β/σ 0.0963 0.0965 0.0979 0.1037 0.1352 0.2401 0.2802

O(T −2) (36) 0.0964 0.0967 0.0983 0.1041 0.1274 0.2905 0.8730

The variance of λ, and even more so its MSE, increases when β/σ decreases. We find hereagain that trimming has little effect and that the large-T and small-σ approximations are very badfor very small β/σ values, especially for small T . However, it would be relatively straightforwardto develop an adequate small-drift asymptotic approximation for these second moments. Note thatthe untrimmed approximations for the MSE of λ given in Theorems 2 and 4, respectively, giveslightly different results. This is because they are obtained in different ways and hence the retainedterms may include different contributions that are actually of the same order as the approximationerror.

For β/σ not very small, the bias in the estimator of the intercept increases when the interceptdecreases. The relative bias increases sharply over the full range examined. It is very substantial(over 100%) for β /σ = 0.1 and then it does not change much with T (for 20 ≤ T ≤ 80). We shouldkeep in mind, however, that the distribution of β is not independent of y0, so choosing y0 = 0 inthe Monte Carlo does not provide general results in this respect. The O(T −1) approximation to thebias in β given in Theorem 3 is found to be very accurate as long as the relative bias is less than,

C© Royal Economic Society 2005

Page 16: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

130 Jan F. Kiviet and Garry D. A. Phillips

Table 3. Bias, variance and MSE of coefficient estimators in model (11) for various values of β/σ .

T = 80 Ref. 10 5 2 1 0.5 0.2 0.1

bias λ −0.0000 −0.0000 −0.0002 −0.0009 −0.0042 −0.0307 −0.0531

(MCSE) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0001) (0.0002)

O(T −2) (17) −0.0000 −0.0000 −0.0002 −0.0009 −0.0037 −0.0234 −0.0938

O(T −2) (16) −0.0000 −0.0000 −0.0002 −0.0009 −0.0037 −0.0231 −0.0926

O(σ 2) (17) −0.0000 −0.0000 −0.0002 −0.0009 −0.0036 −0.0226 −0.0902

O(σ 2) (15) −0.0000 −0.0000 −0.0002 −0.0009 −0.0036 −0.0226 −0.0902

O(T −3) (17) −0.0000 −0.0000 −0.0002 −0.0009 −0.0041 −0.0431 −0.4184

O(T −3) (15) −0.0000 −0.0000 −0.0002 −0.0009 −0.0041 −0.0428 −0.4145

O(β∗2) (14) −0.0149 −0.0523

Var λ 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0026

O(T −3) (21) 0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0023

O(T −3) (19) 0.0000 0.0000 0.0000 0.0000 0.0001 0.0006 0.0023

O(T −4) (21) 0.0000 0.0000 0.0000 0.0000 0.0001 0.0011 0.0100

MSE λ 0.0000 0.0000 0.0000 0.0000 0.0001 0.0025 0.0054

O(T −4) (20) 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0187

O(T −4) (19) 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0188

O(T −4) (36) 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0183

bias β/σ 0.0037 0.0085 0.0230 0.0474 0.0983 0.1817 0.1368

(MCSE) (0.0007) (0.0007) (0.0007) (0.0007) (0.0007) (0.0009) (0.0011)

O(T −1) (33) 0.0048 0.0096 0.0241 0.0481 0.0963 0.2407 0.4815

Var β/σ 0.0491 0.0491 0.0491 0.0489 0.0489 0.0763 0.1235

O(T −2) (36) 0.0491 0.0491 0.0490 0.0489 0.0483 0.0442 0.0295

MSE β/σ 0.0492 0.0492 0.0496 0.0511 0.0586 0.1093 0.1422

O(T −2) (36) 0.0491 0.0492 0.0496 0.0512 0.0576 0.1021 0.2613

say, 50%. For β/σ > 0.5 the approximations to the variance and MSE of β are very good, evenfor samples as small as T = 20, but for small β/σ they are very bad and we even find a negativeapproximation to the variance, that is, the leading term in the variance matrix approximation isnot positive definite.

5.2. The unit root model with intercept and another regressor

First we add a linear trend to the random walk with drift model. We simulated

y∗t = λy∗

t−1 + β1

σ+ β2

σt + εt , (37)

with y∗0 = 0, λ = 1, β 1/σ �= 0, β 2/σ �= 0 and ε t ∼ i.i.d. N(0, 1). Note that our approximations are

not valid for the case where β 1 = β 2 = 0. We could have included the case where the intercept is

C© Royal Economic Society 2005

Page 17: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 131

Table 4. Moments of λ, β1 and β2 in model (37) for various values of β 1/σ and β 2/σ = 0.1.

T = 20 Ref. 10 5 2 1 0.5 0.2 0.1

bias λ −0.1776 −0.1776 −0.1776 −0.1776 −0.1776 −0.1776 −0.1776

(MCSE) (0.0005) (0.0005) (0.0005) (0.0005) (0.0005) (0.0005) (0.0005)

O(T −3) (34) −0.2051 −0.2051 −0.2051 −0.2051 −0.2051 −0.2051 −0.2051

O(T −4) (15) −0.2391 −0.2391 −0.2391 −0.2391 −0.2391 −0.2391 −0.2391

Var λ 0.0287 0.0287 0.0287 0.0287 0.0287 0.0287 0.0287

O(T −6) 0.0200 0.0200 0.0200 0.0200 0.0200 0.0200 0.0200

MSE λ 0.0602 0.0602 0.0602 0.0602 0.0602 0.0602 0.0602

O(T −6) (19) 0.0708 0.0708 0.0708 0.0708 0.0708 0.0708 0.0708

O(T −6) (36) 0.0620 0.0620 0.0620 0.0620 0.0620 0.0620 0.0620

bias β1 −2.4414 −1.5534 −1.0207 −0.8431 −0.7543 −0.7010 −0.6833

(MCSE) (0.0065) (0.0040) (0.0028) (0.0024) (0.0023) (0.0022) (0.0022)

O(T −1) (33) −2.6824 −1.7710 −1.2242 −1.0419 −0.9508 −0.8961 −0.8779

Var β1 4.1908 1.6147 0.7577 0.5869 0.5230 0.4915 0.4822

O(T −2) 4.3787 1.6642 0.7261 0.5286 0.4513 0.4119 0.3999

MSE β1 10.151 4.0278 1.7995 1.2977 1.0919 0.9829 0.9490

O(T −2) (36) 11.574 4.8007 2.2248 1.6142 1.3553 1.2149 1.1706

bias β2 1.9638 1.0759 0.5432 0.3656 0.2768 0.2235 0.2058

(MCSE) (0.0059) (0.0032) (0.0016) (0.0011) (0.0008) (0.0006) (0.0006)

O(T −2) (33) 2.0201 1.1087 0.5619 0.3796 0.2884 0.2338 0.2155

Var β2 3.4329 1.0138 0.2510 0.1115 0.0633 0.0413 0.0351

O(T −4) (36) 3.4165 1.0013 0.2429 0.1051 0.0579 0.0364 0.0304

MSE β2 7.2895 2.1714 0.5461 0.2452 0.1399 0.0912 0.0774

O(T −4) (36) 7.4971 2.2305 0.5585 0.2492 0.1411 0.0911 0.0769

redundant (β 1 = 0) and not the linear trend (β 2 �= 0), but we did not, because this does not seemto be a particularly relevant case. We have to exclude the case where the linear trend is the onlyredundant regressor (i.e. β 1 �= 0, β 2 = 0) because then we have X = (ι : τ ) with τ t = t so thatJX = (J ι : Jτ ) = (τ − ι : Jτ ) and hence Z = [(τ − ι) β1/σ : τ − ι : Jτ ] does not have fullcolumn rank. So this is another case for which the large-T and small-σ expansions do notexist.

When X = (ι : τ ), it follows that My−1 is invariant with respect to β 1. Then we see from(4) that the distribution of λ will not depend on β 1, and neither will its bias and MSE, nor theirapproximations. Note that the estimators β1 and β2 are not invariant with respect to either β 1 orβ 2 or y0. In Tables 4–6, we present numerical results for the unit root model with trending driftfor parametrizations with β 2/σ = 0.1 only; some support for a value in this range (or smaller)may be obtained again from Rudebusch (1992, table 9). For very small samples the bias in λ

is substantial. Its approximation by the leading term approximation given in Theorem 3 worksadequately, even for T = 20. Including the O(T −4) term, which can be obtained readily fromTheorem 1, is found to be counterproductive in a very small sample. Note that the approximationfor the MSE of λ given in Theorem 4 works well. As always the quality of the approximation of

C© Royal Economic Society 2005

Page 18: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

132 Jan F. Kiviet and Garry D. A. Phillips

Table 5. Moments of λ, β1 and β2 in model (37) for various values of β 1/σ and β 2/σ = 0.1.

T = 40 Ref. 10 5 2 1 0.5 0.2 0.1

bias λ −0.0131 −0.0131 −0.0131 −0.0131 −0.0131 −0.0131 −0.0131

(MCSE) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001)

O(T −3) (34) −0.0134 −0.0134 −0.0134 −0.0134 −0.0134 −0.0134 −0.0134

O(T −4) (15) −0.0131 −0.0131 −0.0131 −0.0131 −0.0131 −0.0131 −0.0131

Var λ 0.0007 0.0007 0.0007 0.0007 0.0007 0.0007 0.0007

O(T −6) 0.0007 0.0007 0.0007 0.0007 0.0007 0.0007 0.0007

MSE λ 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009

O(T −6) (19) 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009

O(T −6) (36) 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009 0.0009

bias β1 −0.3510 −0.2856 −0.2464 −0.2333 −0.2268 −0.2229 −0.2216

(MCSE) (0.0022) (0.0019) (0.0017) (0.0016) (0.0016) (0.0016) (0.0016)

O(T −1) (33) −0.3476 −0.2841 −0.2460 −0.2333 −0.2270 −0.2232 −0.2219

Var β1 0.4999 0.3526 0.2813 0.2605 0.2506 0.2448 0.2429

O(T −2) 0.5021 0.3533 0.2814 0.2603 0.2503 0.2445 0.2426

MSE β1 0.6231 0.4341 0.3420 0.3149 0.3020 0.2945 0.2920

O(T −2) (36) 0.6229 0.4341 0.3419 0.3148 0.3018 0.2943 0.2918

bias β2 0.1588 0.0934 0.0542 0.0411 0.0345 0.0306 0.0293

(MCSE) (0.0010) (0.0006) (0.0003) (0.0003) (0.0002) (0.0002) (0.0002)

O(T −2) (33) 0.1542 0.0907 0.0527 0.0400 0.0336 0.0298 0.0285

Var β2 0.1027 0.0350 0.0115 0.0065 0.0046 0.0036 0.0033

O(T −4) (36) 0.1033 0.0351 0.0115 0.0065 0.0046 0.0036 0.0033

MSE β2 0.1279 0.0437 0.0144 0.0082 0.0058 0.0045 0.0041

O(T −4) (36) 0.1271 0.0434 0.0143 0.0081 0.0057 0.0045 0.0041

the variance suffers when the bias approximation is poor. Note that especially in small samples therelative biases of β1 and β2 are very substantial, and that these biases are opposite in sign. Theirapproximations, even for huge biases, are remarkably good, and also the second moments can beapproximated extremely well. In additional calculations, we also examined β 2/σ values equal to0.01, 0.05, 0.5, 1.0. For the larger β 2/σ values, the biases get smaller and the approximations veryaccurate. For β 2/σ and the sample size both very small the large-T and small-σ approximationsbreak down, implying that for this case it could be worthwhile to develop special ‘small-trend’asymptotic approximations.

We also looked into the model where we did not add a linear trend but an exogenous regressorgenerated by a stationary AR(1) process (full results available from the authors). We found the biasin all coefficients to decrease and the quality of the bias approximations to improve when the signalto noise ratio increases and when the exogenous regressor is less smooth. The approximationsto the bias in the β coefficients all overstate the actual bias, except when this bias is very small.All approximations to second moments are found to be quite satisfactory, except when both thesignal to noise ratio and the sample size are very small.

C© Royal Economic Society 2005

Page 19: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 133

Table 6. Moments of λ, β1 and β2 in model (37) for various values of β 1/σ and β 2/σ = 0.1.

T = 80 Ref. 10 5 2 1 0.5 0.2 0.1

bias λ −0.0008 −0.0008 −0.0008 −0.0008 −0.0008 −0.0008 −0.0008

(MCSE) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000)

O(T −3) (34) −0.0009 −0.0009 −0.0009 −0.0009 −0.0009 −0.0009 −0.0009

O(T −4) (15) −0.0008 −0.0008 −0.0008 −0.0008 −0.0008 −0.0008 −0.0008

Var λ 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

O(T −6) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

MSE λ 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

O(T −6) (19) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

O(T −6) (36) 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

bias β1 −0.0653 −0.0611 −0.0586 −0.0577 −0.0573 −0.0571 −0.0570

(MCSE) (0.0012) (0.0011) (0.0011) (0.0011) (0.0011) (0.0011) (0.0011)

O(T −1) (33) −0.0641 −0.0599 −0.0574 −0.0566 −0.0562 −0.0559 −0.0558

Var β1 0.1440 0.1303 0.1226 0.1201 0.1189 0.1182 0.1179

O(T −2) 0.1442 0.1305 0.1228 0.1203 0.1191 0.1183 0.1181

MSE β1 0.1482 0.1340 0.1260 0.1235 0.1222 0.1214 0.1212

O(T −2) (36) 0.1483 0.1341 0.1261 0.1235 0.1222 0.1215 0.1212

bias β2 0.0120 0.0078 0.0053 0.0045 0.0040 0.0038 0.0037

(MCSE) (0.0002) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001) (0.0001)

O(T −2) (33) 0.0119 0.0078 0.0052 0.0044 0.0040 0.0037 0.0037

Var β2 0.0043 0.0018 0.0008 0.0006 0.0005 0.0004 0.0004

O(T −4) (36) 0.0043 0.0018 0.0008 0.0006 0.0005 0.0004 0.0004

MSE β2 0.0045 0.0019 0.0008 0.0006 0.0005 0.0004 0.0004

O(T −4) (36) 0.0045 0.0019 0.0008 0.0006 0.0005 0.0004 0.0004

6. CONCLUSIONS

The above derivations and calculations shed light on the factors, which are important indetermining the bias, the variance and MSE in the normal unit root dynamic regression model.Earlier, in KP (1993, 1998b), we developed bias approximation formulae for the stable model andfound that small-σ versions are inferior to large-T versions and that these deteriorate close to theunit root case, especially when higher-order terms are taken into account. In the present study,we develop special approximations for the unit root case, both for first and second moments, andestablish that the large-T and small-σ versions may both work very well, apart from cases wherethe regressors are, or are close to being, redundant, but then alternative asymptotic parametersequences can be exploited.

The accuracy of our analytic formulae has been examined in numerical experiments.These were designed such that they expose the (extreme) cases where the bias and varianceapproximations break down. They clearly indicate that in this respect when the first-orderasymptotic approximations are not good, their higher-order counterparts are likely to be worse

C© Royal Economic Society 2005

Page 20: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

134 Jan F. Kiviet and Garry D. A. Phillips

and, in extreme cases, much worse. As a rule of thumb, in our experiments the higher-orderapproximations do not lead to improvements if the first-order approximation is in error by morethan 20%. For the random walk with drift model we show that the coefficient bias is substantialwhen the sample size is small and the drift is smaller than the standard deviation of the randomshock. We also illustrate that this may often be a realistic case, and then bias correction seemsworth pursuing. The large-T or small-σ approximations obtained in this study can be used for thatpurpose, but our numerical experiments show that they will not work well for very small values ofthe (scaled) drift term. However, for that situation we present an alternative approximation basedon what we call a small-drift asymptotic expansion and this proves to be very accurate in thespecial cases it is meant for. When further exogenous regressors are added to the model the biasmay get worse for practically relevant parameter values. We give special attention to the modelwith an intercept and a linear deterministic trend, which is so often applied in practice, namely,when the Dickey–Fuller test is applied. We illustrate that the coefficient bias in this model maybe approximated quite accurately, but not when both the trend coefficient and the sample size aresmall.

In all models examined, we also show that the second moment of the least-squares estimatorcan be approximated quite accurately by our higher-order asymptotic expressions. Note that allour approximations have been obtained assuming that the distribution of disturbance terms issymmetric and has the same kurtosis as the normal distribution. However, all results can begeneralized straightforwardly by making alternative assumptions on skewness and kurtosis of thedisturbances.

We conclude that it seems that over a substantial and practically very relevant part of theparameter space of autoregressive models, the analytical tools developed in this paper can beexploited to improve inference methods when samples are small or only moderately large and aunit root is present. To that end it is important that future research should examine the performanceof modified estimators of coefficients and their variances, for example, bias-corrected estimators,which take account of the analytical results of this study.

REFERENCES

Abadir, K. M. (1993). OLS bias in a nonstationary regression. Econometric Theory 9, 81–93.Abadir, K. M. (1995). Unbiased estimation as a solution to testing for random walks. Economics Letters 47,

263–68.Abadir, K. M., K. Hadri and E. Tzavalis (1999). The influence of VAR dimensions on estimator biases.

Econometrica 67, 163–81.Banerjee, A., J. Dolado, J. W. Galbraith and D. F. Hendry (1993). Co-integration, error-correction, and the

econometric analysis of non-stationary data. Advanced Texts in Econometrics. Oxford: Oxford UniversityPress.

Evans G. B. A. and N. E. Savin (1984). Testing for unit roots: 2. Econometrica 52, 1241–69 (and 53, 1253).Grubb, D. and J. Symons (1987). Bias in regressions with a lagged-dependent variable. Econometric Theory

3, 371–86.Hoque, A. S. and T. A. Peters (1986). Finite sample analysis of the ARMAX models. Sankhya B 48, 266–83.Hylleberg, S. and G. E. Mizon (1989). A note on the distribution of the least squares estimator of a random

walk with drift. Economics Letters 29, 225–30.Kadane, J. B. (1971). Comparison of k-class estimators when the disturbances are small. Econometrica 39,

723–37.

C© Royal Economic Society 2005

Page 21: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 135

Kendall, M. G. (1954). Note on the bias in the estimation of autocorrelation. Biometrika 61, 403–4.Kiviet, J. F. and G. D. A. Phillips (1993). Alternative bias approximations in regressions with a lagged-

dependent variable. Econometric Theory 9, 62–80.Kiviet, J. F. and G. D. A. Phillips (1994). Bias assessment and reduction in linear error-correction models.

Journal of Econometrics 63, 215–43.Kiviet, J. F. and G. D. A. Phillips (1996). The bias of the ordinary least squares estimator in simultaneous

equation models. Economics Letters 53, 161–7.Kiviet, J. F. and G. D. A. Phillips (1998a). Degrees of freedom adjustment for disturbance variance estimators

in dynamic regression models. The Econometrics Journal 1, 44–70.Kiviet, J. F. and G. D. A. Phillips (1998b). Higher-order asymptotic expansions of the least-squares estimation

bias in first-order dynamic regression models; paper presented at ESWM-95, Tokyo, August 1995;Tinbergen Institute Discussion Paper 96-167/7, revised October 1998.

Kiviet, J. F., G. D. A. Phillips and B. Schipp (1995). The bias of OLS, GLS and ZEF estimators in dynamicseemingly unrelated regression models. Journal of Econometrics 69, 241–66.

Lawford, S. and P. Stamatogiannis (2004). The finite-sample effects of VAR dimensions on OLS bias,OLS variance, and minimum MSE estimators: Purely nonstationary case. Department of Economics andFinance Discussion Paper 04-05, Brunel University.

MacKinnon, J. G. and A. A. Smith (1998). Approximate bias correction in econometrics. Journal ofEconometrics 85, 205–30.

Nagar, A. L. (1959). The bias and moment matrix of the general k-class estimators of the parameters insimultaneous equations. Econometrica 27, 575–95.

Nankervis, J. C. and N. E. Savin (1988). The exact moments of the least-squares estimator for theautoregressive model: Corrections and extensions. Journal of Econometrics 37, 381–8.

Paolella, M. S. (2003). Computing moments of ratios of quadratic forms in normal variables. ComputationalStatistics & Data Analysis 42, 313–31.

Peters, T. A. (1987). The exact moments of OLS in dynamic regression models with non-normal errors.Journal of Econometrics 17, 279–305.

Rudebusch, G. D. (1992). Trends and random walks in macroeconomic time series: A re-examination.International Economic Review 33, 661–80.

Sawa, T. (1978). The exact moments of the least squares estimator for the autoregressive model. Journal ofEconometrics 8, 159–72.

West, K. D. (1988). Asymptotic normality when regressors have a unit root. Econometrica 56, 1397–418.

APPENDIX A: BASIC RESULTS ON ι AND J

For the T × 1 vector ι and the T × T matrix J, introduced in Section 2.2, we have the followingresults for t = 1, . . . , T : (J ι)t = t − 1, (J J ι)t = (t − 1) (t − 2)/2, (J ′ J ι)t = [T (T − 1) −t(t − 1)]/2, (J ′ι)t = T − t and (J J ′ι)t = (t − 1) (T − t/2). From these, and making use of the well-known results for

∑Tt=1 t j ( j = 1, . . . , 4), we also find:

ι′ J ι =T∑

t=1

(t − 1) = 1

2T 2 − 1

2T ,

ι′ J ′ J ι =T∑

t=1

(t − 1)2 = 1

3T 3 − 1

2T 2 + 1

6T ,

ι′ J J ′ι =T∑

t=1

(T − t)2 = 1

3T 3 − 1

2T 2 + 1

6T ,

C© Royal Economic Society 2005

Page 22: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

136 Jan F. Kiviet and Garry D. A. Phillips

ι′ J J ι =T∑

t=1

(T − t) (t − 1) = 1

6T 3 − 1

2T 2 + 1

3T ,

ι′ J J ′ J ι =T∑

t=1

(t − 1)2 (T − t/2) = 5

24T 4 − 5

12T 3 + 7

24T 2 − 1

12T ,

ι′ J ′ J ′ J ι = 1

2

T∑t=1

(t − 1)2 (t − 2) = 1

8T 4 − 5

12T 3 + 3

8T 2 − 1

12T ,

ι′ J J J ι = 1

2

T∑t=1

(T − t) (t − 1) (t − 2) = 1

24T 4 − 1

4T 3 + 11

24T 2 − 1

4T ,

ι′ J ′ J J ′ J ι =T∑

t=1

[T

2(T − 1) − t

2(t − 1)

]2

= 2

15T 5 + O(T 4),

ι′ J ′ J ′ J ′ J ι = 1

2

T∑t=1

[T

2(T − 1) − t

2(t − 1)

](t − 1) (t − 2) = 1

30T 5 + O(T 4).

The simple structure of J implies tr(J i) = 0 for integer i > 0 and

tr(J ′ J ) = 1

2T 2 − 1

2T ,

tr(J ′ J J ′ J ) =T∑

t=1

[(t − 1) (T − t)2 +

T −t∑i=0

i2

]= 1

6T 4 + O(T 3).

APPENDIX B: BASIC RESULTS ON ORDERS OF MAGNITUDE

Here we collect results that support the statements made on order of magnitude of relevant expressions(where we condition upon X). From E(X ′u) = 0 and Var(X ′u) = σ 2(X ′ X ) = O(T ) follows X ′u = Op(T 1/2).Since JX = O(T ) we have X ′ J ′JX = O(T 3) and X ′ J ′u = Op(T 3/2), also giving X ′ J ′MJX = O(T 3) andX ′ J ′Mu = Op(T 3/2). Along similar lines X ′ J ′ J J ′JX = O(T 5) yields X ′ J ′ Ju = Op(T 5/2), from whichX ′ J ′MJu = Op(T 5/2) follows. From E(u′ Ju) = 0 and Var(u′ Ju) = σ 4tr(J ′ J ) = O(T 2) we findu′ Ju = Op(T ), which yields u′ J ′Mu = Op(T ). Moreover, because E(u′ J ′ Ju) = σ 2tr(J ′ J ) = O(T 2) andVar(u′ J ′ Ju) = 2σ 4 tr(J ′ J J ′ J ) = O(T 4), we find u′ J ′ Ju = Op(T 2), from which u′ J ′MJu = Op(T 2) follows.

APPENDIX C: PROOF OF THEOREM 1

Expanding the inverse factor of (7) further than in (8) we obtain

(1 + 2µσβ ′ X ′ J ′MJε + µσ 2ε′ J ′MJε)−1

= 1 − 2µσβ ′ X ′ J ′MJε − µσ 2ε′ J ′MJε + 4µ2σ 2(β ′ X ′ J ′MJε)2

+ 4µ2σ 3(β ′ X ′ J ′MJε)(ε′ J ′MJε) − 8µ3σ 3(β ′ X ′ J ′MJε)3 + op(T −3/2).

C© Royal Economic Society 2005

Page 23: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 137

Substitution in (7) yields

λ − 1 = µσ (β ′ X ′ J ′ Mε + σε′ J ′ Mε)

− µ2σ 2[2β ′ X ′ J ′ Mε(β ′ X ′ J ′MJε + σε′ J ′ Mε)

+ ε′ J ′MJε(σβ ′ X ′ J ′ Mε + σ 2ε′ J ′ Mε)]

+ 4µ3σ 3[(β ′ X ′ J ′MJε)2(β ′ X ′ J ′ Mε + σε′ J ′ Mε)

+ β ′ X ′ J ′MJε(β ′ X ′ J ′ Mε)(σε′ J ′MJε)]

− 8µ4σ 4(β ′ X ′ J ′MJε)3(β ′ X ′ J ′ Mε) + op(T −3).

To approximate the bias we take the expectation of these terms. Terms involving an odd number of zeromean normal random variables can be ignored. Occasionally we can simplify the expressions by using thefact that in traces or in scalars the expression is sometimes unchanged when J is replaced by J′, and henceJ can be replaced by [J + J ′] = 1

2 [ιι′ − IT ]. Because Mι = 0, this may lead to some simplification. Using

E (ε′ J ′ Mε) = tr(MJ) = 0.5tr[M(ιι′ − IT )] = −0.5(T − k) = O(T ),

E (β ′ X ′ J ′ Mεε′ J ′MJXβ) = β ′ X ′ J ′MJMJXβ = −0.5µ−1 = O(T 3),

E (ε′ J ′MJε) (ε′ J ′ Mε) = tr(MJ)tr(J ′MJ) + 2tr(J ′MJJ′ M)

= tr(MJ)tr(J ′MJ) − tr(J ′MJ) = tr(MJ)tr(J ′MJ) + o(T 3),

E(β ′ X ′ J ′MJε)2(ε′ J ′ Mε) = E(ε′ J ′MJXββ ′ X ′ J ′MJε)(ε′ M Jε)

= tr (J ′MJXββ ′ X ′ J ′ M J ) tr (M J ) + 2tr (J ′MJXββ ′ X ′ J ′ M J M J )

= tr (M J ) β ′ X ′ J ′ M J J ′MJXβ + 2β ′ X ′ J ′ M J M J J ′MJXβ = O(T 6),

E (β ′ X ′ J ′ M Jε) (β ′ X ′ J ′ Mε) (ε′ J ′ M Jε) = E (ε′ J ′ M Jε) (ε′ J ′MJXββ ′ X ′ J ′ Mε)

= tr (J ′ M J ) β ′ X ′ J ′ M J ′MJXβ + 2β ′ X ′ J ′ M J ′ M J J ′MJXβ

= −2−1tr (J ′ M J ) β ′ X ′ J ′MJXβ + 2β ′ X ′ J ′ M J ′ M J J ′MJXβ]

= 2β ′ X ′ J ′ M J ′ M J J ′MJXβ + o(T 6),

E (β ′ X ′ J ′ M Ju)3 (β ′ X ′ J ′ Mu) = E (u′ J ′MJXββ ′ X ′ J ′ M Ju) (u′ J ′MJXββ ′ X ′ J ′ Mu)

= 3σ 4 (β ′ X ′ J ′ M J J ′MJXβ) (β ′ X ′ J ′MJMJXβ)

= −1.5σ 4µ−1 (β ′ X ′ J ′ M J J ′MJXβ) = O(T 8),

and removing terms that are of such magnitude that they can be neglected in an O(T −3) approximation,yields

E(λ − 1) = σ 2µ[1 + tr(M J )] − σ 4µ2tr (M J ) tr (J ′ M J )

+ 4σ 4µ3tr (M J ) β ′ X ′ J ′ M J J ′MJXβ + 8σ 4µ3β ′ X ′ J ′ M J M J J ′MJXβ

+ 8σ 4µ3β ′ X ′ J ′ M J ′ M J J ′MJXβ + op(T −3).

C© Royal Economic Society 2005

Page 24: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

138 Jan F. Kiviet and Garry D. A. Phillips

Exploiting

β ′ X ′ J ′ M(J + J ′)M J J ′MJXβ = −0.5β ′ X ′ J ′ M J J ′MJXβ = O(T 5),

we find that the 4th and 5th term can be neglected in the result of the theorem.

APPENDIX D: PROOF OF COROLLARY 1

Putting X = ι, M = IT − 1T ιι′ and β scalar in the various terms of Theorem 1 and using results from

Appendix A leads to

tr(M J ) = − 1

Tι′ J ′ι = −1

2(T − 1),

X ′ J ′MJX = ι′ J ′ J ι − 1

T(ι′ J ι)2 = 1

12T (T − 1)(T + 1),

tr(J ′ M J ) = tr(J ′ J ) − 1

Tι′ J J ′ι = 1

6(T 2 − 1),

X ′ J ′ M J J ′MJX = ι′ J ′ J J ′ J ι − 2

Tι′ J ιι′ J J ′ J ι + 1

T 2ι′ J ′ιι′ J J ′ιι′ J ι = 1

120T 5 + O(T 4),

which after substitution leads to the result of the Corollary.

APPENDIX E: PROOF OF THEOREM 2

To find the approximation to the MSE we commence from (18). Using an expansion for the inverse factorof the form (1 + x)−2 = 1 − 2x + 3x2 − 4x3 + · · · the MSE may be approximated by

E(λ − 1)2 = µ2σ 2[E(β ′ X ′ J ′ Mε)2 + σ 2E(ε′ J ′ Mε)2]

− 2µ3σ 4{E[(β ′ X ′ J ′ Mε)2ε′ J ′ M Jε] + 4E[(β ′ X ′ J ′ Mε)(ε′ J ′ Mε)(β ′ X ′ J ′ M Jε)]}

+ 12µ4σ 4E[(β ′ X ′ J ′ Mε)2(β ′ X ′ J ′ M Jε)2] + o(T −4).

Here we have removed terms from the original expansion which involve a product of an odd number ofnormal random variables with mean zero, together with terms which are op(T −4). Below we evaluate theexpectations in the various numerators and exploit the same simplification as in Appendix C:

E(β ′ X ′ J ′ Mε)2 = µ−1,

E(ε′ J ′ Mε)2 = [tr(M J )]2 + tr(J M J M) + tr(J ′ M J ),

E[(β ′ X ′ J ′ Mε)2ε′ J ′ M Jε] = E[(ε′MJXββ ′ X ′ J ′ Mε)ε′ J ′ M Jε]

= µ−1tr(J ′ M J ) + 2β ′ X ′ J ′ M J ′MJMJXβ,

C© Royal Economic Society 2005

Page 25: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 139

E[(β ′ X ′ J ′ Mε)(ε′ J ′ Mε)(β ′ X ′ J ′ M Jε)] = E[(ε′ J ′MJXββ ′ X ′ J ′ Mε)ε′ J ′ Mε]

= tr(M J )β ′ X ′ J ′ M J ′MJXβ + β ′ X ′ J ′ M[J ′ M J ′ + J J ′]MJXβ

= −0.5µ−1tr(M J ) + β ′ X ′ J ′ M[J M J + J J ′]MJXβ,

E[(β ′ X ′ J ′ Mε)2(β ′ X ′ J ′ M Jε)2] = E[(ε′MJXββ ′ X ′ J ′ Mε)(ε′ J ′MJXββ ′ X ′ J ′ M Jε)]

= µ−1β ′ X ′ J ′ M J J ′MJXβ + 2(β ′ X ′ J ′MJMJXβ)2

= µ−1β ′ X ′ J ′ M J J ′MJXβ + 0.5µ−2.

Substitution yields

E(λ − 1)2 = σ 2µ + σ 4µ2{[tr(M J )]2 + tr(J M J M) + tr(J ′ M J )}− 2σ 4µ3[µ−1tr(J ′ M J ) + 2β ′ X ′ J ′ M J ′MJMJXβ]

− 8σ 4µ3{β ′ X ′ J ′ M[J M J + J J ′]MJXβ − 0.5µ−1tr(M J )}+ 12σ 4µ4[µ−1β ′ X ′ J ′ M J J ′MJXβ + 0.5µ−2] + o(T −4)

and by simplifying and removing terms of small order we obtain

E(λ − 1)2 = σ 2µ + σ 4µ2{[tr(M J )]2 + tr(J M J M) − tr(J ′ M J )}− 4σ 4µ3(β ′ X ′ J ′ M J ′MJMJXβ + 2β ′ X ′ J ′MJMJMJXβ

− β ′ X ′ J ′ M J J ′MJXβ) + o(T −4)

which then, after minor further simplification, gives the result of the theorem.

APPENDIX F: PROOF OF COROLLARY 2

Following up on the proof in Appendix D, we have to evaluate a few extra expressions after putting X =ι, M = IT − 1

T ιι′ and β scalar. We find

tr(J M J M) = tr(J J ) − 2

Tι′ J J ι + 1

T 2(ι′ J ι)2 = − 1

12T 2 + 1

2T − 5

12,

and

X ′ J ′MJMJMJX = ι′ J ′ J J J ι − 1

Tι′ J ′ιι′ J J J ι − 1

Tι′ J ′ J ιι′ J J ι − 1

Tι′ J ′ J J ιι′ J ι

+ 2

T 2(ι′ J ′ι)2ι′ J J ι + 1

T 2ι′ J ′ J ι(ι′ J ′ι)2 − 1

T 3(ι′ J ′ι)4 = − 1

720T 5 + O(T 4).

Substitution yields

E(λ − 1)2 =(

σ

β

)212

T (T 2 − 1)+

β

)4122

T 4

(1

4− 1

12− 1

6

)

+ 4

β

)4123

T 4

(1

120+ 1

720

)+ o(T −4),

which leads to the result in the corollary.

C© Royal Economic Society 2005

Page 26: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

140 Jan F. Kiviet and Garry D. A. Phillips

APPENDIX G: PROOF OF THEOREM 3

The required bias is obtained from the expansion (32). Since terms with an odd number of stochastic factorshave zero expectation, we have to evaluate

E[D−1(α − α)] = σE[RW ′ε] − σ RE[(W ′W + W ′W )RW ′ε] + o(T −1).

We find

E(RW ′ε) = σ RDe1E(ε′ J ′ε) = 0,

E(W ′W RW ′ε) = σE(W ′ Jεe′1 DRW ′ε) = σ W ′ J W RDe1,

E(W ′W RW ′ε) = σE(De1ε′ J ′W RW ′ε) = σ tr(RW ′ J W )De1.

Hence, using R = D−1(Z ′ Z )−1 D−1 and W = Z D, we find

E[D−1(α − α)] = −σ 2 D−1[(Z ′ Z )−1 Z ′ J Z + tr{(Z ′ Z )−1 Z ′ J Z}Ik+1](Z ′ Z )−1e1 + o(T −1). (38)

Some further simplification is possible. Note that

tr{(Z ′ Z )−1 Z ′ J Z} = 0.5tr{(Z ′ Z )−1 Z ′(J + J ′)Z} = 0.5tr{Z (Z ′ Z )−1 Z ′(ιι′ − IT )}.Since the regression contains a constant, we have Z (Z ′ Z )−1 Z ′ι = ι, and thus

tr{(Z ′ Z )−1 Z ′ J Z} = 0.5[tr(ιι′) − tr(Ik+1)] = 0.5(T − k − 1). (39)

Finally, consider

e′1(Z ′ Z )−1 Z ′ J Z (Z ′ Z )−1e1 = 0.5e′

1(Z ′ Z )−1 Z ′[ιι′ − I ]Z (Z ′ Z )−1e1.

Because ι is the second column of Z we have (Z ′ Z )−1 Z ′ι = e2, and hence, using e′1e2 = 0,

e′1(Z ′ Z )−1 Z ′ J Z (Z ′ Z )−1e1 = −0.5e′

1(Z ′ Z )−1e1. (40)

Pre-multiplying both sides of (38) with e′1 D and making use of (39) and (40) yields the bias of λ as stated

in the theorem, which can be shown to be equivalent to (16) for the case δ1 = −1. Premultiplying (38) bye′

i+1 D yields the bias of the individual elements βi , i = 1, . . . , k.

APPENDIX H: PROOF OF THEOREM 4

Upon removing the terms which are a product of an odd number of normal random variables with zero mean,we may write

E[D−1(α − α)(α − α)′ D−1] =RE[W ′uu′W + W ′uu′W + PW ′uu′W P ′

− PW ′uu′W − W ′uu′W P ′ − PW ′uu′W − W ′uu′W P ′

+ P PW ′uu′W + W ′uu′W P ′ P ′ − SW ′uu′W − W ′uu′W S′]R + o(T −2). (41)

The required approximation to the MSE of α is obtained by evaluating this expectation and pre- andpost-multiplying the result by D. We make use of the substitutions W ′W = R−1, W = Jue′

1 D, P =W ′ Jue′

1 DR + De1u′ J ′W R and S = De1u′ J ′Jue′1DR and, often using the result E(uu′Buu′) = σ 4[tr(B)I +

B + B ′ for general B matrices, we find for the successive terms:

E(W ′uu′W ) = σ 2W ′W = σ 2 R−1

E(W ′uu′W ) = E(De1u′ J ′uu′ Jue′1 D) = σ 4tr(J ′ J )De1e′

1 D

C© Royal Economic Society 2005

Page 27: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

Moment approximation 141

E(PW ′uu′W P ′) = E(W ′ Jue′1 DRW ′uu′W RDe1u′ J ′W + De1u′ J ′W RW ′uu′W RDe1u′ J ′W

+ W ′ Jue′1 DRW ′uu′W RW ′ Jue′

1 D + De1u′ J ′W RW ′uu′W RW ′ Jue′1 D)

= E(W ′ Juu′W RDe1e′1 DRW ′uu′ J ′W + De1e′

1 DRW ′uu′ J ′W RW ′uu′ J ′W

+ W ′ Juu′W RW ′ Juu′W R De1e′1 D + u′ J ′W RW ′uu′W RW ′ Ju De1e′

1 D)

= σ 4(e′1 DRDe1)W ′ J J ′W + 2σ 4W ′ J W RDe1e′

1 DRW ′ J ′W

+ σ 4tr(RW ′ J W )De1e′1 DRW ′ J ′W + σ 4 De1e′

1 DRW ′ J ′W RW ′ J ′W

+ σ 4 De1e′1 DRW ′ J J ′W + σ 4tr(RW ′ J W )W ′ J W RDe1e′

1 D

+ σ 4W ′ J W RW ′ J W RDe1e′1 D + σ 4W ′ J J ′W RDe1e′

1 D

+ σ 4tr(RW ′ J W )tr(RW ′ J W )De1e′1 D

+ σ 4tr(RW ′ J J ′W )De1e′1 D + σ 4tr(RW ′ J W RW ′ J W )De1e′

1 D

E(PW ′uu′W ) = E(W ′ Jue′1 DRW ′uu′ Jue′

1 D + De1u′ J ′W RW ′uu′ Jue′1 D)

= E(W ′ Juu′ Juu′W RDe1e′1 D + u′ J ′W RW ′uu′ Ju De1e′

1 D)

= σ 4W ′ J J W RDe1e′1 D + σ 4W ′ J J ′W RDe1e′

1 D

+ σ 4tr(RW ′ J J ′W )De1e′1 D + σ 4tr(RW ′ J J W )De1e′

1 D

E(W ′uu′W P ′) = [E(PW ′uu′W )]′

E(PW ′uu′W ) = E(W ′ Jue′1 DRDe1u′ J ′uu′W + De1u′ J ′W RDe1u′ J ′uu′W )

= E[(e′1 DRDe1)W ′ Juu′ J ′uu′W + De1e′

1 DRW ′ Juu′ J ′uu′W ]

= σ 4(e′1 DRDe1)W ′ J J ′W + σ 4(e′

1 DRDe1)W ′ J J W

+ σ 4 De1e′1 DRW ′ J J ′W + σ 4 De1e′

1 DRW ′ J J W

E(W ′uu′W P ′) = [E(PW ′uu′W )]′

E(P PW ′uu′W ) = E(W ′ Jue′1 DRW ′ Jue′

1 DRW ′uu′W + De1u′ J ′W RW ′ Jue′1 DRW ′uu′W

+ W ′ Jue′1 DRDe1u′ J ′W RW ′uu′W + De1u′ J ′W RDe1u′ J ′W RW ′uu′W )

= E[W ′ Juu′ J ′W RDe1e′1 DRW ′uu′W + De1e′

1 DRW ′uu′ J ′W RW ′ Juu′W

+ (e′1 DRDe1)W ′ Juu′ J ′W RW ′uu′W + De1e′

1 DRW ′ Juu′ J ′W RW ′uu′W ]

= σ 4(e′1 DRW ′ J W RDe1)W ′ J W + σ 4W ′ J J ′W RDe1e′

1 D

+ σ 4W ′ J W RDe1e′1 DRW ′ J W + σ 4tr(RW ′ J J ′W )De1e′

1 D

+ 2σ 4 De1e′1 DRW ′ J ′W RW ′ J W + σ 4tr(RW ′ J W )(e′

1 DRDe1)W ′ J W

+ σ 4(e′1 DRDe1)W ′ J J ′W + σ 4(e′

1 DRDe1)W ′ J W RW ′ J W

+ σ 4tr(RW ′ J W )De1e′1 DRW ′ J W + σ 4 De1e′

1 DRW ′ J J ′W

+ σ 4 De1e′1 DRW ′ J W RW ′ J W

C© Royal Economic Society 2005

Page 28: Moment Approximation for Least-Squares Estimator in First-Order Regression Models with Unit Root and Nonnormal Errors

ectj˙Kiviet ECTJ-xml.cls March 29, 2005 14:3

142 Jan F. Kiviet and Garry D. A. Phillips

E(W ′uu′W P ′ P ′) = [E(P PW ′uu′W )]′

E(SW ′uu′W ) = E(De1u′ J ′ Jue′1 DRW ′uu′W ) = E(De1e′

1 DRW ′uu′ J ′ Juu′W )

= σ 4tr(J ′ J )De1e′1 D + 2σ 4 De1e′

1 DRW ′ J ′ J W

E(W ′uu′W S′) = [E(SW ′uu′W )]′

Exploiting J + J ′ = ι ι′ − I in the above yields

W RW ′(J + J ′)W = W RW ′(ιι′ − I )W = (ιι′ − I )W ,

W ′(J + J ′)W R = W ′(ιι′ − I )W R = W ′ιι′W R − I = W ′ιe′2 − I ,

W ′(J + J ′)W RDe1 = (W ′ιe′2 − I )De1 = −De1.

Now we can evaluate and simplify the expectation of the term in square brackets in (41) and obtain

σ 2 R−1+ σ 4[2 − tr(J ′ J ) − 2tr(RW ′ J W ) − 2tr(RW ′ J J W ) + tr(RW ′ J J ′W )

+ tr(RW ′ J W RW ′ J W ) + tr(RW ′ J W )tr(RW ′ J W )]De1e′1 D

+ σ 4(e′1 DRDe1)W ′(J J ′ − J J − J ′ J ′)W

+ σ 4W ′(J J ′ − J ′ J − J ′ J ′)W RDe1e′1 D

+ σ 4 De1e′1 DRW ′(J J ′ − J ′ J − J J )W

+ σ 4[(e′1 DRW ′ J W RDe1) + tr(RW ′ J W )(e′

1 DRDe1)]W ′(J + J ′)W

+ σ 4(e′1 DRDe1)(W ′ J W RW ′ J W + W ′ J ′W RW ′ J ′W ).

To obtain the required result for (41) we should pre- and post-multiply the above by R, but first we mayremove terms from it that are o(1). Finally pre- and post-multiplying this expression by D yields the resultof the theorem.

C© Royal Economic Society 2005