-
The Review of Economic Studies Ltd.
On the Rigour of Some Misspecification Tests for Modelling
Dynamic RelationshipsAuthor(s): Jan F. KivietSource: The Review of
Economic Studies, Vol. 53, No. 2 (Apr., 1986), pp. 241-261Published
by: The Review of Economic Studies Ltd.Stable URL:
http://www.jstor.org/stable/2297649Accessed: 26/02/2009 17:07
Your use of the JSTOR archive indicates your acceptance of
JSTOR's Terms and Conditions of Use, available
athttp://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's
Terms and Conditions of Use provides, in part, that unlessyou have
obtained prior permission, you may not download an entire issue of
a journal or multiple copies of articles, and youmay use content in
the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this
work. Publisher contact information may be obtained
athttp://www.jstor.org/action/showPublisher?publisherCode=resl.
Each copy of any part of a JSTOR transmission must contain the
same copyright notice that appears on the screen or printedpage of
such transmission.
JSTOR is a not-for-profit organization founded in 1995 to build
trusted digital archives for scholarship. We work with thescholarly
community to preserve their work and the materials they rely upon,
and to build a common research platform thatpromotes the discovery
and use of these resources. For more information about JSTOR,
please contact [email protected].
The Review of Economic Studies Ltd. is collaborating with JSTOR
to digitize, preserve and extend access toThe Review of Economic
Studies.
http://www.jstor.org
http://www.jstor.org/stable/2297649?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=resl
-
Review of Economic Studies (1986) LIII, 241-261
0034-6527/86/00150241$02.00
? 1986 The Society for Economic Analysis Limited
On the Rigouro f Some
Misspecification Tests for
Modelling Dynamic Relationships
JAN F. KIVIET
University of Amsterdam
For regression models alternative asymptotically equivalent
misspecification tests may lead to conflicting inference in small
samples. Effective misspecification tests should have correct
significance levels irrespective of the true parameters and any
redundant regressors in the model, and reasonable power against a
wide class of alternative specifications. A simulation study of
various tests for serial correlation and predictive failure in
models with lagged dependent variables finds many tests defective
in small samples. Only particular degrees of freedom adjustments to
the test statistics yield improved small sample behaviour.
1. INTRODUCTION
Increasing attention is nowadays paid to model selection and
specification pro- cedures, especially in modelling dynamic
relationships. Numerous test statistics and diagnostic checks have
been suggested as tools in model selection strategies; many of
these recent developments are summarized in Harvey (1981). The
existence of alternative principles (such as likelihood-ratio, Wald
and Lagrange multiplier) for generating test statistics alone means
that more than one test statistic with desirable properties is
usually available for testing a particular null hypothesis against
a specific alternative hypothesis. These statistics often have the
same limiting distribution, but their small sample distribu- tions
generally differ. Moreover, there is a variety of ways of modifying
these tests to generate new tests which retain the original
asymptotic properties but may have improved small sample
properties. The practitioner is thus faced with a proliferation of
tests for the same null and alternative hypotheses. Because these
alternative test statistics may have different power functions and
different true significance levels in small samples, they may cause
conflicting statistical inference and consequently confuse model
builders.
For the hypothesis of linear constraints on the coefficients in
the general linear model with spherical normal disturbances, Savin
(1976) and Breusch (1979) show that the asymptotically equivalent
Wald (W), likelihood-ratio (LR), and Lagrange multiplier (LM) tests
satisfy the systematic inequality W?- LR_ LM. Evans and Savin
(1982) show that in the classical normal linear regression model
the probability of conflict among these tests can be substantial.
By applying various small-sample correction factors they also
investigated how accurately these tests can approximate an exact
test (i.e. a test with correct significance level). As the usual F
test is exact in the case they investigated, there
241
-
242 REVIEW OF ECONOMIC STUDIES
is no need to apply a (possibly modified) asymptotic test. Hence
their results are only of practical interest if they represent
general characteristics of W, LR or LM tests and so suggest a
sensible way to modify these tests in more general cases such as
those involving nonlinearities or where for other reasons it is
difficult to derive an exact test.
In this paper we assess the inaccuracies involved in applying
asymptotic tests in small samples and examine the effectiveness of
various small sample correction factors in cases where no exact
test is available. Rather than consider specification tests of
coefficient restrictions, we investigate two types of
misspecification tests, viz. tests for serial correlation and tests
for predictive failure, in the linear regression model with lagged
dependent variables. (For discussion of the distinction between
tests of specification and misspecification see Mizon (1977b)).
Both these types of mis- specification test have been used in
recent applied work to reduce the risk of accepting misspecified
models-see, inter alia, Davidson et al. (1978), Hendry and Mizon
(1978), Hendry (1980), Mizon and Hendry (1980), Davidson and Hendry
(1981) and Hendry and Richard (1982). In these and other applied
studies different tests for the same misspecification have led to
conflicting inferences. We use Monte Carlo methods to investigate
whether it is worthwhile computing several test statistics for the
same alternative or whether there is one particular (possibly
modified) test available which is a useful and reliable tool in
model selection.
In addition, overparameterisation, or reduction in the effective
sample size, can adversely affect the small sample behaviour of
test statistics, and so robustness to overparameterisation is
desirable. When the small sample distribution of a test statistic
is not known and an approximate sampling distribution based on
asymptotic theory is used, it is possible that the approximation
will deteriorate with reductions in the number of degrees of
freedom. For example, if in checking the adequacy of a general
model as part of a general to specific modelling exercise a
misspecification test is significant, then a further generalisation
of the already overparameterised model may result in an even more
striking rejection of the null of no misspecification. Indeed, it
was precisely this phenomenon encountered in using tests for serial
correlation and for predictive failure on a general model that led
us to investigate more carefully the small sample behaviour of such
tests. It is important to distinguish between evidence of
misspecification arising from the inadequacy of the model and
evidence of misspecification resulting from the assumed (usually
asymptotically valid) distribution of the test statistic providing
a poor approximation to the unknown small sample distribution.
Evidence of misspecification of the latter type is potentially
misleading, and so we explore the value of degrees of freedom
adjustments and other modifications in reducing this problem.
Since we wish to examine the effectiveness of misspecification
tests as tools in a complete model selection strategy, Section 2
discusses some crucial aspects of the success- ive stages of a
specification search. This leads us to formulate three
characteristics of an effective misspecification test. In Sections
3 and 4 we introduce the different test statistics and try to
establish systematic inequalities between the various tests for
serial correlation and predictive failure respectively. The Monte
Carlo design is described in Section 5. The detailed results,
including powers and the noting of cases in which type I error
probabilities are parameter invariant, are presented in Sections 6
and 7. We find that in general asymptotic chi-squared critical
values are reasonably accurate only when the test statistics are
adjusted by an Edgeworth-based correction factor used by Anderson
(1958, p. 208). Alternatively, when numerator and denominator of
the statistics are corrected for degrees of freedom the F critical
values are also reasonably accurate. Section 8 summarizes the
conclusions.
-
KIVIET MISSPECIFICATION TESTS 243
2. MODEL SELECTION IN AD-MODELS
We consider the linear regression model with predetermined
explanatory variables, paying special attention to lagged dependent
variables. Mizon (1977a) and Mizon and Hendry (1980) describe a
model selection strategy for the Autoregressive Distributed lag
(AD) dynamic model, denoted AD(mo, ml,..., Mk):
ca(0)(L)yt = c + caM(L)xt(l) + . . - + a(k)(L)xt(k) + Et (1)
where ca()(L) is a polynomial of order mo in the lag-operator L
associated with the dependent variable yt, the lag-operator
polynomials a(l)(L), ... I a(k)(L) associated with the k exogenous
variables xt(l), ... , x,(k)-have orders ml, . . ., Mk
respectively, Et is a white-noise disturbance term, and c is a
constant. The polynomial a(?)(L) has all its roots outside the unit
circle and is normalized so that it includes mo coefficients; the
total number of regressors in (1) is K = k+ 1 +>3j m>. If the
data generation process (DGP) is (1) and no extra information is
available to restrict these K coefficients, then consistent and
asymptotically efficient estimates are obtained by ordinary least
squares.
However, in practice the DGP is unknown, so that in a
specification search mis- specified models will be estimated and
tested. These might, for example, omit relevant explanatory
variables, choose lag polynomials of too low order, or use
inappropriately transformed variables. Such misspecifications will
in general lead to inconsistent para- meter estimates and incorrect
assessment of the estimates' sampling properties. A general to
specific modelling strategy (see Mizon (1977a, b) and Mizon and
Hendry (1980)) aims to lessen the risk of inconsistency by starting
from a quite general model determined by relevant economic theory,
the available data and computing limitations, and then testing for
acceptable simplifications of it. Hence, initial
overparameterisation is a deliberate element of the strategy.
Conditional on the general model asymptotically valid t- and
F-specification tests can be used to find a parsimonious model. The
adequacy of the general model itself must also be tested and this
is achieved using misspecification tests, though clearly these
tests will also be used at subsequent stages of the modelling
process to check the adequacy of simplified models. If a
misspecification test statistic is significant, the model
specification has to be reconsidered. In the context of AD-models,
the order of the polynomials will often be increased or explanatory
variables will be replaced or added. These model respecifications
need not correspond directly to the alternative hypotheses for
which the particular misspecification tests have high power. For
example, a significant value of a statistic testing for serial
independence need not imply that the model should be augmented by a
serially correlated error process; nor should predictive failure
lead the investigator automatically to specify a model with shift
dummy variables.
In this paper we investigate characteristics of separate tests
only, and ignore problems of applying several tests sequentially to
different specifications using the same data. We assert that the
effectiveness of a misspecification test in any modelling strategy
(particularly in the general to specific strategy mentioned above)
depends on the following three criteria. First, the test should
have an actual significance level (size) close to the nominal
level, and so ceteris paribus tests with known size should be
preferred to tests with only asymptotic validity. When the model
builder is unaware of a substantial difference between the actual
and nominal size of the test, the decisions taken may be
inappropriate. Too low an actual size (relative to the nominal
size) favours the initial specification, leaving the model builder
less critical than he thinks. Too high an actual size implies too
frequent rejection of an adequate specification. Secondly, the size
of the test should
-
244 REVIEW OF ECONOMIC STUDIES
be robust to possible overparameterisation (i.e. including
redundant lagged regressors) to help avoid the problems associated
with overparameterisation mentioned above. Thirdly, the
effectiveness of a misspecification test depends on its power to
reject mis- specified models in addition to that against which the
test is optimal.
In our simulation study we investigate with these three criteria
in mind, the small sample behaviour of alternative forms of serial
correlation and predictive failure mis- specification tests. We
examine the rejection frequencies of these tests for evidence on
their size, on their robustness to changes in the degree of
overparameterisation, and on their "power" against a range of
alternative hypotheses. We then assess which test statistics can be
recommended for practical model selection.
3. TESTS FOR SERIAL CORRELATION
Three types of test for the serial independence of the
disturbances in dynamic models are often used: tests based upon Box
and Pierce (1970)'s time series portmanteau lack-of-fit test, tests
suggested in Durbin (1970), and tests based on the Lagrange
multiplier principle presented in Breusch (1978) and Godfrey
(1978). Serial correlation tests based on the likelihood-ratio and
Wald principles are computationally less attractive because they
require estimation of the model under the (nonlinear) alternative
hypothesis. Here we review over a dozen particular versions of
these tests (listed in Table I), and we investigate thoroughly
their effectiveness in small samples. In addition, we also consider
the Durbin- Watson test statistic which is often reported for
models containing lagged dependent variables, despite its
well-known inadequacy in dynamic models.
The Lagrange multiplier test statistic takes the same form for
both AR (n) and MA (n) alternatives, where n is the order of the
process. We denote by LM one particular version of a number of
asymptotically equivalent expressions for this test statistic:
LM = T * e'E[E'E - E'X(X'X) -X'EJ] -E'e/ e'e, (2)
TABLE I
List of tests for serial correlation of order n, their
null-distribution, some characteristics and references
Test Formula Null statistic in text distribution Characteristics
References
LM (2), (3) xn can be expressed as T- R2 Breusch (1978, (31));
Godfrey (1978, (16))
LMC (4) Xn LMF (5) Fn,T-K-n Durbin (1970, p. 420);
Harvey (1981, p. 227) LMR (6) Xn LMR > LMC LMW (7) 2 LMW_ LM;
LMW> LMF LMP (8) x2n can be negative LMN (9) 2 can be negative
LML (10) x2 can be negative Breusch (1978, (23)) LMD (11) x2 can be
negative Durbin (1970, (11)) LM* 2 LM* < LM LMW* 2 LMF < LMW*
< LMW LMP* 2 LMP* BP Ljung and Box (1978)
-
KIVIET MISSPECIFICATION TESTS 245
where T is the number of sample observations, e'= (el, . . . ,
eT) is the T-element vector of residuals from the OLS regression of
(1), and X is the T x K matrix of all the model's explanatory
variables. The T x n matrix E = [e1, . . ., en] is constructed from
the T-element vectors (e')' = (0, . .. , 0, el, . . ., eT-i) which
include the residuals lagged i periods. Under the null hypothesis
of white-noise disturbances E, LM is asymptotically xn distributed,
even if X includes lagged dependent or redundant regressors.
We can show (see Breusch (1978), p. 354) that LM can be
expressed as T times the coefficient of determination R2 in the
auxiliary regression of e on the T x (K + n) matrix
[X. E]:
LM = T- (RSS[x]-RSS[X: E]/ RSS[X], (3)
where RSS[X E] is the sum of squared residuals from this
artificial regression, and
RSS[X] = e'e. So LM can be considered as a specification test
applied to this artificial regression. We also examine the
performance of the Edgeworth corrected test statistic discussed by
Anderson and used by Evans and Savin (1982, p. 742)
LMC= (T-K-n-1+) In (RSS[x]/RSS[x E]), (4)
and the usual F test
LMF = TK (RSS[X]- RSS[x E])/RSS[X E]- (5) n
Crude versions (incorporating no small sample corrections) of
LMC and LMF respec- tively are the likelihood-ratio statistic
LMR = T- ln (RSS[x]/RSS[x:E]), (6)
and the Wald-type statistic
LMW = T * (RSS[x] - RSS[x: E])/ RSS[[X E]- (7)
We also examine the conjecture (see Breusch and Godfrey (1981,
p. 74) and Godfrey (1978, p. 1300)) that the finite sample
performance of tests may be improved by omitting asymptotically
negligible terms. Since under Ho, plim (1/ T)E'E = o-2In and plim
E'E/e'e = In, a test statistic asymptotically equivalent to LM
is
T e'E [ E'X(X'X)-]X'E1-1 E'e (8) LMP= T l I [n e'e Je'e
Omitting the asymptotically negligible covariance between lagged
disturbances and exogenous regressors in the LMP statistic
yields
eFE EI-'[Xo 0](X'X)f[Xo 0]'E 1 E'e LMN=T--f In- I I I ~1(9) LMN
=~# [I ee j e'e
where the Tx K matrix [X0 0] contains the Tx mo matrix X0 of
lagged dependent regressors. Yet another asymptotically equivalent
test is obtained (see Breusch 1978, formula (23)) by replacing the
matrix E'X0/e'e in LMN by a consistent estimate Qo of plim E'X0/e'e
under Ho. Then we obtain
LML= T-E[In-eIeE * Q(X'X)-'Q']-' E'e (10) eIe eIe
-
246 REVIEW OF ECONOMIC STUDIES
where Q Q Q0* 0] is an n x K matrix. The LML statistic is very
closely related to Durbin (1970)'s test against AR (n) disturbances
which can be written as
LMD = T- e'E(E'E)f-[In - e'e, Q(X'X)f`Q']f-(E'E)f-E'e. (11)
Since plim e'e[E'EJ71 = In under Ho, LMD is asymptotically
equivalent to all test statistics mentioned earlier. When n = 1,
ele-' (ee)'e%, so we have LML?- LMD.
Note that the LMP, LMN, LML and LMD statistics-although
asymptotically x2 distributed under Ho-can all be negative in small
samples, because the matrices in square brackets are not
necessarily positive definite. However, as the x2 distribution only
approximates the unknown exact distribution, negative values could
be interpreted as insignificant values, but the small sample
properties of such a testing procedure would need to be
investigated. In fact, for the n = 1 case (where the square root of
LMD is well-known as Durbin's h statistic) simulations in Hendry
and Trivedi (1972) and Spencer (1975) found both rather frequent
negative values and a substantial difference between the actual and
nominal sizes of the h-test. This suggests that in small samples
the distribution of LMD may differ substantially from its
asymptotic distribution.
We investigate the impact of multiplying each of the statistics
LM, LMW, LMP, LMN, LML and LMD by (T - K)! T These degrees of
freedom corrected versions, where a2 in the denominator is
estiamted by e'e/( T- K) rather than e'e/ T, are super- scripted *
(see Table I). From the inequalities
T-K T-K LM*= KLM
-
KIVIET MISSPECIFICATION TESTS 247
are the statistic
BP = T- rr= T e'E* E'e
(15) ele ele
(see Box and Pierce (1970) and Pierce (1971)) and the
modification suggested in Ljung and Box (1978)
LBP = T21 nT2 r2> BP, (16) = T-1i
where r' = (rl, . . ., rn) = e'E/e'e is an n-element vector of
autocorrelation coefficients of the residual vector, with ri =
e'e'/ e'e. BP can be obtained by neglecting the second term within
the square brackets of the formulae of LMP, LMN or LML. Only when
mo = 0 (i.e. (1) contains no lagged dependent variables) does BP =
LMN = LML hold; if moi> 0 then BP is not asymptotically
equivalent to any of the Lagrange multiplier type test statistics.
Following Pierce (1971), BP is usually treated as a xn-_0 statistic
in an autoregressive model, where ino is the number of lagged
dependent variables. Note that no? mo with Fo < mo only if
a(?)(L) in (1) contains zero coefficients; obviously this test
can only be applied when n > ino. Breusch and Pagan (1980, p.
245) argue that the BP test is inappropriate in models containing
both exogeneous and lagged dependent vari- ables. However, we will
compare its small sample performance with that of the other tests
for serial independence discussed above.
4. POST-SAMPLE PREDICTIVE TESTS
The tests used in a specification search are all employed on the
same set of data. So although it may be possible to determine
overall significance levels of sequences of such tests (see inter
alia Mizon (1977a) and Pagan and Hall (1983)), the perils of data
mining are far from imaginary. Thus it is wise to check a model on
a fresh set of data (typically the most recent time series data) by
a post-sample prediction test. Prediction tests check whether the
model specification derived from the sample data also fits the
post-sample data relatively well. However, using this additional
data at every stage of such a search means that the test statistic
is being used as a model selection criterion. Note that post-sample
predictive failure can occur for two reasons: the model may be
inadequate within sample, or it may be misspecified for only the
post-sample period. Table II lists the tests investigated.
Chow (1960)'s test in the classical linear regression model
is
T K RSST+m-RSST (17) LRF = 17
in RSST
TABLE II
List of m period post-sample prediction tests, their
null-distributions, some characteristics and references
Test Formula Null statistic in text distribution Characteristics
References
LRF (17), (20) Fm,T-K Chow (1960) (21)
LR (18) xm LRC (19) x2 LRC < LR PR (22) m PR* (23) x2
PR>PR*>LRF Hendry (1980, p. 222)
-
248 REVIEW OF ECONOMIC STUDIES
where RSST+m and RSST = e'e are the sums of squared residuals of
the model estimated from T + m and T observations respectively. If
all the regressors are exogenous, LRF has an Fm,T-K distribution
under the null hypothesis Ho of constant parameters over the entire
sample. Note that this test can be applied for any positive value
of m, and so should not be considered as merely a substitute for
the usual test of structural change between two sub-samples with m
< K (see Wilson (1978)). It is often the case that the
prediction test is used without a particular constructive
alternative hypothesis in mind, i.e. it is used as a
misspecification test. Whereas the structural change test is
usually employed with the hypothesis of coefficient non-constancy
(with constant error variance) as a constructive alternative.
Anderson and Mizon (1984) provide a recent discussion of these
tests.
The alternative hypothesis for the LRF test can be represented
by
(y*) =[Xj] (I) (e ) with (?) N(0, 0,2IT+m)
where y* and ?* are m-element vectors, X* is the m x k matrix of
post-sample regressors, and Z is an arbitrary m x m non-singular
matrix (possibly a matrix Im of m dummy variables). The
straightforward F-test of the linear restrictions y = 0 leads to
the LRF statistic in (17). Hence we see that this test-like the
serial correlation test-can also be viewed as a significance test
in a classical regression framework. Again we examine modifying the
likelihood-ratio test for y =0
LR = (T+ m) ln (RSST+m/RSST) (18)
by the Anderson correction factor, to give
LRC = (T+2- K-1) ln (RSST+m/RSST) (19)
Both LR and LRC are asymptotically distributed as X2 under Ho.
The corrected test has a smaller probability of type I error than
LR, but no such inequality exists between LR and LRF.
Two alternative expressions for LRF are
T-K (Y*-9*) [Im + X*(xXfXJ*1](Y* A*) (20) LRF= 20 m RSST
T-K (y* 9*)'[Im - X*(X'X + X*'X*)-1X](y* -9*) (21) m RSST
where 9A = X*(X'X)-fX'y is the predicted value of y*. From (21)
we see that
PR = T* (Y*-9*)'(Y*-9*)/ RSST = T - PSSm/ RSST (22)
(where PSSm is the sum of m squared prediction errors) is
asymptotically Xm distributed under Ho, and is asymptotically
equivalent to LRF, since for finite m and uniformly bounded
regressors, so that as T-> oo plim (1/ T)X*X* = 0 and plim
X*E*//T =0, we have:
plim (y* -A9)'X*(X'X + X*X*)-1X*(y* -_9*) = 0.
Davidson et al (1978) and Hendry (1980) use a degrees of freedom
corrected version of PR
PR*= (T-K) K PSSm/ RSST. (23)
-
KIVIET MISSPECIFICATION TESTS 249
From (21), (22) and (23) it is clear that these statistics are
concerned with predictive failure, and that PR > PR* > m *
LRF; with (13) this implies the inequalties:
P{PR>Xm(a)} >P{PR* >Xm(a)}>P{m LRF> m(a)}(
(24)
= P{LRF>-X 2(a)} > P{LRF> Fm,T-K(a)}.
So PR has a higher rejection frequency than PR*, which in turn
is higher than that for LRF, irrespective of the correctness of the
model. Again we see that alternative asymptoti- cally equivalent
tests may in small samples yield systematically conflicting
inference for a chosen nominal size. Such conflict emerges at the
5% level in Hendry (1980, equation (11)), Mizon and Hendry (1980,
Table II) and in Davidson and Hendry (1981, equation (13) and
Tables 4b and 4c).
5. THE MONTE CARLO DESIGN
In the Monte Carlo experiments the actual data generation
process was the AD(1, 1) model
Yt = yYt-i + o30xt + 1xt_1 + et with JyJ < 1 and et IIN(0,
o-2). (25) We calculated the misspecification tests of interest
from four estimated regression equations (each of which included a
constant term): the correct AD(1, 1) model; the overparameterised
AD(2,2) specification; the misspecified AD(0, 0) model; and the
AD(1) model. For the first two regressions, the empirical rejection
frequencies estimate the tests' true significance levels and are
used to assess sensitivity to redundant regressors. For the two
remaining regressions, the rejection frequencies estimate the power
of the tests against two particular types of dynamic
misspecification. Note that the AD(0, 0) model excludes any
dynamics; the AD(1) model specifies a parsimonious univariate AR(1)
time-series process. These experiments do not cover all conceivable
circumstances, but in our view tests that fail in these simple
cases should no longer be used in models containing lagged
dependent variables.
For (25) two alternative processes to generate xt were
investigated.
xt= Axt + + et, (26) where et IIN(0, o-2) and {JE}, {le}
mutually independent, then (i) A I< 1 and u = 0 generates a
stationary AR(1) process for xt, while (ii) A = 1 generates a
(nonstationary) random walk with drift parameter ,u. Appendix A
shows how to obtain starting values for the xt and Yt series in a
computationally efficient and statistically satisfactory way.
The Monte Carlo design fixes the values of 12 parameters: {y,
30,I3l, A, k, X-2, @2 o'2, T, m, n, a}. We chose the sample size T,
the post-sample size m the order of
serial correlation n, and the nominal significance level a by
the grid T e {20, 40, 80}, m E {4, 8, 20}, n E {1, 4, 8}, a E
{0-01, 0 05, 0-10}. The values of the coefficients y, o30 and P,3
determine the systematic dynamics of the model and we selected
values we believe are relevant for econometric practice. We first
chose the true coefficients of the lagged dependent variable as y E
{0 5, 0 9}. Given y, we then chose o3o and I,3 to obtain values of
the total multiplier (the long-run effect of a unit increase in x
on y) as
TMP = (O+f1)/1(1-fy) E {0*2, 1*0, 5-0},
and values of the immediate standardized impact multiplier (the
proportion of the total multiplier realised instantaneously) as
SIMo = Po/ TMP E {0.2, 0 5, 0 8}.
-
250 REVIEW OF ECONOMIC STUDIES
The sets for y, TMP and SIMo define a grid of 18 combinations of
coefficient values, which are detailed fully in Appendix B. The
values of the parameters in the generating process (26) for the
explanatory variable were fixed as follows. In the stationary case
we set A = 0-8; in the non-stationary case we fixed ,u = 0-02 and
the starting value x-2 = 1. In all cases we fixed o-2 = 1, and
after some initial experimentation chose o-2 = 3, to give
reasonable values for the coefficient of determination R2 in the
AD(1, 1) specification (see Appendix B).
What number of replications M would be adequate for the present
purpose? The probability p that a test statistic leads to rejection
of the null hypothesis is estimated by the corresponding rejection
frequency p in M replications of each Monte Carlo experi- ment. The
variance of p will be p(l -p)/M, and a 95% confidence interval for
p will be approximately (for large M)
[?p-2pf( 1 -?)/M, ?++ 2A/?( 1-?)/ M]. (27) This implies that M
should be around 10,000 (which is prohibitive) if we want to
estimate any p E [0, 1 ] by a 95 % confidence interval no wider
than + ? 001. Our results are obtained from merely 500
replications. Because this choice of M implies a relatively large
confidence interval for ? 001 , we will not be able to analyse
satisfactorily the differences between actual and nominal sizes at
a = 001 and so report no details for this case. However, our
results support the conjecture that any considerable difference
between a test statistic's small sample and asymptotic
distributions usually affects the entire right- hand tail area in
the same direction, so it would be rash to suppose that a test
statistic that performs badly at the 5% and 10% levels might
perform better at the 1% level.
Ideally, the observed significance level of the tests examined
in the next two sections should be close to the nominal level,
regardless of the values of the parameters; failure at any
parameter set is sufficient to disqualify a test for practical
purposes, as we wish to use it when the parameters are either
unknown (i.e. coefficients) or uncontrolled (e.g. the regressors
and sample size). Our results for type I errors are reported using
simple summary statistics (minimum, mean and maximum) over all 18
coefficient combinations of either the stationary or nonstationary
x, series, or over all 36 different DGP's considered. The precision
determined by the confidence interval (27) leads us to suggest that
a test fails the criterion of correct size if the estimated
significance level exceeds 0 09 or is below 0-02 at the 5% nominal
level, or if it exceeds 0-15 or is below 0-06 at the 10% nominal
level. Of course, power values deserve a more detailed presentation
since they will vary over the different parameter sets.
Finally, note that different series of random numbers {JE}, {l}
were generated for each of the 36 different DGP's defined by the 18
coefficient combinations and x, either stationary or
non-stationary. In each replication of a DGP one hundred
observations on the relevant explanatory and dependent variables
have been generated, and these data were used in all the
experiments for the various different values of T, n, m and a.
6. RESULTS FOR TESTS FOR SERIAL CORRELATION
When the model with the correct AD(1, 1) specification is
estimated, the test rejection frequencies estimate the actual
significance level at a given critical value of the asymptotic
distribution. We first discuss the results for the inadequate but
popular Durbin-Watson and Box-Pierce type tests. Note that in all
the Tables rejection frequencies are expressed as a proportion of
the 500 replications.
-
KIVIET MISSPECIFICATION TESTS 251
TABLE III
Rejection frequencies of the Durbin- Watson bounds- and
"asymptotic" tests at the nominal 5% level in the correctly
specified AD(1, 1) model over the 36 combinations in the Monte
Carlo experiments
dL du "asymptotic" d Sample size min mean max min mean max min
mean max
T=20 0.000 0-002 0-008 0-104 0-201 0-322 0 004 0-024 0-046. T=40
0.000 0*007 0-018 0-046 0.110 0-204 0-006 0-028 0 058 T=80 0.000
0-013 0-028 0-018 0*074 0-124 0.000 0 033 0-062
Table III presents rejection frequencies, averaged over all 36
combinations of coefficients and exogenous variables, for the
Durbin-Watson test statistic DW at the tabulated 5% critical lower
and upper bounds dL and du. We see that the actual significance
level is always very low if dL is used, but becomes very irregular
and generally too high if the inconclusive region is added to the
critical region by using du as the critical value. If we avoid the
inconvenience of the bounding critical values and use T12 _ (1-
DW/2) as an approximately asymptotically normal test for serial
correlation then we obtain uncalibrated statistical inference: in
general the actual significance level is too low, but is sometimes
fortuitously correct.
If we apply the DW test to the overparameterised AD(2, 2) model
(using du and dL values for K = 6) the significance level is nil
for dL, varies wildly with T (we found values from 0-00 to 0-56)
for du, and is very small for the asymptotic test. These results
confirm that in the model with lagged dependent variables the DW
statistic cannot give valid evidence about autocorrelated
disturbances; therefore it is best not calculated at all.
We investigated both BP and LBP variants of portmanteau residual
correlation tests, but as the AD(1, 1) model includes one lagged
dependent variable we have mo = imo = 1 and so the tests cannot be
performed for serial correlation of order n = 1. Therefore we
considered n E {2, 4, 8} for these two tests; the statistics are
then compared with critical values of the x2 distribution with 1, 3
and 7 degrees of freedom respectively. As with the DW test there is
little difference between the stationary and the non-stationary
cases. Table IV shows rather high rejection frequencies (even with
T = 80) especially for low values of n. Because of this and the
wide gap between the minimum and maximum rejection frequencies over
the 18 different coefficient combinations, these tests cannot be
recommended. The unstable true significance levels-anything from
half to three times
TABLE IV
Rejection frequencies of the Box- Pierce and the Ljung- Box test
in the correctly specified AD(1, 1) model
Stationary data, 5% level Non-stationary data, 10% level Sample
size,
order of serial BP LBP BP LBP correlation min mean max min mean
max min mean max min mean max
T=20 n=2 0-04 0-08 0-15 0-06 0-12 0-21 0 10 0X19 0-33 0X14 0-25
0X42 n=4 0-03 0X04 0-07 0-06 0-10 0X14 0 05 0 10 0X14 0X12 0-18
0-27 n=8 0.01 0-02 0 05 0-06 0 09 0-14 0-02 0 05 0-08 0-12 0X17
0-25
T=40 n=2 0 03 0X08 0-14 0 04 0.10 0-17 0 10 0 19 0 31 0-12 0X22
0 34 n=4 0-02 0 05 0 09 0 04 0 07 0X12 0-08 0-12 0 19 0 10 0X17 0
25 n = 8 0-02 0 04 0 07 0 05 0X08 0-12 0 05 0-08 0X12 0 11 0X16
0X20
T=80 n=2 0-04 0 09 0X16 0 04 0 10 0X18 0 09 0-20 0-28 0 10 0X21
0 30 n=4 0 03 0-06 0X10 0-04 0-07 0 11 0X08 0X13 0-20 0 10 0,16 0
23 n=8 0-02 0 05 0-08 0 04 0 07 0 11 0-08 0 11 0 16 0 10 0,14
0-21
-
252 REVIEW OF ECONOMIC STUDIES
TABLE V
Rejection
frequencies of
some of
the
Lagrange
multiplier
type
tests
for
serial
correlation at
the
nominal
5%
level in
the
AD(1, 1)
and
AD(2, 2)
models
averaged
over
the 18
coefficient
combinations
(stationary
data, a = 0
05)
AD(1, 1)
AD(2,2)
T
n
LM
LM*
LMW*
LMP*
LMN*
LML*
LMD*
LMC
LMF
LM
LMP*
LMN*
LML*
LMG
LMF
20
1
0-08
0 05
0-08
0-04
0-05
0*07
0-08
0-05
0-05
0-12
0 03
0-06
0.10
0-06
0-06
4
0-09
0-02
0-21
0-04
0-04
0-04
0-16
0-06
0-06
0 15
0-06
0-04
0-05
0-06
0-06
8
0-04
0.00
0-50
0-05
0-03
0-02
0-33
0-06
0-04
0-13
0 05
0 03
0 03
0 07
0.05
40
1
0-06
0-05
0-06
0-04
0-05
0-05
0-06
005
0-05
007
003
0-06
0 10
0-05
005
4
0-06
0-03
0 09
0-05
0.05
0.05
0 11
0-04
0-04
0-08
0 07
0.05
0-07
0.05
0.05
8
0-04
0-02
0 17
0.05
0-04
0-04
0-18
0-04
0-04
007
0 05
0-04
0-05
0-04
0-04
80
1
0*05
0-05
0.05
0-05
0-05
0-05
0-05
0-05
0-05
0-06
003
0-06
0 10
005
005
4
0-05
0-04
0-07
0.05
0-06
0.05
0-08
0-04
0-05
0-06
0 07
0-04
0 07
0 05
0.05
8
0-04
0-03
0 09
0.05
0.05
0.05
0-12
0-04
0-04
0-05
0-06
0-04
0-06
0-04
0-04
-
KIVIET MISSPECIFICATION TESTS 253
the nominal level-preclude effective use of BP or LBP in a model
selection strategy. This conclusion is confirmed in the
overparameterised AD(2, 2) model and when the LBP test is used with
the xn (rather than xn- 0) critical value. In the AD(1) model with
n =20 Davies, Triggs and Newbold (1977) find significance levels
for BP considerably less than those predicted by asymptotic theory,
especially for small T and y. Our results indicate the opposite
finding for low values of n in models with exogenous
regressors.
We now look at the significance levels of the asymptotically
valid Lagrange multiplier type tests. Table V presents the average
rejection frequencies over the 18 parameter combinations for the
stationary data at the 5% level. The results for the non-stationary
data lead to the same conclusions and are not reproduced here. We
list results for only the degrees of freedom corrected tests and
the popular T- R2 version of LM. The rejection frequencies for the
latter test appear to increase in the presence of redundant
regressors, particularly for T = 20, while LM* fails in the
correctly specified model especially for large values of n relative
to T Applying the inequalities in (12) to the results in Table V
implies that the LMW, LMP, LMN, LML and LMD tests are even less
attractive than their degrees of freedom corrected counterparts,
which themselves have generally unsatis- factory type I error
probabilities. LMW* and LMD* in particular have extremely high
rejection frequencies for small T and large n, while LMP*, LMN* and
LML* are adversely influenced by overparameterisation. This
provides evidence against the conjec- ture that omitting
asymptotically negligible terms gives better small sample
behaviour. We also examined the frequency in the simulations of
negative values for the tests (LMP, LMN, LML and LMD) that have
non-definite quadratic forms in the numerator of the statistic.
Negative values are particularly associated with specific
coefficient combinations; frequencies averaged over all
combinations ranged from 0 00 to 0 09 in the AD(1, 1) model, and
from 0 00 to 0-36 in the AD(2, 2) model.
It appears from Table V that the LMC and LMF statistics are the
only ones that satisfy the criteria of correct size and invariance
to overparameterisation. Table VI provides more detailed results
for these two tests. Taking into account the Monte Carlo sampling
variability approximated by (27) the results for LMF and LMC are
satisfying on the whole, especially for T o40 (LMC appears a bit
vulnerable in the extreme case T= 20 and n = 8). All other tests
for serial correlation investigated here are found to be unfit in
some respect.
TABLE VI
Estimated type I error probabilities of the LMF and LMC test
AD(2, 2) model AD(1, 1) model stationary data, 5% level
non-stationary data, 10% level
LMF LMC LMF LMC T n min mean max min mean max min mean max min
mean max
20 1 0-04 0-06 0 07 0 05 0 06 0*07 0-08 0-10 0 13 0-08 0 10 0 13
4 0-04 0-06 0-08 0-04 0-06 0-09 0*08 0.11 0 15 0-08 0 12 0 14 8
0-03 0 05 0-06 0 05 0 07 0-09 0-06 0-10 0 14 0-08 0-12 0-16
40 1 0-03 0-05 0-08 0-03 0.05 0*08 009 0*10 0*13 0.09 0.10 0 13
4 0-04 0.05 0-06 0*04 0.05 0-06 0 08 0.11 0*14 0-08 0.11 0.15 8 0
03 0 04 0*05 0 03 0-04 0-06 0-07 0*10 0*14 0-07 0*10 0-14
80 1 0 04 0-05 0-07 0 04 0:05 0 07 0 09 0-10 0 13 0 09 0 10 0-13
4 0 03 0 05 0 06 0 03 0 05 0-06 0-05 0-10 0-14 0-05 0.10 0-14 8
0*03 0 04 0-06 0 03 0*04 0-06 0-06 0.10 0.12 0-06 0.10 0-12
-
254 REVIEW OF ECONOMIC STUDIES
In the discussion of Breusch and Godfrey (1981, p. 108), Osborn
wonders how the test LMW (T - K - n)/ T compares with LM: she shows
there is no simple systematic inequality between them. As we see
from (14) this test has a significance level below that of LMW* but
exceeding that of LMF. From Table V, we conclude that this
particular degrees of freedom correction to LMW will not outperform
LMF.
Coefficient combination 5 in our simulation experiments (see
Appendix B for details) is very closely related to the pilot
simulation study in Davidson and Hendry (1981) who find acceptable
actual significance levels for the crude Lagrange multiplier test
LM Our Monte Carlo study suggests that this results from their
large sample size (T = 74) relative to the number of coefficients
in a model with no redundant regressors.
We now examine the power of the tests with respect to
alternative hypotheses that do not correspond to the DGP (25). As
tests for serial correlation are often used as general diagnostic
checks to reveal any serious misspecifications, including the
omission of lagged (dependent) variables, it is important that
these tests should have power in AD models against misspecification
of the dynamic adjustment process. Table VII presents rejection
frequencies for the individual coefficient combinations for some
tests of the misspecified AD(O, 0) and AD(1) models. Applying the
DW test to the AD(O, 0) model (which excludes any lagged dependent
variables) gives quite attractive power figures even when dL is
used as the critical value. This test has a significance level
below the nominal level in an adequately specified model;
nevertheless its power here outperforms the LMF test. Note that in
this model to = 0 so the BP, LMN and LML tests are equivalent. The
power of BP appears to exceed that of LMF, although this may be
overstated as BP may have too high a rejection frequency even when
the AD(O, 0) specification is correct. However, we know the LMF
test has a significance level very close to the nominal level
TABLE VII
Rejection frequencies of some particular tests for serial
correlation when applied to misspecified models at the nominal 5%
level (stationary data). DGP is AD(1, 1)
AD(O, 0) model AD(1) model
dL BP (n =4) LMF
Combi- n =1 n =4 LMF (T = 40) nation TMP MNL T = 20 T=40 T=20
T=40 T=20 T=40 T=20 T=40 n = 1 n = 4
1 0-2 8 0-75 1.00 0-55 0-97 0-74 0-96 0-5o 0-78 0-07 0-06 2 1-0
8 0-78 1-00 0 52 0-98 0-75 0-98 0-53 0-83 0-09 0-08 3 5-0 8 0-97
1-00 0-84 1-00 0-95 1.00 0-85 0-99 0-09 0-07 4 0-2 5 0-74 1.00 0-49
0-98 0-71 0-97 0-49 0 73 0-06 0-04 5 1-0 5 0-77 1.00 0-52 0-97 0-76
0-97 0-50 0-82 0-06 0-07 6 5-0 5 0-93 1-00 0-75 1.00 0-93 1-0'Y
0-77 0-98 0-06 0-06 7 0-2 2 0-76 1-00 0-55 0-98 0-74 0-97 0-52 0-78
0-06 0-05 8 1-0 2 0-77 0-99 0-49 0-97 0-74 0-97 0-47 0-77 0-04 0-05
9 5-0 2 0 87 1-00 0-67 0-99 0-85 0-99 0-66 0-90 0-08 0-05
10 0-2 1-6 0-25 0-81 0-12 0-49 0-25 0-36 0-12 0 05 0 05 0 05 11
1-0 1-6 0-49 0-92 0-23 0-66 0-47 0-56 0-21 0-07 0-05 0 07 12 5-0
1-6 0-70 0-99 0-31 0-86 0-70 0-76 0-28 0-09 0-09 0-08 13 0-2 1-0
0-23 0-81 0-13 0*50 0-22 0-36 0-12 0-04 0-06 0-07 14 1*0 1*0 0-42
0-93 0-20 0-63 0-41 0-53 0-17 0-06 0-06 0-04 15 5-0 1-0 0-67 0-97
0-27 0-82 0-64 0-71 0-26 0-09 0.10 0-06 16 0-2 0-4 0-25 0-83 0-13
0-50 0*26 0-36 0-12 0-03 0-04 0-05 17 1-0 0 4 0*28 0-84 0-14 0-50
0-24 0-37 0-12 0-04 0-04 0-05 18 5-0 0-4 0-58 0*97 0-23 0-72 0-52
0-61 0-19 0-06 0-06 0-05
-
KIVIET MISSPECIFICATION TESTS 255
even in small samples and is a reliable model selection
guideline for this reason. In the AD(O, 0) model the power
increases with the mean lag MNL, the total multiplier TMP, and the
sample size (except for high n and low -y).
Whether the power figures in the AD(O, 0) model are satisfactory
is a matter for personal judgement. This type of misspecification
is often tested by applying the DW or LMF tests, yet cases of
insignificant test statistics are frequently found especially for y
= 0 5 and n = 4. Since we find that serial correlation tests may
have reasonable power against general alternatives, this
illustrates the message in Hendry and Mizon (1978) that, for
instance, a significant DW statistic should not automatically be
followed by estimating an autocorrelated error process, but rather
by the specification of a more general AD model that allows for
systematic dynamics instead of pure disturbance dynamics.
The power of the LMF test in the misspecified AD(1) model is
very disappointing: in many cases it is indistinguishable from the
nominal and actual significance level. This also holds for the
non-stationary data. So if the DGP is an AD(1, 1) model a model
selection strategy starting from an incorrect AR (1) univariate
time-series model and which tests for residual autocorrelation is
unlikely to reject this simple model. This illustrates the
assertion in Hendry and Richard (1983, p. 11) that the randomness
of the residuals is a necessary but by no means sufficient
condition for the adequacy of a model's specification.
7. RESULTS FOR THE POST-SAMPLE PREDICTION TESTS
All the tests for predictive failure we investigate are only
asymptotically valid. Table VIII presents the main results,
averaged over the 18 coefficient combinations, for the correctly
specified AD(1, 1) model. Even a sample size of T = 80 is too small
for the PR and PR* tests to exhibit their asymptotic qualities; in
moderate sample sizes these tests have too large actual
significance levels leading to a too frequent incidence of type I
errors. As the LRF statistic yields a test with actual size close
to nominal size over the whole experimental design, it appears that
omitting asymptotically negligible terms (to obtain PR and PR* from
(21)) worsens small sample behaviour. Although the results for the
crude likelihood-ratio test LR are very poor, it is remarkable that
(except for the extreme case T = m = 20) the simple Edgeworth
correction in LRC gives much better results.
TABLE VIII
Rejection frequencies of post-sample predictive tests in the
correctly specified AD(1, 1) model, averaged over the 18
coefficient combinations
stationary data; 5% level non-stationary data; 10% level T m PR
PR* LR LRC LRF PR PR* LR LRC LRF
20 4 030 0-21 0-16 0-06 0-06 0-41 0-32 0-26 0.11 0 11 8 0-42
0-31 0-26 0 07 0-06 0 55 0 43 0 37 0-12 0-12
20 0-61 0 47 0 57 0 10 0-06 0 74 0-61 0-69 0-17 0-12
40 4 0-15 0-12 0.10 0-06 0-06 025 0-20 0-18 0 11 0.11 8 0-20 0
15 0-13 0-06 005 0-32 0-26 0-23 0-12 0.11
20 0-32 0-24 0-30 0*07 0-06 0 47 0-38 0-42 0-13 0.11
80 4 009 0-08 007 005 005 0-16 0-14 0-13 0.10 0.10 8 0.11 009
0*09 0*05 005 0.19 0-16 0 15 0 10 0 10
20 0-16 0-13 0 15 005 005 0-26 0-22 0-24 0 10 0.10
-
256 REVIEW OF ECONOMIC STUDIES
The superiority of LRF is again found in the overparameterised
AD(2, 2) model dealt with in Table IX. Including redundant
regressors appears to further increase the significance levels of
PR and PR*, perhaps leading to the unnecessary extension of an
already overparameterised (but otherwise adequate) model. The LRF
and LRC tests prove to be relatively invariant with respect to both
overparameterisation and the coefficient values of the DGP.
In Table X the power of the LRF test is shown to be very poor
with respect to the AD(O, 0) and AD(1) alternatives considered.
Apparently the generated sample data are too smooth to produce
serious prediction errors. We observe the curious phenomenon that
the rejection frequencies decrease for larger sample sizes. This
occurs because, for
TABLE IX
Rejectionfrequencies of somepost-samplepredictive tests at the
nominal 5% level in the overparameterizedAD(2, 2) model (stationary
data)
LRC LRF T m PR* LR min mean max min mean max
20 4 0-28 0-22 004 0-06 007 0 04 0-06 0 07 8 039 034 004 007
0.10 004 0*06 009
20 0*56 0-68 0 07 0.11 0-15 0 04 0-06 0-10
40 4 0*14 0.11 0 04 0-06 0 07 0 04 0*06 0*08 8 0*18 0*16 0*03
0*05 0*08 003 0*05 0-07
20 0*29 0*35 0*03 0-06 0*09 0-02 0-06 007
80 4 009 0-08 004 0*05 007 004 0*05 007 8 0.11 0.10 0.03 0*05
0*07 003 0*05 0*07
20 0*15 0-17 0*04 0*05 0*07 0*03 005 0-06
TABLE X
Rejection frequencies of the LRF test at the nominal 5% level in
misspecified models. DGP is AD(1, 1)
AD(0, 0) model AD(1) model
stationary data stationary data non-stationary data
Combi- m=4 m=8 m=4 m=8 m=4 m=8 nation T=20 T=80 T=20 T=80 T=20
T=80 T=20 T=80 T=20 T=80 T=20 T=80
1 0*24 0.00 0*30 0.00 0-06 0*04 0*07 0*04 0-07 0-06 0-08 0*05 2
025 0.00 0*33 0.00 009 0*05 0*09 0*06 0*08 0-06 0*08 0-06 3 0.25
0.00 0.35 0.01 0 09 0*06 0*09 0-06 0*12 0.11 0-12 0-14 4 0.20 0.00
0*29 0.01 0*06 005 0*06 0*05 0*07 0*05 0*07 0*05 5 0.19 0.00 0-28
0.00 0*06 0-06 0*06 0-06 0*07 0 05 0*09 0-06 6 0-26 0.00 0*35 0.00
0.06 0*04 0-06 0*06 0*07 0*07 0*07 0-08 7 0.19 0.00 0.27 0.00 0.06
0-06 0-06 0-06 0-08 0-06 0*09 0 04 8 0*22 0.00 0*30 0.00 0*05 0*05
0 04 0-06 0-06 0*04 0*07 0 04 9 0-22 0.00 0-27 0.00 0-06 0*05 0-08
0 05 0*07 0 04 0*09 0 03
10 009 0.00 0*12 0.00 0.05 0*05 0*05 0*04 0*07 007 0*06 007 11
0-12 0.00 0-13 0.00 0.06 0*06 0 05 0*06 0*09 0*08 0-10 0-08 12 0-16
000 0.15 0.00 009 0*08 009 0*09 0*14 0.09 0-18 0*12 13 0*08 0.00
0.10 0.00 0.05 0-06 0-06 0*07 0 05 0 07 0-06 0-06 14 0-11 0.00 0-14
0.00 0.06 0 07 0-06 0 05 0-08 0-06 0-08 0-08 15 0-13 0.00 0-17 0.00
0-08 0-07 0-10 0 07 0-08 0 07 0.10 0 09 16 0.09 0.00 0-12 0.00 0.05
0-05 0-04 0-05 0-06 0.05 0.05 0-05 17 0 09 0.00 0-11 0 00 0 05 0Q04
0 04 0*06 0-06 0 05 0 09 0-06 18 0-11 0.00 0-12 0.00 0-08 0-06 0-06
0-06 0 05 0 07 0 04 0 07
-
KIVIET MISSPECIFICATION TESTS 257
stationary regressors and fixed m plim (1/ T)RSST+m = plim (1/
T)RSST even in mis- specified models, depriving the test of its
power in large samples.
We conclude that although a post-sample prediction test with a
correct size in small samples does exist, detecting a dynamic
misspecification is likely only if the relevant regressor variable
x, is already included in the specification. Starting with a
parsimonious univariate time series model wth omitted (lagged)
variables, it is doubtful that this will be detected by such a
test. However, varying data correlations-not considered in our
Monte Carlo design-will obviously enhance the power of a
post-sample prediction test.
8. CONCLUSIONS
Misspecification tests are important tools in empirical
modelling, but they can only be effective if the user has control
over the probability of type I errors. Their usefulness improves if
they have high power against a wide range of alternative model
specifications. In a simulation study we investigated many versions
of two general types of mis- specification tests applied in finite
samples to a single equation linear regression model with a lagged
dependent variable and normally distributed disturbances. We found
that the rejection probabilities of the tests may vary
substantially for different parameter values of the data generation
process. For particular tests this may occur both in misspecified
models (as is to be expected) and in adequately parameterised or
overparameterised models. Because of this lack of robustness, a
model builder who is ignorant of the parameter values of the data
generation process can be led astray in the model selection
process.
Our Monte Carlo results corroborate theoretical findings that
the Durbin-Watson, Box-Pierce, and Ljung-Box tests for serial
correlation should not be applied in regression models wth lagged
dependent variables. In such models these statistics are best not
calculated at all as they have no sound interpretation. The
simulations also reveal that even asymptotically valid tests such
as (generalizations of) Durbin's h-statistic and various
formulations of the Lagrange multiplier test-including the popular
T- R2 version-have poor small sample properties. However, the
Lagrange multiplier type F-test, denoted here by LMF (and
computationally as simple as the T- R2 version), appears to have a
type I error probability that is relatively invariant to sample
size, order of serial correlation, true coefficeint values, and
redundant regressors. Test LMF is suggested in Harvey (1981, p.
277) where it is referred to as the "goodness of fit F-test", but
where the appropriate degrees of freedom correction goes
unrecorded. On the basis of our simulation results, it seems
reasonable to ignore the fact that LMF does not have an exact F
distribution in dynamic models, and to use critical values from the
F distribution with n and T - K - n degrees of freedom. The other
test statistics could only be used with confidence by evaluating
appropriate critical values by simulation, as suggested in Bera and
Jarque (1982); we believe this to be an infeasible alternative for
the practitioner; besides there is no indication that tests would
be obtained with better power characteristics than LMF.. Hence,
from the computationally simple tests for serial correlation
investigated here, we recommend LMF; the Durbin-Watson test might
be preferable only if no lagged depen- dent variables are included
in the specification (as it is then UMP for particular X matrices
and specific alternative hypotheses).
For the post-sample prediction tests examined, the F test
(denoted LRF here) is also the most reliable. The divergence of the
asymptotic distribution from the finite-sample distribution of test
statistics is well-illustrated here by the poor properties of the
likelihood- ratio test.
-
258 REVIEW OF ECONOMIC STUDIES
For both the LMF and LRF tests, we found that multiplying the
corresponding likelihood-ratio test statistics by a simple
Edgeworth-based scalar correction factor pro- duces tests that
(apart from some extreme cases) have rejection frequencies almost
equal to the chosen F tests. As these Edgeworth corrected
statistics are used with x2 critical values-which are much easier
to memorize than F critical values-practitioners might prefer to
use these LMC and LRC statistics. In the context of non-linear
regression both the F- and the Anderson reformulation of test
statistics are employed in Mizon (1977b).
Of all the test statistics investigated here, only the F tests
and the LMC and LRC tests could usefully be re-examined to analyse
their rejection probabilities more thoroughly over a wider
parameter space, perhaps by using response surface techniques. The
present Monte Carlo study has revealed only that many test
procedures are deficient in small samples. It also suggests that it
is questionable whether serial correlation and predictive failure
tests will be very effective in detecting misspecification of an AD
model, especially if the specification search starts from a simple
ARMA representation while the exogenous explanatory variables of
the AD process are themselves modelled parsimoniously by ARIMA
processes.
APPENDIX A
It is not necessary-and it is computationally inefficient-to
adopt starting values, say x50 -0 = Y-50, to generate the required
values {yt, xt; t = -1,..., T+ m} according to the formulas (26)
and (25), as has been done frequently in comparable simulation
studies. (When estimating the AD(2, 2) model Y-i and x-, are
needed.) Since general ARMA (p, q) series of length N can be
synthesized exactly from N + q IIN(O, 1) drawings (see McLeod and
Hipel (1978)), both the waste of random numbers and small sample
non-stationarity problems as indicated in Hendry (1979, Appendix B)
can be avoided as follows. Using the lag-operator L, (25) may be
rewritten
Yt = 8 1- L 1-L (A.1)
Substitution of (26) for the stationary xt series leads to
/30 +f 1 L 1 yt (1 -yL)( 1-L e' + 1-y Et. (A.2)
Hence, Yt is the sum of an ARMA (2, 1) process and an
(independent) AR (1) process, which in general leads to an ARMA (3,
2) process, see Granger and Newbold (1977, p. 29). Because in (A.2)
both processes have the factor (1 - yL)-1 in common, yt reduces to
an ARMA (2, 1) process.
Now the series {yt, xt; t =-1,..., T+ m} can be generated from
the mutually independent white-noise series {It; t=1, . ..., T+m}
and {4t; t = -2, .. ., T+ m} in the following way. By means of the
method of McLeod and Hipel we generate the observations w-2, wl and
v-, of the AR (2) process wt = [(1 - yL)(1 - AL)]-lgt and the AR
(1) process vt = (1 - yL)-'Et. Then, as
Yt = (,o +1 L) wt + vt, (A.3)
we can calculate y-1. From {le} we also generate the AR (1)
process {xt; t - -1, . . ., T+ m} and then the remaining required
observations of {yt; t- -1, ..., T+ m} are obtained directly from
the generating formula (25).
-
KIVIET MISSPECIFICATION TESTS 259
For the experiments where x, is non-stationary we have
Xt =Xt-1+9 +6t= X-2+(t +2)g + = -1 ei, (t'-1). (A.4)
Now the explanatory variable can be calculated directly from a
given starting value x2, from the drift parameter ,, and from the
white-noise series {e}. Substitution of (A.4) in (A.1) leads to
I30 + 16 Et __ _ __ Yt= l [x2+(t+2)]+(p0 + w L)E - t L+ 1-'L
Hence, the time-series {yt} can be calculated from
Yt= - 6 6
[x_2+ (t+2),] + (P30+PL) Et=-, ui + vt, (A.5)
where vt is again the AR (1) process vt = (1 - yL)-1Et and ui =
(1 - yL)1'si is another AR (1) process with the same
autocorrelation function, but generated from the white-noise series
{ti}.
APPENDIX B
Table XI details all 18 possible coefficient combinations of the
AD(1, 1) model that serves as the DGP in the Monte Carlo
experiments. For each combination it reports the various dynamic
adjustment characteristics and the R2 values averaged over the 500
replications for both types of exogenous explanatory variable xt
obtained in the AD(1, 1) specification. For the definition of these
dynamic characteristics see Harvey (1981, pp. 224-235). Note that
the mean lag MNL is defined only if all lag coefficients in
(,80+,81L)(1k-AL)-1 have the same sign. Since -y>O and 180>0
in our models, this happens if /3 > - y,80, which is always the
case here. The formula for the mean lag in (25) is
MNL = 31/(,30 +f 31) + y/ (1 - y), (P1 > - YPo0)
The median lag measures the number of time periods it takes for
50% of the total multiplier to be realised. Since the standardised
interim impact multipliers are
SIM_ =13?/TMP i = 0 I 1- y l(/ + YP0)/1( 0 +' ,1) = 1 - y i ( -
SIMO) i '-1
the median lag MDL becomes
PI0 SIMo-0 05 MDL= (0 5- SIMo)/(SIM, - SIMo) SIM1 '0 5
(ln y)1- * ln (0 5/(1 - SIMO)) otherwise.
First version received July 1983; final version accepted October
1985 (Eds.) I would like to thank for their valuable comments on
earlier versions of this paper: Anil Bera, David
Hendry, Jan Podivinsky, Stephen Pollock, participants of a
session at the 1981 European Meeting of the Econometric Society in
Amsterdam and members of seminars at the University of Leeds and
Erasmus University Rotterdam. Also the helpful suggestions from my
colleagues at the University of Amsterdam, especially Mars Cramer,
are gratefully acknowledged. Finally, I want to thank Managing
Editor, Grayham Mizon and a number of anonymous referees of this
Review for their remarks and guidance.
-
260 REVIEW OF ECONOMIC STUDIES
TABLE
XI
mean R2
x
stationary
x
non-stationary
A =
0-8
X-2 = 1; A =
0*02
AD(1, 1)
Dynamic
adjustment
O2= 3;
o-21
O2= 3;
o=2
coefficients
characteristics
-
---
Combination
y
10
TMP
SIMO
SIM,
MNL
MDL
T20
T40
T80
T= 2O
T40
T80
1
0*9
0-04
-0-02
0-2
0-2
0-28
8
4-46
0-580
0-665
0
734
0-623
0-723
0-819
2
0.9
0*20
-0-10
1-0
0-2
0-28
8
4-46
0-650
0
754
0-816
0-842
0-933
0-971
3
0-9
1 00
-0-50
5.0
0-2
0-28
8
4*46
0
933
0-963
0
977
0*988
0*996
0-999
4
0-9
0*10
-0*08
0-2
05
0-55
5
0
0-590
0-684
0-746
0-640
0-750
0-831
5
0-9
0-50
-0-40
1*0
05
O55
5
0
0-738
0-808
0-851
0*905
0951
0*979
6
0 9
2-50
-2*00
5-0
0.5
0-55
5
0
0*978
0-984
0*988
0*994
0*997
0-999
7
0-9
0-16
-0*14
0-2
0-8
0-82
2
0
0-621
0-694
0*744
0*682
0-779
0-853
8
0-9
0-80
-0-70
1.0
0-8
0-82
2
0
0-843
0-870
0-890
0*944
0-969
0-983
9
0*9
4-00
-3-50
5.0
0-8
0-82
2
0
0-989
0-992
0-993
0-997
0-999
0-999
10
0.5
0-04
0 06
0-2
0-2
0-60
1-6
0 75
0*344
0-344
0-348
0-541
0-657
0-765
11
05
0*20
0*30
1.0
0-2
0 60
1-6
0*75
0-760
0
812
0-843
0-931
0-967
0-984
12
05
1.00
150
5.0
0-2
0 60
1*6
0 75
0-983
0.990
0-992
0*996
0-999
0-999
13
0-5
0.10
0
0-2
0-5
075
1 0
0
0340
0343
0-364
0-535
0-642
0-758
14
0.5
0-50
0
1-0
05
0-75
1.0
0
0-779
0-830
0
853
0-929
0-966
0-984
15
0-5
2 50
0
50
05
0-75
1.0
0
0-987
0991
0-993
0-997
0-999
0-999
16
05
0-16
-0-06
0-2
0-8
0*90
0-4
0
0-371
0-369
0-374
0-560
0-664
0-783
17
0.5
0-80
-030
1.0
0-8
090
0-4
0
0-821
0855
0-871
0-948
0-971
0-985
18
05
400
-1.50
5.0
08
090
0-4
0
0991
0-993
0-994
0-998
0-999
0-999
-
KIVET MISSPECIFICATION TESTS 261
REFERENCES
ANDERSON, T. W. (1958) An Introduction to Multivariate
Statistical Analysis (John Wiley & Sons) ANDERSON, G. J. and
MIZON, G. E. (1984), "Parameter Constancy Tests: Old and New",
(discussion paper
NO. 8325, University of Southampton) BERA, A. K. and JARQUE, C.
M. (1982), "Model Specification Tests; A Simultaneous Approach",
Journal
of Econometrics, 20, 59-82. BOX, G. E. P. and PIERCE, D. A.
(1970), "Distribution of Residual Autocorrelations in
Autoregressive-
Integrated Moving Average Time Series Models", Journal of the
American Statistical Association, 65, 1509-1526.
BREUSCH, T. S. (1978), "Testing for Autocorrelation in Dynamic
Linear Models", Australian Economic Papers, 17, 334-355.
BREUSCH, T. S. (1979), "Conflict among Criteria for Testing
Hypotheses: Extension and Comments", Econometrica, 47, 203-207.
BREUSCH, T. S. and GODFREY, L. G. (1981), "A Review of Recent
Work on Testing for Auto-Correlation in Dynamic Simultaneous
Models", in Currie, D., Nobay, R. and Peel, D. (eds) Macroeconomic
Analysis (London: Croom Helm).
BREUSCH, T. S. and PAGAN, A. R. (1980), "The Lagrange Multiplier
Test and Its Applications to Model Specification in Econometrics",
Review of Economic Studies, 47, 239-253.
CHOW, G. C. (1960), "Test of Equaltiy between Sets of
Coefficients in Two Linear Regressions", Econometrica, 28,
591-605.
DAVIDSON, J. E. H. and HENDRY, D. F. (1981), "Interpreting
Econometric Evidence: The Behaviour of Consumers' Expenditure in
the UK", European Economic Review, 16, 177-192.
DAVIDSON, J. E. H., HENDRY, D. F., SRBA, F. and YEO, S. (1978),
"Econometric Modelling of the Aggregate Time-Series Relationship
between Consumers' Expenditure and Income in the United King- dom",
The Economic Journal, 88, 661-692.
DAVIES, N., TRIGGS, C. M. and NEWBOLD, P. (1977), "Significance
Levels of the Box-Pierce Portmanteau Statistic in Finite Samples",
Biometrika, 64, 3, 517-522.
DURBIN, J. (1970), "Testing for Serial Correlation in
Least-Squares Regression when some of the Regressors are Lagged
Dependent Variables", Econometrica, 38, 410-421.
EVANS, G. B. A. and SAVIN, N. E. (1982), "Conflict Among the
Criteria Revisited: the W, LR and LM Tests", Econometrica, 50,
737-748.
GODFREY, L. G. (1978), "Testing Against General Autoregressive
and Moving Average Error Models when the Regressors include Lagged
Dependent Variables", Econometrica, 46, 1293-1302.
HARVEY, A. C. (1981), The Econometric Analysis of Time Series
(Oxford: Philip Allan.) HENDRY, D. F. (1979), "The Behaviour of
Inconsistent Instrumental Variables Estimators in Dynamic
Systems
with Autocorrelated Errors", Journal of Econometrics, 9,
295-314. HENDRY, D. F. (1980), "Predictive Failure and Econometric
Modelling in Macroeconomics: The Transaction
Demand for Money" in Ormerod, P. (ed) Modelling the Economy
(London: Heinemann Educational Books.)
HENDRY, D. F. and MIZON, G. E. (1978), "Serial Correlation as a
Convenient Simplification, not a Nuisance: A Comment on a Study of
the Demand for Money by the Bank of England", The Economic Journal,
88, 549-563.
HENDRY, D. F. and RICHARD, J-F. (1982), "On the Formulation of
Empirical Models in Dynamic Econometrics", Journal of Econometrics,
20, 3-33.
HENDRY, D. F. and TRIVEDI, P. K. (1972), "Maximum Likelihood
Estimation of Difference Equations with Moving Average Errors: A
Simulation Study", Review of Economic Studies, 39, 117-145.
LJUNG, G. M. and BOX, G. E. P. (1978), "On a Measure of Lack of
Fit in Time-Series Models", Biometrika, 65, 297-303.
MARDIA, K. V. and ZEMROCH, P. J. (1978) Tables of the F- and
Related Distributions with Algorithms (London: Academic Press).
MCLEOD, A. I. and HIPEL, K. W. (1978), "Simulation Procedures
for Box-Jenkins Models", Water Resources Research, 14, 969-975.
MIZON, G. E. (1977a), "Model Selection Procedures", in Artis, M.
J. and Nobay, A. R. (eds) Studies in Modern Economic Analysis
(Oxford: Basil Blackwell).
MIZON, G. E. (1977b), "Inferential Procedures in Nonlinear
Models: An Application in a UK Cross Section Study of Factor
Substitution and Returns to Scale", Econometrica, 45,
1221-1242.
MIZON, G. E. and HENDRY, D. F. (1980), "An Empirical Application
and Monte Carlo Analysis of Tests of Dynamic Specification", Review
of Economic Studies, 47, 21-45.
PAGAN, A. R. and HALL, A. D. (1983), "Diagnostic Tests as
Residual Analysis (with discussion)", Econometric Reviews, 2,
159-254.
PIERCE, D. A. (1971), "Distribution of Residual Autocorrelation
in the Regressioin Model with Autoregress- ive-Moving Average
Errors", Journal of the Royal Statistical Society, Series B, 33,
140-146.
SAVIN, N. E. (1976), "Conflict Among Testing Procedures in a
Linear Regression Model with Autoregressive Disturbances",
Econometrica, 44, 1303-1315.
SPENCER, B. G. (1975), "The Small Sample Bias of Durbin's Test
for Serial Correlation", Journal of Econometrics, 3, 249-254.
WILSON, A. L. (1978), "When is the Chow Test UMP?", The American
Statistician, 32, 66-68.
Article
Contentsp.241p.242p.243p.244p.245p.246p.247p.248p.249p.250p.251p.252p.253p.254p.255p.256p.257p.258p.259p.260p.261
Issue Table of ContentsThe Review of Economic Studies, Vol. 53,
No. 2 (Apr., 1986), pp. 171-308Front MatterEditorial
[pp.171-172]Announcement [p.173]Bertrand-Edgeworth Oligopoly in
Large Markets [pp.175-204]The Dynamic Effects of Tax Law
Asymmetries [pp.205-225]A Complete Characterization of ARMA
Solutions to Linear Rational Expectations Models [pp.227-239]On the
Rigour of Some Misspecification Tests for Modelling Dynamic
Relationships [pp.241-261]Complete Consistency: A Testing Analogue
of Estimator Consistency [pp.263-269]Disappointment and Dynamic
Consistency in Choice under Uncertainty [pp.271-282]Rational
Expectations and Price Rigidity in a Monopolistically Competitive
Market [pp.283-292]A Note on Commodity Taxation: The Choice of
Variable and the Slutsky, Hessian and Antonelli Matrices (SHAM)
[pp.293-299]Pricing Optimal Distributions to Overlapping
Generations: A Corollary to Efficiency Pricing [pp.301-306]An
Improved Bound for Approximate Equilibria [pp.307-308]Back
Matter