7/30/2019 Slide Chapter 2
1/137
Chapter 2 Stationary Time Series Models
This chapter develops the Box-Jenkins Method-
ology for estimating time series models of the
form
yt = a0+a1yt1+. . .+apytp+t+1t1+. . .+qtq
which are called autoregressive integrated mov-
ing average (ARIMA) models.
The chapter has three aims:
1. Present the theory of stochastic linear differ-
ence equations and consider time series proper-
ties of stationary ARIMA models; a stationaryARIMA model is called an autoregressive mov-
ing average (ARMA) model.
1
7/30/2019 Slide Chapter 2
2/137
2. Develop tools used in estimating ARMA mod-
els. Especially useful are the autocorrelationfunctions (ACF) and partial autocorrelation func-
tions (PACF).
3. Consider various test statistics to check for
model adequacy and show how a properly es-timated model can be used for forecasting.
1. Stochastic Difference Equation ModelsStochastic difference equations are a conve-
nient way of modeling dynamic economic pro-cesses. To take a simple example, suppose the
Federal Reserves money supply target grows
3% each period. Hence,
mt = 1.03mt1 (1)
so that, given the initial condition m
0, the par-ticular solution is
mt = (1.03)tm0
2
7/30/2019 Slide Chapter 2
3/137
where mt =the logarithm of the money supply
target in period t, m0=the logarithm of moneysupply in period 0.
Of course, the actual money supply, mt, and
the target money supply, mt , need not be equal.
Suppose that at the beginning of period
t there are mt1 dollars so that the gap
between the target and the actual money
supply is mt mt1
Suppose that Fed cannot perfectly control
the money supply but attempts to change
the money supply by % of any gap be-
tween the desired and actual money supply
We can model this behavior as
mt = [mt mt1] + t
3
7/30/2019 Slide Chapter 2
4/137
Using (1), we obtain
mt = (1.03)tm0 + (1 )mt1 + t (2)
where t
is the uncontrollable portion of the
money supply, and we assume its mean is
zero in all time periods.
Although the model is overly simple, it does
illustrate the key points:
1. Equation (2) is a discrete difference equa-
tion. Since {t} is stochastic, the money
supply is stochastic; we call (2) a linear
stochastic difference equation.
4
7/30/2019 Slide Chapter 2
5/137
2. If we knew the distribution of{t}, we could
calculate the distribution for each element in
the {mt} sequence. Since (2) shows how the
realizations of the {mt} sequence are linked
across time, we would be able to calculate the
various joint probabilities. We note that the
distribution of the money supply sequence is
completely determined by the parameters of
the difference equation (2) and the distribu-
tion of the {t} sequence.
3. Having observed the first t observations in
the {mt} sequence, we can make forecasts of
mt+1, mt+2, . . . ,. For example, updating (2)
by one period and taking the conditional ex-pectation, the forecast of mt+1 is: Etmt+1 =
(1.03)t+1m0 + (1 )mt.
5
7/30/2019 Slide Chapter 2
6/137
7/30/2019 Slide Chapter 2
7/137
A white noise process can be used to construct
more interesting time series processes. For ex-
ample, the time series
xt =q
i=0
iti (3)
is constructed by taking the values t,
ti, . . . ,
tqand multiplying each by the associated value of
i.
A series formed in this manner is called a
moving average of order q
It is denoted by MA(q)
Although the sequence
{t}is a white noise
process, the sequence {xt} will not be a
white noise process if two or more of the
i are different from zero
7
7/30/2019 Slide Chapter 2
8/137
To illustrate using an MA(1) process, set 0 =
1, 1 = 0.5, and all other i = 0. Then
E(xt) = E(t + 0.5t1) = 0
var(xt) = var(t + 0.5t1) = 1.252
E(xt) = E(xts) and var(xt) = var(xts)for all s
Hence, the first two conditions for {xt} to be
a white noise are satisfied. However,
E(xtxt1) = E[(t+0.5t1)(t1+0.5t2)]= E[tt1+0.5(t1)
2+0.5tt2+0.25t1t2]= 0.52
Given that there exists a value of s = 0such that E(xtxts) = 0,the sequence {xt}is not a white noise process
8
7/30/2019 Slide Chapter 2
9/137
2. ARMA ModelsIt is possible to combine a moving average pro-
cess with a linear difference equation to obtain
an autoregressive moving average model. Con-
sider the p-th order difference equation:
yt = a0 +p
i=1
aiyti + xt. (4)
Now let {xt} be the MA(q) process given by
(3) so that we can write
yt = a0 +
pi=1
aiyti +
qi=0
iti (5)
where by convention we normalize 0 to unity.
If the characteristic roots of (5) are all in
the unit circle, then yt is said to follow anautoregressive moving average (ARMA)
model
9
7/30/2019 Slide Chapter 2
10/137
The autoregressive part of the model is the
difference equation given by the homoge-
neous portion of (4) and the moving aver-
age part is the xt sequence
If the homogeneous part of the difference
equation contains p lags and the model
for xt contains q lags, the model is called
ARMA(p,q) model
If q = 0, the model is a pure autoregressivemodel denoted by AR(p)
If p = 0, the model is a pure moving aver-
age model denoted by MA(q)
In an ARMA model, it is permissible to al-
low p and/or q to be infinite
10
7/30/2019 Slide Chapter 2
11/137
If one or more characteristic roots of (5)
are greater than or equal to unity, the {yt}
sequence is called an integrated process
and (5) is called an autoregressive inte-
grated moving average (ARIMA) model
This chapter considers only models in which
all of the characteristic roots of (5) are
within the unit circle
Treating (5) as a difference equation suggests
that yt can be solved in terms of {t} sequence
The solution of an ARMA(p,q) model ex-
pressing yt in terms of the {t} sequence is
the moving average representation of yt
11
7/30/2019 Slide Chapter 2
12/137
For the AR(1) model: yt = a0 +a1yt1 +t,the moving average representation can be
shown to be
yt = a0/(1 a1) +
i=0ai1ti
For the general ARMA(p,q) model, using
the lag operator L, (5) can be rewritten as
(1 p
i=1 aiLi)yt = a0 +
q
i=0 itiso the particular solution for yt is
yt = (a0 +q
i=0
iti)/(1 p
i=1
aiLi) (6)
The expansion of (6) yields an MA() pro-
cess
12
7/30/2019 Slide Chapter 2
13/137
Issue: whether the expansion is convergent
so that the stochastic difference given by
(6) is stable
Will see in the next section, the stability
condition is that the roots of the polyno-
mial (1 p
i=1 aiLi) must lie outside the
unit circle
Will also see that, if yt is a linear stochastic
difference equation, the stability condition
is a necessary condition for the time series
{yt} to be stationary
3. Stationarity
Suppose the quality control division of amanufacturing firm samples four machines
each hour. Every hour, quality control finds
the mean of the machines output levels.
13
7/30/2019 Slide Chapter 2
14/137
The plot of each machines hourly output is
shown in Figure 2.1. If yit represents machineyis output at hour t, the means (yt) are readily
calculated as
yt =4
i=1
yit/4.
For hours 5, 10, and 15, these mean values are4.61, 5.14, and 5.03, respectively.
The sample variance for each hour can similarly
be constructed.
Unfortunately, we do not usually have the
luxury of being able to obtain an ensem-
ble, that is, multiple observations of the
same process over the same time period
Typically, we observe only one set of re-
alizations, that is, one observation, of a
process, over a given time period
14
7/30/2019 Slide Chapter 2
15/137
Fortunately, if {yt} is a stationary series,
the mean , variance, and autocorrelationscan be well approximated by sufficiently
long time averages based on the single
set of realizations
Suppose you observed the output of machine
1 for 20 periods. If you knew that the out-
put was stationary, you could approximate the
mean level of output by
yt
20i=1
yit/20.
In using this approximation you would be as-
suming that the mean was the same for each
period. Formally, a stochastic process having
a finite mean and variance is covariance sta-
tionary if for all t and t s,
Mean(yt) = Mean(yts) =
E(yt) = E(yts) = (7)
15
7/30/2019 Slide Chapter 2
16/137
V ar(yt) = V ar(yts) = 2y
E[(yt )2] = E(yts )
2] = 2y (8)
Cov(yt, yts) = Cov(ytjytjs) = s
E[(yt )(yts )]
= E(ytj )(ytjs )] = s (9)
where , 2y , and s are all constants. (For s =
0, (8) and (9) are identical, so 0 equals the
variance of yt.)
To reiterate, a time series is covariance sta-
tionary if its mean and all autocovariances
are unaffected by a change in time origin
A covariance stationary process is alsoreferred to as a weakly stationary, second-
order stationary, or wide-sense station-
ary process
16
7/30/2019 Slide Chapter 2
17/137
A strongly stationary process need not have
a finite mean and/or variance
In our course, we consider only covariancestationary series. So there is no ambiguityin using the terms stationary and covari-ance stationary interchangeably
In multivariate models, the term autoco-variance is reserved for the covariance be-tween yt and its own lags
In univariate time series models, there isno ambiguity and the terms autocovarianceand covariance are used interchangeably
For a covariance stationary series, we can de-
fine the autocorrelation between yt and ytsas
s s/0
where s and 0 are defined by (9).
17
7/30/2019 Slide Chapter 2
18/137
Since s and 0 are time-independent, theautocorrelation coefficients s are also time-
independent
Although the autocorrelation between yt
and yt1 can differ from the autocorrela-tion between yt and yt2, the autocorrela-
tion between yt and yt1 must be identical
to that between yts and yts1
Obviously, 0 = 1
Stationarity Restrictions for an AR(1) Model
Let
yt = a0 + a1yt1 + t
where t is a white noise.
18
7/30/2019 Slide Chapter 2
19/137
Case: y0 known
Suppose the process started in period zero, sothat y0 is a deterministic initial condition. The
solution to this equation is
yt = a0
t1
i=0ai1 + a
t1y0 +
t1
i=0ai1ti. (10)
Taking expected value of (10), we obtain
Eyt = a0
t1i=0
ai1 + at1y0. (11)
Updating by s periods yields
Eyt+s = a0t+s1
i=0
ai1 + at+s1 y0. (12)
Comparing (11) and (12), it is clear that
both means are time-dependent
Since Eyt = Eyt+s, the sequence cannot
be stationary
19
7/30/2019 Slide Chapter 2
20/137
However, if t is large, we can consider the
limiting value of yt in (10)
If |a1| < 1, then at1y0 converges to zero
as t becomes infinitely large and the sum
a0[1 + a1 + (a1)2
+ (a1)3
+ . . .] convergesto a0/(1 a1)
Thus, if |a1| < 1, as t , we have
lim yt =a
01 a1+
i=0
ai1ti. (13)
Now take expectations of (13)
Then we have, for sufficiently large values
of t, Eyt = a0/(1 a1), since E(ti) = 0
for all i
20
7/30/2019 Slide Chapter 2
21/137
Thus, the mean value of yt is finite and
time-independent:
Eyt = Eyts = a0/(1 a1) for all t.
Turning to the variance, we find
E(yt )2
= E[(t + a1t1 + (a1)2t2 + . . .)
2]
= 2[ 1 + (a1)2 + (a1)
4 + . . .]
= 2/[1 (a1)2]
which is also finite and time-independent
Finally, the limiting values of all autoco-variances, s, s = 0, 1, 2, . . ., are also finiteand time-independent:
s = E[(yt )(yts )]
= E{[t + a1t1 + (a1)2t2 + . . .][ts + a1ts1 + (a1)
2ts2 + . . .]}
= 2(a1)s[ 1 + (a1)
2 + (a1)4 + . . .]
= 2(a1)s/[1 (a1)
2]
(14)
21
7/30/2019 Slide Chapter 2
22/137
Case: y0 unknown
Little would change were we not given the ini-tial condition. Without the initial value yo,
the sum of the particular solution and homo-
geneous solution for yt is
yt = a0/(1 a1) +
i=0
ai
1
ti
particular solution
+ A(a1)t
homogeneous solution
(15)
where A= an arbitrary constant = deviation
from long-run equilibrium.
If we take the expectation of (15), it is
clear that the {yt} sequence cannot be sta-
tionary unless the particular solution A(a1)t
is equal to zero
Either the sequence must have started in-
finitely long ago (so that at1 = 0) or the
arbitrary constant A must be zero
22
7/30/2019 Slide Chapter 2
23/137
Thus, we have the stability conditions:
The homogeneous solution must be zero.
Either the sequence must have started in-
finitely far in the past or the process must
always be in equilibrium (so that the arbi-
trary constant is zero).
The characteristic root a1 must be less
than unity in absolute value. These two
conditions readily generalize to all ARMA(p,q)
processes. The homogeneous solution to
(5) has the form
pi=1
Aiti
or if there are m repeated roots,
m
i=1
Aiti +
pi=m+1
Aiti
23
7/30/2019 Slide Chapter 2
24/137
where the Ai are arbitrary constants, is the
repeated root, and i are the distinct roots.
If any portion of the homogeneous equa-
tion is present, the mean, variance, and al
covariances will be time-dependent
Hence, for any ARMA(p,q) model, station-
arity necessitates that the homogeneous
solution be zero
The next section addresses stationarity restric-
tions for the particular solution.
4. Stationarity Restrictions for an
ARMA(p,q) ModelAs a prelude to the stationarity conditions for
the general ARMA(p,q) model, first consider
the stationarity conditions for an ARMA(2,1)
24
7/30/2019 Slide Chapter 2
25/137
model. Since the magnitude of the intercept
term does not affect the stability (or station-
arity) condition, set a0 = 0 and write
yt = a1yt1 + a2yt2 + t + 1t1. (16)
From the previous section, we know that the
homogeneous solution must be zero. So itis only necessary to find the particular solu-
tion. Using the method of undetermined co-
efficients, we can write the challenge solution
as
yt =
i=0
iti. (17)
For (17) to be a solution of (16), the various
i must satisfy
0t + 1t1 + 2t2 + 3t3 + . . .
= a1(0t1 + 1t2 + 2t3 + 3t4 + . . .)
+a2(0t2 + 1t3 + 2t4 + 3t5 + . . .)
+t + 1t1.
25
7/30/2019 Slide Chapter 2
26/137
Matching the coefficients of t, t1, t2, ,
yields1. 0 = 1
2. 1 = a10 + 1 1 = a1 + 13. i = a1i1 + a2i2 for all i 2.
The key point is that for i 2, the coeffi-cients must satisfy the difference equation
i = a1i1 + a2i2
If the characteristic roots of (16) are within
the unit circle, then the {i} must consti-tute a convergent sequence
To verify that the {i} sequence generated
by is stationary, take the expectation of
(17) and note that Eyt = Eyti = 0 for all
t and i
Hence, the mean is finite and time-invariant
26
7/30/2019 Slide Chapter 2
27/137
Since the {t} sequence is assumed to be white
noise process, the variance ofyt is constant and
time-independent:
V ar(yt)
= E[(0t + 1t1 + 2t2 + 3t3 + . . .)2
]
= 2
i=0
2i
V ar(yts)= E[(0ts + 1ts1 + 2ts2
+3ts3 + . . .)2]
= 2
i=0
2i
Hence, V ar(yt) = V ar(yts) for all t and s
27
7/30/2019 Slide Chapter 2
28/137
Finally, note,
Cov(yt, yt1)
= E[(t + 1t1 + 2t2 + 3t3 + . . .)
(t1 + 1t2 + 2t3 + 3t4 + . . .)]
=
2
(1 + 21 + 32 + )
Cov(yt, yt2)
= E[(t + 1t1 + 2t2 + 3t3 + . . .)
(t2 + 1t3 + 2t4 + 3t5 + . . .)]
= 2(2
+ 3
1
+ 4
2
+ . . .)
From the above pattern, it is clear that the
s-th autocovariance, s, is given by
s = Cov(yt, yts)
= 2(s + s+11 + s+22 + . . .)(18)
Thus, the s-th autocovariance, s, is con-
stant and independent of t
28
7/30/2019 Slide Chapter 2
29/137
Conversely, if the characteristic roots of
(16) do not lie within the unit circle, the
{i} sequence will not be convergent, and
hence, the {yt} sequence cannot be con-
vergent
Stationarity Restrictions for the
Moving Average Coefficients
Next, we look at the conditions ensuring
the stationarity of a pure MA() process:
xt =
i=0
iti
where t W N(0, 2). We have already
determined that {xt} is not a white noise
process; now the issue is whether {xt} is co-
variance stationary. Given conditions (7),(8), and (9), we ask the following:
29
7/30/2019 Slide Chapter 2
30/137
1. Is the mean finite and time-independent?
E(xt) = E(t + 1t1 + 2t2 + . . .)
= Et + 1Et1 + 2Et2 + . . .
= 0
Repeating the calculation with xts, we obtain
E(xts) = E(ts + 1ts1 + 2ts2 + . . .)
= Ets + 1Ets1 + 2Ets2 + . . .
= 0
Hence, all elements in the {xt} sequence havethe same finite mean ( = 0).
2. Is the variance finite and time-independent?
V ar(xt)
= E[(t + 1t1 + 2t2 + . . .)2
]= E(t)
2 + (1)2E(t1)
2 + (2)2E(t2)
2 + . . .
[since Etts = 0 for s = 0]
= 2[ 1 + (1)2 + (2)
2 + . . .]
30
7/30/2019 Slide Chapter 2
31/137
Therefore, a necessary condition for V ar(xt)
to be finite is that i=0(i)2 be finite.
Repeating the calculation with xts yields
V ar(xts)
= E[(ts + 1ts1 + 2ts2 + . . .)2]
= E(ts)2 + (1)
2E(ts1)2
+(2)2E(ts2)
2 + . . .
[since Etstsi = 0 for i = 0]
= 2[ 1 + (1)2 + (2)
2 + . . .]
Thus, if
i=0(i)2 is finite, then V ar(xt) =
V ar(xts) for all t and t s, and hence, all
elements in the {xt} sequence have the same
finite variance.
3. Are all autocovariances finite and time-
independent?
31
7/30/2019 Slide Chapter 2
32/137
The s-th autocovariance, s, is given by
s = Cov(xt, xts)
= E(xtxts)
= E(t + 1t1 + 2t2 + . . .)
(ts + 1ts1 + 2ts2 + . . .)
= 2
(s + s+11 + s+22 + . . .)Therefore, for s to be finite, the sum
s + s+11 + s+22 + . . ., must be finite.
In summary, the necessary and sufficient con-ditions for an MA() process to be stationary
are that the sums
(i) 20 + 21 +
22 + . . ., and
(ii) s + s+11 + s+22 + . . .
be finite.
However, since (ii) must hold for all values of
s 0, and 0 = 1, condition (i) is redundant.
32
7/30/2019 Slide Chapter 2
33/137
Stationarity Restrictions for the
Moving Average CoefficientsNow consider the pure autoregressive model oforder p:
yt = a0 +p
i=1
aiyti + t. (19)
If the characteristic roots of the homogeneousequation of (19) all lie inside the unit circle,we can write the particular solution as
yt =a0
1
pi=1 ai
+ +
i=0
iti (20)
where 0 = 1 a n d {i, i 1} are undeter-mined coefficients. We know that (20) is aconvergent sequence so long as the character-istic roots of (19) are inside the unit circle. Wealso know that the sequence {i} will solve thedifference equation
i a1i1 a2i2 . . . apip = 0. (21)
If the characteristic roots of (21) are all in-side the unit circle, the {i} sequence will beconvergent.
33
7/30/2019 Slide Chapter 2
34/137
Although (20) is an infinite-order moving av-erage process, the convergence of the MA co-
efficients implies that
i=0 2i is finite. Thus,
we can use (20) to check the three condiions
of stationarity.
Eyt = Eyts =a
01
pi=1 ai
A necessary condition for all characteristic roots
to lie inside the unit circle is 1 p
i=1 ai > 0.
Hence, the mean of the sequence is finite and
time-invariant.
V ar(yt)
= E[(t + 1t1 + 2t2 + . . .)2]
= E(t)2 + (1)
2E(t1)2 + (2)
2E(t2)2 + . . .
= 2[ 1 + (1)2 + (2)2 + . . .]
= 2
i=0
2i
34
7/30/2019 Slide Chapter 2
35/137
Similarly,
V ar(yts)
= E[(ts + 1ts1 + 2ts2 + . . .)2]
= E(ts)2 + (1)
2E(ts1)2 +
(2)2E(ts2)
2 + . . .
= 2[ 1 + (1)2 + (2)
2 + . . .]
= 2
i=0
2i
Thus, if
i=0(i)2 is finite, then V ar(yt) =
V ar(yts) for all t and t s, and hence, allelements in the {yt} sequence have the samefinite variance.
Finally, let us look at the s-th autocovariance,s, which is given by
s = Cov(yt, yts)
= E(ytyts)= E(t + 1t1 + 2t2 + . . .)
(ts + 1ts1 + 2ts2 + . . .)
= 2(s + s+11 + s+22 + . . .)
35
7/30/2019 Slide Chapter 2
36/137
Therefore, for s to be finite, the sum
s + s+11 + s+22 + . . ., must be finite.
Nothing of substance is changed by combining
the AR(p) and MA(q) models into the general
ARMA(p,q) model:
yt = a0 +p
i=1
aiyti + xt
xt =q
i=0
iti. (22)
If the roots of the inverse characteristic equa-tion lie outside the unit circle[that is, if the
roots of the homogeneous form of (22) lie in-
side the unit circle] and if the {xt} sequence is
stationary, the {yt} sequence will be stationary.
Consider
yt =a0
1 p
i=1 ai+
t
1 p
i=1 aiLi
+1t1
1p
i=1 aiLi +
2t21
pi=1 aiL
i + . . . (23)
36
7/30/2019 Slide Chapter 2
37/137
Each of the expressions on the right-hand
side of (23) is stationary as long as theroots of 1
pi=1 aiL
i are outside the unitcircle
Given that {xt} is stationary, only the rootsof the autoregressive portion of (22) deter-mine whether the {yt} sequence is station-ary
5. The Autocorrelation FunctionThe autocovariances and autocorrelations of
the type found in (18) serve as useful toolsin the Box-Jenkins approach to identifying andestimating time series models. Illustrated be-low are four important examples: the AR(1),AR(2), MA(1), and ARMA(1,1) models.
The Autocorrelation Function of an AR(1)ProcessFor an AR(1) model, yt = a0 +a1yt1 +t, (14)shows
37
7/30/2019 Slide Chapter 2
38/137
0 =2
[1 (a1)2]
s =2(a1)
s
[1 (a1)2].
Now dividing s by 0, gives autocorrelationfunction (ACF) at lag s: s =s0
. Thus, we
find that,
0 = 1, 1 = a1, 2 = (a1)2, . . . , s = (a1)
s.
A necessary condition for an AR(1) process
to be stationary is that |a1| < 1
Thus, the plot of s against s - called the
correlogram - should converge to zero ge-
ometrically if the series is stationary
38
7/30/2019 Slide Chapter 2
39/137
If a1 is positive, convergence will be direct,
and if a1 is negative, the correlogram willfollow a damped oscillatory path around
zero
The first two graphs on the left-hand side
of Figure 2.2 show the theoretical auto-correlation function for a1 = 0.7 and a1 =
0.7, respectively
In these diagrams 0 is not shown since its
value is necessarily equal to one
The Autocorrelation Function of an AR(2)
Process
We now consider AR(2) process yt = a1yt1 +
a2yt2+t (with a0 omitted since this intercept
term has no effect on the ACF. For the AR(2)
to be stationary, we know that it is necessary
to restrict the roots of the second-order lag
39
7/30/2019 Slide Chapter 2
40/137
polynomial (1 a1L a2L2) to be outside the
unit circle. In section 4, we derived the auto-covariances of an ARMA(2,1) process by useof the method of undetermined coefficients.Now we use an alternative technique known asYule-Walker equations. Multiply the second-order difference equation by yt, yt1, yt2, . . . , yts
and take expectations. This yields
Eytyt = a1Eyt1yt + a2Eyt2yt + EtytEytyt1 = a1Eyt1yt1 + a2Eyt2yt1 + Etyt1Eytyt2 = a1Eyt1yt2 + a2Eyt2yt2 + Etyt2
...
Eytyts = a1Eyt1yts + a2Eyt2yts + Etyts(24)
By definition, the autocovariances of a station-ary series are such that Eytyts = Eytsyt =Eytkytks = s. We also know that Etyt =2 and Etyts = 0. Hence, we can use equa-
tions (24) to formo = a11 + a22 +
2 (25)
1 = a1o + a21 (26)
s = a1s1 + a2s2 (27)
40
7/30/2019 Slide Chapter 2
41/137
Dividing (26) and (27) by 0 yields
1 = a1o + a21 (28)
s = a1s1 + a2s2 (29)
We know that 0 = 1. So, from (28), we have
1 = a1/(1 a2). Hence, we can find all s for
s 2 by solving the difference equation (29).
For example, for s = 2, and s = 3,
2 = (a1)2/(1 a2) + a2
3 = a1[(a1)2/(1 a2) + a2] + a2a1/(1 a2)
Given the solutions for 0 and 1, the keypoint to note is that the s all satisfy the
difference equation (29)
The solution may be oscillatory or direct
Note that the stationarity condition for ytnecessitates that the characteristic roots of
(29) lie inside the unit circle
41
7/30/2019 Slide Chapter 2
42/137
Hence, the {s} sequence must be conver-
gent
The correlogram for an AR(2) process must
be such that 0 = 1 and 1 be determined
by (28)
These two values can be viewed as the ini-
tial values for the second-order difference
equation (29)
The fourth panel on the left-hand side of
Figure 2.2 shows the ACF for the process
yt = 0.7yt1 0.49yt2 + t.
The properties of the various s follow di-
rectly from the homogeneous equation yt
0.7yt1 0.49yt2 = 0
42
7/30/2019 Slide Chapter 2
43/137
The roots are obtained as
= {0.7 [(0.7)2 4(0.49)]1/2}/2
Since the discriminantd = (0.7)2 4(0.49)is negative, the characteristic roots are imag-inary. So the solution oscillates
However, since a2 = 0.49, the solution isconvergent and the {yt} is stationary
Finally, we may wish to find the autoco-variances, s. Since we know all the auto-correlations, if we can find the variance ofyt, that is, 0, we can find all of the others.
Since i = i/0, from (25) we have0 = a1(10) + a2(20) + 2
0(1 a11 a22) = 2
0 =2
(1a11a22)
43
7/30/2019 Slide Chapter 2
44/137
Substituting for 1 and 2 yields0 = V ar(yt)
=
(1 a2)
(1 + a2)
2
(a1 + a2 1)(a2 a1 1)
.
The Autocorrelation Function of an MA(1)
Process
Next consider the MA(1) process: yt = t +
t1. Again, we can obtain the Yule-Walker
equations by multiplying yt by each yts, s =
0, 1, 2, . . . and take expectations. This yields
0 = V ar(yt) = Eytyt= E[(t + t1)(t + t1)] = (1 +
2)2
1 = Eytyt1= E[(t + t1)(t1 + t2)] =
2
...
s = Eytyts= E[(t + t1)(ts + ts1)] = 0 s > 1
44
7/30/2019 Slide Chapter 2
45/137
Dividing each s by 0, it can be seen that
the ACF is simply
0 = 1,
1 = (1 + 2), and
s
= 0 s > 1.
The third graph on the left-hand side of
Figure 2.2 shows the ACF for the MA(1)
process: yt = t 0.7t1
You saw above that for an MA(1) process,
s = 0 s > 1.
As an easy exercise, convince yourself that,
for an MA(2) process, s = 0 s > 2,
for an MA(3) process, s = 0 s > 3,
and so on.
45
7/30/2019 Slide Chapter 2
46/137
The Autocorrelation Function of an
ARMA(1,1) Process
Finally, consider the ARMA(1,1) process:
yt = a1yt1+t+1t1. Using the now-familiar
procedure, the Yule-Walker equations are:
Eytyt = a1Eyt1yt + Etyt + 1Et1yt
0 = a11 + 2 + 1(a1 + 1)
2 (30)
Eytyt1 = a1Eyt1yt1 + Etyt1 + 1Et1yt1
1 = a10 + 12 (31)
Eytyt2 = a1Eyt1yt2 + Etyt2 + 1Et1yt2
2 = a11 (32)...
Eytyts = a1Eyt1yts + Etyts + 1Et1yts
s = a1s1. (33)
Solving (30) and (31) simultaneously for 0and 1 yields
46
7/30/2019 Slide Chapter 2
47/137
0 =1 + 21 + 2a11
(1 a21)2, and
1 =(1 + a11)(a1 + 1)
(1 a21)2.
Hence,
1 =(1 + a11)(a1 + 1)
1 + 21 + 2a11(34)
and s = a1s1 for all s 2.
Thus, the ACF for an ARMA(1,1) process is
such that the magnitude of 1 depends on both
a1 and 1. Beginning with this value of 1,
the ACF of an ARMA(1,1) process looks like
that of the AR(1) process. If 0 < a1 < 1,
convergence will be direct, and if 1 < a1 < 0,
the autocorrelations will oscillate. The ACFfor the function yt = 0.7yt1 + t 0.7t1 is
shown as the last graph on the left-hand side
of Figure 2.2.
47
7/30/2019 Slide Chapter 2
48/137
From the above you should be able to able
to recognize that the correlogram can re-
veal the pattern of the autoregressive co-
efficients
For an ARMA(p,q) model beginning after
lag q, the values of i will satisfy
i = a1i1 + a2i2 + . . . + apip.
6. The Partial Autocorrelation Function
In an AR(1) process, yt
and yt2
are cor-
related even though yt2 does not directly
appear in the model
48
7/30/2019 Slide Chapter 2
49/137
7/30/2019 Slide Chapter 2
50/137
The most direct way to find the partial au-
tocorrelation function is to first form theseries yt by subtracting the mean of theseries (i.e., ) from each observation toobtain yt yt
Next, form the first-order autoregression
yt = 11yt1 + et
where et is the regression error term whichneed not be a white noise process
Since there is no intervening values, 11is both the autocorrelation and the partialautocorrelation between yt and yt1
Now form the second-order autoregression
y
t=
21y
t1+
22y
t2+ et
Here 22 is the partial autocorrelation co-efficient between yt and yt2
50
7/30/2019 Slide Chapter 2
51/137
In other words, 22 is the correlation be-
tween between yt and yt2 controlling for(i.e., netting out) the effect of yt1
Repeating the process for all additional lags
s yields the partial autocorrelation function
(PACF)
Using Yule-Walker equations, one can form
the partial autocorrelations from the auto-
correlations as
11 = 1 (35)
22 =(2
21)
(1 21)(36)
and for additional lags,
ss =
s s1j=1 s1,jsj
1 s1
j=1 s1,jj(37)
where sj = s1,j sss1,sj,
j = 1, 2, 3, . . . , s 1.
51
7/30/2019 Slide Chapter 2
52/137
For an AR(p) process, there is no direct
correlation between yt and yts for s > p
Hence, for s > p, all values of ss will be
zero and the PACF for pure AR(p) should
cut off to zero for all lags greater than p
In contrast,The PACF of an MA(1) pro-
cess: yt = t + t1
As long as = 1, we can write yt/(1 +L) = t, which we know has the AR()
representation
yt yt1 2yt2
3yt3 + . . . = t.
Therefore, the PACF will not jump to zero
since yt will be correlated with all of its own
lags
52
7/30/2019 Slide Chapter 2
53/137
Instead, the PACF coefficients exhibit a ge-
ometrically decaying pattern
If < 0, decay is direct and if > 0, the
PACF coefficients oscillate
The right-hand side of the fifth panel in
Figure 2.2 shows the PACF for the ARMA(1,1)
model:
yt = 0.7yt1 + t 0.7t1
More generally, the PACF of a stationary
ARMA(p, q) process must ultimately decay
toward zero beginning at lag p
The decay pattern depends on the coef-
ficients of the lag polynomial (1 + 1L +
2L2 + . . . + qLq)
53
7/30/2019 Slide Chapter 2
54/137
Table 2.1 summarizes some of the properties
of the ACF and PACF for various ARMA pro-cesses. For stationary processes, the key points
to note are the following:
The ACF of an ARMA(p, q) process will
begin to decay after lag q. After lag q,the coefficients of the ACF (ie., the i)
will satisfy the difference equation (i =
a1i1 + a2i2 + . . . + apip. Since the
characteristic roots are inside the unit cir-
cle, the autocorrelations will decay after
lag q. Moreover, the pattern of the auto-correlation coefficients will mimic that sug-
gested by the characteristic roots.
The PACF of an ARMA(p, q) process will
begin to decay after lag p. After lag p, thecoefficients of the PACF (ie., the ss) will
mimic the ACF coefficients from the model
yt/(1 + 1L + 2L2 + . . . + qLq).
54
7/30/2019 Slide Chapter 2
55/137
We can illustrate the usefulness of the ACFand PACF functions using the model yt = a0 +
0.7yt1 + t. If we compare the top two graphs
in Figure 2.2, the ACF shows the monotonic
decay of the autocorrelations while the PACF
exhibits the single spike at lag 1. Suppose aresearcher collected sample data and plotted
the ACF and PACF functions. If the actual
patterns compared favorably to the theoreti-
cal patterns, the researcher might try to fit an
AR(1) model. Conversely, if the ACF exhibited
a single spike and the PACF exhibited mono-
tonic decay the researcher might try an MA(1)
model.
7. Sample Autocorrelations of Stationary
Time series
Let there be T observations y1, y2, . . . , yT. If
the data series is stationary, we can use the
sample mean y, sample variance 2, and
55
7/30/2019 Slide Chapter 2
56/137
sample autocorrelations rs, as estimates of the
population mean , population variance 2,
and population autocorrelations s, respectively,
where
y = (1/T)
Tt=1
yt (38)
2 = (1/T)T
t=1
(yt y)2 (39)
and for s = 1, 2, . . . ,
rs =
Tt=s+1(yt y)(yts y)T
t=1(yt y)2
. (40)
The sample ACF and PACF can be compared
to theoretical ACF and PACF to identify the
actual data generating process. If the truevalue ofs = 0, that is, the true data-generating
process is MA(s-1), the sampling variance of
56
7/30/2019 Slide Chapter 2
57/137
rs is given by
V ar(rs) = T1 for s = 1
= T1( 1 + 2s1
j=1
r2j ) for s > 1(41)
It T is large, rs is distributed normally with
mean zero. For the PACF coefficients, underthe null hypothesis of an AR(p) model, that
is, under the null that all p+i,p+i are zero, the
variance of p+i,p+i is approximately T1.
We can test for significance of sample ACF
and sample PACF using (41). For example, if
we use a 95% confidence interval, (i.e., 2 stan-
dard deviations), and the calculated value of r1exceeds 2T1/2, it is possible to reject the null
hypothesis that the first-order autocorrelation
is not statistically significantly different fromzero. Rejecting this hypothesis means reject-
ing an MA(s 1) = MA(0) process and ac-
cepting the alternative q > 0. Next, try s = 2.
57
7/30/2019 Slide Chapter 2
58/137
Then Var(r2) = (1 + 2r2
1)/T. If r1 = 0.5 andT = 100, then Var(r2) = 0.015 and SD(r2)
= 0.123. Thus, if the calculated value of r2exceeds 2(0.123), it is possible to reject the
null hypothesis, H0 : 2 = 0. Again, rejecting
the null means accepting the alternative that
q > 1. Proceeding in this way it is possible to
identify the order of the process.
Box and Pierce (1970) developed the Q-
statistic to test whether a group of auto-correlations is significantly different from
zero. Under the null hypothesis H0 : 1 =
2 = . . . = s = 0, the statistic
Q = Ts
k=1r2k
is asymptotically distributed as a 2 with s
degrees of freedom
58
7/30/2019 Slide Chapter 2
59/137
The intuition behind the use of this statis-
tic is that large sample autocorrelations will
lead to large values of Q, while a white
noise process (in which autocorrelations at
all lags should be zero) would have a Q
value of zero
Thus if the calculated value of Q exceeds
the appropriate value in a 2 table, we can
reject the null of no significant autocorre-
lations
Rejecting the null means accepting an al-
ternative that at least one autocorrelation
is non-zero
A problem with the Box-Pierce Q-statistic:
it works poorly even in moderately large
samples
59
7/30/2019 Slide Chapter 2
60/137
Remedy: Modified Q-statistic of Ljung and
Box (1978):
Q = T(T + 2)s
k=1r2k /(T k) (42)
If the sample value of Q from (42) exceeds
the critical value of 2 with s degrees of
freedom, then at least one value of rk is
statistically significantly different from zero
at the specified significance level
The Box-Pierce and Ljung-Box Q-statistics
also serve as a check to see if the residu-
als from an estimated ARMA(p, q) modelbehave as a white noise process
60
7/30/2019 Slide Chapter 2
61/137
However, when the
sautocorrelations from
an estimated ARMA(p, q) model are formed,
the degrees of freedom are reduced by the
number of estimated coefficients
Hence, using the residuals of an ARMA(p, q)model, Q has a 2 distribution with spq
degrees of freedom (if a constant is in-
cluded in the estimation, the degrees of
freedom are s p q 1)
Model Selection Criteria
A natural question to ask of any estimated
model is: How well does it fit the data?
The larger the lag orders p and/or q, the
smaller is the sum of squares of the esti-
mated residuals of the fitted model
61
7/30/2019 Slide Chapter 2
62/137
However, adding such lags entails estima-
tion of additional coefficients and an asso-ciated loss of degrees of freedom
Moreover, inclusion of extraneous coeffi-cients will reduce the forecasting perfor-
mance of the fitted model
Thus, increasing the lag lengths p and/orq, involves both benefits and costs
If we choose a lag order that is lower thannecessary, we will omit valuable informa-tion contained in the more distant lags, andthus, will underfit the model
If we choose a lag order that is higher than
necessary, we will overfit the model andestimate extraneous coefficients and injectadditional estimation error into our fore-casts
62
7/30/2019 Slide Chapter 2
63/137
Model selection criteria attempt to choose
the most parsimonious model by select-ing the lag orders p and/or q by balancingthe benefit of reduced sum of squares ofestimated residuals due to additional lagsagainst the cost of additional estimationerror
Two most commonly used model selec-tion criteria are Akaike Information Crite-rion (AIC) and Schwartz Bayesian Criterion(SBC).
AIC = T ln (SSR) + 2n
SBC = T ln (SSR) + n ln(T)
where n = number of parameters estimated(p + q+possible constant term),
T = number of observations,SSR = sum of squared residuals.
63
7/30/2019 Slide Chapter 2
64/137
Estimation of an AR(1) Model
Beginning with t = 1, 100 values of {yt} aregenerated using the AR(1) process: yt = 0.7yt1+
t, with the initial condition y0 = 0. The up-
per left graph of Figure 2.3 shows the sample
ACF and the upper right graph shows the sam-
ple PACF of this AR(1) process. It is impor-tant that you compare these ACF and PACF
to those of the theoretical processes shown in
Figure 2.2.
In practice, we never know the true data gen-
erating process. However, suppose we were
presented with those 100 sample values and
were asked to uncover the true process.
The first step might be to compare the sample
ACF and PACF to those of the various theo-retical models. The decaying pattern of the
ACF and the single large spike at lag 1 in the
sample PACF suggests an AR(1) model. The
64
7/30/2019 Slide Chapter 2
65/137
first three sample autocorrelations are r1 =
0.74, r2 = 0.58, and r3 = 0.47 (which are some-
what greater than the corresponding theoreti-
cal autocorrelations of 0.7, 0.49, and 0.343.
In the PACF, there is a sizeable spike of 0.74
at lag 1, and all other autocorrelations (exceptfor lag 12) are very small.
Under the null hypothesis of an MA(0) pro-
cess, the standard deviation ofr1 is T1/2 =
0.1. Since the sample value of r1 = 0.74 ismore than seven standard deviations from
zero, we can reject the null hypothesis H0 :
1 = 0
The standard deviation of r2 is obtainedfrom (41) by taking s = 2:
V ar(r2) = (1 + 2(0.74)2)/100 = 0.021.
65
7/30/2019 Slide Chapter 2
66/137
Since (0.021)1/2 = 0.1449, the sample value
of r2 is more than 3 standard deviationsfrom zero; at conventional significance lev-
els, we can reject the null hypothesis H0 :
2 = 0
Similarly, we can test for the significance ofall other values of sample autocorrelations
It can be seen in the second panel of Figure
2.3, other than 11, all partial autocorrelations
(except for lag 12) are less than 2T1/2 = 0.2.The decay of the ACF and the single spike
of the PACF give strong indication of AR(1)
model. Nevertheless, if we did not know the
true underlying process, and happened to be
using monthly data, we might be concerned
with the significant partial autocorrelation atlag 12. After all, with monthly data we might
expect some direct relationship between yt and
yt12.
66
7/30/2019 Slide Chapter 2
67/137
Although we know that the data were actually
generated from an AR(1) process, it is illumi-nating to compare the estimates of two differ-
ent models. Suppose we estimate an AR(1)
model and also try to capture the spike at lag
12 with an MA coefficient. Thus, we can con-
sider the two tentative models as
Model 1: yt = a1yt1 + t,
Model 2: yt = a1yt1 + t + 12t12.
Table 2.2 reports the results of the two esti-
mations. The coefficient of Model 1 satisfies
the stability condition |a1| < 1 and has a low
standard error (the associated t-statistic for anull of zero is more than 12). As a useful di-
agnostic check, we plot the correlogram of the
residuals of the fitted model in Figure 2.4.
The Ljung-Box Q-statistics for these residuals
indicate that each one of the autocorrelations
is less than 2 standard deviations from zero.
The Q-statistics indicate that as a group, lags
1 through 8, 1 through 16, and 1 through 24
are not significantly different from zero.
67
7/30/2019 Slide Chapter 2
68/137
This is strong evidence that the AR(1) model
fits the data well. If the residual autocorrela-tions were significant, the AR(1) model would
not utilize all available information concern-
ing movements in the yt sequence. For exam-
ple, suppose we wanted to forecast yt+1 con-
ditional on all available information up to and
including period t. With Model 1, the value ofyt+1 is yt+1 = a1yt + t+1. Hence, the forecast
from Model 1 is:
Etyt+1 = Et(a1yt + t+1)
= Et(a1yt) + Et(t+1)
= a1yt.
If the residual autocorrelation had been signifi-
cant, this forecast would not capture all of the
available information set.
Examining the results for Model 2, note thatboth models yield similar estimates for the first-
order autoregressive coefficient and the asso-
ciated standard error. However, the estimate
for
68
7/30/2019 Slide Chapter 2
69/137
12 is of poor quality; the insignificant t-value
suggests that it should be dropped from the
model. Moreover, comparing the AIC and the
SBC values of the two models suggests that
any benefits of a reduced sum of squared resid-
uals is overwhelmed by the detrimental effects
of estimating an additional parameter. All ofthese indicators point to the choice of Model
1.
Estimation of an ARMA(1,1) Model
See ARMA(1,1) & Table 2.3 under Figures& Tables in Chapter 2.
Estimation of an AR(2) Model
See AR(2) under Figures & Tables in Chap-
ter 2.
8. Box-Jenkins Model Selection
The estimates of the AR(1), ARMA(1,1) and
AR(2) models in the previous section illustrate
69
7/30/2019 Slide Chapter 2
70/137
the Box-Jenkins (9176) strategy for appropri-
ate model selection. Box and Jenkins popular-
ized a three-stage method aimed at selecting
an appropriate model for the purpose of esti-
mating and forecasting a univariate time series.
In the identification stage, the researcher vi-
sually examines the time plot of the series, theautocorrelation function, and the partial auto-
correlation function. Plotting the time path
of the {yt} sequence provides useful informa-
tion concerning outliers, missing values, and
structural breaks in the data. Nonstationary
variables may have a pronounced trend or ap-
pear to meander without a constant long-run
mean or variance. Missing values and outliers
can be corrected at this point. Earlier, a stan-
dard practice was to first-difference any series
deemed to be nonstationary. Currently, a largeliterature is evolving that develops formal pro-
cedures to check for nonstationarity. We defer
this discussion until Chapter 4 and
70
7/30/2019 Slide Chapter 2
71/137
assume that we are working with stationary
data. A comparison of the sample ACF andsample PACF to those of various theoretical
ARMA processes may suggest several plausi-
ble models. In the estimation stage each of
the tentative models is fit and the various ai
and i coefficients are examined. In this sec-ond stage, the estimated models are compared
using the following criteria.
Parsimony
A fundamental idea in the Box-Jenkins approach
is the principle of parsimony. Incorporatingadditional coefficients will necessarily increase
fit (e.g., the value of R2 will increase) at a
cost of reducing degrees of freedom. Box and
Jenkins argue that parsimonious models pro-
duce better forecasts than overparameterized
models. A parsimonious model fits the data
well without incorporating any needless coef-
ficients. The aim is to approximate the true
data generating process but not to pin down
71
7/30/2019 Slide Chapter 2
72/137
the exact process. The goal of parsimony sug-gested eliminating the MA(12) coefficient in
the simulated AR(1) model shown earlier.
In selecting an appropriate model, the econo-
metrician needs to be aware that several differ-
ent models may have similar properties. As an
extreme example, note that the AR(1) model
yt = 0.5yt1 + t
has the equivalent infinite-order moving-average
representation of
yt = t + 0.5t1 + 0.25t2 + 0.125t3
+0.0625t4 + . . . .
In most samples, approximating this MA()
process with an MA(2) or MA(3) model willgive a very good fit. However, the AR(1)
model is the more parsimonious model and is
preferred.
72
7/30/2019 Slide Chapter 2
73/137
One also needs to be aware of the common
factor problem. Suppose we wanted to fit theARMA(2,3) model
(1 a1L a2L2)yt = (1 + 1L + 2L
2 + 3L3)t.
(43)
Suppose that (1 a1L a2L2) and (1 + 1L +
2L2 + 3L3) can each be factored as (1 +cL)(1 + aL) and (1 + cL)(1 + b1L + b2L
2), re-
spectively. Since (1 + cL) is a common factor
to each, (43) has the equivalent, but more par-
simonious, form
(1 + aL)yt = (1 + b1L + b2L2
)t. (44)In order to ensure that the model is parsimo-
nious, the various ai and i should all have t-
statistics of 2.0 or greater (so that each coeffi-
cient is significantly different from zero at the
5% level). Moreover, the coefficients should
not be strongly correlated with each other.
Highly collinear coefficients are unstable, usu-
ally one or more can be eliminated from the
model without reducing forecasting performance.
73
7/30/2019 Slide Chapter 2
74/137
Stationarity and Invertibility
The distribution theory underlying the use of
the sample ACF and PACF as approximations
to those of the true data generating process
is based on the assumption of stationarity of
the yt sequence. Moreover, t-statistics and Q-
statistics also presume that the data are sta-tionary. The estimated autoregressive coeffi-
cients should be consistent with this underlying
assumption. Hence, we should be suspicious of
an AR(1) model if the estimated value of a1
is close to unity. For an ARMA(2,q) model,the characteristic roots of the estimated poly-
nomial (1 a1L a2L2) should be outside the
unit circle.
The Box-Jenkins methodology also necessitates
that the model be invertible. Formally, yt is
invertible if it can be represented by a finite-
order or convergent autoregressive process. In-
vertibility is important because the use of the
74
7/30/2019 Slide Chapter 2
75/137
ACF and PACF implicitly assumes that the {yt}sequence can be represented by an autoregres-
sive model. As a demonstration, consider the
simple MA(1) model
yt = t 1t1 (45)
so that if |1| < 1,
yt/(1 1L) = t
or
yt + 1yt
1 + 2
1
yt
2 + 3
1
yt
3 + . . . = t. (46)
If |1| < 1, (46) can be estimated using the
Box-Jenkins method. However, if |1| 1,
the {yt} sequence cannot be represented by a
finite-order AR process, and thus, it is not in-
vertible. More generally, for an ARMA model
to have a convergent AR representation, the
roots of the polynomial (1+ 1L + 2L2 + . . . +
qLq) must lie outside the unit circle.
75
7/30/2019 Slide Chapter 2
76/137
We note that there is noting improper about a
noninvertible model. The {yt} sequence im-plied by yt = t t1 is stationary in thatit has a constant time-invariant mean [Eyt =Eyts = 0], a constant time-invariant variance[V ar(yt) = V ar(yts) =
2(1 + 21) + 22], and
the autocovariances 1 = 12 and all other
s = 0. The problem is that the technique doesnot allow for the estimation of such models. If
1 = 1, (46) becomes
yt + yt1 + yt2 + yt3 + yt4 + . . . = t.
Clearly, the autocorrelations and partial auto-
correlations between yt and yts will never de-cay.
Goodness of Fit
R2 and the average of the residual sum of
squares are common measures of goodness offit in ordinary least squares.
AIC and SBC are more appropriate measures
of fit in time series models.
76
7/30/2019 Slide Chapter 2
77/137
Caution must be exercised if estimates fail to
converge rapidly. Failure of rapid convergencemight be indicative of estimates being unsta-
ble. In such circumstances, adding an addi-
tional observation or two can greatly alter the
estimates.
The third stage of the Box-Jenkins methodol-ogy involves diagnostic checking. The stan-
dard practice is to plot the residuals to look
for outliers and for evidence of periods in which
the model does not fit the data well. If all plau-
sible ARMA models show evidence of a poor fit
during a reasonably long portion of the sample,
it is wise to consider using intervention analy-
sis, transfer function analysis, or any other of
the multivariate estimation methods, discussed
in later chapters. If the variance of the residual
is increasing, a logarithmic transformation may
be appropriate. Alternatively, we may wish to
actually model any tendency of the variance to
change using the ARCH techniques discussed
in Chapter 3.
77
7/30/2019 Slide Chapter 2
78/137
It is particularly important that the residuals
from an estimated model be serially uncorre-
lated. Any evidence of serial correlation implies
a systematic movement in the {yt} sequence
that is not accounted for by the ARMA coeffi-
cients included in the model. Hence, any of the
tentative models yielding nonrandom residualsshould be eliminated from consideration. To
check for correlation in the residuals, construct
the ACF and the PACF of the residuals of the
estimated model. Then use (41) and (42) to
determine whether any or all of the residual
autocorrelations or partial autocorrelations are
statistically significant. Although there is no
significance level that is deemed most appro-
priate, be wary of any model yielding
(1) several residual correlations that are marginally
significant, and(2) a Q-statistic that is barely significant at
10% level.
In such circumstances,it is usually possible to
78
7/30/2019 Slide Chapter 2
79/137
formulate a better performing model. If there
are sufficient observations, fitting the same ARMA
model to each of two subsamples can provide
useful information concerning the validity of
the assumption that the data generating pro-
cess is unchanging. In the AR(2) model that
was estimated in the last section, the sam-
ple was split in half. In general, suppose you
estimated an ARMA(p, q) model using a sam-
ple of T observations. Denote the sum of the
squared residuals as SSR. Now divide the T ob-servations with tm observations in the first and
tn = T tm observations in the second. Use
each subsample to estimate the two models
yt = a0(1) + a1(1)yt1 + . . . + ap(1)ytp
+t+1(1)t1+. . .+q(1)tq [using t1, . . . , tm]yt = a0(2) + a1(2)yt1 + . . . + ap(2)ytp+t+1(2)t1+. . .+q(2)tq [using tm+1, . . . , tT].
79
7/30/2019 Slide Chapter 2
80/137
Let the sum of the squared residuals from the
two models be, respectively, SSR1 and SSR2.To test the restriction that all coefficients are
equal, [i.e., a0(1) = a0(2) and a1(1) = a1(2)
and . . . ap(1) = ap(2) and 1(1) = 1(2) and
. . . q(1) = q(2)], conduct an F-test using
F = (SSR SSR1 SSR2)/n(SSR1 + SSR2)/(T 2n)
(47)
where n = number of parameters estimated
= p+q +1 (if an intercept is included)
= p + q (if no intercept is included)
and the numbers of degrees of freedom are(n, T 2n).
Intuitively, if the coefficients are equal, that
is, if the restriction is not binding, then the
sum of squared residuals SSR from the re-
stricted model and the sum of squared residu-
als (SSR1+SSR2) from the unrestricted model
should be equal. Hence, F should be zero.
Conversely, if restriction is binding, SSR should
80
7/30/2019 Slide Chapter 2
81/137
exceed (SSR1+SSR2). And, the larger the dif-
ference between SSR and (SSR1 + SSR2), and
thus, the larger the calculated value of F, the
larger is the evidence against the hypothesis
that the coefficients are equal.
Similarly, a model can be estimated over only aportion of the data set. The estimated model
can then be used to forecast the known values
of the series. The sum of the squared forecast
errors is a useful way to compare the adequacy
of alternative models. Those models with poor
out-of-sample forecasts should be eliminated.
9. Properties of Forecasts
One of the most important uses of ARMA
models is to forecast future values of the {yt}
sequence. To simplify the following discussion,it is assumed that the actual data generating
process and the current and past realizations
of {yt} and {t} sequences are known to the
81
7/30/2019 Slide Chapter 2
82/137
researcher. First consider the forecasts of anAR(1)model: yt = a0 + a1yt1 + t. Updating
one period, we obtain: yt+1 = a0 + a1yt + t+1.
If we know the coefficients a0 and a1, we can
forecast yt+1 conditioned on the information
available at period t as
Etyt+1 = a0 + a1yt (48)
where the notation Etyt+j stands for the con-
ditional expectation of yt+j given the informa-
tion available at period t. Formally,
Etyt+j = E(yt+j|yt, yt1, yt2, . . . , t, t1, . . .).
In the same way, since yt+2 = a0 + a1yt+1 +
t+2, the forecast of yt+2 conditioned on the
information available at period t is
Etyt+2 = a0 + a1Etyt+1
and using (48)
Etyt+2 = a0 + a1(a0 + a1yt).
82
7/30/2019 Slide Chapter 2
83/137
Thus forecast of yt+1 can be used to forecast
yt+2. In other words, forecasts can be con-
structed using forward iteration; the forecastof yt+j can be used to forecast yt+j+1. Since
yt+j+1 = a0 + a1yt+j + t+j+1, it follows that
Etyt+j+1 = a0 + a1Etyt+j. (49)
From (48) and (49) it should be clear thatit is possible to obtain the entire sequence
of j-step-ahead forecasts by forward iteration.
Consider
Etyt+j = a0(1 + a1 + a21 + . . . + a
j11 ) + a
j1yt.
83
7/30/2019 Slide Chapter 2
84/137
This equation, called the forecast function,
expresses all of the j-step-ahead forecasts asfunctions of the information set in period t.
Unfortunately, the quality of the forecasts de-
clines as we forecast further out into the fu-
ture. Think of (49) as a first-order differ-
ence equation in the {Ety
t+j} sequence. Since
|a1| < 1, the difference equation is stable, and
it is straightforward to find the particular so-
lution to the difference equation. If we take
the limit of Etyt+j as j , we find that
Etyt+j a0/(1 a1). This result is quite gen-
eral. For any stationary ARMA model, theconditional forecast of yt+j converges to the
unconditional mean as j .
Because the forecasts from an ARMA model
will not be perfectly accurate, it is important to
consider the properties of the forecast errors.Forecasting from time period t, we can define
the j-step-ahead forecast error et(j) as the dif-
ference between the realized value of yt+j and
84
7/30/2019 Slide Chapter 2
85/137
the forecast value Etyt+j. Thus
et(j) yt+j Etyt+j.
Hence, the 1-step-ahead forecast error et(1) =
yt+1Etyt+1 = t+1 (i.e., the unforecastable
portion of yt+1 given the information available
in period t).
To find the two-step-ahead forecast error, we
need to form et(2) = yt+2 Etyt+2. Since
yt+2 = a0 + a1yt+1 + t+2 and Etyt+2 = a0 +
a1Etyt+1, it follows that
et(2) = a1(yt+1 Etyt+1) + t+2= t+2 + a1t+1.
Proceeding in a like manner, you can demon-
strate that for the AR(1) model, j-step-ahead
forecast error et(j) is given by
et(j) = t+j + a1t+j1 + a21t+j2 + a
31t+j3
+ . . . + aj11 t+1. (50)
85
7/30/2019 Slide Chapter 2
86/137
Since the mean of (50) is zero, the forecasts
are unbiased estimates of each value yt+j. It
can be seen as follows. Since Ett+j = Ett+j1 =
. . . = Ett+1 = 0, the conditional expectation
of (50) is Etet(j) = 0. Since the expected
value of the forecast error is zero, the fore-
casts are unbiased.
Next we look at the variance of the forecast er-
ror. To compute the forecast error, continue
to assume that the elements of the {t} se-
quence are independent with a variance equalto 2. Then using (50), the variance of the
forecast error is
V ar[et(j)] = 2[1 + a21 + a
41 + a
61 + . . . + a
2(j1)1 ].
(51)
for j = 1, 2, . . . , . Thus, the one-step-aheadforecast error variance is 2, the two-step-ahead
forecast error variance is 2(1 + a21), and so
forth. The essential point to note is that the
86
7/30/2019 Slide Chapter 2
87/137
forecast error variance is an increasing function
of j. Consequently, we can have more confi-
dence in short-term forecasts than in long-term
forecasts. In the limit, as j , the forecast
error variance converges to 2/(1 a21); hence,
the forecast error variance converges to the
unconditional variance of the {yt} sequence.
Moreover, assuming the {t} sequence is nor-
mally distributed, you can place confidence in-
tervals around the forecasts. The one-step-
ahead forecast of yt+1 is a0 + a1yt and theforecast error is 2. Therefore, the 95% confi-
dence interval for the one-step-ahead forecast
can be constructed as
a0 + a1yt 1.96.
We can construct a confidence interval for thetwo-step-ahead forecast error in a similar way.
Using (49), the two-step-ahead forecast is Etyt+2= a0(1+a1)+a
21yt. Again using (51), we know
87
7/30/2019 Slide Chapter 2
88/137
that V ar[et(2)] = 2(1 + a21). Thus, the 95%
confidence interval for the two-step-ahead fore-cast is
a0(1 + a1) + a21yt 1.96(1 + a
21)
1/2.
Higher-Order Models
Now we generalize the above discussion to de-rive forecasts for any ARMA(p, q) model. To
keep the algebra simple, consider the ARMA(2,1)
model:
yt = a0 + a1yt1 + a2yt2 + t + 1t1. (52)
Updating one period yields
yt+1 = a0 + a1yt + a2yt1 + t+1 + 1t.
If we continue to assume that (1) all the coef-
ficients are known; (2) all variables subscripted
t, t 1, t 2, . . . are known at period t; and (3)
Ett+j = 0 for j > 0, the conditional expecta-
tion of yt+1 is
Etyt+1 = a0 + a1yt + a2yt1 + 1t. (53)
88
7/30/2019 Slide Chapter 2
89/137
Equation (53) is the one-step-ahead forecast
of yt+1. The one-step-ahead forecast error:
et(1) = yt+1 Etyt+1 = t+1.
To find the two-step-ahead forecast, update
(52) by two periods
yt+2 = a0 + a1yt+1 + a2yt + t+2 + 1t+1.
The conditional expectation of yt+2 is
Etyt+2 = a0 + a1Etyt+1 + a2yt. (54)
Equation (54) expresses the two-step-ahead
forecast in terms of the one-step-ahead fore-
cast and current value of yt. Combining (53)
and (54) yields
Etyt+2 = a0+a1[a0 +a1yt +a2yt1 +1t]+a2yt= a0(1 + a1) + [a
21 + a2]yt + a1a2yt1
+ a11t.
89
7/30/2019 Slide Chapter 2
90/137
To find the two-step-ahead forecast error, sub-
tract (54) from yt+2. Thus,
et(2) = yt+2 Etyt+2= [a0 + a1yt+1 + a2yt + t+2 + 1t+1]
[a0 + a1Etyt+1 + a2yt]
= a1(yt+1 Etyt+1) + t+2
+1t+1. (55)
Since yt+1 Etyt+1 is equal to the one-step-ahead forecast error t+1, we can write the
forecast error as et(2) = (a1 + 1)t+1 + t+2.
Alternatively,
et(2) = yt+2 Etyt+2= [a0 + a1yt+1 + a2yt + t+2 + 1t+1]
[a0(1 + a1) + [a21 + a2]yt + a1a2yt1
+a11t]
= (a1 + 1)t+1 + t+2. (56)
Finally, all j-step-ahead forecasts can be ob-tained from
Etyt+j = a0 + a1Etyt+j1 + a2Etyt+j2, j 2.(57)
90
7/30/2019 Slide Chapter 2
91/137
Equation (57) suggests that the forecasts will
satisfy a second-order difference equation. As
long as the characteristic roots of (57) lie in-
side the unit circle, the forecasts will converge
to the unconditional mean: a0/(1 a1 a2).
We can use (57) to find the j-step-ahead fore-
cast errors. Since yt+j = a0 + a1yt+j1 +a2yt+j2 + t+j + 1t+j1, the j-step-ahead
forecast error:
et(j) = yt+j Etyt+j
= [a0 + a1yt+j1 + a2yt+j2 + t+j
+1t+j1]
Et[a0 + a1yt+j1 + a2yt+j2
+t+j + 1t+j1]
= a1(yt+j1 Etyt+j1)
+a2(yt+j2 Etyt+j2)
+t+j + 1t+j1
= a1et(j 1) + a2et(j 2)
+t+j + 1t+j1. (58)
91
7/30/2019 Slide Chapter 2
92/137
In practice, we will not know the actual order
of the ARMA process or the actual values ofthe coefficients of that process. Instead, to
create out-of-sample forecasts, it is necessary
to use the estimated coefficients from what
we believe to be the appropriate form of the
ARMA model. Suppose we have T observa-
tions of the {yt} sequence and choose to fit
an ARMA(2,1) model to the data. Let a hat
or caret, (i.e., ) over a parameter denote the
estimated value of the parameter and let {t}
denote the residuals of the estimated model.
Hence the estimated ARMA(2,1) model can
be written as
yt = a0 + a1yt1 + a2yt2 + t + 1t1.
Given that the sample contains T observations,
the out-of-sample forecasts can be easily con-
structed. For example, we can use (53) to
forecast the value of yT+1 conditional on the
T observations as
ETyT+1 = a0 + a1yT + a2yT1 + 1T. (59)
92
7/30/2019 Slide Chapter 2
93/137
Once we know the values of a0, a1, a2, and 1,(59) can easily be constructed using the ac-
tual values of yT, yT1, and T. Similarly, the
forecast of yT+2 can be constructed as
ETyT+2 = a0 + a1ETyT+1 + a2yT
where ETyT+1 is the forecast from (59).
Given these two forecasts, all subsequent fore-
casts can be obtained from the difference equa-
tion
ETyT+j = a0+a1ETyT+j1+a2ETyT+j2, j 2.
Note: it is much more difficult to construct
confidence intervals for the forecast errors. Not
only is it necessary to include the effects of
the stochastic variation in the future values of{yT+1}, it is also necessary to incorporate the
fact that the coefficients are estimated with
errors.
93
7/30/2019 Slide Chapter 2
94/137
Now that we have estimated a series and have
forecasted its future values, the obvious ques-tion is: How good are our forecasts? Typically,
there will be several plausible models that we
can select to use for our forecasts. Do not be
fooled into thinking that the one with the best
fit is the one that will forecast the best. To
make a simple point, suppose you wanted to
forecast the future values of the ARMA(2,1)
process given by (52). If you could forecast
the value of yT+1 using (53), you would ob-
tain the one-step-ahead forecast error
eT(1) = yT+1 a0a1yT a2yT1 1T = T+1.
Since the forecast error is the pure unfore-
castable portion ofyT+1, no other ARMA model
can provide you with superior forecasting per-
formance. However, we need to estimate the
parameters of the process, so our forecastsmust be made using (59). Therefore, our es-
timated forecast error will be
eT(1) = yT+1 (a0 + a1yT + a2yT1 + 1T).
94
7/30/2019 Slide Chapter 2
95/137
Clearly, the two forecast errors are not iden-
tical. When we forecast using (59), the co-efficients (and residuals) are estimated impre-
cisely. The forecasts made using the estimated
model extrapolate this coefficient uncertainty
into the future. Since coefficient uncertainty
increases as the model becomes more complex,
it could be that an estimated AR(1) model
forecasts the process given by (52) better than
an estimated ARMA(2,1) model.
How do we know which one of several rea-
sonable models has the best forecasting per-formance? One way to determine that is to
put the alternative models to a head-to-head
test. Since the future values of the series are
unknown, you can hold back a portion of the
observations from the estimation process and
estimate the alternative models over the short-
ened span of data and use these estimates to
forecast the observations of the holdback pe-
riod. You can then compare the properties of
95
7/30/2019 Slide Chapter 2
96/137
the forecast errors from the alternative mod-
els. To take a simple example, suppose that{yt} contains a total of 150 observations and
that you are unsure as to whether an AR(1) or
an MA(1) model best captures the behavior of
the series. One way to proceed is to use the
first 100 observations to estimate both mod-
els and use each to forecast the value of y101.
Since you know the actual value of y101, you
can construct the forecast error obtained from
AR(1) and from MA(1). These two forecast
errors are precisely those that someone would
have made if they had been making a one-step-ahead forecast in period 100. Now, re-
estimate an AR(1) and an MA(1) model using
the first 101 observations. Although the esti-
mated coefficients will change somewhat, they
are those that someone would have obtained
in period 101. Use the two models to forecast
the value of y102. Given that you know the ac-
tual value of y102, you can construct two more
forecast errors. Since you know all the values
96
7/30/2019 Slide Chapter 2
97/137
of the {yt} sequence through period 150, you
can continue this process so as to obtain twoseries of one-step-ahead forecast errors, each
containing 50 errors. To keep the notation
simple, let {f1t} and {f2t} denote the sequences
of forecasts from the AR(1) and the MA(1),
respectively. Similarly, let {e1t} and {e2t} de-
note the sequences of forecast errors from the
AR(1) and the MA(1), respectively. Then it
should be clear that f11 = E100y101 is the first
forecast using the AR(1), e11 = y101 f11 is
the first forecast error (where the first hold
back observation is y101), and e2,50 is the lastforecast error from the MA(1).
It is desirable that the forecast errors have a
mean zero and a small variance. A regression-
based method to assess the forecasts is to use
the 50 forecasts from the AR(1) to estimatean equation of the form
y100+t = a0 + a1f1t + v1t, t = 1, 2, . . . , 50.
97
7/30/2019 Slide Chapter 2
98/137
If the forecasts are unbiased, an F-test shouldallow us to impose the restriction a0 = 0 and
a1 = 1. Similarly, the residual series v1t should
act as a white noise process. It is a good idea
to plot v1t against y100+t to determine if there
are periods in which our forecasts are espe-
cially poor. Now repeat the process with the
forecasts from the MA(1). In particular, use
the 50 forecasts from the MA(1) to estimate
y100+t = b0 + b1f2t + v2t, t = 1, 2, . . . , 50.
Again, if we use an F-test, we should not beable to reject the joint hypothesis b0 = 0 and
b1 = 1. If the significance levels from the two
F-tests are similar, we might select the model
with the smallest residual variance: that is, se-
lect the AR(1) if V ar(y1t
) < V ar(y2t
).
More generally, we might want to have a hold-
back period that differs from 50 observations.
98
7/30/2019 Slide Chapter 2
99/137
With a very small sample, it may not be possi-ble to hold back 50 observations. Small sam-
ples are a problem since Ashley (1997) shows
that very large samples are often necessary to
reveal a significant difference between the out-
of-sample forecasting performances of similar
models. Hence, we need to have enough ob-
servations to have well-estimated coefficients
for the in-sample period and enough out-of-
sample forecasts so that the test has good
power. If we have a large sample, it is typi-
cal to hold back as much as 50% of the dataset. Also, we might want to use j-step-ahead
forecasts instead of one-step-ahead forecasts.
For example, if we have quarterly data and
want to forecast one year into the future, we
can perform the analysis using four-step-aheadforecasts. Nevertheless, once we have the two
sequences of forecast errors, we can compare
their properties.
99
7/30/2019 Slide Chapter 2
100/137
Instead of using a regression based approach, a
researcher could select a model with the small-est mean square prediction error (MSPE). If
there are H observations in the hold back pe-
riods, the MSPE for the AR(1) can be calcu-
lated as
M SP E = 1HHi=1 e21i
Several methods have been proposed to deter-
mine whether one MSPE is statistically differ-
ent from the other. If we put the larger of
the two MSPEs in the numerator, a standard
recommendation is to use the F-statistic
F =H
i=1
e21i/H
i=1
e22i (60)
The intuition is that the value of F will equal
unity if the forecast errors from the two models
are identical. A very large value of F implies
that the forecast errors from the first model
are substantially larger than those from the
second. Under the null hypothesis of equal
100
7/30/2019 Slide Chapter 2
101/137
forecasting performance, (60) has a standard
F distribution with (H, H) degrees of freedomif the following 3 assumptions hold. The fore-
cast errors are
1. normally distributed with zero mean,
2. serially uncorrelated, and
3. contemporaneously uncorrelated.
Although it is common practice to assume that
the {et} sequence is normally distributed, it is
not necessarily the case that the forecast errors
are normally distributed with zero mean. Sim-
ilarly, the forecasts may be serially correlated;this is particularly true if we use multi-step-
ahead forecasts. For example, equation (56)
indicated that the two-step-ahead forecast er-
ror for yt+2 is
et(2) = (a1 + 1)t+1 +
t+2
and updating et(2) by one period yields the
two-step-ahead forecast error for yt+3 as
et+1(2) = (a1 + 1)t+2 + t+3.
101
7/30/2019 Slide Chapter 2
102/137
Thus predicting yt+2 from the perspective of
period t and predicting yt+2 from the perspec-tive of period t+1 both contain an error due tothe presence of t+2. This induces serial cor-relation between the two forecast errors. For-mally it can be seen as follows:
E[et(2)e
t+1(2)] = (a
1+
1)2 = 0.
However, for i > 1, E[et(2)et+1(2)] = 0 sincethere are no overlapping forecasts. Hence, theautocorrelations of the two-step-ahead fore-cast errors cut off to zero after lag 1. As anexercise you can demonstrate the general re-
sult that j-step-ahead forecast errors act as anMA(j 1) process.
Finally, the forecast errors from the two alter-native models will usually be highly correlatedwith each other. For example, a negative re-
alization of t+1 will tend to cause the fore-casts from both models to be too high. Alsonote: the violation of any of the 3 assump-tions means that the ratio of the MSPEs in(60) does not have an F-distribution.
102
7/30/2019 Slide Chapter 2
103/137
The Granger-Newbold Test
Granger and Newbold (1976) show how to over-come the problem of contemporaneously cor-related forecast errors. Use the two sequencesof forecast errors to form
xt = e1t + e2t and zt = e1t e2t.
If assumptions 1 and 2 are valid, then underthe null hypothesis of equal forecast accuracy,xt and zt should be uncorrelated. That is,
xz = Extzt = E(e21t e
22t).
should be zero. Model 1 has a larger MSPE if
xz is positive and Model 2 has a larger MSPEif xz is negative. Let rxz denote the samplecorrelation coefficient between {xt} and {zt}.Granger and Newbold (1976) show that
rxz/
(1 r2xz)/(H 1) (61)
has a t-distribution with H 1 degrees of free-dom. Thus, if rxz is statistically significantlydifferent from zero, model 1 has a larger MSPEifrxz is positive and model 2 has a larger MSPEif rxz is negative.
103
7/30/2019 Slide Chapter 2
104/137
The Diebold-Mariano Test
Diebold and Mariano (1995) relaxes assump-
tions 1 - 3 and allow for an objective func-
tion that is not quadratic. This is important
because if, for example, an investors loss de-
pends on the size of the forecast error, the
forecaster should be concerned with the abso-lute values of the forecast errors. As another
example, an options trader receives a pay-off
of zero if the value of the underlying asset
lies below the strike price but receives a one-
dollar pay-off for each dollar the asset price
rises above the strike price.
If we consider only one-step-ahead forecasts,
we can eliminate the subscript j and let the
loss from a forecast error in period i be denoted
by g(ei). In the typical case of mean squarederrors, the loss is e2i . To allow the loss function
to be general, we can write the differential loss
in period i from using model 1 versus model 2
104
7/30/2019 Slide Chapter 2
105/137
as di = g(e1i) g(e2i). The mean loss can be
obtained as
d =1
H
Hi=1
[g(e1i) g(e2i)]. (62)
Under the null hypothesis of equal forecast
accuracy, the value ofd is zero. Since
d isthe mean of the individual losses, under fairly
weak conditions, the Central Limit Theorem
implies that d should have a normal distribu-
tion. Hence it is not necessary to assume that
the individual forecast errors are normally dis-
tributed. Thus if we know V ar(d), we could
construct the ratio d/
V ar(d) and test the null
hypothesis of equal forecast accuracy using a
standard normal distribution. In practice, to
implement the test we first need to estimate
V ar(d).
If the {di} series are serially uncorrelated with
a sample variance of 0,the estimate of V ar(d)
105
7/30/2019 Slide Chapter 2
106/137
7/30/2019 Slide Chapter 2
107/137
sequences of j-step-ahead forecasts, the DM
statistic is
DM =d
0+21+...+2qH+12j+H1j(j1)
.
An example showing the appropriate use of the
Granger-Newbold and Diebold-Mariano tests isprovided in the next section.
10. A Model of the Producer Price Index
This section is intended to illustrate some of
the ambiguities frequently encountered in the
Box-Jenkins technique. These ambiguities maylead two equally skilled econometricians to es-
timate and forecast the same series using very
different ARMA processes. Nonetheless, if you
make reasonable choices, you will select mod-
els that come very close to mimicking the ac-
tual data generating process.
Now we look at the illustration of Box-Jenkins
modeling procedure by estimating a quarterly
107
7/30/2019 Slide Chapter 2
108/137
model of the U.S. Producer Price Index (PPI).
The data used in this section are the series la-beled PPI in the file QUARTERLY.XLS. Panel
(a) of Figure 2.5 clearly reveals that there is
little point in modeling the series as being sta-
tionary; there is a decidedly positive trend or
drift throughout the period 1960Q1 to 2002Q1.The first difference of the series seems to have
a constant mean, although inspection of Panel
(b) suggests that the variance is an increasing
function of time. As shown in Panel (c), the
first difference of the logarithm (denoted by
lppi) is the most likely candidate to be covari-ance stationary. Moreover, there is a strong
economic reason to be interested in the log-
arithmic change since lppit is a measure of
inflation. However, the large volatility of the
PPI accompanying the oil price shocks in the
1970s should make us somewhat wary of the
assumption that the process is covariance sta-
tionary. At this point, some researchers would
make
108
7/30/2019 Slide Chapter 2
109/137
additional transformations intended to reduce
the volatility exhibited in the 1970s. However,
it seems reasonable to estimate a model of the
{lppit} sequence without any further trans-
formations. As always, you should maintain
a healthy skepticism of the accuracy of your
model.
The autocorrelation and partial autocorrela-
tion functions of the {lppit} sequence can
be seen in Figure 2.6. Let us try to identify
the tentative models that we would want toestimate. In making our decision, we note the
following:
1. The ACF and PACF converge to zero rea-
sonably quickly. We do not want to overdiffer-
ence the data and try to model the {2
lppit}sequence.
2. The theoretical ACF of a pure MA(q) pro-
cess cuts off to zero at lag q and the theoretical
109
7/30/2019 Slide Chapter 2
110/137
ACF of an AR(1) process decays geometrically.
Examination of Figure 2.6 suggests that nei-ther of these specifications seems appropriate
for the sample data.
3. The ACF does not decay geometrically.
The value of 1 is 0.603 and the values of
2, 3, and 4 are 0.494, 0.451, and 0.446,respectively. Thus the ACF is suggestive of
an AR(2) process or a process with both au-
toregressive and moving average components.
The PACF is such that 11 = 0.604 and cuts
off to 0.203 abruptly (i.e., 22 = 0.203). Over-
all, the PACF suggests that we should considermodels such that p = 1 and p = 2.
4. Note the jump in ACF after lag 4 and the
small jump in the PACF at lag 4 (44 = 0.148
while 55 = - 0.114). Since we are using quar-
terly data, we might want to incorporate a sea-
sonal factor at lag 4.
Points 1 to 4 suggest an ARMA(1,1) or an
AR(2) model. In addition, we might want to
110
7/30/2019 Slide Chapter 2
111/137
consider models with a seasonal term at lag 4.
However, to compare with a variety of mod-els, Table 2.4 reports estimates of 6 tentative
models. To ensure comparability, all were esti-
mated over the same sample period. We make
the following observations:
1. The estimated AR(1) model confirms ouranalysis conducted in the identification stage.
Even though the estimated value of a1 (0.603)
is less than unity in absolute value and al-
most four standard deviations from zero, the
AR(1) specification is inadequate. Forming
the Ljung-Box Q-statistic for 4 lags of the resid-
uals yields a value of 13.9, we can reject the
null that Q(4) = 0 at the 1% significance level.
Hence, the lagged residuals of this model ex-
hibit substantial serial autocorrelation and we
must eliminate this model from consideration.
2. The AR(2) model is an improvement over
the AR(1) specification. The estimated coef-
ficients (a1 = 0.480 and a2 = 0.209) are each
111
7/30/2019 Slide Chapter 2
112/137
significantly different from zero at conventional
levels and imply characteristic roots in the unitcircle. However, there are some ambiguity about
the information content of the residuals. The
Q-statistics indicate that the autocorrelations
of the residuals are not statistically significant
at the 5% level but are significant at the 10%
level. As measured by the AIC and SBC, the
fit of the AR(2) model is superior to that of
the AR(1). Overall, the AR(2) model domi-
nates the AR(1) specification.
3. The ARMA(1,1) specification is superior
to the AR(2) model. The estimated coeffi-cients are highly significant (with t-values of
14.9 and - 4.41). The estimated value of a1 is
positive and less than unity and the Q-statistics
indicate that the autocorrelations of the resid-
uals are not significant at conventional levels.
Moreover, all goodness-of-fit measures select
the ARMA(1,1) specification over the AR(2)
model. Thus, there is little reason to maintain
the AR(2) specification.
112
7/30/2019 Slide Chapter 2
113/137
4. In order to account for the possibility of sea-
sonality, we estimated the ARMA(1,1) modelwith the additional moving average coefficient
at lag 4. That is, we estimated a model of the
form: yt = a0 + a1yt1 + t + 1t1 + 4t4.
Other seasonal patterns are considered in the
next section. For now, note that the additiveexpression 4t4 is often preferable to an addi-
tive autoregressive term a4yt4. For truly sea-
sonal shocks, the expression 4t4 captures
spikes - not decay - at the quarterly lags. The
slope coefficients of the estimated ARMA(1,
(1,4)) model are all highly significant with t-
statistics of 9.46, -3.41, and 3.63. The Q-
statistics of the residuals are all very low, im-
plying that the autocorrelations are not statis-
tically significantly different from zero. More-
over, the AIC and SBC select this model overthe ARMA(1,1) mo