of 137

# Slide Chapter 2

Apr 04, 2018

## Documents

Shai Aron R
Welcome message from author
Transcript
• 7/30/2019 Slide Chapter 2

1/137

Chapter 2 Stationary Time Series Models

This chapter develops the Box-Jenkins Method-

ology for estimating time series models of the

form

yt = a0+a1yt1+. . .+apytp+t+1t1+. . .+qtq

which are called autoregressive integrated mov-

ing average (ARIMA) models.

The chapter has three aims:

1. Present the theory of stochastic linear differ-

ence equations and consider time series proper-

ties of stationary ARIMA models; a stationaryARIMA model is called an autoregressive mov-

ing average (ARMA) model.

1

• 7/30/2019 Slide Chapter 2

2/137

2. Develop tools used in estimating ARMA mod-

els. Especially useful are the autocorrelationfunctions (ACF) and partial autocorrelation func-

tions (PACF).

3. Consider various test statistics to check for

model adequacy and show how a properly es-timated model can be used for forecasting.

1. Stochastic Difference Equation ModelsStochastic difference equations are a conve-

nient way of modeling dynamic economic pro-cesses. To take a simple example, suppose the

Federal Reserves money supply target grows

3% each period. Hence,

mt = 1.03mt1 (1)

so that, given the initial condition m

0, the par-ticular solution is

mt = (1.03)tm0

2

• 7/30/2019 Slide Chapter 2

3/137

where mt =the logarithm of the money supply

target in period t, m0=the logarithm of moneysupply in period 0.

Of course, the actual money supply, mt, and

the target money supply, mt , need not be equal.

Suppose that at the beginning of period

t there are mt1 dollars so that the gap

between the target and the actual money

supply is mt mt1

Suppose that Fed cannot perfectly control

the money supply but attempts to change

the money supply by % of any gap be-

tween the desired and actual money supply

We can model this behavior as

mt = [mt mt1] + t

3

• 7/30/2019 Slide Chapter 2

4/137

Using (1), we obtain

mt = (1.03)tm0 + (1 )mt1 + t (2)

where t

is the uncontrollable portion of the

money supply, and we assume its mean is

zero in all time periods.

Although the model is overly simple, it does

illustrate the key points:

1. Equation (2) is a discrete difference equa-

tion. Since {t} is stochastic, the money

supply is stochastic; we call (2) a linear

stochastic difference equation.

4

• 7/30/2019 Slide Chapter 2

5/137

2. If we knew the distribution of{t}, we could

calculate the distribution for each element in

the {mt} sequence. Since (2) shows how the

realizations of the {mt} sequence are linked

across time, we would be able to calculate the

various joint probabilities. We note that the

distribution of the money supply sequence is

completely determined by the parameters of

the difference equation (2) and the distribu-

tion of the {t} sequence.

3. Having observed the first t observations in

the {mt} sequence, we can make forecasts of

mt+1, mt+2, . . . ,. For example, updating (2)

by one period and taking the conditional ex-pectation, the forecast of mt+1 is: Etmt+1 =

(1.03)t+1m0 + (1 )mt.

5

• 7/30/2019 Slide Chapter 2

6/137

• 7/30/2019 Slide Chapter 2

7/137

A white noise process can be used to construct

more interesting time series processes. For ex-

ample, the time series

xt =q

i=0

iti (3)

is constructed by taking the values t,

ti, . . . ,

tqand multiplying each by the associated value of

i.

A series formed in this manner is called a

moving average of order q

It is denoted by MA(q)

Although the sequence

{t}is a white noise

process, the sequence {xt} will not be a

white noise process if two or more of the

i are different from zero

7

• 7/30/2019 Slide Chapter 2

8/137

To illustrate using an MA(1) process, set 0 =

1, 1 = 0.5, and all other i = 0. Then

E(xt) = E(t + 0.5t1) = 0

var(xt) = var(t + 0.5t1) = 1.252

E(xt) = E(xts) and var(xt) = var(xts)for all s

Hence, the first two conditions for {xt} to be

a white noise are satisfied. However,

E(xtxt1) = E[(t+0.5t1)(t1+0.5t2)]= E[tt1+0.5(t1)

2+0.5tt2+0.25t1t2]= 0.52

Given that there exists a value of s = 0such that E(xtxts) = 0,the sequence {xt}is not a white noise process

8

• 7/30/2019 Slide Chapter 2

9/137

2. ARMA ModelsIt is possible to combine a moving average pro-

cess with a linear difference equation to obtain

an autoregressive moving average model. Con-

sider the p-th order difference equation:

yt = a0 +p

i=1

aiyti + xt. (4)

Now let {xt} be the MA(q) process given by

(3) so that we can write

yt = a0 +

pi=1

aiyti +

qi=0

iti (5)

where by convention we normalize 0 to unity.

If the characteristic roots of (5) are all in

the unit circle, then yt is said to follow anautoregressive moving average (ARMA)

model

9

• 7/30/2019 Slide Chapter 2

10/137

The autoregressive part of the model is the

difference equation given by the homoge-

neous portion of (4) and the moving aver-

age part is the xt sequence

If the homogeneous part of the difference

equation contains p lags and the model

for xt contains q lags, the model is called

ARMA(p,q) model

If q = 0, the model is a pure autoregressivemodel denoted by AR(p)

If p = 0, the model is a pure moving aver-

age model denoted by MA(q)

In an ARMA model, it is permissible to al-

low p and/or q to be infinite

10

• 7/30/2019 Slide Chapter 2

11/137

If one or more characteristic roots of (5)

are greater than or equal to unity, the {yt}

sequence is called an integrated process

and (5) is called an autoregressive inte-

grated moving average (ARIMA) model

This chapter considers only models in which

all of the characteristic roots of (5) are

within the unit circle

Treating (5) as a difference equation suggests

that yt can be solved in terms of {t} sequence

The solution of an ARMA(p,q) model ex-

pressing yt in terms of the {t} sequence is

the moving average representation of yt

11

• 7/30/2019 Slide Chapter 2

12/137

For the AR(1) model: yt = a0 +a1yt1 +t,the moving average representation can be

shown to be

yt = a0/(1 a1) +

i=0ai1ti

For the general ARMA(p,q) model, using

the lag operator L, (5) can be rewritten as

(1 p

i=1 aiLi)yt = a0 +

q

i=0 itiso the particular solution for yt is

yt = (a0 +q

i=0

iti)/(1 p

i=1

aiLi) (6)

The expansion of (6) yields an MA() pro-

cess

12

• 7/30/2019 Slide Chapter 2

13/137

Issue: whether the expansion is convergent

so that the stochastic difference given by

(6) is stable

Will see in the next section, the stability

condition is that the roots of the polyno-

mial (1 p

i=1 aiLi) must lie outside the

unit circle

Will also see that, if yt is a linear stochastic

difference equation, the stability condition

is a necessary condition for the time series

{yt} to be stationary

3. Stationarity

Suppose the quality control division of amanufacturing firm samples four machines

each hour. Every hour, quality control finds

the mean of the machines output levels.

13

• 7/30/2019 Slide Chapter 2

14/137

The plot of each machines hourly output is

shown in Figure 2.1. If yit represents machineyis output at hour t, the means (yt) are readily

calculated as

yt =4

i=1

yit/4.

For hours 5, 10, and 15, these mean values are4.61, 5.14, and 5.03, respectively.

The sample variance for each hour can similarly

be constructed.

Unfortunately, we do not usually have the

luxury of being able to obtain an ensem-

ble, that is, multiple observations of the

same process over the same time period

Typically, we observe only one set of re-

alizations, that is, one observation, of a

process, over a given time period

14

• 7/30/2019 Slide Chapter 2

15/137

Fortunately, if {yt} is a stationary series,

the mean , variance, and autocorrelationscan be well approximated by sufficiently

long time averages based on the single

set of realizations

Suppose you observed the output of machine

1 for 20 periods. If you knew that the out-

put was stationary, you could approximate the

mean level of output by

yt

20i=1

yit/20.

In using this approximation you would be as-

suming that the mean was the same for each

period. Formally, a stochastic process having

a finite mean and variance is covariance sta-

tionary if for all t and t s,

Mean(yt) = Mean(yts) =

E(yt) = E(yts) = (7)

15

• 7/30/2019 Slide Chapter 2

16/137

V ar(yt) = V ar(yts) = 2y

E[(yt )2] = E(yts )

2] = 2y (8)

Cov(yt, yts) = Cov(ytjytjs) = s

E[(yt )(yts )]

= E(ytj )(ytjs )] = s (9)

where , 2y , and s are all constants. (For s =

0, (8) and (9) are identical, so 0 equals the

variance of yt.)

To reiterate, a time series is covariance sta-

tionary if its mean and all autocovariances

are unaffected by a change in time origin

A covariance stationary process is alsoreferred to as a weakly stationary, second-

order stationary, or wide-sense station-

ary process

16

• 7/30/2019 Slide Chapter 2

17/137

A strongly stationary process need not have

a finite mean and/or variance

In our course, we consider only covariancestationary series. So there is no ambiguityin using the terms stationary and covari-ance stationary interchangeably

In multivariate models, the term autoco-variance is reserved for the covariance be-tween yt and its own lags

In univariate time series models, there isno ambiguity and the terms autocovarianceand covariance are used interchangeably

For a covariance stationary series, we can de-

fine the autocorrelation between yt and ytsas

s s/0

where s and 0 are defined by (9).

17

• 7/30/2019 Slide Chapter 2

18/137

Since s and 0 are time-independent, theautocorrelation coefficients s are also time-

independent

Although the autocorrelation between yt

and yt1 can differ from the autocorrela-tion between yt and yt2, the autocorrela-

tion between yt and yt1 must be identical

to that between yts and yts1

Obviously, 0 = 1

Stationarity Restrictions for an AR(1) Model

Let

yt = a0 + a1yt1 + t

where t is a white noise.

18

• 7/30/2019 Slide Chapter 2

19/137

Case: y0 known

Suppose the process started in period zero, sothat y0 is a deterministic initial condition. The

solution to this equation is

yt = a0

t1

i=0ai1 + a

t1y0 +

t1

i=0ai1ti. (10)

Taking expected value of (10), we obtain

Eyt = a0

t1i=0

ai1 + at1y0. (11)

Updating by s periods yields

Eyt+s = a0t+s1

i=0

ai1 + at+s1 y0. (12)

Comparing (11) and (12), it is clear that

both means are time-dependent

Since Eyt = Eyt+s, the sequence cannot

be stationary

19

• 7/30/2019 Slide Chapter 2

20/137

However, if t is large, we can consider the

limiting value of yt in (10)

If |a1| < 1, then at1y0 converges to zero

as t becomes infinitely large and the sum

a0[1 + a1 + (a1)2

+ (a1)3

+ . . .] convergesto a0/(1 a1)

Thus, if |a1| < 1, as t , we have

lim yt =a

01 a1+

i=0

ai1ti. (13)

Now take expectations of (13)

Then we have, for sufficiently large values

of t, Eyt = a0/(1 a1), since E(ti) = 0

for all i

20

• 7/30/2019 Slide Chapter 2

21/137

Thus, the mean value of yt is finite and

time-independent:

Eyt = Eyts = a0/(1 a1) for all t.

Turning to the variance, we find

E(yt )2

= E[(t + a1t1 + (a1)2t2 + . . .)

2]

= 2[ 1 + (a1)2 + (a1)

4 + . . .]

= 2/[1 (a1)2]

which is also finite and time-independent

Finally, the limiting values of all autoco-variances, s, s = 0, 1, 2, . . ., are also finiteand time-independent:

s = E[(yt )(yts )]

= E{[t + a1t1 + (a1)2t2 + . . .][ts + a1ts1 + (a1)

2ts2 + . . .]}

= 2(a1)s[ 1 + (a1)

2 + (a1)4 + . . .]

= 2(a1)s/[1 (a1)

2]

(14)

21

• 7/30/2019 Slide Chapter 2

22/137

Case: y0 unknown

Little would change were we not given the ini-tial condition. Without the initial value yo,

the sum of the particular solution and homo-

geneous solution for yt is

yt = a0/(1 a1) +

i=0

ai

1

ti

particular solution

+ A(a1)t

homogeneous solution

(15)

where A= an arbitrary constant = deviation

from long-run equilibrium.

If we take the expectation of (15), it is

clear that the {yt} sequence cannot be sta-

tionary unless the particular solution A(a1)t

is equal to zero

Either the sequence must have started in-

finitely long ago (so that at1 = 0) or the

arbitrary constant A must be zero

22

• 7/30/2019 Slide Chapter 2

23/137

Thus, we have the stability conditions:

The homogeneous solution must be zero.

Either the sequence must have started in-

finitely far in the past or the process must

always be in equilibrium (so that the arbi-

trary constant is zero).

The characteristic root a1 must be less

than unity in absolute value. These two

conditions readily generalize to all ARMA(p,q)

processes. The homogeneous solution to

(5) has the form

pi=1

Aiti

or if there are m repeated roots,

m

i=1

Aiti +

pi=m+1

Aiti

23

• 7/30/2019 Slide Chapter 2

24/137

where the Ai are arbitrary constants, is the

repeated root, and i are the distinct roots.

If any portion of the homogeneous equa-

tion is present, the mean, variance, and al

covariances will be time-dependent

Hence, for any ARMA(p,q) model, station-

arity necessitates that the homogeneous

solution be zero

The next section addresses stationarity restric-

tions for the particular solution.

4. Stationarity Restrictions for an

ARMA(p,q) ModelAs a prelude to the stationarity conditions for

the general ARMA(p,q) model, first consider

the stationarity conditions for an ARMA(2,1)

24

• 7/30/2019 Slide Chapter 2

25/137

model. Since the magnitude of the intercept

term does not affect the stability (or station-

arity) condition, set a0 = 0 and write

yt = a1yt1 + a2yt2 + t + 1t1. (16)

From the previous section, we know that the

homogeneous solution must be zero. So itis only necessary to find the particular solu-

tion. Using the method of undetermined co-

efficients, we can write the challenge solution

as

yt =

i=0

iti. (17)

For (17) to be a solution of (16), the various

i must satisfy

0t + 1t1 + 2t2 + 3t3 + . . .

= a1(0t1 + 1t2 + 2t3 + 3t4 + . . .)

+a2(0t2 + 1t3 + 2t4 + 3t5 + . . .)

+t + 1t1.

25

• 7/30/2019 Slide Chapter 2

26/137

Matching the coefficients of t, t1, t2, ,

yields1. 0 = 1

2. 1 = a10 + 1 1 = a1 + 13. i = a1i1 + a2i2 for all i 2.

The key point is that for i 2, the coeffi-cients must satisfy the difference equation

i = a1i1 + a2i2

If the characteristic roots of (16) are within

the unit circle, then the {i} must consti-tute a convergent sequence

To verify that the {i} sequence generated

by is stationary, take the expectation of

(17) and note that Eyt = Eyti = 0 for all

t and i

Hence, the mean is finite and time-invariant

26

• 7/30/2019 Slide Chapter 2

27/137

Since the {t} sequence is assumed to be white

noise process, the variance ofyt is constant and

time-independent:

V ar(yt)

= E[(0t + 1t1 + 2t2 + 3t3 + . . .)2

]

= 2

i=0

2i

V ar(yts)= E[(0ts + 1ts1 + 2ts2

+3ts3 + . . .)2]

= 2

i=0

2i

Hence, V ar(yt) = V ar(yts) for all t and s

27

• 7/30/2019 Slide Chapter 2

28/137

Finally, note,

Cov(yt, yt1)

= E[(t + 1t1 + 2t2 + 3t3 + . . .)

(t1 + 1t2 + 2t3 + 3t4 + . . .)]

=

2

(1 + 21 + 32 + )

Cov(yt, yt2)

= E[(t + 1t1 + 2t2 + 3t3 + . . .)

(t2 + 1t3 + 2t4 + 3t5 + . . .)]

= 2(2

+ 3

1

+ 4

2

+ . . .)

From the above pattern, it is clear that the

s-th autocovariance, s, is given by

s = Cov(yt, yts)

= 2(s + s+11 + s+22 + . . .)(18)

Thus, the s-th autocovariance, s, is con-

stant and independent of t

28

• 7/30/2019 Slide Chapter 2

29/137

Conversely, if the characteristic roots of

(16) do not lie within the unit circle, the

{i} sequence will not be convergent, and

hence, the {yt} sequence cannot be con-

vergent

Stationarity Restrictions for the

Moving Average Coefficients

Next, we look at the conditions ensuring

the stationarity of a pure MA() process:

xt =

i=0

iti

where t W N(0, 2). We have already

determined that {xt} is not a white noise

process; now the issue is whether {xt} is co-

variance stationary. Given conditions (7),(8), and (9), we ask the following:

29

• 7/30/2019 Slide Chapter 2

30/137

1. Is the mean finite and time-independent?

E(xt) = E(t + 1t1 + 2t2 + . . .)

= Et + 1Et1 + 2Et2 + . . .

= 0

Repeating the calculation with xts, we obtain

E(xts) = E(ts + 1ts1 + 2ts2 + . . .)

= Ets + 1Ets1 + 2Ets2 + . . .

= 0

Hence, all elements in the {xt} sequence havethe same finite mean ( = 0).

2. Is the variance finite and time-independent?

V ar(xt)

= E[(t + 1t1 + 2t2 + . . .)2

]= E(t)

2 + (1)2E(t1)

2 + (2)2E(t2)

2 + . . .

[since Etts = 0 for s = 0]

= 2[ 1 + (1)2 + (2)

2 + . . .]

30

• 7/30/2019 Slide Chapter 2

31/137

Therefore, a necessary condition for V ar(xt)

to be finite is that i=0(i)2 be finite.

Repeating the calculation with xts yields

V ar(xts)

= E[(ts + 1ts1 + 2ts2 + . . .)2]

= E(ts)2 + (1)

2E(ts1)2

+(2)2E(ts2)

2 + . . .

[since Etstsi = 0 for i = 0]

= 2[ 1 + (1)2 + (2)

2 + . . .]

Thus, if

i=0(i)2 is finite, then V ar(xt) =

V ar(xts) for all t and t s, and hence, all

elements in the {xt} sequence have the same

finite variance.

3. Are all autocovariances finite and time-

independent?

31

• 7/30/2019 Slide Chapter 2

32/137

The s-th autocovariance, s, is given by

s = Cov(xt, xts)

= E(xtxts)

= E(t + 1t1 + 2t2 + . . .)

(ts + 1ts1 + 2ts2 + . . .)

= 2

(s + s+11 + s+22 + . . .)Therefore, for s to be finite, the sum

s + s+11 + s+22 + . . ., must be finite.

In summary, the necessary and sufficient con-ditions for an MA() process to be stationary

are that the sums

(i) 20 + 21 +

22 + . . ., and

(ii) s + s+11 + s+22 + . . .

be finite.

However, since (ii) must hold for all values of

s 0, and 0 = 1, condition (i) is redundant.

32

• 7/30/2019 Slide Chapter 2

33/137

Stationarity Restrictions for the

Moving Average CoefficientsNow consider the pure autoregressive model oforder p:

yt = a0 +p

i=1

aiyti + t. (19)

If the characteristic roots of the homogeneousequation of (19) all lie inside the unit circle,we can write the particular solution as

yt =a0

1

pi=1 ai

+ +

i=0

iti (20)

where 0 = 1 a n d {i, i 1} are undeter-mined coefficients. We know that (20) is aconvergent sequence so long as the character-istic roots of (19) are inside the unit circle. Wealso know that the sequence {i} will solve thedifference equation

i a1i1 a2i2 . . . apip = 0. (21)

If the characteristic roots of (21) are all in-side the unit circle, the {i} sequence will beconvergent.

33

• 7/30/2019 Slide Chapter 2

34/137

Although (20) is an infinite-order moving av-erage process, the convergence of the MA co-

efficients implies that

i=0 2i is finite. Thus,

we can use (20) to check the three condiions

of stationarity.

Eyt = Eyts =a

01

pi=1 ai

A necessary condition for all characteristic roots

to lie inside the unit circle is 1 p

i=1 ai > 0.

Hence, the mean of the sequence is finite and

time-invariant.

V ar(yt)

= E[(t + 1t1 + 2t2 + . . .)2]

= E(t)2 + (1)

2E(t1)2 + (2)

2E(t2)2 + . . .

= 2[ 1 + (1)2 + (2)2 + . . .]

= 2

i=0

2i

34

• 7/30/2019 Slide Chapter 2

35/137

Similarly,

V ar(yts)

= E[(ts + 1ts1 + 2ts2 + . . .)2]

= E(ts)2 + (1)

2E(ts1)2 +

(2)2E(ts2)

2 + . . .

= 2[ 1 + (1)2 + (2)

2 + . . .]

= 2

i=0

2i

Thus, if

i=0(i)2 is finite, then V ar(yt) =

V ar(yts) for all t and t s, and hence, allelements in the {yt} sequence have the samefinite variance.

Finally, let us look at the s-th autocovariance,s, which is given by

s = Cov(yt, yts)

= E(ytyts)= E(t + 1t1 + 2t2 + . . .)

(ts + 1ts1 + 2ts2 + . . .)

= 2(s + s+11 + s+22 + . . .)

35

• 7/30/2019 Slide Chapter 2

36/137

Therefore, for s to be finite, the sum

s + s+11 + s+22 + . . ., must be finite.

Nothing of substance is changed by combining

the AR(p) and MA(q) models into the general

ARMA(p,q) model:

yt = a0 +p

i=1

aiyti + xt

xt =q

i=0

iti. (22)

If the roots of the inverse characteristic equa-tion lie outside the unit circle[that is, if the

roots of the homogeneous form of (22) lie in-

side the unit circle] and if the {xt} sequence is

stationary, the {yt} sequence will be stationary.

Consider

yt =a0

1 p

i=1 ai+

t

1 p

i=1 aiLi

+1t1

1p

i=1 aiLi +

2t21

pi=1 aiL

i + . . . (23)

36

• 7/30/2019 Slide Chapter 2

37/137

Each of the expressions on the right-hand

side of (23) is stationary as long as theroots of 1

pi=1 aiL

i are outside the unitcircle

Given that {xt} is stationary, only the rootsof the autoregressive portion of (22) deter-mine whether the {yt} sequence is station-ary

5. The Autocorrelation FunctionThe autocovariances and autocorrelations of

the type found in (18) serve as useful toolsin the Box-Jenkins approach to identifying andestimating time series models. Illustrated be-low are four important examples: the AR(1),AR(2), MA(1), and ARMA(1,1) models.

The Autocorrelation Function of an AR(1)ProcessFor an AR(1) model, yt = a0 +a1yt1 +t, (14)shows

37

• 7/30/2019 Slide Chapter 2

38/137

0 =2

[1 (a1)2]

s =2(a1)

s

[1 (a1)2].

Now dividing s by 0, gives autocorrelationfunction (ACF) at lag s: s =s0

. Thus, we

find that,

0 = 1, 1 = a1, 2 = (a1)2, . . . , s = (a1)

s.

A necessary condition for an AR(1) process

to be stationary is that |a1| < 1

Thus, the plot of s against s - called the

correlogram - should converge to zero ge-

ometrically if the series is stationary

38

• 7/30/2019 Slide Chapter 2

39/137

If a1 is positive, convergence will be direct,

and if a1 is negative, the correlogram willfollow a damped oscillatory path around

zero

The first two graphs on the left-hand side

of Figure 2.2 show the theoretical auto-correlation function for a1 = 0.7 and a1 =

0.7, respectively

In these diagrams 0 is not shown since its

value is necessarily equal to one

The Autocorrelation Function of an AR(2)

Process

We now consider AR(2) process yt = a1yt1 +

a2yt2+t (with a0 omitted since this intercept

term has no effect on the ACF. For the AR(2)

to be stationary, we know that it is necessary

to restrict the roots of the second-order lag

39

• 7/30/2019 Slide Chapter 2

40/137

polynomial (1 a1L a2L2) to be outside the

unit circle. In section 4, we derived the auto-covariances of an ARMA(2,1) process by useof the method of undetermined coefficients.Now we use an alternative technique known asYule-Walker equations. Multiply the second-order difference equation by yt, yt1, yt2, . . . , yts

and take expectations. This yields

Eytyt = a1Eyt1yt + a2Eyt2yt + EtytEytyt1 = a1Eyt1yt1 + a2Eyt2yt1 + Etyt1Eytyt2 = a1Eyt1yt2 + a2Eyt2yt2 + Etyt2

...

Eytyts = a1Eyt1yts + a2Eyt2yts + Etyts(24)

By definition, the autocovariances of a station-ary series are such that Eytyts = Eytsyt =Eytkytks = s. We also know that Etyt =2 and Etyts = 0. Hence, we can use equa-

tions (24) to formo = a11 + a22 +

2 (25)

1 = a1o + a21 (26)

s = a1s1 + a2s2 (27)

40

• 7/30/2019 Slide Chapter 2

41/137

Dividing (26) and (27) by 0 yields

1 = a1o + a21 (28)

s = a1s1 + a2s2 (29)

We know that 0 = 1. So, from (28), we have

1 = a1/(1 a2). Hence, we can find all s for

s 2 by solving the difference equation (29).

For example, for s = 2, and s = 3,

2 = (a1)2/(1 a2) + a2

3 = a1[(a1)2/(1 a2) + a2] + a2a1/(1 a2)

Given the solutions for 0 and 1, the keypoint to note is that the s all satisfy the

difference equation (29)

The solution may be oscillatory or direct

Note that the stationarity condition for ytnecessitates that the characteristic roots of

(29) lie inside the unit circle

41

• 7/30/2019 Slide Chapter 2

42/137

Hence, the {s} sequence must be conver-

gent

The correlogram for an AR(2) process must

be such that 0 = 1 and 1 be determined

by (28)

These two values can be viewed as the ini-

tial values for the second-order difference

equation (29)

The fourth panel on the left-hand side of

Figure 2.2 shows the ACF for the process

yt = 0.7yt1 0.49yt2 + t.

The properties of the various s follow di-

rectly from the homogeneous equation yt

0.7yt1 0.49yt2 = 0

42

• 7/30/2019 Slide Chapter 2

43/137

The roots are obtained as

= {0.7 [(0.7)2 4(0.49)]1/2}/2

Since the discriminantd = (0.7)2 4(0.49)is negative, the characteristic roots are imag-inary. So the solution oscillates

However, since a2 = 0.49, the solution isconvergent and the {yt} is stationary

Finally, we may wish to find the autoco-variances, s. Since we know all the auto-correlations, if we can find the variance ofyt, that is, 0, we can find all of the others.

Since i = i/0, from (25) we have0 = a1(10) + a2(20) + 2

0(1 a11 a22) = 2

0 =2

(1a11a22)

43

• 7/30/2019 Slide Chapter 2

44/137

Substituting for 1 and 2 yields0 = V ar(yt)

=

(1 a2)

(1 + a2)

2

(a1 + a2 1)(a2 a1 1)

.

The Autocorrelation Function of an MA(1)

Process

Next consider the MA(1) process: yt = t +

t1. Again, we can obtain the Yule-Walker

equations by multiplying yt by each yts, s =

0, 1, 2, . . . and take expectations. This yields

0 = V ar(yt) = Eytyt= E[(t + t1)(t + t1)] = (1 +

2)2

1 = Eytyt1= E[(t + t1)(t1 + t2)] =

2

...

s = Eytyts= E[(t + t1)(ts + ts1)] = 0 s > 1

44

• 7/30/2019 Slide Chapter 2

45/137

Dividing each s by 0, it can be seen that

the ACF is simply

0 = 1,

1 = (1 + 2), and

s

= 0 s > 1.

The third graph on the left-hand side of

Figure 2.2 shows the ACF for the MA(1)

process: yt = t 0.7t1

You saw above that for an MA(1) process,

s = 0 s > 1.

As an easy exercise, convince yourself that,

for an MA(2) process, s = 0 s > 2,

for an MA(3) process, s = 0 s > 3,

and so on.

45

• 7/30/2019 Slide Chapter 2

46/137

The Autocorrelation Function of an

ARMA(1,1) Process

Finally, consider the ARMA(1,1) process:

yt = a1yt1+t+1t1. Using the now-familiar

procedure, the Yule-Walker equations are:

Eytyt = a1Eyt1yt + Etyt + 1Et1yt

0 = a11 + 2 + 1(a1 + 1)

2 (30)

Eytyt1 = a1Eyt1yt1 + Etyt1 + 1Et1yt1

1 = a10 + 12 (31)

Eytyt2 = a1Eyt1yt2 + Etyt2 + 1Et1yt2

2 = a11 (32)...

Eytyts = a1Eyt1yts + Etyts + 1Et1yts

s = a1s1. (33)

Solving (30) and (31) simultaneously for 0and 1 yields

46

• 7/30/2019 Slide Chapter 2

47/137

0 =1 + 21 + 2a11

(1 a21)2, and

1 =(1 + a11)(a1 + 1)

(1 a21)2.

Hence,

1 =(1 + a11)(a1 + 1)

1 + 21 + 2a11(34)

and s = a1s1 for all s 2.

Thus, the ACF for an ARMA(1,1) process is

such that the magnitude of 1 depends on both

a1 and 1. Beginning with this value of 1,

the ACF of an ARMA(1,1) process looks like

that of the AR(1) process. If 0 < a1 < 1,

convergence will be direct, and if 1 < a1 < 0,

the autocorrelations will oscillate. The ACFfor the function yt = 0.7yt1 + t 0.7t1 is

shown as the last graph on the left-hand side

of Figure 2.2.

47

• 7/30/2019 Slide Chapter 2

48/137

From the above you should be able to able

to recognize that the correlogram can re-

veal the pattern of the autoregressive co-

efficients

For an ARMA(p,q) model beginning after

lag q, the values of i will satisfy

i = a1i1 + a2i2 + . . . + apip.

6. The Partial Autocorrelation Function

In an AR(1) process, yt

and yt2

are cor-

related even though yt2 does not directly

appear in the model

48

• 7/30/2019 Slide Chapter 2

49/137

• 7/30/2019 Slide Chapter 2

50/137

The most direct way to find the partial au-

tocorrelation function is to first form theseries yt by subtracting the mean of theseries (i.e., ) from each observation toobtain yt yt

Next, form the first-order autoregression

yt = 11yt1 + et

where et is the regression error term whichneed not be a white noise process

Since there is no intervening values, 11is both the autocorrelation and the partialautocorrelation between yt and yt1

Now form the second-order autoregression

y

t=

21y

t1+

22y

t2+ et

Here 22 is the partial autocorrelation co-efficient between yt and yt2

50

• 7/30/2019 Slide Chapter 2

51/137

In other words, 22 is the correlation be-

tween between yt and yt2 controlling for(i.e., netting out) the effect of yt1

Repeating the process for all additional lags

s yields the partial autocorrelation function

(PACF)

Using Yule-Walker equations, one can form

the partial autocorrelations from the auto-

correlations as

11 = 1 (35)

22 =(2

21)

(1 21)(36)

ss =

s s1j=1 s1,jsj

1 s1

j=1 s1,jj(37)

where sj = s1,j sss1,sj,

j = 1, 2, 3, . . . , s 1.

51

• 7/30/2019 Slide Chapter 2

52/137

For an AR(p) process, there is no direct

correlation between yt and yts for s > p

Hence, for s > p, all values of ss will be

zero and the PACF for pure AR(p) should

cut off to zero for all lags greater than p

In contrast,The PACF of an MA(1) pro-

cess: yt = t + t1

As long as = 1, we can write yt/(1 +L) = t, which we know has the AR()

representation

yt yt1 2yt2

3yt3 + . . . = t.

since yt will be correlated with all of its own

lags

52

• 7/30/2019 Slide Chapter 2

53/137

Instead, the PACF coefficients exhibit a ge-

ometrically decaying pattern

If < 0, decay is direct and if > 0, the

PACF coefficients oscillate

The right-hand side of the fifth panel in

Figure 2.2 shows the PACF for the ARMA(1,1)

model:

yt = 0.7yt1 + t 0.7t1

More generally, the PACF of a stationary

ARMA(p, q) process must ultimately decay

toward zero beginning at lag p

The decay pattern depends on the coef-

ficients of the lag polynomial (1 + 1L +

2L2 + . . . + qLq)

53

• 7/30/2019 Slide Chapter 2

54/137

Table 2.1 summarizes some of the properties

of the ACF and PACF for various ARMA pro-cesses. For stationary processes, the key points

to note are the following:

The ACF of an ARMA(p, q) process will

begin to decay after lag q. After lag q,the coefficients of the ACF (ie., the i)

will satisfy the difference equation (i =

a1i1 + a2i2 + . . . + apip. Since the

characteristic roots are inside the unit cir-

cle, the autocorrelations will decay after

lag q. Moreover, the pattern of the auto-correlation coefficients will mimic that sug-

gested by the characteristic roots.

The PACF of an ARMA(p, q) process will

begin to decay after lag p. After lag p, thecoefficients of the PACF (ie., the ss) will

mimic the ACF coefficients from the model

yt/(1 + 1L + 2L2 + . . . + qLq).

54

• 7/30/2019 Slide Chapter 2

55/137

We can illustrate the usefulness of the ACFand PACF functions using the model yt = a0 +

0.7yt1 + t. If we compare the top two graphs

in Figure 2.2, the ACF shows the monotonic

decay of the autocorrelations while the PACF

exhibits the single spike at lag 1. Suppose aresearcher collected sample data and plotted

the ACF and PACF functions. If the actual

patterns compared favorably to the theoreti-

cal patterns, the researcher might try to fit an

AR(1) model. Conversely, if the ACF exhibited

a single spike and the PACF exhibited mono-

tonic decay the researcher might try an MA(1)

model.

7. Sample Autocorrelations of Stationary

Time series

Let there be T observations y1, y2, . . . , yT. If

the data series is stationary, we can use the

sample mean y, sample variance 2, and

55

• 7/30/2019 Slide Chapter 2

56/137

sample autocorrelations rs, as estimates of the

population mean , population variance 2,

and population autocorrelations s, respectively,

where

y = (1/T)

Tt=1

yt (38)

2 = (1/T)T

t=1

(yt y)2 (39)

and for s = 1, 2, . . . ,

rs =

Tt=s+1(yt y)(yts y)T

t=1(yt y)2

. (40)

The sample ACF and PACF can be compared

to theoretical ACF and PACF to identify the

actual data generating process. If the truevalue ofs = 0, that is, the true data-generating

process is MA(s-1), the sampling variance of

56

• 7/30/2019 Slide Chapter 2

57/137

rs is given by

V ar(rs) = T1 for s = 1

= T1( 1 + 2s1

j=1

r2j ) for s > 1(41)

It T is large, rs is distributed normally with

mean zero. For the PACF coefficients, underthe null hypothesis of an AR(p) model, that

is, under the null that all p+i,p+i are zero, the

variance of p+i,p+i is approximately T1.

We can test for significance of sample ACF

and sample PACF using (41). For example, if

we use a 95% confidence interval, (i.e., 2 stan-

dard deviations), and the calculated value of r1exceeds 2T1/2, it is possible to reject the null

hypothesis that the first-order autocorrelation

is not statistically significantly different fromzero. Rejecting this hypothesis means reject-

ing an MA(s 1) = MA(0) process and ac-

cepting the alternative q > 0. Next, try s = 2.

57

• 7/30/2019 Slide Chapter 2

58/137

Then Var(r2) = (1 + 2r2

1)/T. If r1 = 0.5 andT = 100, then Var(r2) = 0.015 and SD(r2)

= 0.123. Thus, if the calculated value of r2exceeds 2(0.123), it is possible to reject the

null hypothesis, H0 : 2 = 0. Again, rejecting

the null means accepting the alternative that

q > 1. Proceeding in this way it is possible to

identify the order of the process.

Box and Pierce (1970) developed the Q-

statistic to test whether a group of auto-correlations is significantly different from

zero. Under the null hypothesis H0 : 1 =

2 = . . . = s = 0, the statistic

Q = Ts

k=1r2k

is asymptotically distributed as a 2 with s

degrees of freedom

58

• 7/30/2019 Slide Chapter 2

59/137

The intuition behind the use of this statis-

tic is that large sample autocorrelations will

lead to large values of Q, while a white

noise process (in which autocorrelations at

all lags should be zero) would have a Q

value of zero

Thus if the calculated value of Q exceeds

the appropriate value in a 2 table, we can

reject the null of no significant autocorre-

lations

Rejecting the null means accepting an al-

ternative that at least one autocorrelation

is non-zero

A problem with the Box-Pierce Q-statistic:

it works poorly even in moderately large

samples

59

• 7/30/2019 Slide Chapter 2

60/137

Remedy: Modified Q-statistic of Ljung and

Box (1978):

Q = T(T + 2)s

k=1r2k /(T k) (42)

If the sample value of Q from (42) exceeds

the critical value of 2 with s degrees of

freedom, then at least one value of rk is

statistically significantly different from zero

at the specified significance level

The Box-Pierce and Ljung-Box Q-statistics

also serve as a check to see if the residu-

als from an estimated ARMA(p, q) modelbehave as a white noise process

60

• 7/30/2019 Slide Chapter 2

61/137

However, when the

sautocorrelations from

an estimated ARMA(p, q) model are formed,

the degrees of freedom are reduced by the

number of estimated coefficients

Hence, using the residuals of an ARMA(p, q)model, Q has a 2 distribution with spq

degrees of freedom (if a constant is in-

cluded in the estimation, the degrees of

freedom are s p q 1)

Model Selection Criteria

A natural question to ask of any estimated

model is: How well does it fit the data?

The larger the lag orders p and/or q, the

smaller is the sum of squares of the esti-

mated residuals of the fitted model

61

• 7/30/2019 Slide Chapter 2

62/137

However, adding such lags entails estima-

tion of additional coefficients and an asso-ciated loss of degrees of freedom

Moreover, inclusion of extraneous coeffi-cients will reduce the forecasting perfor-

mance of the fitted model

Thus, increasing the lag lengths p and/orq, involves both benefits and costs

If we choose a lag order that is lower thannecessary, we will omit valuable informa-tion contained in the more distant lags, andthus, will underfit the model

If we choose a lag order that is higher than

necessary, we will overfit the model andestimate extraneous coefficients and injectadditional estimation error into our fore-casts

62

• 7/30/2019 Slide Chapter 2

63/137

Model selection criteria attempt to choose

the most parsimonious model by select-ing the lag orders p and/or q by balancingthe benefit of reduced sum of squares ofestimated residuals due to additional lagsagainst the cost of additional estimationerror

Two most commonly used model selec-tion criteria are Akaike Information Crite-rion (AIC) and Schwartz Bayesian Criterion(SBC).

AIC = T ln (SSR) + 2n

SBC = T ln (SSR) + n ln(T)

where n = number of parameters estimated(p + q+possible constant term),

T = number of observations,SSR = sum of squared residuals.

63

• 7/30/2019 Slide Chapter 2

64/137

Estimation of an AR(1) Model

Beginning with t = 1, 100 values of {yt} aregenerated using the AR(1) process: yt = 0.7yt1+

t, with the initial condition y0 = 0. The up-

per left graph of Figure 2.3 shows the sample

ACF and the upper right graph shows the sam-

ple PACF of this AR(1) process. It is impor-tant that you compare these ACF and PACF

to those of the theoretical processes shown in

Figure 2.2.

In practice, we never know the true data gen-

erating process. However, suppose we were

presented with those 100 sample values and

were asked to uncover the true process.

The first step might be to compare the sample

ACF and PACF to those of the various theo-retical models. The decaying pattern of the

ACF and the single large spike at lag 1 in the

sample PACF suggests an AR(1) model. The

64

• 7/30/2019 Slide Chapter 2

65/137

first three sample autocorrelations are r1 =

0.74, r2 = 0.58, and r3 = 0.47 (which are some-

what greater than the corresponding theoreti-

cal autocorrelations of 0.7, 0.49, and 0.343.

In the PACF, there is a sizeable spike of 0.74

at lag 1, and all other autocorrelations (exceptfor lag 12) are very small.

Under the null hypothesis of an MA(0) pro-

cess, the standard deviation ofr1 is T1/2 =

0.1. Since the sample value of r1 = 0.74 ismore than seven standard deviations from

zero, we can reject the null hypothesis H0 :

1 = 0

The standard deviation of r2 is obtainedfrom (41) by taking s = 2:

V ar(r2) = (1 + 2(0.74)2)/100 = 0.021.

65

• 7/30/2019 Slide Chapter 2

66/137

Since (0.021)1/2 = 0.1449, the sample value

of r2 is more than 3 standard deviationsfrom zero; at conventional significance lev-

els, we can reject the null hypothesis H0 :

2 = 0

Similarly, we can test for the significance ofall other values of sample autocorrelations

It can be seen in the second panel of Figure

2.3, other than 11, all partial autocorrelations

(except for lag 12) are less than 2T1/2 = 0.2.The decay of the ACF and the single spike

of the PACF give strong indication of AR(1)

model. Nevertheless, if we did not know the

true underlying process, and happened to be

using monthly data, we might be concerned

with the significant partial autocorrelation atlag 12. After all, with monthly data we might

expect some direct relationship between yt and

yt12.

66

• 7/30/2019 Slide Chapter 2

67/137

Although we know that the data were actually

generated from an AR(1) process, it is illumi-nating to compare the estimates of two differ-

ent models. Suppose we estimate an AR(1)

model and also try to capture the spike at lag

12 with an MA coefficient. Thus, we can con-

sider the two tentative models as

Model 1: yt = a1yt1 + t,

Model 2: yt = a1yt1 + t + 12t12.

Table 2.2 reports the results of the two esti-

mations. The coefficient of Model 1 satisfies

the stability condition |a1| < 1 and has a low

standard error (the associated t-statistic for anull of zero is more than 12). As a useful di-

agnostic check, we plot the correlogram of the

residuals of the fitted model in Figure 2.4.

The Ljung-Box Q-statistics for these residuals

indicate that each one of the autocorrelations

is less than 2 standard deviations from zero.

The Q-statistics indicate that as a group, lags

1 through 8, 1 through 16, and 1 through 24

are not significantly different from zero.

67

• 7/30/2019 Slide Chapter 2

68/137

This is strong evidence that the AR(1) model

fits the data well. If the residual autocorrela-tions were significant, the AR(1) model would

not utilize all available information concern-

ing movements in the yt sequence. For exam-

ple, suppose we wanted to forecast yt+1 con-

ditional on all available information up to and

including period t. With Model 1, the value ofyt+1 is yt+1 = a1yt + t+1. Hence, the forecast

from Model 1 is:

Etyt+1 = Et(a1yt + t+1)

= Et(a1yt) + Et(t+1)

= a1yt.

If the residual autocorrelation had been signifi-

cant, this forecast would not capture all of the

available information set.

Examining the results for Model 2, note thatboth models yield similar estimates for the first-

order autoregressive coefficient and the asso-

ciated standard error. However, the estimate

for

68

• 7/30/2019 Slide Chapter 2

69/137

12 is of poor quality; the insignificant t-value

suggests that it should be dropped from the

model. Moreover, comparing the AIC and the

SBC values of the two models suggests that

any benefits of a reduced sum of squared resid-

uals is overwhelmed by the detrimental effects

of estimating an additional parameter. All ofthese indicators point to the choice of Model

1.

Estimation of an ARMA(1,1) Model

See ARMA(1,1) & Table 2.3 under Figures& Tables in Chapter 2.

Estimation of an AR(2) Model

See AR(2) under Figures & Tables in Chap-

ter 2.

8. Box-Jenkins Model Selection

The estimates of the AR(1), ARMA(1,1) and

AR(2) models in the previous section illustrate

69

• 7/30/2019 Slide Chapter 2

70/137

the Box-Jenkins (9176) strategy for appropri-

ate model selection. Box and Jenkins popular-

ized a three-stage method aimed at selecting

an appropriate model for the purpose of esti-

mating and forecasting a univariate time series.

In the identification stage, the researcher vi-

sually examines the time plot of the series, theautocorrelation function, and the partial auto-

correlation function. Plotting the time path

of the {yt} sequence provides useful informa-

tion concerning outliers, missing values, and

structural breaks in the data. Nonstationary

variables may have a pronounced trend or ap-

pear to meander without a constant long-run

mean or variance. Missing values and outliers

can be corrected at this point. Earlier, a stan-

dard practice was to first-difference any series

deemed to be nonstationary. Currently, a largeliterature is evolving that develops formal pro-

cedures to check for nonstationarity. We defer

this discussion until Chapter 4 and

70

• 7/30/2019 Slide Chapter 2

71/137

assume that we are working with stationary

data. A comparison of the sample ACF andsample PACF to those of various theoretical

ARMA processes may suggest several plausi-

ble models. In the estimation stage each of

the tentative models is fit and the various ai

and i coefficients are examined. In this sec-ond stage, the estimated models are compared

using the following criteria.

Parsimony

A fundamental idea in the Box-Jenkins approach

is the principle of parsimony. Incorporatingadditional coefficients will necessarily increase

fit (e.g., the value of R2 will increase) at a

cost of reducing degrees of freedom. Box and

Jenkins argue that parsimonious models pro-

duce better forecasts than overparameterized

models. A parsimonious model fits the data

well without incorporating any needless coef-

ficients. The aim is to approximate the true

data generating process but not to pin down

71

• 7/30/2019 Slide Chapter 2

72/137

the exact process. The goal of parsimony sug-gested eliminating the MA(12) coefficient in

the simulated AR(1) model shown earlier.

In selecting an appropriate model, the econo-

metrician needs to be aware that several differ-

ent models may have similar properties. As an

extreme example, note that the AR(1) model

yt = 0.5yt1 + t

has the equivalent infinite-order moving-average

representation of

yt = t + 0.5t1 + 0.25t2 + 0.125t3

+0.0625t4 + . . . .

In most samples, approximating this MA()

process with an MA(2) or MA(3) model willgive a very good fit. However, the AR(1)

model is the more parsimonious model and is

preferred.

72

• 7/30/2019 Slide Chapter 2

73/137

One also needs to be aware of the common

factor problem. Suppose we wanted to fit theARMA(2,3) model

(1 a1L a2L2)yt = (1 + 1L + 2L

2 + 3L3)t.

(43)

Suppose that (1 a1L a2L2) and (1 + 1L +

2L2 + 3L3) can each be factored as (1 +cL)(1 + aL) and (1 + cL)(1 + b1L + b2L

2), re-

spectively. Since (1 + cL) is a common factor

to each, (43) has the equivalent, but more par-

simonious, form

(1 + aL)yt = (1 + b1L + b2L2

)t. (44)In order to ensure that the model is parsimo-

nious, the various ai and i should all have t-

statistics of 2.0 or greater (so that each coeffi-

cient is significantly different from zero at the

5% level). Moreover, the coefficients should

not be strongly correlated with each other.

Highly collinear coefficients are unstable, usu-

ally one or more can be eliminated from the

model without reducing forecasting performance.

73

• 7/30/2019 Slide Chapter 2

74/137

Stationarity and Invertibility

The distribution theory underlying the use of

the sample ACF and PACF as approximations

to those of the true data generating process

is based on the assumption of stationarity of

the yt sequence. Moreover, t-statistics and Q-

statistics also presume that the data are sta-tionary. The estimated autoregressive coeffi-

cients should be consistent with this underlying

assumption. Hence, we should be suspicious of

an AR(1) model if the estimated value of a1

is close to unity. For an ARMA(2,q) model,the characteristic roots of the estimated poly-

nomial (1 a1L a2L2) should be outside the

unit circle.

The Box-Jenkins methodology also necessitates

that the model be invertible. Formally, yt is

invertible if it can be represented by a finite-

order or convergent autoregressive process. In-

vertibility is important because the use of the

74

• 7/30/2019 Slide Chapter 2

75/137

ACF and PACF implicitly assumes that the {yt}sequence can be represented by an autoregres-

sive model. As a demonstration, consider the

simple MA(1) model

yt = t 1t1 (45)

so that if |1| < 1,

yt/(1 1L) = t

or

yt + 1yt

1 + 2

1

yt

2 + 3

1

yt

3 + . . . = t. (46)

If |1| < 1, (46) can be estimated using the

Box-Jenkins method. However, if |1| 1,

the {yt} sequence cannot be represented by a

finite-order AR process, and thus, it is not in-

vertible. More generally, for an ARMA model

to have a convergent AR representation, the

roots of the polynomial (1+ 1L + 2L2 + . . . +

qLq) must lie outside the unit circle.

75

• 7/30/2019 Slide Chapter 2

76/137

We note that there is noting improper about a

noninvertible model. The {yt} sequence im-plied by yt = t t1 is stationary in thatit has a constant time-invariant mean [Eyt =Eyts = 0], a constant time-invariant variance[V ar(yt) = V ar(yts) =

2(1 + 21) + 22], and

the autocovariances 1 = 12 and all other

s = 0. The problem is that the technique doesnot allow for the estimation of such models. If

1 = 1, (46) becomes

yt + yt1 + yt2 + yt3 + yt4 + . . . = t.

Clearly, the autocorrelations and partial auto-

correlations between yt and yts will never de-cay.

Goodness of Fit

R2 and the average of the residual sum of

squares are common measures of goodness offit in ordinary least squares.

AIC and SBC are more appropriate measures

of fit in time series models.

76

• 7/30/2019 Slide Chapter 2

77/137

Caution must be exercised if estimates fail to

converge rapidly. Failure of rapid convergencemight be indicative of estimates being unsta-

tional observation or two can greatly alter the

estimates.

The third stage of the Box-Jenkins methodol-ogy involves diagnostic checking. The stan-

dard practice is to plot the residuals to look

for outliers and for evidence of periods in which

the model does not fit the data well. If all plau-

sible ARMA models show evidence of a poor fit

during a reasonably long portion of the sample,

it is wise to consider using intervention analy-

sis, transfer function analysis, or any other of

the multivariate estimation methods, discussed

in later chapters. If the variance of the residual

is increasing, a logarithmic transformation may

be appropriate. Alternatively, we may wish to

actually model any tendency of the variance to

change using the ARCH techniques discussed

in Chapter 3.

77

• 7/30/2019 Slide Chapter 2

78/137

It is particularly important that the residuals

from an estimated model be serially uncorre-

lated. Any evidence of serial correlation implies

a systematic movement in the {yt} sequence

that is not accounted for by the ARMA coeffi-

cients included in the model. Hence, any of the

tentative models yielding nonrandom residualsshould be eliminated from consideration. To

check for correlation in the residuals, construct

the ACF and the PACF of the residuals of the

estimated model. Then use (41) and (42) to

determine whether any or all of the residual

autocorrelations or partial autocorrelations are

statistically significant. Although there is no

significance level that is deemed most appro-

priate, be wary of any model yielding

(1) several residual correlations that are marginally

significant, and(2) a Q-statistic that is barely significant at

10% level.

In such circumstances,it is usually possible to

78

• 7/30/2019 Slide Chapter 2

79/137

formulate a better performing model. If there

are sufficient observations, fitting the same ARMA

model to each of two subsamples can provide

useful information concerning the validity of

the assumption that the data generating pro-

cess is unchanging. In the AR(2) model that

was estimated in the last section, the sam-

ple was split in half. In general, suppose you

estimated an ARMA(p, q) model using a sam-

ple of T observations. Denote the sum of the

squared residuals as SSR. Now divide the T ob-servations with tm observations in the first and

tn = T tm observations in the second. Use

each subsample to estimate the two models

yt = a0(1) + a1(1)yt1 + . . . + ap(1)ytp

+t+1(1)t1+. . .+q(1)tq [using t1, . . . , tm]yt = a0(2) + a1(2)yt1 + . . . + ap(2)ytp+t+1(2)t1+. . .+q(2)tq [using tm+1, . . . , tT].

79

• 7/30/2019 Slide Chapter 2

80/137

Let the sum of the squared residuals from the

two models be, respectively, SSR1 and SSR2.To test the restriction that all coefficients are

equal, [i.e., a0(1) = a0(2) and a1(1) = a1(2)

and . . . ap(1) = ap(2) and 1(1) = 1(2) and

. . . q(1) = q(2)], conduct an F-test using

F = (SSR SSR1 SSR2)/n(SSR1 + SSR2)/(T 2n)

(47)

where n = number of parameters estimated

= p+q +1 (if an intercept is included)

= p + q (if no intercept is included)

and the numbers of degrees of freedom are(n, T 2n).

Intuitively, if the coefficients are equal, that

is, if the restriction is not binding, then the

sum of squared residuals SSR from the re-

stricted model and the sum of squared residu-

als (SSR1+SSR2) from the unrestricted model

should be equal. Hence, F should be zero.

Conversely, if restriction is binding, SSR should

80

• 7/30/2019 Slide Chapter 2

81/137

exceed (SSR1+SSR2). And, the larger the dif-

ference between SSR and (SSR1 + SSR2), and

thus, the larger the calculated value of F, the

larger is the evidence against the hypothesis

that the coefficients are equal.

Similarly, a model can be estimated over only aportion of the data set. The estimated model

can then be used to forecast the known values

of the series. The sum of the squared forecast

errors is a useful way to compare the adequacy

of alternative models. Those models with poor

out-of-sample forecasts should be eliminated.

9. Properties of Forecasts

One of the most important uses of ARMA

models is to forecast future values of the {yt}

sequence. To simplify the following discussion,it is assumed that the actual data generating

process and the current and past realizations

of {yt} and {t} sequences are known to the

81

• 7/30/2019 Slide Chapter 2

82/137

researcher. First consider the forecasts of anAR(1)model: yt = a0 + a1yt1 + t. Updating

one period, we obtain: yt+1 = a0 + a1yt + t+1.

If we know the coefficients a0 and a1, we can

forecast yt+1 conditioned on the information

available at period t as

Etyt+1 = a0 + a1yt (48)

where the notation Etyt+j stands for the con-

ditional expectation of yt+j given the informa-

tion available at period t. Formally,

Etyt+j = E(yt+j|yt, yt1, yt2, . . . , t, t1, . . .).

In the same way, since yt+2 = a0 + a1yt+1 +

t+2, the forecast of yt+2 conditioned on the

information available at period t is

Etyt+2 = a0 + a1Etyt+1

and using (48)

Etyt+2 = a0 + a1(a0 + a1yt).

82

• 7/30/2019 Slide Chapter 2

83/137

Thus forecast of yt+1 can be used to forecast

yt+2. In other words, forecasts can be con-

structed using forward iteration; the forecastof yt+j can be used to forecast yt+j+1. Since

yt+j+1 = a0 + a1yt+j + t+j+1, it follows that

Etyt+j+1 = a0 + a1Etyt+j. (49)

From (48) and (49) it should be clear thatit is possible to obtain the entire sequence

of j-step-ahead forecasts by forward iteration.

Consider

Etyt+j = a0(1 + a1 + a21 + . . . + a

j11 ) + a

j1yt.

83

• 7/30/2019 Slide Chapter 2

84/137

This equation, called the forecast function,

expresses all of the j-step-ahead forecasts asfunctions of the information set in period t.

Unfortunately, the quality of the forecasts de-

clines as we forecast further out into the fu-

ture. Think of (49) as a first-order differ-

ence equation in the {Ety

t+j} sequence. Since

|a1| < 1, the difference equation is stable, and

it is straightforward to find the particular so-

lution to the difference equation. If we take

the limit of Etyt+j as j , we find that

Etyt+j a0/(1 a1). This result is quite gen-

eral. For any stationary ARMA model, theconditional forecast of yt+j converges to the

unconditional mean as j .

Because the forecasts from an ARMA model

will not be perfectly accurate, it is important to

consider the properties of the forecast errors.Forecasting from time period t, we can define

the j-step-ahead forecast error et(j) as the dif-

ference between the realized value of yt+j and

84

• 7/30/2019 Slide Chapter 2

85/137

the forecast value Etyt+j. Thus

et(j) yt+j Etyt+j.

Hence, the 1-step-ahead forecast error et(1) =

yt+1Etyt+1 = t+1 (i.e., the unforecastable

portion of yt+1 given the information available

in period t).

To find the two-step-ahead forecast error, we

need to form et(2) = yt+2 Etyt+2. Since

yt+2 = a0 + a1yt+1 + t+2 and Etyt+2 = a0 +

a1Etyt+1, it follows that

et(2) = a1(yt+1 Etyt+1) + t+2= t+2 + a1t+1.

Proceeding in a like manner, you can demon-

strate that for the AR(1) model, j-step-ahead

forecast error et(j) is given by

et(j) = t+j + a1t+j1 + a21t+j2 + a

31t+j3

+ . . . + aj11 t+1. (50)

85

• 7/30/2019 Slide Chapter 2

86/137

Since the mean of (50) is zero, the forecasts

are unbiased estimates of each value yt+j. It

can be seen as follows. Since Ett+j = Ett+j1 =

. . . = Ett+1 = 0, the conditional expectation

of (50) is Etet(j) = 0. Since the expected

value of the forecast error is zero, the fore-

casts are unbiased.

Next we look at the variance of the forecast er-

ror. To compute the forecast error, continue

to assume that the elements of the {t} se-

quence are independent with a variance equalto 2. Then using (50), the variance of the

forecast error is

V ar[et(j)] = 2[1 + a21 + a

41 + a

61 + . . . + a

2(j1)1 ].

(51)

for j = 1, 2, . . . , . Thus, the one-step-aheadforecast error variance is 2, the two-step-ahead

forecast error variance is 2(1 + a21), and so

forth. The essential point to note is that the

86

• 7/30/2019 Slide Chapter 2

87/137

forecast error variance is an increasing function

of j. Consequently, we can have more confi-

dence in short-term forecasts than in long-term

forecasts. In the limit, as j , the forecast

error variance converges to 2/(1 a21); hence,

the forecast error variance converges to the

unconditional variance of the {yt} sequence.

Moreover, assuming the {t} sequence is nor-

mally distributed, you can place confidence in-

tervals around the forecasts. The one-step-

ahead forecast of yt+1 is a0 + a1yt and theforecast error is 2. Therefore, the 95% confi-

dence interval for the one-step-ahead forecast

can be constructed as

a0 + a1yt 1.96.

We can construct a confidence interval for thetwo-step-ahead forecast error in a similar way.

Using (49), the two-step-ahead forecast is Etyt+2= a0(1+a1)+a

21yt. Again using (51), we know

87

• 7/30/2019 Slide Chapter 2

88/137

that V ar[et(2)] = 2(1 + a21). Thus, the 95%

confidence interval for the two-step-ahead fore-cast is

a0(1 + a1) + a21yt 1.96(1 + a

21)

1/2.

Higher-Order Models

Now we generalize the above discussion to de-rive forecasts for any ARMA(p, q) model. To

keep the algebra simple, consider the ARMA(2,1)

model:

yt = a0 + a1yt1 + a2yt2 + t + 1t1. (52)

Updating one period yields

yt+1 = a0 + a1yt + a2yt1 + t+1 + 1t.

If we continue to assume that (1) all the coef-

ficients are known; (2) all variables subscripted

t, t 1, t 2, . . . are known at period t; and (3)

Ett+j = 0 for j > 0, the conditional expecta-

tion of yt+1 is

Etyt+1 = a0 + a1yt + a2yt1 + 1t. (53)

88

• 7/30/2019 Slide Chapter 2

89/137

Equation (53) is the one-step-ahead forecast

of yt+1. The one-step-ahead forecast error:

et(1) = yt+1 Etyt+1 = t+1.

To find the two-step-ahead forecast, update

(52) by two periods

yt+2 = a0 + a1yt+1 + a2yt + t+2 + 1t+1.

The conditional expectation of yt+2 is

Etyt+2 = a0 + a1Etyt+1 + a2yt. (54)

forecast in terms of the one-step-ahead fore-

cast and current value of yt. Combining (53)

and (54) yields

Etyt+2 = a0+a1[a0 +a1yt +a2yt1 +1t]+a2yt= a0(1 + a1) + [a

21 + a2]yt + a1a2yt1

+ a11t.

89

• 7/30/2019 Slide Chapter 2

90/137

To find the two-step-ahead forecast error, sub-

tract (54) from yt+2. Thus,

et(2) = yt+2 Etyt+2= [a0 + a1yt+1 + a2yt + t+2 + 1t+1]

[a0 + a1Etyt+1 + a2yt]

= a1(yt+1 Etyt+1) + t+2

+1t+1. (55)

Since yt+1 Etyt+1 is equal to the one-step-ahead forecast error t+1, we can write the

forecast error as et(2) = (a1 + 1)t+1 + t+2.

Alternatively,

et(2) = yt+2 Etyt+2= [a0 + a1yt+1 + a2yt + t+2 + 1t+1]

[a0(1 + a1) + [a21 + a2]yt + a1a2yt1

+a11t]

= (a1 + 1)t+1 + t+2. (56)

Finally, all j-step-ahead forecasts can be ob-tained from

Etyt+j = a0 + a1Etyt+j1 + a2Etyt+j2, j 2.(57)

90

• 7/30/2019 Slide Chapter 2

91/137

Equation (57) suggests that the forecasts will

satisfy a second-order difference equation. As

long as the characteristic roots of (57) lie in-

side the unit circle, the forecasts will converge

to the unconditional mean: a0/(1 a1 a2).

We can use (57) to find the j-step-ahead fore-

cast errors. Since yt+j = a0 + a1yt+j1 +a2yt+j2 + t+j + 1t+j1, the j-step-ahead

forecast error:

et(j) = yt+j Etyt+j

= [a0 + a1yt+j1 + a2yt+j2 + t+j

+1t+j1]

Et[a0 + a1yt+j1 + a2yt+j2

+t+j + 1t+j1]

= a1(yt+j1 Etyt+j1)

+a2(yt+j2 Etyt+j2)

+t+j + 1t+j1

= a1et(j 1) + a2et(j 2)

+t+j + 1t+j1. (58)

91

• 7/30/2019 Slide Chapter 2

92/137

In practice, we will not know the actual order

of the ARMA process or the actual values ofthe coefficients of that process. Instead, to

create out-of-sample forecasts, it is necessary

to use the estimated coefficients from what

we believe to be the appropriate form of the

ARMA model. Suppose we have T observa-

tions of the {yt} sequence and choose to fit

an ARMA(2,1) model to the data. Let a hat

or caret, (i.e., ) over a parameter denote the

estimated value of the parameter and let {t}

denote the residuals of the estimated model.

Hence the estimated ARMA(2,1) model can

be written as

yt = a0 + a1yt1 + a2yt2 + t + 1t1.

Given that the sample contains T observations,

the out-of-sample forecasts can be easily con-

structed. For example, we can use (53) to

forecast the value of yT+1 conditional on the

T observations as

ETyT+1 = a0 + a1yT + a2yT1 + 1T. (59)

92

• 7/30/2019 Slide Chapter 2

93/137

Once we know the values of a0, a1, a2, and 1,(59) can easily be constructed using the ac-

tual values of yT, yT1, and T. Similarly, the

forecast of yT+2 can be constructed as

ETyT+2 = a0 + a1ETyT+1 + a2yT

where ETyT+1 is the forecast from (59).

Given these two forecasts, all subsequent fore-

casts can be obtained from the difference equa-

tion

ETyT+j = a0+a1ETyT+j1+a2ETyT+j2, j 2.

Note: it is much more difficult to construct

confidence intervals for the forecast errors. Not

only is it necessary to include the effects of

the stochastic variation in the future values of{yT+1}, it is also necessary to incorporate the

fact that the coefficients are estimated with

errors.

93

• 7/30/2019 Slide Chapter 2

94/137

Now that we have estimated a series and have

forecasted its future values, the obvious ques-tion is: How good are our forecasts? Typically,

there will be several plausible models that we

can select to use for our forecasts. Do not be

fooled into thinking that the one with the best

fit is the one that will forecast the best. To

make a simple point, suppose you wanted to

forecast the future values of the ARMA(2,1)

process given by (52). If you could forecast

the value of yT+1 using (53), you would ob-

eT(1) = yT+1 a0a1yT a2yT1 1T = T+1.

Since the forecast error is the pure unfore-

castable portion ofyT+1, no other ARMA model

can provide you with superior forecasting per-

formance. However, we need to estimate the

parameters of the process, so our forecastsmust be made using (59). Therefore, our es-

timated forecast error will be

eT(1) = yT+1 (a0 + a1yT + a2yT1 + 1T).

94

• 7/30/2019 Slide Chapter 2

95/137

Clearly, the two forecast errors are not iden-

tical. When we forecast using (59), the co-efficients (and residuals) are estimated impre-

cisely. The forecasts made using the estimated

model extrapolate this coefficient uncertainty

into the future. Since coefficient uncertainty

increases as the model becomes more complex,

it could be that an estimated AR(1) model

forecasts the process given by (52) better than

an estimated ARMA(2,1) model.

How do we know which one of several rea-

sonable models has the best forecasting per-formance? One way to determine that is to

test. Since the future values of the series are

unknown, you can hold back a portion of the

observations from the estimation process and

estimate the alternative models over the short-

ened span of data and use these estimates to

forecast the observations of the holdback pe-

riod. You can then compare the properties of

95

• 7/30/2019 Slide Chapter 2

96/137

the forecast errors from the alternative mod-

els. To take a simple example, suppose that{yt} contains a total of 150 observations and

that you are unsure as to whether an AR(1) or

an MA(1) model best captures the behavior of

the series. One way to proceed is to use the

first 100 observations to estimate both mod-

els and use each to forecast the value of y101.

Since you know the actual value of y101, you

can construct the forecast error obtained from

AR(1) and from MA(1). These two forecast

errors are precisely those that someone would

estimate an AR(1) and an MA(1) model using

the first 101 observations. Although the esti-

mated coefficients will change somewhat, they

are those that someone would have obtained

in period 101. Use the two models to forecast

the value of y102. Given that you know the ac-

tual value of y102, you can construct two more

forecast errors. Since you know all the values

96

• 7/30/2019 Slide Chapter 2

97/137

of the {yt} sequence through period 150, you

can continue this process so as to obtain twoseries of one-step-ahead forecast errors, each

containing 50 errors. To keep the notation

simple, let {f1t} and {f2t} denote the sequences

of forecasts from the AR(1) and the MA(1),

respectively. Similarly, let {e1t} and {e2t} de-

note the sequences of forecast errors from the

AR(1) and the MA(1), respectively. Then it

should be clear that f11 = E100y101 is the first

forecast using the AR(1), e11 = y101 f11 is

the first forecast error (where the first hold

back observation is y101), and e2,50 is the lastforecast error from the MA(1).

It is desirable that the forecast errors have a

mean zero and a small variance. A regression-

based method to assess the forecasts is to use

the 50 forecasts from the AR(1) to estimatean equation of the form

y100+t = a0 + a1f1t + v1t, t = 1, 2, . . . , 50.

97

• 7/30/2019 Slide Chapter 2

98/137

If the forecasts are unbiased, an F-test shouldallow us to impose the restriction a0 = 0 and

a1 = 1. Similarly, the residual series v1t should

act as a white noise process. It is a good idea

to plot v1t against y100+t to determine if there

are periods in which our forecasts are espe-

cially poor. Now repeat the process with the

forecasts from the MA(1). In particular, use

the 50 forecasts from the MA(1) to estimate

y100+t = b0 + b1f2t + v2t, t = 1, 2, . . . , 50.

Again, if we use an F-test, we should not beable to reject the joint hypothesis b0 = 0 and

b1 = 1. If the significance levels from the two

F-tests are similar, we might select the model

with the smallest residual variance: that is, se-

lect the AR(1) if V ar(y1t

) < V ar(y2t

).

More generally, we might want to have a hold-

back period that differs from 50 observations.

98

• 7/30/2019 Slide Chapter 2

99/137

With a very small sample, it may not be possi-ble to hold back 50 observations. Small sam-

ples are a problem since Ashley (1997) shows

that very large samples are often necessary to

reveal a significant difference between the out-

of-sample forecasting performances of similar

models. Hence, we need to have enough ob-

servations to have well-estimated coefficients

for the in-sample period and enough out-of-

sample forecasts so that the test has good

power. If we have a large sample, it is typi-

cal to hold back as much as 50% of the dataset. Also, we might want to use j-step-ahead

For example, if we have quarterly data and

want to forecast one year into the future, we

can perform the analysis using four-step-aheadforecasts. Nevertheless, once we have the two

sequences of forecast errors, we can compare

their properties.

99

• 7/30/2019 Slide Chapter 2

100/137

Instead of using a regression based approach, a

researcher could select a model with the small-est mean square prediction error (MSPE). If

there are H observations in the hold back pe-

riods, the MSPE for the AR(1) can be calcu-

lated as

M SP E = 1HHi=1 e21i

Several methods have been proposed to deter-

mine whether one MSPE is statistically differ-

ent from the other. If we put the larger of

the two MSPEs in the numerator, a standard

recommendation is to use the F-statistic

F =H

i=1

e21i/H

i=1

e22i (60)

The intuition is that the value of F will equal

unity if the forecast errors from the two models

are identical. A very large value of F implies

that the forecast errors from the first model

are substantially larger than those from the

second. Under the null hypothesis of equal

100

• 7/30/2019 Slide Chapter 2

101/137

forecasting performance, (60) has a standard

F distribution with (H, H) degrees of freedomif the following 3 assumptions hold. The fore-

cast errors are

1. normally distributed with zero mean,

2. serially uncorrelated, and

3. contemporaneously uncorrelated.

Although it is common practice to assume that

the {et} sequence is normally distributed, it is

not necessarily the case that the forecast errors

are normally distributed with zero mean. Sim-

ilarly, the forecasts may be serially correlated;this is particularly true if we use multi-step-

ahead forecasts. For example, equation (56)

indicated that the two-step-ahead forecast er-

ror for yt+2 is

et(2) = (a1 + 1)t+1 +

t+2

and updating et(2) by one period yields the

two-step-ahead forecast error for yt+3 as

et+1(2) = (a1 + 1)t+2 + t+3.

101

• 7/30/2019 Slide Chapter 2

102/137

Thus predicting yt+2 from the perspective of

period t and predicting yt+2 from the perspec-tive of period t+1 both contain an error due tothe presence of t+2. This induces serial cor-relation between the two forecast errors. For-mally it can be seen as follows:

E[et(2)e

t+1(2)] = (a

1+

1)2 = 0.

However, for i > 1, E[et(2)et+1(2)] = 0 sincethere are no overlapping forecasts. Hence, theautocorrelations of the two-step-ahead fore-cast errors cut off to zero after lag 1. As anexercise you can demonstrate the general re-

sult that j-step-ahead forecast errors act as anMA(j 1) process.

Finally, the forecast errors from the two alter-native models will usually be highly correlatedwith each other. For example, a negative re-

alization of t+1 will tend to cause the fore-casts from both models to be too high. Alsonote: the violation of any of the 3 assump-tions means that the ratio of the MSPEs in(60) does not have an F-distribution.

102

• 7/30/2019 Slide Chapter 2

103/137

The Granger-Newbold Test

Granger and Newbold (1976) show how to over-come the problem of contemporaneously cor-related forecast errors. Use the two sequencesof forecast errors to form

xt = e1t + e2t and zt = e1t e2t.

If assumptions 1 and 2 are valid, then underthe null hypothesis of equal forecast accuracy,xt and zt should be uncorrelated. That is,

xz = Extzt = E(e21t e

22t).

should be zero. Model 1 has a larger MSPE if

xz is positive and Model 2 has a larger MSPEif xz is negative. Let rxz denote the samplecorrelation coefficient between {xt} and {zt}.Granger and Newbold (1976) show that

rxz/

(1 r2xz)/(H 1) (61)

has a t-distribution with H 1 degrees of free-dom. Thus, if rxz is statistically significantlydifferent from zero, model 1 has a larger MSPEifrxz is positive and model 2 has a larger MSPEif rxz is negative.

103

• 7/30/2019 Slide Chapter 2

104/137

The Diebold-Mariano Test

Diebold and Mariano (1995) relaxes assump-

tions 1 - 3 and allow for an objective func-

tion that is not quadratic. This is important

because if, for example, an investors loss de-

pends on the size of the forecast error, the

forecaster should be concerned with the abso-lute values of the forecast errors. As another

of zero if the value of the underlying asset

lies below the strike price but receives a one-

dollar pay-off for each dollar the asset price

rises above the strike price.

If we consider only one-step-ahead forecasts,

we can eliminate the subscript j and let the

loss from a forecast error in period i be denoted

by g(ei). In the typical case of mean squarederrors, the loss is e2i . To allow the loss function

to be general, we can write the differential loss

in period i from using model 1 versus model 2

104

• 7/30/2019 Slide Chapter 2

105/137

as di = g(e1i) g(e2i). The mean loss can be

obtained as

d =1

H

Hi=1

[g(e1i) g(e2i)]. (62)

Under the null hypothesis of equal forecast

accuracy, the value ofd is zero. Since

d isthe mean of the individual losses, under fairly

weak conditions, the Central Limit Theorem

implies that d should have a normal distribu-

tion. Hence it is not necessary to assume that

the individual forecast errors are normally dis-

tributed. Thus if we know V ar(d), we could

construct the ratio d/

V ar(d) and test the null

hypothesis of equal forecast accuracy using a

standard normal distribution. In practice, to

implement the test we first need to estimate

V ar(d).

If the {di} series are serially uncorrelated with

a sample variance of 0,the estimate of V ar(d)

105

• 7/30/2019 Slide Chapter 2

106/137

• 7/30/2019 Slide Chapter 2

107/137

sequences of j-step-ahead forecasts, the DM

statistic is

DM =d

0+21+...+2qH+12j+H1j(j1)

.

An example showing the appropriate use of the

Granger-Newbold and Diebold-Mariano tests isprovided in the next section.

10. A Model of the Producer Price Index

This section is intended to illustrate some of

the ambiguities frequently encountered in the

Box-Jenkins technique. These ambiguities maylead two equally skilled econometricians to es-

timate and forecast the same series using very

different ARMA processes. Nonetheless, if you

make reasonable choices, you will select mod-

els that come very close to mimicking the ac-

tual data generating process.

Now we look at the illustration of Box-Jenkins

modeling procedure by estimating a quarterly

107

• 7/30/2019 Slide Chapter 2

108/137

model of the U.S. Producer Price Index (PPI).

The data used in this section are the series la-beled PPI in the file QUARTERLY.XLS. Panel

(a) of Figure 2.5 clearly reveals that there is

little point in modeling the series as being sta-

tionary; there is a decidedly positive trend or

drift throughout the period 1960Q1 to 2002Q1.The first difference of the series seems to have

a constant mean, although inspection of Panel

(b) suggests that the variance is an increasing

function of time. As shown in Panel (c), the

first difference of the logarithm (denoted by

lppi) is the most likely candidate to be covari-ance stationary. Moreover, there is a strong

economic reason to be interested in the log-

arithmic change since lppit is a measure of

inflation. However, the large volatility of the

PPI accompanying the oil price shocks in the

1970s should make us somewhat wary of the

assumption that the process is covariance sta-

tionary. At this point, some researchers would

make

108

• 7/30/2019 Slide Chapter 2

109/137

the volatility exhibited in the 1970s. However,

it seems reasonable to estimate a model of the

{lppit} sequence without any further trans-

formations. As always, you should maintain

a healthy skepticism of the accuracy of your

model.

The autocorrelation and partial autocorrela-

tion functions of the {lppit} sequence can

be seen in Figure 2.6. Let us try to identify

the tentative models that we would want toestimate. In making our decision, we note the

following:

1. The ACF and PACF converge to zero rea-

sonably quickly. We do not want to overdiffer-

ence the data and try to model the {2

lppit}sequence.

2. The theoretical ACF of a pure MA(q) pro-

cess cuts off to zero at lag q and the theoretical

109

• 7/30/2019 Slide Chapter 2

110/137

ACF of an AR(1) process decays geometrically.

Examination of Figure 2.6 suggests that nei-ther of these specifications seems appropriate

for the sample data.

3. The ACF does not decay geometrically.

The value of 1 is 0.603 and the values of

2, 3, and 4 are 0.494, 0.451, and 0.446,respectively. Thus the ACF is suggestive of

an AR(2) process or a process with both au-

toregressive and moving average components.

The PACF is such that 11 = 0.604 and cuts

off to 0.203 abruptly (i.e., 22 = 0.203). Over-

all, the PACF suggests that we should considermodels such that p = 1 and p = 2.

4. Note the jump in ACF after lag 4 and the

small jump in the PACF at lag 4 (44 = 0.148

while 55 = - 0.114). Since we are using quar-

terly data, we might want to incorporate a sea-

sonal factor at lag 4.

Points 1 to 4 suggest an ARMA(1,1) or an

AR(2) model. In addition, we might want to

110

• 7/30/2019 Slide Chapter 2

111/137

consider models with a seasonal term at lag 4.

However, to compare with a variety of mod-els, Table 2.4 reports estimates of 6 tentative

models. To ensure comparability, all were esti-

mated over the same sample period. We make

the following observations:

1. The estimated AR(1) model confirms ouranalysis conducted in the identification stage.

Even though the estimated value of a1 (0.603)

is less than unity in absolute value and al-

most four standard deviations from zero, the

the Ljung-Box Q-statistic for 4 lags of the resid-

uals yields a value of 13.9, we can reject the

null that Q(4) = 0 at the 1% significance level.

Hence, the lagged residuals of this model ex-

hibit substantial serial autocorrelation and we

must eliminate this model from consideration.

2. The AR(2) model is an improvement over

the AR(1) specification. The estimated coef-

ficients (a1 = 0.480 and a2 = 0.209) are each

111

• 7/30/2019 Slide Chapter 2

112/137

significantly different from zero at conventional

levels and imply characteristic roots in the unitcircle. However, there are some ambiguity about

the information content of the residuals. The

Q-statistics indicate that the autocorrelations

of the residuals are not statistically significant

at the 5% level but are significant at the 10%

level. As measured by the AIC and SBC, the

fit of the AR(2) model is superior to that of

the AR(1). Overall, the AR(2) model domi-

nates the AR(1) specification.

3. The ARMA(1,1) specification is superior

to the AR(2) model. The estimated coeffi-cients are highly significant (with t-values of

14.9 and - 4.41). The estimated value of a1 is

positive and less than unity and the Q-statistics

indicate that the autocorrelations of the resid-

uals are not significant at conventional levels.

Moreover, all goodness-of-fit measures select

the ARMA(1,1) specification over the AR(2)

model. Thus, there is little reason to maintain

the AR(2) specification.

112

• 7/30/2019 Slide Chapter 2

113/137

4. In order to account for the possibility of sea-

sonality, we estimated the ARMA(1,1) modelwith the additional moving average coefficient

at lag 4. That is, we estimated a model of the

form: yt = a0 + a1yt1 + t + 1t1 + 4t4.

Other seasonal patterns are considered in the

next section. For now, note that the additiveexpression 4t4 is often preferable to an addi-

tive autoregressive term a4yt4. For truly sea-

sonal shocks, the expression 4t4 captures

spikes - not decay - at the quarterly lags. The

slope coefficients of the estimated ARMA(1,

(1,4)) model are all highly significant with t-

statistics of 9.46, -3.41, and 3.63. The Q-

statistics of the residuals are all very low, im-

plying that the autocorrelations are not statis-

tically significantly different from zero. More-

over, the AIC and SBC select this model overthe ARMA(1,1) mo

Related Documents
##### Chapter 18 Slide
Category: Documents
##### Slide 1. Slide 2 Chapter 2 Flow of Control Slide 3 Learning....
Category: Documents
##### Chapter 19: Blood The Cardiovascular...
Category: Documents
##### CHAPTER 7 Slide - Class News · CHAPTER 7 Slide ‹# ......
Category: Documents
##### [Chapter-2]-slide DS[Data Engineering]
Category: Documents
##### Slide 8-1. Slide 8-2 Chapter 8 Accounting for Receivables...
Category: Documents
##### Chapter 2 Neurons [PPTX] - Slide 1
Category: Documents