YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
  • 7/30/2019 Slide Chapter 2

    1/137

    Chapter 2 Stationary Time Series Models

    This chapter develops the Box-Jenkins Method-

    ology for estimating time series models of the

    form

    yt = a0+a1yt1+. . .+apytp+t+1t1+. . .+qtq

    which are called autoregressive integrated mov-

    ing average (ARIMA) models.

    The chapter has three aims:

    1. Present the theory of stochastic linear differ-

    ence equations and consider time series proper-

    ties of stationary ARIMA models; a stationaryARIMA model is called an autoregressive mov-

    ing average (ARMA) model.

    1

  • 7/30/2019 Slide Chapter 2

    2/137

    2. Develop tools used in estimating ARMA mod-

    els. Especially useful are the autocorrelationfunctions (ACF) and partial autocorrelation func-

    tions (PACF).

    3. Consider various test statistics to check for

    model adequacy and show how a properly es-timated model can be used for forecasting.

    1. Stochastic Difference Equation ModelsStochastic difference equations are a conve-

    nient way of modeling dynamic economic pro-cesses. To take a simple example, suppose the

    Federal Reserves money supply target grows

    3% each period. Hence,

    mt = 1.03mt1 (1)

    so that, given the initial condition m

    0, the par-ticular solution is

    mt = (1.03)tm0

    2

  • 7/30/2019 Slide Chapter 2

    3/137

    where mt =the logarithm of the money supply

    target in period t, m0=the logarithm of moneysupply in period 0.

    Of course, the actual money supply, mt, and

    the target money supply, mt , need not be equal.

    Suppose that at the beginning of period

    t there are mt1 dollars so that the gap

    between the target and the actual money

    supply is mt mt1

    Suppose that Fed cannot perfectly control

    the money supply but attempts to change

    the money supply by % of any gap be-

    tween the desired and actual money supply

    We can model this behavior as

    mt = [mt mt1] + t

    3

  • 7/30/2019 Slide Chapter 2

    4/137

    Using (1), we obtain

    mt = (1.03)tm0 + (1 )mt1 + t (2)

    where t

    is the uncontrollable portion of the

    money supply, and we assume its mean is

    zero in all time periods.

    Although the model is overly simple, it does

    illustrate the key points:

    1. Equation (2) is a discrete difference equa-

    tion. Since {t} is stochastic, the money

    supply is stochastic; we call (2) a linear

    stochastic difference equation.

    4

  • 7/30/2019 Slide Chapter 2

    5/137

    2. If we knew the distribution of{t}, we could

    calculate the distribution for each element in

    the {mt} sequence. Since (2) shows how the

    realizations of the {mt} sequence are linked

    across time, we would be able to calculate the

    various joint probabilities. We note that the

    distribution of the money supply sequence is

    completely determined by the parameters of

    the difference equation (2) and the distribu-

    tion of the {t} sequence.

    3. Having observed the first t observations in

    the {mt} sequence, we can make forecasts of

    mt+1, mt+2, . . . ,. For example, updating (2)

    by one period and taking the conditional ex-pectation, the forecast of mt+1 is: Etmt+1 =

    (1.03)t+1m0 + (1 )mt.

    5

  • 7/30/2019 Slide Chapter 2

    6/137

  • 7/30/2019 Slide Chapter 2

    7/137

    A white noise process can be used to construct

    more interesting time series processes. For ex-

    ample, the time series

    xt =q

    i=0

    iti (3)

    is constructed by taking the values t,

    ti, . . . ,

    tqand multiplying each by the associated value of

    i.

    A series formed in this manner is called a

    moving average of order q

    It is denoted by MA(q)

    Although the sequence

    {t}is a white noise

    process, the sequence {xt} will not be a

    white noise process if two or more of the

    i are different from zero

    7

  • 7/30/2019 Slide Chapter 2

    8/137

    To illustrate using an MA(1) process, set 0 =

    1, 1 = 0.5, and all other i = 0. Then

    E(xt) = E(t + 0.5t1) = 0

    var(xt) = var(t + 0.5t1) = 1.252

    E(xt) = E(xts) and var(xt) = var(xts)for all s

    Hence, the first two conditions for {xt} to be

    a white noise are satisfied. However,

    E(xtxt1) = E[(t+0.5t1)(t1+0.5t2)]= E[tt1+0.5(t1)

    2+0.5tt2+0.25t1t2]= 0.52

    Given that there exists a value of s = 0such that E(xtxts) = 0,the sequence {xt}is not a white noise process

    8

  • 7/30/2019 Slide Chapter 2

    9/137

    2. ARMA ModelsIt is possible to combine a moving average pro-

    cess with a linear difference equation to obtain

    an autoregressive moving average model. Con-

    sider the p-th order difference equation:

    yt = a0 +p

    i=1

    aiyti + xt. (4)

    Now let {xt} be the MA(q) process given by

    (3) so that we can write

    yt = a0 +

    pi=1

    aiyti +

    qi=0

    iti (5)

    where by convention we normalize 0 to unity.

    If the characteristic roots of (5) are all in

    the unit circle, then yt is said to follow anautoregressive moving average (ARMA)

    model

    9

  • 7/30/2019 Slide Chapter 2

    10/137

    The autoregressive part of the model is the

    difference equation given by the homoge-

    neous portion of (4) and the moving aver-

    age part is the xt sequence

    If the homogeneous part of the difference

    equation contains p lags and the model

    for xt contains q lags, the model is called

    ARMA(p,q) model

    If q = 0, the model is a pure autoregressivemodel denoted by AR(p)

    If p = 0, the model is a pure moving aver-

    age model denoted by MA(q)

    In an ARMA model, it is permissible to al-

    low p and/or q to be infinite

    10

  • 7/30/2019 Slide Chapter 2

    11/137

    If one or more characteristic roots of (5)

    are greater than or equal to unity, the {yt}

    sequence is called an integrated process

    and (5) is called an autoregressive inte-

    grated moving average (ARIMA) model

    This chapter considers only models in which

    all of the characteristic roots of (5) are

    within the unit circle

    Treating (5) as a difference equation suggests

    that yt can be solved in terms of {t} sequence

    The solution of an ARMA(p,q) model ex-

    pressing yt in terms of the {t} sequence is

    the moving average representation of yt

    11

  • 7/30/2019 Slide Chapter 2

    12/137

    For the AR(1) model: yt = a0 +a1yt1 +t,the moving average representation can be

    shown to be

    yt = a0/(1 a1) +

    i=0ai1ti

    For the general ARMA(p,q) model, using

    the lag operator L, (5) can be rewritten as

    (1 p

    i=1 aiLi)yt = a0 +

    q

    i=0 itiso the particular solution for yt is

    yt = (a0 +q

    i=0

    iti)/(1 p

    i=1

    aiLi) (6)

    The expansion of (6) yields an MA() pro-

    cess

    12

  • 7/30/2019 Slide Chapter 2

    13/137

    Issue: whether the expansion is convergent

    so that the stochastic difference given by

    (6) is stable

    Will see in the next section, the stability

    condition is that the roots of the polyno-

    mial (1 p

    i=1 aiLi) must lie outside the

    unit circle

    Will also see that, if yt is a linear stochastic

    difference equation, the stability condition

    is a necessary condition for the time series

    {yt} to be stationary

    3. Stationarity

    Suppose the quality control division of amanufacturing firm samples four machines

    each hour. Every hour, quality control finds

    the mean of the machines output levels.

    13

  • 7/30/2019 Slide Chapter 2

    14/137

    The plot of each machines hourly output is

    shown in Figure 2.1. If yit represents machineyis output at hour t, the means (yt) are readily

    calculated as

    yt =4

    i=1

    yit/4.

    For hours 5, 10, and 15, these mean values are4.61, 5.14, and 5.03, respectively.

    The sample variance for each hour can similarly

    be constructed.

    Unfortunately, we do not usually have the

    luxury of being able to obtain an ensem-

    ble, that is, multiple observations of the

    same process over the same time period

    Typically, we observe only one set of re-

    alizations, that is, one observation, of a

    process, over a given time period

    14

  • 7/30/2019 Slide Chapter 2

    15/137

    Fortunately, if {yt} is a stationary series,

    the mean , variance, and autocorrelationscan be well approximated by sufficiently

    long time averages based on the single

    set of realizations

    Suppose you observed the output of machine

    1 for 20 periods. If you knew that the out-

    put was stationary, you could approximate the

    mean level of output by

    yt

    20i=1

    yit/20.

    In using this approximation you would be as-

    suming that the mean was the same for each

    period. Formally, a stochastic process having

    a finite mean and variance is covariance sta-

    tionary if for all t and t s,

    Mean(yt) = Mean(yts) =

    E(yt) = E(yts) = (7)

    15

  • 7/30/2019 Slide Chapter 2

    16/137

    V ar(yt) = V ar(yts) = 2y

    E[(yt )2] = E(yts )

    2] = 2y (8)

    Cov(yt, yts) = Cov(ytjytjs) = s

    E[(yt )(yts )]

    = E(ytj )(ytjs )] = s (9)

    where , 2y , and s are all constants. (For s =

    0, (8) and (9) are identical, so 0 equals the

    variance of yt.)

    To reiterate, a time series is covariance sta-

    tionary if its mean and all autocovariances

    are unaffected by a change in time origin

    A covariance stationary process is alsoreferred to as a weakly stationary, second-

    order stationary, or wide-sense station-

    ary process

    16

  • 7/30/2019 Slide Chapter 2

    17/137

    A strongly stationary process need not have

    a finite mean and/or variance

    In our course, we consider only covariancestationary series. So there is no ambiguityin using the terms stationary and covari-ance stationary interchangeably

    In multivariate models, the term autoco-variance is reserved for the covariance be-tween yt and its own lags

    In univariate time series models, there isno ambiguity and the terms autocovarianceand covariance are used interchangeably

    For a covariance stationary series, we can de-

    fine the autocorrelation between yt and ytsas

    s s/0

    where s and 0 are defined by (9).

    17

  • 7/30/2019 Slide Chapter 2

    18/137

    Since s and 0 are time-independent, theautocorrelation coefficients s are also time-

    independent

    Although the autocorrelation between yt

    and yt1 can differ from the autocorrela-tion between yt and yt2, the autocorrela-

    tion between yt and yt1 must be identical

    to that between yts and yts1

    Obviously, 0 = 1

    Stationarity Restrictions for an AR(1) Model

    Let

    yt = a0 + a1yt1 + t

    where t is a white noise.

    18

  • 7/30/2019 Slide Chapter 2

    19/137

    Case: y0 known

    Suppose the process started in period zero, sothat y0 is a deterministic initial condition. The

    solution to this equation is

    yt = a0

    t1

    i=0ai1 + a

    t1y0 +

    t1

    i=0ai1ti. (10)

    Taking expected value of (10), we obtain

    Eyt = a0

    t1i=0

    ai1 + at1y0. (11)

    Updating by s periods yields

    Eyt+s = a0t+s1

    i=0

    ai1 + at+s1 y0. (12)

    Comparing (11) and (12), it is clear that

    both means are time-dependent

    Since Eyt = Eyt+s, the sequence cannot

    be stationary

    19

  • 7/30/2019 Slide Chapter 2

    20/137

    However, if t is large, we can consider the

    limiting value of yt in (10)

    If |a1| < 1, then at1y0 converges to zero

    as t becomes infinitely large and the sum

    a0[1 + a1 + (a1)2

    + (a1)3

    + . . .] convergesto a0/(1 a1)

    Thus, if |a1| < 1, as t , we have

    lim yt =a

    01 a1+

    i=0

    ai1ti. (13)

    Now take expectations of (13)

    Then we have, for sufficiently large values

    of t, Eyt = a0/(1 a1), since E(ti) = 0

    for all i

    20

  • 7/30/2019 Slide Chapter 2

    21/137

    Thus, the mean value of yt is finite and

    time-independent:

    Eyt = Eyts = a0/(1 a1) for all t.

    Turning to the variance, we find

    E(yt )2

    = E[(t + a1t1 + (a1)2t2 + . . .)

    2]

    = 2[ 1 + (a1)2 + (a1)

    4 + . . .]

    = 2/[1 (a1)2]

    which is also finite and time-independent

    Finally, the limiting values of all autoco-variances, s, s = 0, 1, 2, . . ., are also finiteand time-independent:

    s = E[(yt )(yts )]

    = E{[t + a1t1 + (a1)2t2 + . . .][ts + a1ts1 + (a1)

    2ts2 + . . .]}

    = 2(a1)s[ 1 + (a1)

    2 + (a1)4 + . . .]

    = 2(a1)s/[1 (a1)

    2]

    (14)

    21

  • 7/30/2019 Slide Chapter 2

    22/137

    Case: y0 unknown

    Little would change were we not given the ini-tial condition. Without the initial value yo,

    the sum of the particular solution and homo-

    geneous solution for yt is

    yt = a0/(1 a1) +

    i=0

    ai

    1

    ti

    particular solution

    + A(a1)t

    homogeneous solution

    (15)

    where A= an arbitrary constant = deviation

    from long-run equilibrium.

    If we take the expectation of (15), it is

    clear that the {yt} sequence cannot be sta-

    tionary unless the particular solution A(a1)t

    is equal to zero

    Either the sequence must have started in-

    finitely long ago (so that at1 = 0) or the

    arbitrary constant A must be zero

    22

  • 7/30/2019 Slide Chapter 2

    23/137

    Thus, we have the stability conditions:

    The homogeneous solution must be zero.

    Either the sequence must have started in-

    finitely far in the past or the process must

    always be in equilibrium (so that the arbi-

    trary constant is zero).

    The characteristic root a1 must be less

    than unity in absolute value. These two

    conditions readily generalize to all ARMA(p,q)

    processes. The homogeneous solution to

    (5) has the form

    pi=1

    Aiti

    or if there are m repeated roots,

    m

    i=1

    Aiti +

    pi=m+1

    Aiti

    23

  • 7/30/2019 Slide Chapter 2

    24/137

    where the Ai are arbitrary constants, is the

    repeated root, and i are the distinct roots.

    If any portion of the homogeneous equa-

    tion is present, the mean, variance, and al

    covariances will be time-dependent

    Hence, for any ARMA(p,q) model, station-

    arity necessitates that the homogeneous

    solution be zero

    The next section addresses stationarity restric-

    tions for the particular solution.

    4. Stationarity Restrictions for an

    ARMA(p,q) ModelAs a prelude to the stationarity conditions for

    the general ARMA(p,q) model, first consider

    the stationarity conditions for an ARMA(2,1)

    24

  • 7/30/2019 Slide Chapter 2

    25/137

    model. Since the magnitude of the intercept

    term does not affect the stability (or station-

    arity) condition, set a0 = 0 and write

    yt = a1yt1 + a2yt2 + t + 1t1. (16)

    From the previous section, we know that the

    homogeneous solution must be zero. So itis only necessary to find the particular solu-

    tion. Using the method of undetermined co-

    efficients, we can write the challenge solution

    as

    yt =

    i=0

    iti. (17)

    For (17) to be a solution of (16), the various

    i must satisfy

    0t + 1t1 + 2t2 + 3t3 + . . .

    = a1(0t1 + 1t2 + 2t3 + 3t4 + . . .)

    +a2(0t2 + 1t3 + 2t4 + 3t5 + . . .)

    +t + 1t1.

    25

  • 7/30/2019 Slide Chapter 2

    26/137

    Matching the coefficients of t, t1, t2, ,

    yields1. 0 = 1

    2. 1 = a10 + 1 1 = a1 + 13. i = a1i1 + a2i2 for all i 2.

    The key point is that for i 2, the coeffi-cients must satisfy the difference equation

    i = a1i1 + a2i2

    If the characteristic roots of (16) are within

    the unit circle, then the {i} must consti-tute a convergent sequence

    To verify that the {i} sequence generated

    by is stationary, take the expectation of

    (17) and note that Eyt = Eyti = 0 for all

    t and i

    Hence, the mean is finite and time-invariant

    26

  • 7/30/2019 Slide Chapter 2

    27/137

    Since the {t} sequence is assumed to be white

    noise process, the variance ofyt is constant and

    time-independent:

    V ar(yt)

    = E[(0t + 1t1 + 2t2 + 3t3 + . . .)2

    ]

    = 2

    i=0

    2i

    V ar(yts)= E[(0ts + 1ts1 + 2ts2

    +3ts3 + . . .)2]

    = 2

    i=0

    2i

    Hence, V ar(yt) = V ar(yts) for all t and s

    27

  • 7/30/2019 Slide Chapter 2

    28/137

    Finally, note,

    Cov(yt, yt1)

    = E[(t + 1t1 + 2t2 + 3t3 + . . .)

    (t1 + 1t2 + 2t3 + 3t4 + . . .)]

    =

    2

    (1 + 21 + 32 + )

    Cov(yt, yt2)

    = E[(t + 1t1 + 2t2 + 3t3 + . . .)

    (t2 + 1t3 + 2t4 + 3t5 + . . .)]

    = 2(2

    + 3

    1

    + 4

    2

    + . . .)

    From the above pattern, it is clear that the

    s-th autocovariance, s, is given by

    s = Cov(yt, yts)

    = 2(s + s+11 + s+22 + . . .)(18)

    Thus, the s-th autocovariance, s, is con-

    stant and independent of t

    28

  • 7/30/2019 Slide Chapter 2

    29/137

    Conversely, if the characteristic roots of

    (16) do not lie within the unit circle, the

    {i} sequence will not be convergent, and

    hence, the {yt} sequence cannot be con-

    vergent

    Stationarity Restrictions for the

    Moving Average Coefficients

    Next, we look at the conditions ensuring

    the stationarity of a pure MA() process:

    xt =

    i=0

    iti

    where t W N(0, 2). We have already

    determined that {xt} is not a white noise

    process; now the issue is whether {xt} is co-

    variance stationary. Given conditions (7),(8), and (9), we ask the following:

    29

  • 7/30/2019 Slide Chapter 2

    30/137

    1. Is the mean finite and time-independent?

    E(xt) = E(t + 1t1 + 2t2 + . . .)

    = Et + 1Et1 + 2Et2 + . . .

    = 0

    Repeating the calculation with xts, we obtain

    E(xts) = E(ts + 1ts1 + 2ts2 + . . .)

    = Ets + 1Ets1 + 2Ets2 + . . .

    = 0

    Hence, all elements in the {xt} sequence havethe same finite mean ( = 0).

    2. Is the variance finite and time-independent?

    V ar(xt)

    = E[(t + 1t1 + 2t2 + . . .)2

    ]= E(t)

    2 + (1)2E(t1)

    2 + (2)2E(t2)

    2 + . . .

    [since Etts = 0 for s = 0]

    = 2[ 1 + (1)2 + (2)

    2 + . . .]

    30

  • 7/30/2019 Slide Chapter 2

    31/137

    Therefore, a necessary condition for V ar(xt)

    to be finite is that i=0(i)2 be finite.

    Repeating the calculation with xts yields

    V ar(xts)

    = E[(ts + 1ts1 + 2ts2 + . . .)2]

    = E(ts)2 + (1)

    2E(ts1)2

    +(2)2E(ts2)

    2 + . . .

    [since Etstsi = 0 for i = 0]

    = 2[ 1 + (1)2 + (2)

    2 + . . .]

    Thus, if

    i=0(i)2 is finite, then V ar(xt) =

    V ar(xts) for all t and t s, and hence, all

    elements in the {xt} sequence have the same

    finite variance.

    3. Are all autocovariances finite and time-

    independent?

    31

  • 7/30/2019 Slide Chapter 2

    32/137

    The s-th autocovariance, s, is given by

    s = Cov(xt, xts)

    = E(xtxts)

    = E(t + 1t1 + 2t2 + . . .)

    (ts + 1ts1 + 2ts2 + . . .)

    = 2

    (s + s+11 + s+22 + . . .)Therefore, for s to be finite, the sum

    s + s+11 + s+22 + . . ., must be finite.

    In summary, the necessary and sufficient con-ditions for an MA() process to be stationary

    are that the sums

    (i) 20 + 21 +

    22 + . . ., and

    (ii) s + s+11 + s+22 + . . .

    be finite.

    However, since (ii) must hold for all values of

    s 0, and 0 = 1, condition (i) is redundant.

    32

  • 7/30/2019 Slide Chapter 2

    33/137

    Stationarity Restrictions for the

    Moving Average CoefficientsNow consider the pure autoregressive model oforder p:

    yt = a0 +p

    i=1

    aiyti + t. (19)

    If the characteristic roots of the homogeneousequation of (19) all lie inside the unit circle,we can write the particular solution as

    yt =a0

    1

    pi=1 ai

    + +

    i=0

    iti (20)

    where 0 = 1 a n d {i, i 1} are undeter-mined coefficients. We know that (20) is aconvergent sequence so long as the character-istic roots of (19) are inside the unit circle. Wealso know that the sequence {i} will solve thedifference equation

    i a1i1 a2i2 . . . apip = 0. (21)

    If the characteristic roots of (21) are all in-side the unit circle, the {i} sequence will beconvergent.

    33

  • 7/30/2019 Slide Chapter 2

    34/137

    Although (20) is an infinite-order moving av-erage process, the convergence of the MA co-

    efficients implies that

    i=0 2i is finite. Thus,

    we can use (20) to check the three condiions

    of stationarity.

    Eyt = Eyts =a

    01

    pi=1 ai

    A necessary condition for all characteristic roots

    to lie inside the unit circle is 1 p

    i=1 ai > 0.

    Hence, the mean of the sequence is finite and

    time-invariant.

    V ar(yt)

    = E[(t + 1t1 + 2t2 + . . .)2]

    = E(t)2 + (1)

    2E(t1)2 + (2)

    2E(t2)2 + . . .

    = 2[ 1 + (1)2 + (2)2 + . . .]

    = 2

    i=0

    2i

    34

  • 7/30/2019 Slide Chapter 2

    35/137

    Similarly,

    V ar(yts)

    = E[(ts + 1ts1 + 2ts2 + . . .)2]

    = E(ts)2 + (1)

    2E(ts1)2 +

    (2)2E(ts2)

    2 + . . .

    = 2[ 1 + (1)2 + (2)

    2 + . . .]

    = 2

    i=0

    2i

    Thus, if

    i=0(i)2 is finite, then V ar(yt) =

    V ar(yts) for all t and t s, and hence, allelements in the {yt} sequence have the samefinite variance.

    Finally, let us look at the s-th autocovariance,s, which is given by

    s = Cov(yt, yts)

    = E(ytyts)= E(t + 1t1 + 2t2 + . . .)

    (ts + 1ts1 + 2ts2 + . . .)

    = 2(s + s+11 + s+22 + . . .)

    35

  • 7/30/2019 Slide Chapter 2

    36/137

    Therefore, for s to be finite, the sum

    s + s+11 + s+22 + . . ., must be finite.

    Nothing of substance is changed by combining

    the AR(p) and MA(q) models into the general

    ARMA(p,q) model:

    yt = a0 +p

    i=1

    aiyti + xt

    xt =q

    i=0

    iti. (22)

    If the roots of the inverse characteristic equa-tion lie outside the unit circle[that is, if the

    roots of the homogeneous form of (22) lie in-

    side the unit circle] and if the {xt} sequence is

    stationary, the {yt} sequence will be stationary.

    Consider

    yt =a0

    1 p

    i=1 ai+

    t

    1 p

    i=1 aiLi

    +1t1

    1p

    i=1 aiLi +

    2t21

    pi=1 aiL

    i + . . . (23)

    36

  • 7/30/2019 Slide Chapter 2

    37/137

    Each of the expressions on the right-hand

    side of (23) is stationary as long as theroots of 1

    pi=1 aiL

    i are outside the unitcircle

    Given that {xt} is stationary, only the rootsof the autoregressive portion of (22) deter-mine whether the {yt} sequence is station-ary

    5. The Autocorrelation FunctionThe autocovariances and autocorrelations of

    the type found in (18) serve as useful toolsin the Box-Jenkins approach to identifying andestimating time series models. Illustrated be-low are four important examples: the AR(1),AR(2), MA(1), and ARMA(1,1) models.

    The Autocorrelation Function of an AR(1)ProcessFor an AR(1) model, yt = a0 +a1yt1 +t, (14)shows

    37

  • 7/30/2019 Slide Chapter 2

    38/137

    0 =2

    [1 (a1)2]

    s =2(a1)

    s

    [1 (a1)2].

    Now dividing s by 0, gives autocorrelationfunction (ACF) at lag s: s =s0

    . Thus, we

    find that,

    0 = 1, 1 = a1, 2 = (a1)2, . . . , s = (a1)

    s.

    A necessary condition for an AR(1) process

    to be stationary is that |a1| < 1

    Thus, the plot of s against s - called the

    correlogram - should converge to zero ge-

    ometrically if the series is stationary

    38

  • 7/30/2019 Slide Chapter 2

    39/137

    If a1 is positive, convergence will be direct,

    and if a1 is negative, the correlogram willfollow a damped oscillatory path around

    zero

    The first two graphs on the left-hand side

    of Figure 2.2 show the theoretical auto-correlation function for a1 = 0.7 and a1 =

    0.7, respectively

    In these diagrams 0 is not shown since its

    value is necessarily equal to one

    The Autocorrelation Function of an AR(2)

    Process

    We now consider AR(2) process yt = a1yt1 +

    a2yt2+t (with a0 omitted since this intercept

    term has no effect on the ACF. For the AR(2)

    to be stationary, we know that it is necessary

    to restrict the roots of the second-order lag

    39

  • 7/30/2019 Slide Chapter 2

    40/137

    polynomial (1 a1L a2L2) to be outside the

    unit circle. In section 4, we derived the auto-covariances of an ARMA(2,1) process by useof the method of undetermined coefficients.Now we use an alternative technique known asYule-Walker equations. Multiply the second-order difference equation by yt, yt1, yt2, . . . , yts

    and take expectations. This yields

    Eytyt = a1Eyt1yt + a2Eyt2yt + EtytEytyt1 = a1Eyt1yt1 + a2Eyt2yt1 + Etyt1Eytyt2 = a1Eyt1yt2 + a2Eyt2yt2 + Etyt2

    ...

    Eytyts = a1Eyt1yts + a2Eyt2yts + Etyts(24)

    By definition, the autocovariances of a station-ary series are such that Eytyts = Eytsyt =Eytkytks = s. We also know that Etyt =2 and Etyts = 0. Hence, we can use equa-

    tions (24) to formo = a11 + a22 +

    2 (25)

    1 = a1o + a21 (26)

    s = a1s1 + a2s2 (27)

    40

  • 7/30/2019 Slide Chapter 2

    41/137

    Dividing (26) and (27) by 0 yields

    1 = a1o + a21 (28)

    s = a1s1 + a2s2 (29)

    We know that 0 = 1. So, from (28), we have

    1 = a1/(1 a2). Hence, we can find all s for

    s 2 by solving the difference equation (29).

    For example, for s = 2, and s = 3,

    2 = (a1)2/(1 a2) + a2

    3 = a1[(a1)2/(1 a2) + a2] + a2a1/(1 a2)

    Given the solutions for 0 and 1, the keypoint to note is that the s all satisfy the

    difference equation (29)

    The solution may be oscillatory or direct

    Note that the stationarity condition for ytnecessitates that the characteristic roots of

    (29) lie inside the unit circle

    41

  • 7/30/2019 Slide Chapter 2

    42/137

    Hence, the {s} sequence must be conver-

    gent

    The correlogram for an AR(2) process must

    be such that 0 = 1 and 1 be determined

    by (28)

    These two values can be viewed as the ini-

    tial values for the second-order difference

    equation (29)

    The fourth panel on the left-hand side of

    Figure 2.2 shows the ACF for the process

    yt = 0.7yt1 0.49yt2 + t.

    The properties of the various s follow di-

    rectly from the homogeneous equation yt

    0.7yt1 0.49yt2 = 0

    42

  • 7/30/2019 Slide Chapter 2

    43/137

    The roots are obtained as

    = {0.7 [(0.7)2 4(0.49)]1/2}/2

    Since the discriminantd = (0.7)2 4(0.49)is negative, the characteristic roots are imag-inary. So the solution oscillates

    However, since a2 = 0.49, the solution isconvergent and the {yt} is stationary

    Finally, we may wish to find the autoco-variances, s. Since we know all the auto-correlations, if we can find the variance ofyt, that is, 0, we can find all of the others.

    Since i = i/0, from (25) we have0 = a1(10) + a2(20) + 2

    0(1 a11 a22) = 2

    0 =2

    (1a11a22)

    43

  • 7/30/2019 Slide Chapter 2

    44/137

    Substituting for 1 and 2 yields0 = V ar(yt)

    =

    (1 a2)

    (1 + a2)

    2

    (a1 + a2 1)(a2 a1 1)

    .

    The Autocorrelation Function of an MA(1)

    Process

    Next consider the MA(1) process: yt = t +

    t1. Again, we can obtain the Yule-Walker

    equations by multiplying yt by each yts, s =

    0, 1, 2, . . . and take expectations. This yields

    0 = V ar(yt) = Eytyt= E[(t + t1)(t + t1)] = (1 +

    2)2

    1 = Eytyt1= E[(t + t1)(t1 + t2)] =

    2

    ...

    s = Eytyts= E[(t + t1)(ts + ts1)] = 0 s > 1

    44

  • 7/30/2019 Slide Chapter 2

    45/137

    Dividing each s by 0, it can be seen that

    the ACF is simply

    0 = 1,

    1 = (1 + 2), and

    s

    = 0 s > 1.

    The third graph on the left-hand side of

    Figure 2.2 shows the ACF for the MA(1)

    process: yt = t 0.7t1

    You saw above that for an MA(1) process,

    s = 0 s > 1.

    As an easy exercise, convince yourself that,

    for an MA(2) process, s = 0 s > 2,

    for an MA(3) process, s = 0 s > 3,

    and so on.

    45

  • 7/30/2019 Slide Chapter 2

    46/137

    The Autocorrelation Function of an

    ARMA(1,1) Process

    Finally, consider the ARMA(1,1) process:

    yt = a1yt1+t+1t1. Using the now-familiar

    procedure, the Yule-Walker equations are:

    Eytyt = a1Eyt1yt + Etyt + 1Et1yt

    0 = a11 + 2 + 1(a1 + 1)

    2 (30)

    Eytyt1 = a1Eyt1yt1 + Etyt1 + 1Et1yt1

    1 = a10 + 12 (31)

    Eytyt2 = a1Eyt1yt2 + Etyt2 + 1Et1yt2

    2 = a11 (32)...

    Eytyts = a1Eyt1yts + Etyts + 1Et1yts

    s = a1s1. (33)

    Solving (30) and (31) simultaneously for 0and 1 yields

    46

  • 7/30/2019 Slide Chapter 2

    47/137

    0 =1 + 21 + 2a11

    (1 a21)2, and

    1 =(1 + a11)(a1 + 1)

    (1 a21)2.

    Hence,

    1 =(1 + a11)(a1 + 1)

    1 + 21 + 2a11(34)

    and s = a1s1 for all s 2.

    Thus, the ACF for an ARMA(1,1) process is

    such that the magnitude of 1 depends on both

    a1 and 1. Beginning with this value of 1,

    the ACF of an ARMA(1,1) process looks like

    that of the AR(1) process. If 0 < a1 < 1,

    convergence will be direct, and if 1 < a1 < 0,

    the autocorrelations will oscillate. The ACFfor the function yt = 0.7yt1 + t 0.7t1 is

    shown as the last graph on the left-hand side

    of Figure 2.2.

    47

  • 7/30/2019 Slide Chapter 2

    48/137

    From the above you should be able to able

    to recognize that the correlogram can re-

    veal the pattern of the autoregressive co-

    efficients

    For an ARMA(p,q) model beginning after

    lag q, the values of i will satisfy

    i = a1i1 + a2i2 + . . . + apip.

    6. The Partial Autocorrelation Function

    In an AR(1) process, yt

    and yt2

    are cor-

    related even though yt2 does not directly

    appear in the model

    48

  • 7/30/2019 Slide Chapter 2

    49/137

  • 7/30/2019 Slide Chapter 2

    50/137

    The most direct way to find the partial au-

    tocorrelation function is to first form theseries yt by subtracting the mean of theseries (i.e., ) from each observation toobtain yt yt

    Next, form the first-order autoregression

    yt = 11yt1 + et

    where et is the regression error term whichneed not be a white noise process

    Since there is no intervening values, 11is both the autocorrelation and the partialautocorrelation between yt and yt1

    Now form the second-order autoregression

    y

    t=

    21y

    t1+

    22y

    t2+ et

    Here 22 is the partial autocorrelation co-efficient between yt and yt2

    50

  • 7/30/2019 Slide Chapter 2

    51/137

    In other words, 22 is the correlation be-

    tween between yt and yt2 controlling for(i.e., netting out) the effect of yt1

    Repeating the process for all additional lags

    s yields the partial autocorrelation function

    (PACF)

    Using Yule-Walker equations, one can form

    the partial autocorrelations from the auto-

    correlations as

    11 = 1 (35)

    22 =(2

    21)

    (1 21)(36)

    and for additional lags,

    ss =

    s s1j=1 s1,jsj

    1 s1

    j=1 s1,jj(37)

    where sj = s1,j sss1,sj,

    j = 1, 2, 3, . . . , s 1.

    51

  • 7/30/2019 Slide Chapter 2

    52/137

    For an AR(p) process, there is no direct

    correlation between yt and yts for s > p

    Hence, for s > p, all values of ss will be

    zero and the PACF for pure AR(p) should

    cut off to zero for all lags greater than p

    In contrast,The PACF of an MA(1) pro-

    cess: yt = t + t1

    As long as = 1, we can write yt/(1 +L) = t, which we know has the AR()

    representation

    yt yt1 2yt2

    3yt3 + . . . = t.

    Therefore, the PACF will not jump to zero

    since yt will be correlated with all of its own

    lags

    52

  • 7/30/2019 Slide Chapter 2

    53/137

    Instead, the PACF coefficients exhibit a ge-

    ometrically decaying pattern

    If < 0, decay is direct and if > 0, the

    PACF coefficients oscillate

    The right-hand side of the fifth panel in

    Figure 2.2 shows the PACF for the ARMA(1,1)

    model:

    yt = 0.7yt1 + t 0.7t1

    More generally, the PACF of a stationary

    ARMA(p, q) process must ultimately decay

    toward zero beginning at lag p

    The decay pattern depends on the coef-

    ficients of the lag polynomial (1 + 1L +

    2L2 + . . . + qLq)

    53

  • 7/30/2019 Slide Chapter 2

    54/137

    Table 2.1 summarizes some of the properties

    of the ACF and PACF for various ARMA pro-cesses. For stationary processes, the key points

    to note are the following:

    The ACF of an ARMA(p, q) process will

    begin to decay after lag q. After lag q,the coefficients of the ACF (ie., the i)

    will satisfy the difference equation (i =

    a1i1 + a2i2 + . . . + apip. Since the

    characteristic roots are inside the unit cir-

    cle, the autocorrelations will decay after

    lag q. Moreover, the pattern of the auto-correlation coefficients will mimic that sug-

    gested by the characteristic roots.

    The PACF of an ARMA(p, q) process will

    begin to decay after lag p. After lag p, thecoefficients of the PACF (ie., the ss) will

    mimic the ACF coefficients from the model

    yt/(1 + 1L + 2L2 + . . . + qLq).

    54

  • 7/30/2019 Slide Chapter 2

    55/137

    We can illustrate the usefulness of the ACFand PACF functions using the model yt = a0 +

    0.7yt1 + t. If we compare the top two graphs

    in Figure 2.2, the ACF shows the monotonic

    decay of the autocorrelations while the PACF

    exhibits the single spike at lag 1. Suppose aresearcher collected sample data and plotted

    the ACF and PACF functions. If the actual

    patterns compared favorably to the theoreti-

    cal patterns, the researcher might try to fit an

    AR(1) model. Conversely, if the ACF exhibited

    a single spike and the PACF exhibited mono-

    tonic decay the researcher might try an MA(1)

    model.

    7. Sample Autocorrelations of Stationary

    Time series

    Let there be T observations y1, y2, . . . , yT. If

    the data series is stationary, we can use the

    sample mean y, sample variance 2, and

    55

  • 7/30/2019 Slide Chapter 2

    56/137

    sample autocorrelations rs, as estimates of the

    population mean , population variance 2,

    and population autocorrelations s, respectively,

    where

    y = (1/T)

    Tt=1

    yt (38)

    2 = (1/T)T

    t=1

    (yt y)2 (39)

    and for s = 1, 2, . . . ,

    rs =

    Tt=s+1(yt y)(yts y)T

    t=1(yt y)2

    . (40)

    The sample ACF and PACF can be compared

    to theoretical ACF and PACF to identify the

    actual data generating process. If the truevalue ofs = 0, that is, the true data-generating

    process is MA(s-1), the sampling variance of

    56

  • 7/30/2019 Slide Chapter 2

    57/137

    rs is given by

    V ar(rs) = T1 for s = 1

    = T1( 1 + 2s1

    j=1

    r2j ) for s > 1(41)

    It T is large, rs is distributed normally with

    mean zero. For the PACF coefficients, underthe null hypothesis of an AR(p) model, that

    is, under the null that all p+i,p+i are zero, the

    variance of p+i,p+i is approximately T1.

    We can test for significance of sample ACF

    and sample PACF using (41). For example, if

    we use a 95% confidence interval, (i.e., 2 stan-

    dard deviations), and the calculated value of r1exceeds 2T1/2, it is possible to reject the null

    hypothesis that the first-order autocorrelation

    is not statistically significantly different fromzero. Rejecting this hypothesis means reject-

    ing an MA(s 1) = MA(0) process and ac-

    cepting the alternative q > 0. Next, try s = 2.

    57

  • 7/30/2019 Slide Chapter 2

    58/137

    Then Var(r2) = (1 + 2r2

    1)/T. If r1 = 0.5 andT = 100, then Var(r2) = 0.015 and SD(r2)

    = 0.123. Thus, if the calculated value of r2exceeds 2(0.123), it is possible to reject the

    null hypothesis, H0 : 2 = 0. Again, rejecting

    the null means accepting the alternative that

    q > 1. Proceeding in this way it is possible to

    identify the order of the process.

    Box and Pierce (1970) developed the Q-

    statistic to test whether a group of auto-correlations is significantly different from

    zero. Under the null hypothesis H0 : 1 =

    2 = . . . = s = 0, the statistic

    Q = Ts

    k=1r2k

    is asymptotically distributed as a 2 with s

    degrees of freedom

    58

  • 7/30/2019 Slide Chapter 2

    59/137

    The intuition behind the use of this statis-

    tic is that large sample autocorrelations will

    lead to large values of Q, while a white

    noise process (in which autocorrelations at

    all lags should be zero) would have a Q

    value of zero

    Thus if the calculated value of Q exceeds

    the appropriate value in a 2 table, we can

    reject the null of no significant autocorre-

    lations

    Rejecting the null means accepting an al-

    ternative that at least one autocorrelation

    is non-zero

    A problem with the Box-Pierce Q-statistic:

    it works poorly even in moderately large

    samples

    59

  • 7/30/2019 Slide Chapter 2

    60/137

    Remedy: Modified Q-statistic of Ljung and

    Box (1978):

    Q = T(T + 2)s

    k=1r2k /(T k) (42)

    If the sample value of Q from (42) exceeds

    the critical value of 2 with s degrees of

    freedom, then at least one value of rk is

    statistically significantly different from zero

    at the specified significance level

    The Box-Pierce and Ljung-Box Q-statistics

    also serve as a check to see if the residu-

    als from an estimated ARMA(p, q) modelbehave as a white noise process

    60

  • 7/30/2019 Slide Chapter 2

    61/137

    However, when the

    sautocorrelations from

    an estimated ARMA(p, q) model are formed,

    the degrees of freedom are reduced by the

    number of estimated coefficients

    Hence, using the residuals of an ARMA(p, q)model, Q has a 2 distribution with spq

    degrees of freedom (if a constant is in-

    cluded in the estimation, the degrees of

    freedom are s p q 1)

    Model Selection Criteria

    A natural question to ask of any estimated

    model is: How well does it fit the data?

    The larger the lag orders p and/or q, the

    smaller is the sum of squares of the esti-

    mated residuals of the fitted model

    61

  • 7/30/2019 Slide Chapter 2

    62/137

    However, adding such lags entails estima-

    tion of additional coefficients and an asso-ciated loss of degrees of freedom

    Moreover, inclusion of extraneous coeffi-cients will reduce the forecasting perfor-

    mance of the fitted model

    Thus, increasing the lag lengths p and/orq, involves both benefits and costs

    If we choose a lag order that is lower thannecessary, we will omit valuable informa-tion contained in the more distant lags, andthus, will underfit the model

    If we choose a lag order that is higher than

    necessary, we will overfit the model andestimate extraneous coefficients and injectadditional estimation error into our fore-casts

    62

  • 7/30/2019 Slide Chapter 2

    63/137

    Model selection criteria attempt to choose

    the most parsimonious model by select-ing the lag orders p and/or q by balancingthe benefit of reduced sum of squares ofestimated residuals due to additional lagsagainst the cost of additional estimationerror

    Two most commonly used model selec-tion criteria are Akaike Information Crite-rion (AIC) and Schwartz Bayesian Criterion(SBC).

    AIC = T ln (SSR) + 2n

    SBC = T ln (SSR) + n ln(T)

    where n = number of parameters estimated(p + q+possible constant term),

    T = number of observations,SSR = sum of squared residuals.

    63

  • 7/30/2019 Slide Chapter 2

    64/137

    Estimation of an AR(1) Model

    Beginning with t = 1, 100 values of {yt} aregenerated using the AR(1) process: yt = 0.7yt1+

    t, with the initial condition y0 = 0. The up-

    per left graph of Figure 2.3 shows the sample

    ACF and the upper right graph shows the sam-

    ple PACF of this AR(1) process. It is impor-tant that you compare these ACF and PACF

    to those of the theoretical processes shown in

    Figure 2.2.

    In practice, we never know the true data gen-

    erating process. However, suppose we were

    presented with those 100 sample values and

    were asked to uncover the true process.

    The first step might be to compare the sample

    ACF and PACF to those of the various theo-retical models. The decaying pattern of the

    ACF and the single large spike at lag 1 in the

    sample PACF suggests an AR(1) model. The

    64

  • 7/30/2019 Slide Chapter 2

    65/137

    first three sample autocorrelations are r1 =

    0.74, r2 = 0.58, and r3 = 0.47 (which are some-

    what greater than the corresponding theoreti-

    cal autocorrelations of 0.7, 0.49, and 0.343.

    In the PACF, there is a sizeable spike of 0.74

    at lag 1, and all other autocorrelations (exceptfor lag 12) are very small.

    Under the null hypothesis of an MA(0) pro-

    cess, the standard deviation ofr1 is T1/2 =

    0.1. Since the sample value of r1 = 0.74 ismore than seven standard deviations from

    zero, we can reject the null hypothesis H0 :

    1 = 0

    The standard deviation of r2 is obtainedfrom (41) by taking s = 2:

    V ar(r2) = (1 + 2(0.74)2)/100 = 0.021.

    65

  • 7/30/2019 Slide Chapter 2

    66/137

    Since (0.021)1/2 = 0.1449, the sample value

    of r2 is more than 3 standard deviationsfrom zero; at conventional significance lev-

    els, we can reject the null hypothesis H0 :

    2 = 0

    Similarly, we can test for the significance ofall other values of sample autocorrelations

    It can be seen in the second panel of Figure

    2.3, other than 11, all partial autocorrelations

    (except for lag 12) are less than 2T1/2 = 0.2.The decay of the ACF and the single spike

    of the PACF give strong indication of AR(1)

    model. Nevertheless, if we did not know the

    true underlying process, and happened to be

    using monthly data, we might be concerned

    with the significant partial autocorrelation atlag 12. After all, with monthly data we might

    expect some direct relationship between yt and

    yt12.

    66

  • 7/30/2019 Slide Chapter 2

    67/137

    Although we know that the data were actually

    generated from an AR(1) process, it is illumi-nating to compare the estimates of two differ-

    ent models. Suppose we estimate an AR(1)

    model and also try to capture the spike at lag

    12 with an MA coefficient. Thus, we can con-

    sider the two tentative models as

    Model 1: yt = a1yt1 + t,

    Model 2: yt = a1yt1 + t + 12t12.

    Table 2.2 reports the results of the two esti-

    mations. The coefficient of Model 1 satisfies

    the stability condition |a1| < 1 and has a low

    standard error (the associated t-statistic for anull of zero is more than 12). As a useful di-

    agnostic check, we plot the correlogram of the

    residuals of the fitted model in Figure 2.4.

    The Ljung-Box Q-statistics for these residuals

    indicate that each one of the autocorrelations

    is less than 2 standard deviations from zero.

    The Q-statistics indicate that as a group, lags

    1 through 8, 1 through 16, and 1 through 24

    are not significantly different from zero.

    67

  • 7/30/2019 Slide Chapter 2

    68/137

    This is strong evidence that the AR(1) model

    fits the data well. If the residual autocorrela-tions were significant, the AR(1) model would

    not utilize all available information concern-

    ing movements in the yt sequence. For exam-

    ple, suppose we wanted to forecast yt+1 con-

    ditional on all available information up to and

    including period t. With Model 1, the value ofyt+1 is yt+1 = a1yt + t+1. Hence, the forecast

    from Model 1 is:

    Etyt+1 = Et(a1yt + t+1)

    = Et(a1yt) + Et(t+1)

    = a1yt.

    If the residual autocorrelation had been signifi-

    cant, this forecast would not capture all of the

    available information set.

    Examining the results for Model 2, note thatboth models yield similar estimates for the first-

    order autoregressive coefficient and the asso-

    ciated standard error. However, the estimate

    for

    68

  • 7/30/2019 Slide Chapter 2

    69/137

    12 is of poor quality; the insignificant t-value

    suggests that it should be dropped from the

    model. Moreover, comparing the AIC and the

    SBC values of the two models suggests that

    any benefits of a reduced sum of squared resid-

    uals is overwhelmed by the detrimental effects

    of estimating an additional parameter. All ofthese indicators point to the choice of Model

    1.

    Estimation of an ARMA(1,1) Model

    See ARMA(1,1) & Table 2.3 under Figures& Tables in Chapter 2.

    Estimation of an AR(2) Model

    See AR(2) under Figures & Tables in Chap-

    ter 2.

    8. Box-Jenkins Model Selection

    The estimates of the AR(1), ARMA(1,1) and

    AR(2) models in the previous section illustrate

    69

  • 7/30/2019 Slide Chapter 2

    70/137

    the Box-Jenkins (9176) strategy for appropri-

    ate model selection. Box and Jenkins popular-

    ized a three-stage method aimed at selecting

    an appropriate model for the purpose of esti-

    mating and forecasting a univariate time series.

    In the identification stage, the researcher vi-

    sually examines the time plot of the series, theautocorrelation function, and the partial auto-

    correlation function. Plotting the time path

    of the {yt} sequence provides useful informa-

    tion concerning outliers, missing values, and

    structural breaks in the data. Nonstationary

    variables may have a pronounced trend or ap-

    pear to meander without a constant long-run

    mean or variance. Missing values and outliers

    can be corrected at this point. Earlier, a stan-

    dard practice was to first-difference any series

    deemed to be nonstationary. Currently, a largeliterature is evolving that develops formal pro-

    cedures to check for nonstationarity. We defer

    this discussion until Chapter 4 and

    70

  • 7/30/2019 Slide Chapter 2

    71/137

    assume that we are working with stationary

    data. A comparison of the sample ACF andsample PACF to those of various theoretical

    ARMA processes may suggest several plausi-

    ble models. In the estimation stage each of

    the tentative models is fit and the various ai

    and i coefficients are examined. In this sec-ond stage, the estimated models are compared

    using the following criteria.

    Parsimony

    A fundamental idea in the Box-Jenkins approach

    is the principle of parsimony. Incorporatingadditional coefficients will necessarily increase

    fit (e.g., the value of R2 will increase) at a

    cost of reducing degrees of freedom. Box and

    Jenkins argue that parsimonious models pro-

    duce better forecasts than overparameterized

    models. A parsimonious model fits the data

    well without incorporating any needless coef-

    ficients. The aim is to approximate the true

    data generating process but not to pin down

    71

  • 7/30/2019 Slide Chapter 2

    72/137

    the exact process. The goal of parsimony sug-gested eliminating the MA(12) coefficient in

    the simulated AR(1) model shown earlier.

    In selecting an appropriate model, the econo-

    metrician needs to be aware that several differ-

    ent models may have similar properties. As an

    extreme example, note that the AR(1) model

    yt = 0.5yt1 + t

    has the equivalent infinite-order moving-average

    representation of

    yt = t + 0.5t1 + 0.25t2 + 0.125t3

    +0.0625t4 + . . . .

    In most samples, approximating this MA()

    process with an MA(2) or MA(3) model willgive a very good fit. However, the AR(1)

    model is the more parsimonious model and is

    preferred.

    72

  • 7/30/2019 Slide Chapter 2

    73/137

    One also needs to be aware of the common

    factor problem. Suppose we wanted to fit theARMA(2,3) model

    (1 a1L a2L2)yt = (1 + 1L + 2L

    2 + 3L3)t.

    (43)

    Suppose that (1 a1L a2L2) and (1 + 1L +

    2L2 + 3L3) can each be factored as (1 +cL)(1 + aL) and (1 + cL)(1 + b1L + b2L

    2), re-

    spectively. Since (1 + cL) is a common factor

    to each, (43) has the equivalent, but more par-

    simonious, form

    (1 + aL)yt = (1 + b1L + b2L2

    )t. (44)In order to ensure that the model is parsimo-

    nious, the various ai and i should all have t-

    statistics of 2.0 or greater (so that each coeffi-

    cient is significantly different from zero at the

    5% level). Moreover, the coefficients should

    not be strongly correlated with each other.

    Highly collinear coefficients are unstable, usu-

    ally one or more can be eliminated from the

    model without reducing forecasting performance.

    73

  • 7/30/2019 Slide Chapter 2

    74/137

    Stationarity and Invertibility

    The distribution theory underlying the use of

    the sample ACF and PACF as approximations

    to those of the true data generating process

    is based on the assumption of stationarity of

    the yt sequence. Moreover, t-statistics and Q-

    statistics also presume that the data are sta-tionary. The estimated autoregressive coeffi-

    cients should be consistent with this underlying

    assumption. Hence, we should be suspicious of

    an AR(1) model if the estimated value of a1

    is close to unity. For an ARMA(2,q) model,the characteristic roots of the estimated poly-

    nomial (1 a1L a2L2) should be outside the

    unit circle.

    The Box-Jenkins methodology also necessitates

    that the model be invertible. Formally, yt is

    invertible if it can be represented by a finite-

    order or convergent autoregressive process. In-

    vertibility is important because the use of the

    74

  • 7/30/2019 Slide Chapter 2

    75/137

    ACF and PACF implicitly assumes that the {yt}sequence can be represented by an autoregres-

    sive model. As a demonstration, consider the

    simple MA(1) model

    yt = t 1t1 (45)

    so that if |1| < 1,

    yt/(1 1L) = t

    or

    yt + 1yt

    1 + 2

    1

    yt

    2 + 3

    1

    yt

    3 + . . . = t. (46)

    If |1| < 1, (46) can be estimated using the

    Box-Jenkins method. However, if |1| 1,

    the {yt} sequence cannot be represented by a

    finite-order AR process, and thus, it is not in-

    vertible. More generally, for an ARMA model

    to have a convergent AR representation, the

    roots of the polynomial (1+ 1L + 2L2 + . . . +

    qLq) must lie outside the unit circle.

    75

  • 7/30/2019 Slide Chapter 2

    76/137

    We note that there is noting improper about a

    noninvertible model. The {yt} sequence im-plied by yt = t t1 is stationary in thatit has a constant time-invariant mean [Eyt =Eyts = 0], a constant time-invariant variance[V ar(yt) = V ar(yts) =

    2(1 + 21) + 22], and

    the autocovariances 1 = 12 and all other

    s = 0. The problem is that the technique doesnot allow for the estimation of such models. If

    1 = 1, (46) becomes

    yt + yt1 + yt2 + yt3 + yt4 + . . . = t.

    Clearly, the autocorrelations and partial auto-

    correlations between yt and yts will never de-cay.

    Goodness of Fit

    R2 and the average of the residual sum of

    squares are common measures of goodness offit in ordinary least squares.

    AIC and SBC are more appropriate measures

    of fit in time series models.

    76

  • 7/30/2019 Slide Chapter 2

    77/137

    Caution must be exercised if estimates fail to

    converge rapidly. Failure of rapid convergencemight be indicative of estimates being unsta-

    ble. In such circumstances, adding an addi-

    tional observation or two can greatly alter the

    estimates.

    The third stage of the Box-Jenkins methodol-ogy involves diagnostic checking. The stan-

    dard practice is to plot the residuals to look

    for outliers and for evidence of periods in which

    the model does not fit the data well. If all plau-

    sible ARMA models show evidence of a poor fit

    during a reasonably long portion of the sample,

    it is wise to consider using intervention analy-

    sis, transfer function analysis, or any other of

    the multivariate estimation methods, discussed

    in later chapters. If the variance of the residual

    is increasing, a logarithmic transformation may

    be appropriate. Alternatively, we may wish to

    actually model any tendency of the variance to

    change using the ARCH techniques discussed

    in Chapter 3.

    77

  • 7/30/2019 Slide Chapter 2

    78/137

    It is particularly important that the residuals

    from an estimated model be serially uncorre-

    lated. Any evidence of serial correlation implies

    a systematic movement in the {yt} sequence

    that is not accounted for by the ARMA coeffi-

    cients included in the model. Hence, any of the

    tentative models yielding nonrandom residualsshould be eliminated from consideration. To

    check for correlation in the residuals, construct

    the ACF and the PACF of the residuals of the

    estimated model. Then use (41) and (42) to

    determine whether any or all of the residual

    autocorrelations or partial autocorrelations are

    statistically significant. Although there is no

    significance level that is deemed most appro-

    priate, be wary of any model yielding

    (1) several residual correlations that are marginally

    significant, and(2) a Q-statistic that is barely significant at

    10% level.

    In such circumstances,it is usually possible to

    78

  • 7/30/2019 Slide Chapter 2

    79/137

    formulate a better performing model. If there

    are sufficient observations, fitting the same ARMA

    model to each of two subsamples can provide

    useful information concerning the validity of

    the assumption that the data generating pro-

    cess is unchanging. In the AR(2) model that

    was estimated in the last section, the sam-

    ple was split in half. In general, suppose you

    estimated an ARMA(p, q) model using a sam-

    ple of T observations. Denote the sum of the

    squared residuals as SSR. Now divide the T ob-servations with tm observations in the first and

    tn = T tm observations in the second. Use

    each subsample to estimate the two models

    yt = a0(1) + a1(1)yt1 + . . . + ap(1)ytp

    +t+1(1)t1+. . .+q(1)tq [using t1, . . . , tm]yt = a0(2) + a1(2)yt1 + . . . + ap(2)ytp+t+1(2)t1+. . .+q(2)tq [using tm+1, . . . , tT].

    79

  • 7/30/2019 Slide Chapter 2

    80/137

    Let the sum of the squared residuals from the

    two models be, respectively, SSR1 and SSR2.To test the restriction that all coefficients are

    equal, [i.e., a0(1) = a0(2) and a1(1) = a1(2)

    and . . . ap(1) = ap(2) and 1(1) = 1(2) and

    . . . q(1) = q(2)], conduct an F-test using

    F = (SSR SSR1 SSR2)/n(SSR1 + SSR2)/(T 2n)

    (47)

    where n = number of parameters estimated

    = p+q +1 (if an intercept is included)

    = p + q (if no intercept is included)

    and the numbers of degrees of freedom are(n, T 2n).

    Intuitively, if the coefficients are equal, that

    is, if the restriction is not binding, then the

    sum of squared residuals SSR from the re-

    stricted model and the sum of squared residu-

    als (SSR1+SSR2) from the unrestricted model

    should be equal. Hence, F should be zero.

    Conversely, if restriction is binding, SSR should

    80

  • 7/30/2019 Slide Chapter 2

    81/137

    exceed (SSR1+SSR2). And, the larger the dif-

    ference between SSR and (SSR1 + SSR2), and

    thus, the larger the calculated value of F, the

    larger is the evidence against the hypothesis

    that the coefficients are equal.

    Similarly, a model can be estimated over only aportion of the data set. The estimated model

    can then be used to forecast the known values

    of the series. The sum of the squared forecast

    errors is a useful way to compare the adequacy

    of alternative models. Those models with poor

    out-of-sample forecasts should be eliminated.

    9. Properties of Forecasts

    One of the most important uses of ARMA

    models is to forecast future values of the {yt}

    sequence. To simplify the following discussion,it is assumed that the actual data generating

    process and the current and past realizations

    of {yt} and {t} sequences are known to the

    81

  • 7/30/2019 Slide Chapter 2

    82/137

    researcher. First consider the forecasts of anAR(1)model: yt = a0 + a1yt1 + t. Updating

    one period, we obtain: yt+1 = a0 + a1yt + t+1.

    If we know the coefficients a0 and a1, we can

    forecast yt+1 conditioned on the information

    available at period t as

    Etyt+1 = a0 + a1yt (48)

    where the notation Etyt+j stands for the con-

    ditional expectation of yt+j given the informa-

    tion available at period t. Formally,

    Etyt+j = E(yt+j|yt, yt1, yt2, . . . , t, t1, . . .).

    In the same way, since yt+2 = a0 + a1yt+1 +

    t+2, the forecast of yt+2 conditioned on the

    information available at period t is

    Etyt+2 = a0 + a1Etyt+1

    and using (48)

    Etyt+2 = a0 + a1(a0 + a1yt).

    82

  • 7/30/2019 Slide Chapter 2

    83/137

    Thus forecast of yt+1 can be used to forecast

    yt+2. In other words, forecasts can be con-

    structed using forward iteration; the forecastof yt+j can be used to forecast yt+j+1. Since

    yt+j+1 = a0 + a1yt+j + t+j+1, it follows that

    Etyt+j+1 = a0 + a1Etyt+j. (49)

    From (48) and (49) it should be clear thatit is possible to obtain the entire sequence

    of j-step-ahead forecasts by forward iteration.

    Consider

    Etyt+j = a0(1 + a1 + a21 + . . . + a

    j11 ) + a

    j1yt.

    83

  • 7/30/2019 Slide Chapter 2

    84/137

    This equation, called the forecast function,

    expresses all of the j-step-ahead forecasts asfunctions of the information set in period t.

    Unfortunately, the quality of the forecasts de-

    clines as we forecast further out into the fu-

    ture. Think of (49) as a first-order differ-

    ence equation in the {Ety

    t+j} sequence. Since

    |a1| < 1, the difference equation is stable, and

    it is straightforward to find the particular so-

    lution to the difference equation. If we take

    the limit of Etyt+j as j , we find that

    Etyt+j a0/(1 a1). This result is quite gen-

    eral. For any stationary ARMA model, theconditional forecast of yt+j converges to the

    unconditional mean as j .

    Because the forecasts from an ARMA model

    will not be perfectly accurate, it is important to

    consider the properties of the forecast errors.Forecasting from time period t, we can define

    the j-step-ahead forecast error et(j) as the dif-

    ference between the realized value of yt+j and

    84

  • 7/30/2019 Slide Chapter 2

    85/137

    the forecast value Etyt+j. Thus

    et(j) yt+j Etyt+j.

    Hence, the 1-step-ahead forecast error et(1) =

    yt+1Etyt+1 = t+1 (i.e., the unforecastable

    portion of yt+1 given the information available

    in period t).

    To find the two-step-ahead forecast error, we

    need to form et(2) = yt+2 Etyt+2. Since

    yt+2 = a0 + a1yt+1 + t+2 and Etyt+2 = a0 +

    a1Etyt+1, it follows that

    et(2) = a1(yt+1 Etyt+1) + t+2= t+2 + a1t+1.

    Proceeding in a like manner, you can demon-

    strate that for the AR(1) model, j-step-ahead

    forecast error et(j) is given by

    et(j) = t+j + a1t+j1 + a21t+j2 + a

    31t+j3

    + . . . + aj11 t+1. (50)

    85

  • 7/30/2019 Slide Chapter 2

    86/137

    Since the mean of (50) is zero, the forecasts

    are unbiased estimates of each value yt+j. It

    can be seen as follows. Since Ett+j = Ett+j1 =

    . . . = Ett+1 = 0, the conditional expectation

    of (50) is Etet(j) = 0. Since the expected

    value of the forecast error is zero, the fore-

    casts are unbiased.

    Next we look at the variance of the forecast er-

    ror. To compute the forecast error, continue

    to assume that the elements of the {t} se-

    quence are independent with a variance equalto 2. Then using (50), the variance of the

    forecast error is

    V ar[et(j)] = 2[1 + a21 + a

    41 + a

    61 + . . . + a

    2(j1)1 ].

    (51)

    for j = 1, 2, . . . , . Thus, the one-step-aheadforecast error variance is 2, the two-step-ahead

    forecast error variance is 2(1 + a21), and so

    forth. The essential point to note is that the

    86

  • 7/30/2019 Slide Chapter 2

    87/137

    forecast error variance is an increasing function

    of j. Consequently, we can have more confi-

    dence in short-term forecasts than in long-term

    forecasts. In the limit, as j , the forecast

    error variance converges to 2/(1 a21); hence,

    the forecast error variance converges to the

    unconditional variance of the {yt} sequence.

    Moreover, assuming the {t} sequence is nor-

    mally distributed, you can place confidence in-

    tervals around the forecasts. The one-step-

    ahead forecast of yt+1 is a0 + a1yt and theforecast error is 2. Therefore, the 95% confi-

    dence interval for the one-step-ahead forecast

    can be constructed as

    a0 + a1yt 1.96.

    We can construct a confidence interval for thetwo-step-ahead forecast error in a similar way.

    Using (49), the two-step-ahead forecast is Etyt+2= a0(1+a1)+a

    21yt. Again using (51), we know

    87

  • 7/30/2019 Slide Chapter 2

    88/137

    that V ar[et(2)] = 2(1 + a21). Thus, the 95%

    confidence interval for the two-step-ahead fore-cast is

    a0(1 + a1) + a21yt 1.96(1 + a

    21)

    1/2.

    Higher-Order Models

    Now we generalize the above discussion to de-rive forecasts for any ARMA(p, q) model. To

    keep the algebra simple, consider the ARMA(2,1)

    model:

    yt = a0 + a1yt1 + a2yt2 + t + 1t1. (52)

    Updating one period yields

    yt+1 = a0 + a1yt + a2yt1 + t+1 + 1t.

    If we continue to assume that (1) all the coef-

    ficients are known; (2) all variables subscripted

    t, t 1, t 2, . . . are known at period t; and (3)

    Ett+j = 0 for j > 0, the conditional expecta-

    tion of yt+1 is

    Etyt+1 = a0 + a1yt + a2yt1 + 1t. (53)

    88

  • 7/30/2019 Slide Chapter 2

    89/137

    Equation (53) is the one-step-ahead forecast

    of yt+1. The one-step-ahead forecast error:

    et(1) = yt+1 Etyt+1 = t+1.

    To find the two-step-ahead forecast, update

    (52) by two periods

    yt+2 = a0 + a1yt+1 + a2yt + t+2 + 1t+1.

    The conditional expectation of yt+2 is

    Etyt+2 = a0 + a1Etyt+1 + a2yt. (54)

    Equation (54) expresses the two-step-ahead

    forecast in terms of the one-step-ahead fore-

    cast and current value of yt. Combining (53)

    and (54) yields

    Etyt+2 = a0+a1[a0 +a1yt +a2yt1 +1t]+a2yt= a0(1 + a1) + [a

    21 + a2]yt + a1a2yt1

    + a11t.

    89

  • 7/30/2019 Slide Chapter 2

    90/137

    To find the two-step-ahead forecast error, sub-

    tract (54) from yt+2. Thus,

    et(2) = yt+2 Etyt+2= [a0 + a1yt+1 + a2yt + t+2 + 1t+1]

    [a0 + a1Etyt+1 + a2yt]

    = a1(yt+1 Etyt+1) + t+2

    +1t+1. (55)

    Since yt+1 Etyt+1 is equal to the one-step-ahead forecast error t+1, we can write the

    forecast error as et(2) = (a1 + 1)t+1 + t+2.

    Alternatively,

    et(2) = yt+2 Etyt+2= [a0 + a1yt+1 + a2yt + t+2 + 1t+1]

    [a0(1 + a1) + [a21 + a2]yt + a1a2yt1

    +a11t]

    = (a1 + 1)t+1 + t+2. (56)

    Finally, all j-step-ahead forecasts can be ob-tained from

    Etyt+j = a0 + a1Etyt+j1 + a2Etyt+j2, j 2.(57)

    90

  • 7/30/2019 Slide Chapter 2

    91/137

    Equation (57) suggests that the forecasts will

    satisfy a second-order difference equation. As

    long as the characteristic roots of (57) lie in-

    side the unit circle, the forecasts will converge

    to the unconditional mean: a0/(1 a1 a2).

    We can use (57) to find the j-step-ahead fore-

    cast errors. Since yt+j = a0 + a1yt+j1 +a2yt+j2 + t+j + 1t+j1, the j-step-ahead

    forecast error:

    et(j) = yt+j Etyt+j

    = [a0 + a1yt+j1 + a2yt+j2 + t+j

    +1t+j1]

    Et[a0 + a1yt+j1 + a2yt+j2

    +t+j + 1t+j1]

    = a1(yt+j1 Etyt+j1)

    +a2(yt+j2 Etyt+j2)

    +t+j + 1t+j1

    = a1et(j 1) + a2et(j 2)

    +t+j + 1t+j1. (58)

    91

  • 7/30/2019 Slide Chapter 2

    92/137

    In practice, we will not know the actual order

    of the ARMA process or the actual values ofthe coefficients of that process. Instead, to

    create out-of-sample forecasts, it is necessary

    to use the estimated coefficients from what

    we believe to be the appropriate form of the

    ARMA model. Suppose we have T observa-

    tions of the {yt} sequence and choose to fit

    an ARMA(2,1) model to the data. Let a hat

    or caret, (i.e., ) over a parameter denote the

    estimated value of the parameter and let {t}

    denote the residuals of the estimated model.

    Hence the estimated ARMA(2,1) model can

    be written as

    yt = a0 + a1yt1 + a2yt2 + t + 1t1.

    Given that the sample contains T observations,

    the out-of-sample forecasts can be easily con-

    structed. For example, we can use (53) to

    forecast the value of yT+1 conditional on the

    T observations as

    ETyT+1 = a0 + a1yT + a2yT1 + 1T. (59)

    92

  • 7/30/2019 Slide Chapter 2

    93/137

    Once we know the values of a0, a1, a2, and 1,(59) can easily be constructed using the ac-

    tual values of yT, yT1, and T. Similarly, the

    forecast of yT+2 can be constructed as

    ETyT+2 = a0 + a1ETyT+1 + a2yT

    where ETyT+1 is the forecast from (59).

    Given these two forecasts, all subsequent fore-

    casts can be obtained from the difference equa-

    tion

    ETyT+j = a0+a1ETyT+j1+a2ETyT+j2, j 2.

    Note: it is much more difficult to construct

    confidence intervals for the forecast errors. Not

    only is it necessary to include the effects of

    the stochastic variation in the future values of{yT+1}, it is also necessary to incorporate the

    fact that the coefficients are estimated with

    errors.

    93

  • 7/30/2019 Slide Chapter 2

    94/137

    Now that we have estimated a series and have

    forecasted its future values, the obvious ques-tion is: How good are our forecasts? Typically,

    there will be several plausible models that we

    can select to use for our forecasts. Do not be

    fooled into thinking that the one with the best

    fit is the one that will forecast the best. To

    make a simple point, suppose you wanted to

    forecast the future values of the ARMA(2,1)

    process given by (52). If you could forecast

    the value of yT+1 using (53), you would ob-

    tain the one-step-ahead forecast error

    eT(1) = yT+1 a0a1yT a2yT1 1T = T+1.

    Since the forecast error is the pure unfore-

    castable portion ofyT+1, no other ARMA model

    can provide you with superior forecasting per-

    formance. However, we need to estimate the

    parameters of the process, so our forecastsmust be made using (59). Therefore, our es-

    timated forecast error will be

    eT(1) = yT+1 (a0 + a1yT + a2yT1 + 1T).

    94

  • 7/30/2019 Slide Chapter 2

    95/137

    Clearly, the two forecast errors are not iden-

    tical. When we forecast using (59), the co-efficients (and residuals) are estimated impre-

    cisely. The forecasts made using the estimated

    model extrapolate this coefficient uncertainty

    into the future. Since coefficient uncertainty

    increases as the model becomes more complex,

    it could be that an estimated AR(1) model

    forecasts the process given by (52) better than

    an estimated ARMA(2,1) model.

    How do we know which one of several rea-

    sonable models has the best forecasting per-formance? One way to determine that is to

    put the alternative models to a head-to-head

    test. Since the future values of the series are

    unknown, you can hold back a portion of the

    observations from the estimation process and

    estimate the alternative models over the short-

    ened span of data and use these estimates to

    forecast the observations of the holdback pe-

    riod. You can then compare the properties of

    95

  • 7/30/2019 Slide Chapter 2

    96/137

    the forecast errors from the alternative mod-

    els. To take a simple example, suppose that{yt} contains a total of 150 observations and

    that you are unsure as to whether an AR(1) or

    an MA(1) model best captures the behavior of

    the series. One way to proceed is to use the

    first 100 observations to estimate both mod-

    els and use each to forecast the value of y101.

    Since you know the actual value of y101, you

    can construct the forecast error obtained from

    AR(1) and from MA(1). These two forecast

    errors are precisely those that someone would

    have made if they had been making a one-step-ahead forecast in period 100. Now, re-

    estimate an AR(1) and an MA(1) model using

    the first 101 observations. Although the esti-

    mated coefficients will change somewhat, they

    are those that someone would have obtained

    in period 101. Use the two models to forecast

    the value of y102. Given that you know the ac-

    tual value of y102, you can construct two more

    forecast errors. Since you know all the values

    96

  • 7/30/2019 Slide Chapter 2

    97/137

    of the {yt} sequence through period 150, you

    can continue this process so as to obtain twoseries of one-step-ahead forecast errors, each

    containing 50 errors. To keep the notation

    simple, let {f1t} and {f2t} denote the sequences

    of forecasts from the AR(1) and the MA(1),

    respectively. Similarly, let {e1t} and {e2t} de-

    note the sequences of forecast errors from the

    AR(1) and the MA(1), respectively. Then it

    should be clear that f11 = E100y101 is the first

    forecast using the AR(1), e11 = y101 f11 is

    the first forecast error (where the first hold

    back observation is y101), and e2,50 is the lastforecast error from the MA(1).

    It is desirable that the forecast errors have a

    mean zero and a small variance. A regression-

    based method to assess the forecasts is to use

    the 50 forecasts from the AR(1) to estimatean equation of the form

    y100+t = a0 + a1f1t + v1t, t = 1, 2, . . . , 50.

    97

  • 7/30/2019 Slide Chapter 2

    98/137

    If the forecasts are unbiased, an F-test shouldallow us to impose the restriction a0 = 0 and

    a1 = 1. Similarly, the residual series v1t should

    act as a white noise process. It is a good idea

    to plot v1t against y100+t to determine if there

    are periods in which our forecasts are espe-

    cially poor. Now repeat the process with the

    forecasts from the MA(1). In particular, use

    the 50 forecasts from the MA(1) to estimate

    y100+t = b0 + b1f2t + v2t, t = 1, 2, . . . , 50.

    Again, if we use an F-test, we should not beable to reject the joint hypothesis b0 = 0 and

    b1 = 1. If the significance levels from the two

    F-tests are similar, we might select the model

    with the smallest residual variance: that is, se-

    lect the AR(1) if V ar(y1t

    ) < V ar(y2t

    ).

    More generally, we might want to have a hold-

    back period that differs from 50 observations.

    98

  • 7/30/2019 Slide Chapter 2

    99/137

    With a very small sample, it may not be possi-ble to hold back 50 observations. Small sam-

    ples are a problem since Ashley (1997) shows

    that very large samples are often necessary to

    reveal a significant difference between the out-

    of-sample forecasting performances of similar

    models. Hence, we need to have enough ob-

    servations to have well-estimated coefficients

    for the in-sample period and enough out-of-

    sample forecasts so that the test has good

    power. If we have a large sample, it is typi-

    cal to hold back as much as 50% of the dataset. Also, we might want to use j-step-ahead

    forecasts instead of one-step-ahead forecasts.

    For example, if we have quarterly data and

    want to forecast one year into the future, we

    can perform the analysis using four-step-aheadforecasts. Nevertheless, once we have the two

    sequences of forecast errors, we can compare

    their properties.

    99

  • 7/30/2019 Slide Chapter 2

    100/137

    Instead of using a regression based approach, a

    researcher could select a model with the small-est mean square prediction error (MSPE). If

    there are H observations in the hold back pe-

    riods, the MSPE for the AR(1) can be calcu-

    lated as

    M SP E = 1HHi=1 e21i

    Several methods have been proposed to deter-

    mine whether one MSPE is statistically differ-

    ent from the other. If we put the larger of

    the two MSPEs in the numerator, a standard

    recommendation is to use the F-statistic

    F =H

    i=1

    e21i/H

    i=1

    e22i (60)

    The intuition is that the value of F will equal

    unity if the forecast errors from the two models

    are identical. A very large value of F implies

    that the forecast errors from the first model

    are substantially larger than those from the

    second. Under the null hypothesis of equal

    100

  • 7/30/2019 Slide Chapter 2

    101/137

    forecasting performance, (60) has a standard

    F distribution with (H, H) degrees of freedomif the following 3 assumptions hold. The fore-

    cast errors are

    1. normally distributed with zero mean,

    2. serially uncorrelated, and

    3. contemporaneously uncorrelated.

    Although it is common practice to assume that

    the {et} sequence is normally distributed, it is

    not necessarily the case that the forecast errors

    are normally distributed with zero mean. Sim-

    ilarly, the forecasts may be serially correlated;this is particularly true if we use multi-step-

    ahead forecasts. For example, equation (56)

    indicated that the two-step-ahead forecast er-

    ror for yt+2 is

    et(2) = (a1 + 1)t+1 +

    t+2

    and updating et(2) by one period yields the

    two-step-ahead forecast error for yt+3 as

    et+1(2) = (a1 + 1)t+2 + t+3.

    101

  • 7/30/2019 Slide Chapter 2

    102/137

    Thus predicting yt+2 from the perspective of

    period t and predicting yt+2 from the perspec-tive of period t+1 both contain an error due tothe presence of t+2. This induces serial cor-relation between the two forecast errors. For-mally it can be seen as follows:

    E[et(2)e

    t+1(2)] = (a

    1+

    1)2 = 0.

    However, for i > 1, E[et(2)et+1(2)] = 0 sincethere are no overlapping forecasts. Hence, theautocorrelations of the two-step-ahead fore-cast errors cut off to zero after lag 1. As anexercise you can demonstrate the general re-

    sult that j-step-ahead forecast errors act as anMA(j 1) process.

    Finally, the forecast errors from the two alter-native models will usually be highly correlatedwith each other. For example, a negative re-

    alization of t+1 will tend to cause the fore-casts from both models to be too high. Alsonote: the violation of any of the 3 assump-tions means that the ratio of the MSPEs in(60) does not have an F-distribution.

    102

  • 7/30/2019 Slide Chapter 2

    103/137

    The Granger-Newbold Test

    Granger and Newbold (1976) show how to over-come the problem of contemporaneously cor-related forecast errors. Use the two sequencesof forecast errors to form

    xt = e1t + e2t and zt = e1t e2t.

    If assumptions 1 and 2 are valid, then underthe null hypothesis of equal forecast accuracy,xt and zt should be uncorrelated. That is,

    xz = Extzt = E(e21t e

    22t).

    should be zero. Model 1 has a larger MSPE if

    xz is positive and Model 2 has a larger MSPEif xz is negative. Let rxz denote the samplecorrelation coefficient between {xt} and {zt}.Granger and Newbold (1976) show that

    rxz/

    (1 r2xz)/(H 1) (61)

    has a t-distribution with H 1 degrees of free-dom. Thus, if rxz is statistically significantlydifferent from zero, model 1 has a larger MSPEifrxz is positive and model 2 has a larger MSPEif rxz is negative.

    103

  • 7/30/2019 Slide Chapter 2

    104/137

    The Diebold-Mariano Test

    Diebold and Mariano (1995) relaxes assump-

    tions 1 - 3 and allow for an objective func-

    tion that is not quadratic. This is important

    because if, for example, an investors loss de-

    pends on the size of the forecast error, the

    forecaster should be concerned with the abso-lute values of the forecast errors. As another

    example, an options trader receives a pay-off

    of zero if the value of the underlying asset

    lies below the strike price but receives a one-

    dollar pay-off for each dollar the asset price

    rises above the strike price.

    If we consider only one-step-ahead forecasts,

    we can eliminate the subscript j and let the

    loss from a forecast error in period i be denoted

    by g(ei). In the typical case of mean squarederrors, the loss is e2i . To allow the loss function

    to be general, we can write the differential loss

    in period i from using model 1 versus model 2

    104

  • 7/30/2019 Slide Chapter 2

    105/137

    as di = g(e1i) g(e2i). The mean loss can be

    obtained as

    d =1

    H

    Hi=1

    [g(e1i) g(e2i)]. (62)

    Under the null hypothesis of equal forecast

    accuracy, the value ofd is zero. Since

    d isthe mean of the individual losses, under fairly

    weak conditions, the Central Limit Theorem

    implies that d should have a normal distribu-

    tion. Hence it is not necessary to assume that

    the individual forecast errors are normally dis-

    tributed. Thus if we know V ar(d), we could

    construct the ratio d/

    V ar(d) and test the null

    hypothesis of equal forecast accuracy using a

    standard normal distribution. In practice, to

    implement the test we first need to estimate

    V ar(d).

    If the {di} series are serially uncorrelated with

    a sample variance of 0,the estimate of V ar(d)

    105

  • 7/30/2019 Slide Chapter 2

    106/137

  • 7/30/2019 Slide Chapter 2

    107/137

    sequences of j-step-ahead forecasts, the DM

    statistic is

    DM =d

    0+21+...+2qH+12j+H1j(j1)

    .

    An example showing the appropriate use of the

    Granger-Newbold and Diebold-Mariano tests isprovided in the next section.

    10. A Model of the Producer Price Index

    This section is intended to illustrate some of

    the ambiguities frequently encountered in the

    Box-Jenkins technique. These ambiguities maylead two equally skilled econometricians to es-

    timate and forecast the same series using very

    different ARMA processes. Nonetheless, if you

    make reasonable choices, you will select mod-

    els that come very close to mimicking the ac-

    tual data generating process.

    Now we look at the illustration of Box-Jenkins

    modeling procedure by estimating a quarterly

    107

  • 7/30/2019 Slide Chapter 2

    108/137

    model of the U.S. Producer Price Index (PPI).

    The data used in this section are the series la-beled PPI in the file QUARTERLY.XLS. Panel

    (a) of Figure 2.5 clearly reveals that there is

    little point in modeling the series as being sta-

    tionary; there is a decidedly positive trend or

    drift throughout the period 1960Q1 to 2002Q1.The first difference of the series seems to have

    a constant mean, although inspection of Panel

    (b) suggests that the variance is an increasing

    function of time. As shown in Panel (c), the

    first difference of the logarithm (denoted by

    lppi) is the most likely candidate to be covari-ance stationary. Moreover, there is a strong

    economic reason to be interested in the log-

    arithmic change since lppit is a measure of

    inflation. However, the large volatility of the

    PPI accompanying the oil price shocks in the

    1970s should make us somewhat wary of the

    assumption that the process is covariance sta-

    tionary. At this point, some researchers would

    make

    108

  • 7/30/2019 Slide Chapter 2

    109/137

    additional transformations intended to reduce

    the volatility exhibited in the 1970s. However,

    it seems reasonable to estimate a model of the

    {lppit} sequence without any further trans-

    formations. As always, you should maintain

    a healthy skepticism of the accuracy of your

    model.

    The autocorrelation and partial autocorrela-

    tion functions of the {lppit} sequence can

    be seen in Figure 2.6. Let us try to identify

    the tentative models that we would want toestimate. In making our decision, we note the

    following:

    1. The ACF and PACF converge to zero rea-

    sonably quickly. We do not want to overdiffer-

    ence the data and try to model the {2

    lppit}sequence.

    2. The theoretical ACF of a pure MA(q) pro-

    cess cuts off to zero at lag q and the theoretical

    109

  • 7/30/2019 Slide Chapter 2

    110/137

    ACF of an AR(1) process decays geometrically.

    Examination of Figure 2.6 suggests that nei-ther of these specifications seems appropriate

    for the sample data.

    3. The ACF does not decay geometrically.

    The value of 1 is 0.603 and the values of

    2, 3, and 4 are 0.494, 0.451, and 0.446,respectively. Thus the ACF is suggestive of

    an AR(2) process or a process with both au-

    toregressive and moving average components.

    The PACF is such that 11 = 0.604 and cuts

    off to 0.203 abruptly (i.e., 22 = 0.203). Over-

    all, the PACF suggests that we should considermodels such that p = 1 and p = 2.

    4. Note the jump in ACF after lag 4 and the

    small jump in the PACF at lag 4 (44 = 0.148

    while 55 = - 0.114). Since we are using quar-

    terly data, we might want to incorporate a sea-

    sonal factor at lag 4.

    Points 1 to 4 suggest an ARMA(1,1) or an

    AR(2) model. In addition, we might want to

    110

  • 7/30/2019 Slide Chapter 2

    111/137

    consider models with a seasonal term at lag 4.

    However, to compare with a variety of mod-els, Table 2.4 reports estimates of 6 tentative

    models. To ensure comparability, all were esti-

    mated over the same sample period. We make

    the following observations:

    1. The estimated AR(1) model confirms ouranalysis conducted in the identification stage.

    Even though the estimated value of a1 (0.603)

    is less than unity in absolute value and al-

    most four standard deviations from zero, the

    AR(1) specification is inadequate. Forming

    the Ljung-Box Q-statistic for 4 lags of the resid-

    uals yields a value of 13.9, we can reject the

    null that Q(4) = 0 at the 1% significance level.

    Hence, the lagged residuals of this model ex-

    hibit substantial serial autocorrelation and we

    must eliminate this model from consideration.

    2. The AR(2) model is an improvement over

    the AR(1) specification. The estimated coef-

    ficients (a1 = 0.480 and a2 = 0.209) are each

    111

  • 7/30/2019 Slide Chapter 2

    112/137

    significantly different from zero at conventional

    levels and imply characteristic roots in the unitcircle. However, there are some ambiguity about

    the information content of the residuals. The

    Q-statistics indicate that the autocorrelations

    of the residuals are not statistically significant

    at the 5% level but are significant at the 10%

    level. As measured by the AIC and SBC, the

    fit of the AR(2) model is superior to that of

    the AR(1). Overall, the AR(2) model domi-

    nates the AR(1) specification.

    3. The ARMA(1,1) specification is superior

    to the AR(2) model. The estimated coeffi-cients are highly significant (with t-values of

    14.9 and - 4.41). The estimated value of a1 is

    positive and less than unity and the Q-statistics

    indicate that the autocorrelations of the resid-

    uals are not significant at conventional levels.

    Moreover, all goodness-of-fit measures select

    the ARMA(1,1) specification over the AR(2)

    model. Thus, there is little reason to maintain

    the AR(2) specification.

    112

  • 7/30/2019 Slide Chapter 2

    113/137

    4. In order to account for the possibility of sea-

    sonality, we estimated the ARMA(1,1) modelwith the additional moving average coefficient

    at lag 4. That is, we estimated a model of the

    form: yt = a0 + a1yt1 + t + 1t1 + 4t4.

    Other seasonal patterns are considered in the

    next section. For now, note that the additiveexpression 4t4 is often preferable to an addi-

    tive autoregressive term a4yt4. For truly sea-

    sonal shocks, the expression 4t4 captures

    spikes - not decay - at the quarterly lags. The

    slope coefficients of the estimated ARMA(1,

    (1,4)) model are all highly significant with t-

    statistics of 9.46, -3.41, and 3.63. The Q-

    statistics of the residuals are all very low, im-

    plying that the autocorrelations are not statis-

    tically significantly different from zero. More-

    over, the AIC and SBC select this model overthe ARMA(1,1) mo


Related Documents