Top Banner

of 95

Analysis Time Series

Apr 06, 2018

Download

Documents

Dapot Sihombing
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/3/2019 Analysis Time Series

    1/95

    Time Series AnalysisLecture Notes for 475.726

    Ross Ihaka

    Statistics Department

    University of Auckland

    June 23, 2003

  • 8/3/2019 Analysis Time Series

    2/95

    ii

  • 8/3/2019 Analysis Time Series

    3/95

    Contents

    1 Introduction 1

    1.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Stationarity and Non-Stationarity . . . . . . . . . . . . . . . . . 11.3 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.3.1 Annual Auckland Rainfall . . . . . . . . . . . . . . . . . . 21.3.2 Nile River Flow . . . . . . . . . . . . . . . . . . . . . . . . 21.3.3 Yield on British Government Securities . . . . . . . . . . 21.3.4 Ground Displacement in an Earthquake . . . . . . . . . . 31.3.5 United States Housing Starts . . . . . . . . . . . . . . . . 31.3.6 Iowa City Bus Ridership . . . . . . . . . . . . . . . . . . . 3

    2 Simple Forecasting Methods 7

    2.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . 7

    2.3 Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Parameter Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3 Some Time Series Theory 11

    3.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Linear Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Autoregressive Series . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3.4.1 The AR(1) Series . . . . . . . . . . . . . . . . . . . . . . . 153.4.2 The AR(2) Series . . . . . . . . . . . . . . . . . . . . . . . 163.4.3 Computations . . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.5 Moving Average Series . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.5.1 The MA(1) Series . . . . . . . . . . . . . . . . . . . . . . 233.5.2 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . 233.5.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 23

    3.6 Autoregressive Moving Average Series . . . . . . . . . . . . . . . 253.6.1 The ARMA(1,1) Series . . . . . . . . . . . . . . . . . . . 253.6.2 The ARMA(p,q) Model . . . . . . . . . . . . . . . . . . . 263.6.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 263.6.4 Common Factors . . . . . . . . . . . . . . . . . . . . . . . 26

    3.7 The Partial Autocorrelation Function . . . . . . . . . . . . . . . 263.7.1 Computing the PACF . . . . . . . . . . . . . . . . . . . . 293.7.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 30

    iii

  • 8/3/2019 Analysis Time Series

    4/95

    iv Contents

    3.8 Appendix: Prediction Theory . . . . . . . . . . . . . . . . . . . . 31

    4 Identifying Time Series Models 33

    4.1 ACF Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2 PACF Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.3 System Identification . . . . . . . . . . . . . . . . . . . . . . . . . 374.4 Model Generalisation . . . . . . . . . . . . . . . . . . . . . . . . . 39

    4.4.1 Non-Zero Means . . . . . . . . . . . . . . . . . . . . . . . 394.4.2 Deterministic Trends . . . . . . . . . . . . . . . . . . . . . 414.4.3 Models With Nonstationary AR Components . . . . . . . 414.4.4 The Effect of Differencing . . . . . . . . . . . . . . . . . . 42

    4.5 ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    5 Fitting and Forecasting 45

    5.1 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.1.1 Computations . . . . . . . . . . . . . . . . . . . . . . . . . 45

    5.2 Assessing Quality of Fit . . . . . . . . . . . . . . . . . . . . . . . 515.3 Residual Correlations . . . . . . . . . . . . . . . . . . . . . . . . 525.4 Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    5.4.1 Computation . . . . . . . . . . . . . . . . . . . . . . . . . 535.5 Seasonal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    5.5.1 Seasonal ARIMA Models . . . . . . . . . . . . . . . . . . 555.5.2 Stationary Series . . . . . . . . . . . . . . . . . . . . . . . 555.5.3 Seasonal Series with Trends . . . . . . . . . . . . . . . . . 56

    6 Frequency Domain Analysis 59

    6.1 Some Background . . . . . . . . . . . . . . . . . . . . . . . . . . 596.1.1 Complex Exponentials, Sines and Cosines . . . . . . . . . 596.1.2 Properties of Cosinusoids . . . . . . . . . . . . . . . . . . 606.1.3 Frequency and Angular Frequency . . . . . . . . . . . . . 616.1.4 Invariance and Complex Exponentials . . . . . . . . . . . 61

    6.2 Filters and Filtering . . . . . . . . . . . . . . . . . . . . . . . . . 626.2.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.2.2 Transfer Functions . . . . . . . . . . . . . . . . . . . . . . 636.2.3 Filtering Sines and Cosines . . . . . . . . . . . . . . . . . 646.2.4 Filtering General Series . . . . . . . . . . . . . . . . . . . 646.2.5 Computing Transfer Functions . . . . . . . . . . . . . . . 656.2.6 Sequential Filtering . . . . . . . . . . . . . . . . . . . . . 65

    6.3 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    6.3.1 The Power Spectrum . . . . . . . . . . . . . . . . . . . . . 666.3.2 The Cramer Representation . . . . . . . . . . . . . . . . . 676.3.3 Using The Cramer Representation . . . . . . . . . . . . . 696.3.4 Power Spectrum Examples . . . . . . . . . . . . . . . . . 70

    6.4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . 726.4.1 Some Distribution Theory . . . . . . . . . . . . . . . . . . 726.4.2 The Periodogram and its Distribution . . . . . . . . . . . 746.4.3 An Example Sunspot Numbers . . . . . . . . . . . . . . 756.4.4 Estimating The Power Spectrum . . . . . . . . . . . . . . 766.4.5 Tapering and Prewhitening . . . . . . . . . . . . . . . . . 796.4.6 Cross Spectral Analysis . . . . . . . . . . . . . . . . . . . 80

  • 8/3/2019 Analysis Time Series

    5/95

    Contents v

    6.5 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    6.5.1 A Simple Spectral Analysis Package for R . . . . . . . . . 836.5.2 Power Spectrum Estimation . . . . . . . . . . . . . . . . . 836.5.3 Cross-Spectral Analysis . . . . . . . . . . . . . . . . . . . 84

    6.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

  • 8/3/2019 Analysis Time Series

    6/95

    vi Contents

  • 8/3/2019 Analysis Time Series

    7/95

    Chapter 1

    Introduction

    1.1 Time Series

    Time series arise as recordings of processes which vary over time. A recordingcan either be a continuous trace or a set of discrete observations). We willconcentrate on the case where observations are made at discrete equally spacedtimes. By appropriate choice of origin and scale we can take the observationtimes to be 1, 2, . . . T and we can denote the observations by Y1, Y2, . . . , YT.

    There are a number of things which are of interest in time series analysis.The most important of these are:

    Smoothing: The observed Yt are assumed to be the result of noise values

    t additively contaminating a smooth signal t.

    Yt = t + t

    We may wish to recover the values of the underlying t.

    Modelling: We may wish to develop a simple mathematical model whichexplains the observed pattern of Y1, Y2, . . . , YT. This model may dependon unknown parameters and these will need to be estimated.

    Forecasting: On the basis of observations Y1, Y2, . . . , YT, we may wish topredict what the value of YT+L will be (L 1), and possibly to give anindication of what the uncetainty is in the prediction.

    Control: We may wish to intervene with the process which is producing theyt values in such a way that the future values are altered to producefavourable outcome.

    1.2 Stationarity and Non-Stationarity

    A key idea in time series is that of stationarity. Roughly speaking, a timeseries is stationary if its behaviour does not change over time. This means, forexample, that the values always tend to vary about the same level and thattheir variability is constant over time. Stationary series have a rich theory and

    1

  • 8/3/2019 Analysis Time Series

    8/95

    2 Chapter 1. Introduction

    their behaviour is well understood. This means that they play a fundamental

    role in the study of time series.Obviously, not all time series that we encouter are stationary. Indeed, non-stationary series tend to be the rule rather than the exception. However, manytime series are related in simple ways to series which are stationary. Two im-portant examples of this are:

    Trend models : The series we observe is the sum of a determinstic trendseries and a stationary noise series. A simple example is the linear trendmodel:

    Yt = 0 + 1t + t.

    Another common trend model assumes that the series is the sum of a

    periodic seasonal effect and stationary noise. There are many othervariations.

    Integrated models : The time series we observe satisfies

    Yt+1 Yt = t+1

    where t is a stationary series. A particularly important model of this kindis the random walk. In that case, the t values are independent shockswhich affect perturb the current state Yt by an amount t+1 to produce anew state Yt+1.

    1.3 Some Examples

    1.3.1 Annual Auckland Rainfall

    Figure 1.1 shows the annual amount of rainfall in Auckland for the years from1949 to 2000. The general pattern of rainfall looks similar throughout the record,so this series could be regarded as being stationary. (There is a hint that rainfallamounts are declining over time, but this type of effect can occur over shortishtime spans for stationary series.)

    1.3.2 Nile River Flow

    Figure 1.2 shows the flow volume of the Nile at Aswan from 1871 to 1970. Theseare yearly values. The general pattern of this data does not change over timeso it can be regarded as stationary (at least over this time period).

    1.3.3 Yield on British Government Securities

    Figure 1.3 shows the percentage yield on British Government securities, monthlyover a 21 year period. There is a steady long-term increase in the yields. Overthe period of observation a trend-plus-stationary series model looks like it mightbe appropriate. An integrated stationary series is another possibility.

  • 8/3/2019 Analysis Time Series

    9/95

    1.3. Some Examples 3

    1.3.4 Ground Displacement in an Earthquake

    Figure 1.4 shows one component of the horizontal ground motion resulting froman earthquake. The initial motion (a little after 4 seconds) corresponds to thearrival of the p-wave and the large spike just before six seconds corresponds tothe arrival of the s-wave. Latter features correspond to the arrival of surfacewaves. This is an example of a transient signal and cannot have techniquesappropriate for stationary series applied to it.

    1.3.5 United States Housing Starts

    Figure 1.5 shows the monthly number of housing starts in the Unites States (inthousands). Housing starts are a leading economic indicator. This means thatan increase in the number of housing starts indicates that economic growth is

    likely to follow and a decline in housing starts indicates that a recession may beon the way.

    1.3.6 Iowa City Bus Ridership

    Figure 1.6 shows the monthly average weekday bus ridership for Iowa City overthe period from September 1971 to December 1982. There is clearly a strongseasonal effect suprimposed on top of a general upward trend.

  • 8/3/2019 Analysis Time Series

    10/95

    4 Chapter 1. Introduction

    Year

    AnnualRainfall(cm)

    1950 1960 1970 1980 1990 2000

    80

    100

    120

    140

    160

    180

    200

    Figure 1.1: Annual Auckland rainfall (in cm) from 1949 to 2000 (fromPaul Cowpertwait).

    Year

    Flow

    1880 1900 1920 1940 1960

    600

    800

    1000

    1200

    1400

    Figure 1.2: Flow volume of the Nile at Aswan from 1871 to 1970 (fromDurbin and Koopman).

  • 8/3/2019 Analysis Time Series

    11/95

    1.3. Some Examples 5

    Month

    Percent

    0 50 100 150 200 250

    2

    4

    6

    8

    Figure 1.3: Monthly percentage yield on British Government securitiesover a 21 year period (from Chatfield).

    Seconds

    Displacement

    0 2 4 6 8 10 12

    100

    0

    100

    200

    300

    400

    500

    Figure 1.4: Horizontal ground displacement during a small Californiaearthquake (from Bill Peppin).

  • 8/3/2019 Analysis Time Series

    12/95

    6 Chapter 1. Introduction

    Time

    Housing

    Starts(000s)

    1966 1968 1970 1972 1974

    50

    100

    150

    200

    Figure 1.5: Housing starts in the United States (000s) (from S-Plus).

    Time

    AverageWeeklyRidership

    1972 1974 1976 1978 1980 1982

    4000

    6000

    8000

    10000

    Figure 1.6: Average weekday bus ridership, Iowa City (monthly ave) Sep1971 - Dec 1982.

  • 8/3/2019 Analysis Time Series

    13/95

    Chapter 2

    Simple Forecasting Methods

    2.1 Generalities

    Given observations Y1, . . . , Y T we want to predict YT+1. Denote this forecast byYT+1. The forecast error isYT+1 YT+1

    and we can measure the quality of the forecasting procedure by the mean-squarederror:

    MSE = E(YT+1 YT+1)2It may not be possible to compute this value theoretically, but it can be esti-mated from the observations by

    1

    T

    Tt=1

    (Yt Yt)2.2.2 Exponential Smoothing

    Suppose that the underlying process has a constant mean . If was known itwould provide the minimum mean-square error predictor (E(Ya)2 is minimisedat a = ). Since is not known, we can use the sample mean in its place.

    YT+1 =

    1

    T

    T

    t=1Yt

    Now suppose that instead of being constant the process mean is a slowly varyingfunction of t. It now makes sense to use a weighted mean with the more recentvalues receiving greater weights.

    One way of setting the weights is to progressively discount older observations.This leads to a forecast of the form.YT+1 = c(Yt + Yt1 + 2Yt2 + )for some || < 1. To ensure that the weights sum to 1, we require

    c(1 + + 2 + ) = 1.

    7

  • 8/3/2019 Analysis Time Series

    14/95

    8 Chapter 2. Simple Forecasting Methods

    This mean that c = 1 , and hence thatYT+1 = (1 )(Yt + Yt1 + 2Yt2 + )

    Generating forecasts in this fashion is know as exponential smoothing. Thecoefficient is called the discount coefficient.

    2.3 Updating

    The exponential smoothing procedure can be formulated in simple recursiveform.

    YT+1 = (1 )YT +

    YT

    = (1 )YT + YT + (1 )YT= YT + (1 )(YT YT) (2.1)

    Formula 2.1 provides a recursive way of updating forecasts for the series. Notethat the recursion is sometimes written in terms of = 1 . This makes theformula even simpler. YT+1 = YT + (YT YT)

    In order to run the recursions we need a starting value for Y1. A number ofalternatives have been suggested.

    a)

    Y1 = Y1.

    b) Y1 = Y (The mean of a initial data stretch.)c) Use backward forecasting to predict Y1 from later values.Since || < 1, values in the remote past do not effect predictions to any majorextent. This means that the procedure is relatively insensitive to the value of(or used).

    2.4 An Example

    In their book, Statistical Methods for Forecasting, Abraham and Ledolter presenta time-series for the growth-rate of Iowa non-farm income. This series is shown infigure 2.1. The series is relatively noisy but contains longer term slow variation.

    Any smoothing process applied to the series should follow this slower variation.The result of forecasting this series using = .1 and = .4 and a startingvalue of 1 are shown in figures 2.2 and 2.3. The first of these follows the slowvariation present in the series, while the second of these is probably trackingthe noisy (non-predictable) part of the series too closely.

    2.5 Parameter Choice

    The value of (or ) can have a large effect on the forecasts produced byexponential smoothing. This naturally leads to the question: Which valueof will give the best forecasts? For many observed time series, values of

  • 8/3/2019 Analysis Time Series

    15/95

    2.5. Parameter Choice 9

    Time

    Growth(%)

    1950 1955 1960 1965 1970 1975 1980

    1

    0

    1

    2

    3

    4

    Figure 2.1: The growth-rate of Iowa non-farm income. (from Abrahamand Ledolter).

    Time

    Growth(%)

    1950 1955 1960 1965 1970 1975 1980

    1

    0

    1

    2

    3

    4

    Figure 2.2: Exponential smoothing of the Iowa growth-rate data using = .1 and a starting value of 1.

  • 8/3/2019 Analysis Time Series

    16/95

    10 Chapter 2. Simple Forecasting Methods

    Time

    Growth(%)

    1950 1955 1960 1965 1970 1975 1980

    1

    0

    1

    2

    3

    4

    Figure 2.3: Exponential smoothing of the Iowa growth-rate data using = .4 and a starting value of 1.

    between .7 and .95 seem to work well. The difficulty is that without a theoretical

    description of the underlying process it is impossible to be more precise.One possible approach is to choose the value which minimises

    SSE =Tt=1

    (Yt Yt)2In the case of the Iowa growth data, we can minimise the value of SSE by usinga starting value of 1.478 and an of 0.1031. This produces forecasts which arequite close to those of figure 2.2.

    The problem is that this choice is that it is optimised for the observationsalready seen, and may not work so well for future observations. In order tounderstand the properties of ad-hoc proceedures and to develop better ones it

    is important that we develop more theory.

  • 8/3/2019 Analysis Time Series

    17/95

    Chapter 3

    Some Time Series Theory

    3.1 Time Series

    We will assume that the time series values we observe are the realisations ofrandom variables Y1, . . . , Y T, which are in turn part of a larger stochastic process{Yt : t Z}. It is this underlying process that will be the focus for our theoreticaldevelopment.

    The mean and the variance of random variables have a special place in thetheory of statistics. In time series analysis, the analogs of these are the meanfunction and the autocovariance function.

    Definition 3.1.1 (Mean and Autocovariance Functions): The mean function

    of a time series is defined to be (t) = EYt and the autocovariance function isdefined to be (s, t) = cov(Ys, Yt).

    The mean and the autocovariance functions are fundamental parameters andit would be useful to obtain sample estimates of them. For general time seriesthere are 2T + T(T 1)/2 parameters associated with Y1, . . . , Y T and it is notpossible to estimate all these parameters from T data values.

    To make any progress at all we must impose constraints on the time serieswe are investigating. The most common constraint is that of stationarity. Thereare two common definitions of stationarity.

    Definition 3.1.2 (Strict Stationarity): A time series {Yt : t Z} is said to bestrictly stationary if for any k > 0 and any t1, . . . , tk

    Z, the distribution of

    (Yt1 , . . . , Y tk)

    is the same as that for

    (Yt1+L, . . . , Y tk+L)

    for every value of L.

    This definition says that the stochastic behaviour of the process does notchange through time. If Yt is stationary then

    (t) = (0)

    11

  • 8/3/2019 Analysis Time Series

    18/95

    12 Chapter 3. Some Time Series Theory

    and

    (s, t) = (s t, 0).So for stationary series, the mean function is constant and the autocovariancefunction depends only on the time-lag between the two values for which thecovariance is being computed.

    These two restrictions on the mean and covariance functions are enough fora reasonable amount of theory to be developed. Because of this a less restrictivedefinition of stationarity of often used in place of strict stationarity.

    Definition 3.1.3 (Weak Stationarity): A time series is said to be weakly, wide-sense or covariance stationary if E|Yt|2 < , (t) = and (t + u, t) = (u, 0)for all t and u.

    In the case of Gaussian time series, the two definitions of stationarity are

    equivalent. This is because the finite dimensional distributions of the time seriesare completely characterised by the mean and covariance functions.

    When time series are stationary it is possible to simplify the parameteri-sation of the mean and autocovariance functions. In this case we can definethe mean of the series to be = E(Yt) and the autocovariance function to be(u) = cov(Yt+u, Yt). We will also have occasion to examine the autocorrelationfunction

    (u) =(u)

    (0)= cor(Yt+u, Yt).

    Example 3.1.1 (White Noise) If the random variables which make up {Yt}are uncorrelated, have means 0 and variance 2, then {Yt} is stationary withautocovariance function

    (u) =

    2 u = 0,0 otherwise.

    This type of series is referred to as white noise.

    3.2 Hilbert Spaces

    Suppose that {Yt : t Z} is a stationary zero-mean time series. Linear combi-nations of the form

    N

    i=1ciYti

    can be added together and multiplied by constants and thus form a vector space(provided we identify elements which are equal with probability 1). It is possibleto define an inner product on this space by

    < X,Y >= cov(X, Y) = E(XY),

    and a norm by||X||2 = E|X|2 = var(X).

    The distance between two elements of the space is defined in terms of this norm.

    ||X Y||2 = var(X Y)

  • 8/3/2019 Analysis Time Series

    19/95

    3.2. Hilbert Spaces 13

    The space can be completed by adding limits of sequences. This complete

    inner product space or Hilbert space is called the Hilbert space generated by{Yt : t Z}, and we will denote it by H.Now consider the lag operator L defined by

    LYt = Yt1.

    The operator can be defined for linear combinations by

    L(c1Yt1 + c2Yt2) = c1Yt11 + c2Yt21

    and can be extended to all ofH by a suitable definition for limits.Note that, in addition to being linear, the lag operator preserves inner prod-

    ucts.

    < LYs, LYt > = cov(Ys1, Yt1)

    = cov(Ys, Yt)

    = < Ys, Yt >

    and so is a unitary operator.

    There is a natural calculus of operators on H. For example we can definepowers of L naturally by

    L2Yt = LLYt = LYt1 = Yt2

    L3Yt = LL2Yt = Yt3

    ...

    LkYt = Ytk

    and linear combinations by

    (Lk + Ll)Yt = Ytk + Ytl.

    Other operators can be defined in terms in terms of L. The differencingoperator defined by

    Yt = (1 L)Yt = Yt Yt1is of fundamental importance when dealing with models for non-stationary timeseries. Again, we can define powers of this operator

    2Yt = (Yt)= (Yt Yt1)= (Yt Yt1) (Yt1 Yt2)= Yt 2Yt1 + Yt2.

    We will not dwell on the rich Hilbert space theory associated with time series,but it is important to know that many of the operator manipulations which wewill carry out can be placed on a rigorous footing.

  • 8/3/2019 Analysis Time Series

    20/95

    14 Chapter 3. Some Time Series Theory

    3.3 Linear Processes

    We will now turn to an examination of a large class of useful time series models.These are almost all defined in terms of the lag operator L. As the simplestexample, consider the autoregressive model defined by:

    Yt = Yt1 + t, (3.1)

    with the t being a set of uncorrelated random variables, each with with mean0 and variance 2. From a statistical point of view this model makes perfectsense, but is not clear that any Yt which satisfies this equation exists.

    One way to proceed is to re-arrange equation 3.1 and write in its operatorform.

    (1 L)Yt = t.Formally inverting (1 L) leads to

    Yt = (1 L)1t

    =u=0

    uLut

    =

    u=0

    utu.

    The series on the right is defined as the limit as n ofn

    u=0utu.Loosely speaking, this limit exists if

    ||

    u=n+1

    utu||2 0.

    Since

    ||

    u=n+1

    utu||2 = var(

    u=n+1

    utu)

    =

    u=n+1 ||2u2and || < 1, there is indeed a well-defined solution of 3.1. Further, this solutioncan be written as an (infinite) moving average of current and earlier t values.

    This type of infinite moving average plays a special role in the theory of timeseries.

    Definition 3.3.1 (Linear Processes) The time series Yt defined by

    Yt =

    u=

    utu

  • 8/3/2019 Analysis Time Series

    21/95

    3.4. Autoregressive Series 15

    where t is a white-noise series and

    u=

    |u|2 <

    is called a linear process.

    Many time series can be represented as linear processes. This provides a unifyingtheoretical underpinning for time series theory, but mat be of limited practicalinterest because of the potentially infinite number of parameters required.

    3.4 Autoregressive Series

    Definition 3.4.1 (Autoregressive Series) If Yt satisfies

    Yt = 1Yt1 + + pYtp + twhere t is white-noise and the u are constants, then Yt is called an autore-gressive series of order p, denoted by AR(p).

    Autoregressive series are important because:

    1. They have a natural interpretation the next value observed is a slightperturbation of the most recent observation.

    2. It is easy to estimate their parameters. It can be done with standardregression software.

    3. They are easy to forecast. Again standard regression software will do thejob.

    3.4.1 The AR(1) Series

    The AR(1) series is defined by

    Yt = Yt1 + t. (3.2)

    Because Yt1 and t are uncorrelated, the variance of this series is

    var(Yt) = 2var(Yt1) +

    2 .

    If

    {Yt

    }is stationary then var(Yt) = var(Yt1) =

    2Y and so

    2Y = 22Y +

    2 . (3.3)

    This implies that2Y >

    22Y

    and hence1 > 2.

    In order for equation 3.2 to define a stationary series we must have || < 1.There is an alternative view of this using the operator formulation of equation

    3.2, namely(1 L)Yt = t

  • 8/3/2019 Analysis Time Series

    22/95

    16 Chapter 3. Some Time Series Theory

    It is possible to formally invert the autoregressive operator to obtain

    (1 L)1 = u=0

    uLu.

    Applying this to the series {t} produces the representation

    Yt =u=0

    utu.

    If || < 1 this series converges in mean square because ||2u < . The limitseries is stationary and satisfies equation 3.2. Thus, if || < 1 then there isa stationary solution to 3.2. An equivalent condition is that the root of theequation

    1 z = 0(namely 1/) lies outside the unit circle in the complex plane.

    If we multiply both sides of equation 3.2 by Ytu and take expectations weobtain

    E(YtYtu) = E(Yt1Ytu) + E(tYtu).

    The term on the right is zero because from the linear process representation, tis independent of earlier Yt values. This means that the autocovariances mustsatisfy the recursion

    (u) = (u 1), u = 1, 2, 3, . . . .This is a first-order linear difference equation with solution

    (u) = u(0), u = 0, 1, 2, . . .

    By rearranging equation 3.3 we find (0) = 2/(1 2), and hence that

    (u) =u2

    1 2 , u = 0, 1, 2, . . .

    This in turn means that the autocorrelation function is given by

    (u) = u, u = 0, 1, 2, . . .

    The autocorrelation functions for the the AR(1) series with 1 = .7 and1 = .7 are shown in figure 3.1. Both functions show exponential decay in

    3.4.2 The AR(2) Series

    The AR(2) model is defined by

    Yt = 1Yt1 + 2Yt2 + t (3.4)

    or, in operator form(1 1L 2L2)Yt = t.

    As in the AR(1) case we can consider inverting the AR operator. To see whetherthis is possible, we can consider factorising the operator

    1 1L 2L2

  • 8/3/2019 Analysis Time Series

    23/95

    3.4. Autoregressive Series 17

    1 = 0.7

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 0.7

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    Figure 3.1: Autcorrelation functions for two of AR(1) models.

    and inverting each factor separately. Suppose that

    1 1L 2L2

    = (1 c1L)(1 c2L),then it is clear that we can invert each the operator if we can invert each factorseparately. This is possible if |c1| < 1 and |c2| < 1, or equivalently, if the rootsof the polynomial

    1 1z 2z2lie outside the unit circle. A little algebraic manipulation shows that this equiv-alent to the conditions:

    1 + 2 < 1, 1 + 2 < 1, 2 > 1.These constraints define a triangular region in the 1, 2 plane. The region isshown as the shaded triangle in figure 3.3.

    The autocovariance function for the AR(2) series can be investigated bymultiplying both sides of equation 3.4 by Ytu and taking expectations.

    E(YtYtu) = 1E(Yt1Ytu) + 2E(Yt2Ytu) + E(tYtu).

    This in turns leads to the recurrence

    (u) = 1(u 1) + 2(u 2)with initial conditions

    (0) = 1(1) + 2(2) + 2(1) = 1(0) + 2(1).

  • 8/3/2019 Analysis Time Series

    24/95

    18 Chapter 3. Some Time Series Theory

    or, using the fact that (u) = (u),(0) = 1(1) + 2(2) + 2

    (1) = 1(0) + 2(1).

    The solution to these equations has the form

    (u) = A1Gu1 + A2G

    u2

    where G11 and G12 are the roots of the polynomial

    1 1z 2z2 (3.5)and A1 and A2 are constants which can be determined from the initial condi-tions. In the case that roots are equal, the solution has the general form

    (u) = (A1 + A2u)Gu

    These equations indicate the the autocovariance function for the AR(2) serieswill exhibit (exponential) decay as u .

    If Gk corresponds to a complex root, then

    Gk = |Gk|eik

    and henceGuk = |Gk|ueiku = |Gk|u(cos ku + i sin ku)

    Complex roots will thus introduce a pattern of decaying sinusoidal variationinto the covariance function (or autocorrelation function). The region of the 1,2 plane corresponding complex roots is indicated by the cross-hatched regionin figure 3.3.

    The AR(p) Series

    The AR(p) series is defined by

    Yt = 1Yt1 + + pYtp + t (3.6)This is stationary if the roots of

    1 1z pzp (3.7)lie outside the unit circle.

    The autocovariance function can be investigated by multiplying equation 3.6by Ytu and taking expectations. This yields

    (u) = 1(u 1) + + p(u p).This is a linear homogeneous difference equation and has the general solution

    (u) = A1Gu1 + + ApGup

    (this is for distinct roots), where G1, . . . ,Gp are the reciprocals of the rootsof equation 3.7. Note that the stationarity condition means that (u) 0,exhibiting exponential decay. As in the AR(2) case, complex roots will introduceoscillatory behaviour into the autocovariance function.

  • 8/3/2019 Analysis Time Series

    25/95

    3.4. Autoregressive Series 19

    1 = 0.5, 2 = 0.3

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 1, 2 = 0.5

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 0.5, 2 = 0.3

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    1 = 0.5, 2 = 0.3

    0 2 4 6 8 10 12 14

    1.0

    0.5

    0.0

    0.5

    1.0

    Figure 3.2: Autocorrelation functions for a variety of AR(2) models.

  • 8/3/2019 Analysis Time Series

    26/95

    20 Chapter 3. Some Time Series Theory

    2 1 0 1 2

    1

    0

    1

    1

    2

    Figure 3.3: The regions of 1/2 space where the series produced by theAR(2) scheme is stationary (indicated in grey) and has complex roots(indicated by cross-hatching).

    3.4.3 ComputationsR contains a good deal of time series functionality. On older versions of R (thoseprior to 1.7.0) you will need to type the command

    > library(ts)

    to ensure that this functionality is loaded.The function polyroot can be used to find the roots of polynomials and so

    determine whether a proposed model is stationary. Consider the model

    Yt = 2.5Yt1 Yt2 + tor, its equivalent operator form

    (1 2.5L + L2)Yt = t.We can compute magnitudes of the roots of the polynomial 1 2.5z + z2 withpolyroot.

    > Mod(polyroot(c(1,-2.5,1)))

    [1] 0.5 2.0

    The roots have magnitudes .5 and 2. Because the first of these is less than 1 inmagnitude the model is thus non-stationary.

    For the modelYt = 1.5Yt1 .75Yt2 + t

  • 8/3/2019 Analysis Time Series

    27/95

    3.4. Autoregressive Series 21

    0 2 4 6 8 10 12 14

    0.4

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 3.4: The acf for the model Yt = 1.5Yt1 .75Yt2 + t.

    or its operator equivalent

    (1

    1.5L + .75L2)Yt = t

    we can check stationarity by examining the magnitudes of the roots of 1 1.5z +.75z2.

    > Mod(polyroot(c(1,-1.5,.75)))

    [1] 1.154701 1.154701

    Both roots are bigger than 1 in magnitude, so the series is stationary. We canobtain the roots themselves as follows.

    > polyroot(c(1,-1.5,.75))

    [1] 1+0.5773503i 1-0.5773503i

    Because the roots are complex we can expect to see a cosine-like ripple in theautocovariance function and the autocorrelation function.

    The autocorrelation function for a given model can be computed using theARMAacf function. The acf for the model above can be computed and plottedas follows.

    > plot(0:14, ARMAacf(ar=c(1.5,-.75), lag=14), type="h",

    xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    The result is shown in figure 3.4.Finally, it may be useful to simulate a time series of a given form. We can

    create a time series from the model Yt = 1.5Yt1 .75Yt2 + t and plot it withthe following statements.

  • 8/3/2019 Analysis Time Series

    28/95

    22 Chapter 3. Some Time Series Theory

    Time

    x

    0 20 40 60 80 100

    4

    2

    0

    2

    4

    Figure 3.5: Simulation of the model Yt = 1.5Yt1 .75Yt2 + t.

    > x = arima.sim(model = list(ar=c(1.5,-.75)), n = 100)

    > plot(x)

    The result is shown in figure 3.5. Note that there is evidence that the seriescontains a quasi-periodic component with period about 12, as suggested by theautocorrelation function.

    3.5 Moving Average Series

    A time series {Yt} which satisfiesYt = t + 1t1 + + qtq (3.8)

    (with {t} white noise) is said to be a moving average process of order q orMA(q) process. No additional conditions are required to ensure stationarity.

    The autocovariance function for the MA(q) process is

    (u) =

    (1 + 21 + + 2q)2 u = 0(u + 1u+1 + + quq)2 u = 1, . . . , q0 otherwise.

    which says there is only a finite span of dependence on the series.Note that it is easy to distinguish MA and AR series by the behaviour of

    their autocorrelation functions. The acf for MA series cuts off sharply whilethat for and AR series decays exponentially (with a possible sinusoidal ripplesuperimposed).

  • 8/3/2019 Analysis Time Series

    29/95

    3.5. Moving Average Series 23

    3.5.1 The MA(1) Series

    The MA(1) series is defined by

    Yt = t + t1. (3.9)

    It has autocovariance function

    (u) =

    (1 + 2)2 u = 0

    2 u = 1

    0 otherwise.

    and autocorrelation function

    (u) =

    1 + 2for u = 1,

    0 otherwise.

    (3.10)

    3.5.2 Invertibility

    If we replace by 1/ and 2 by 2 the autocorrelation function given by 3.10is unchanged. There are thus two sets of parameter values which can explainthe structure of the series.

    For the general process defined by equation 3.8, there is a similar identifia-bility problem. The problem can be resolved by requiring that the operator

    1 + 1L + + qLq

    be invertible i.e. that all roots of the polynomial

    1 + 1z + + qzqlie outside the unit circle.

    3.5.3 Computation

    The function polyroot can be used to check invertibility for MA models. Re-member that the invertibility requirement is only so that each MA model is onlydefined by one set of parameters.

    The function ARMAacf can be used to compute the acf for MA series. Forexample, the acf of the model

    Yt = t + 0.9t1

    can be computed and plotted as follows.

    > plot(0:14, ARMAacf(ma=.9, lag=14), type="h",

    xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    The result is shown in figure 3.6.A simulation of the series can be computed and plotted as follows.

    > x = arima.sim(model = list(ma=.9), n = 100)

    > plot(x)

    The result of the simulation is shown in figure 3.7.

  • 8/3/2019 Analysis Time Series

    30/95

    24 Chapter 3. Some Time Series Theory

    0 2 4 6 8 10 12 14

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 3.6: The acf for the model Yt = t + .9t1.

    Time

    x

    0 20 40 60 80 100

    2

    1

    0

    1

    2

    Figure 3.7: Simulation of the model Yt = t + .9t1.

  • 8/3/2019 Analysis Time Series

    31/95

    3.6. Autoregressive Moving Average Series 25

    3.6 Autoregressive Moving Average Series

    Definition 3.6.1 If a series satisfies

    Yt = 1Yt1 + + pYtp + t + 1t1 + + qtq (3.11)(with {t} white noise), it is called an autogressive-moving average series oforder (p,q), or an ARMA(p,q) series.

    An ARMA(p,q) series is stationary if the roots of the polynomial

    1 1z pzp

    lie outside the unit circle.

    3.6.1 The ARMA(1,1) SeriesThe ARMA(1,1) series is defined by

    Yt = Yt1 + t + t1. (3.12)

    To derive the autocovariance function for Yt, note that

    E(tYt) = E[t(Yt1 + t + t1)]

    = 2

    and

    E(t1Yt) = E[t1(Yt1 + t + t1)]

    = 2 + 2

    = ( + )2 .

    Multiplying equation 3.12 by Ytu and taking expectation yields:

    (u) =

    (1) + (1 + ( + ))2 u = 0

    (0) + 2 u = 1

    (u 1) u 2Solving the first two equations produces

    (0) =

    (1 + 2 + 2)

    1 2 2

    and using the last recursively shows

    (u) =(1 + )( + )

    1 2 u12 for u 1.

    The autocorrelation function can then be computed as

    (u) =(1 + )( + )

    (1 + 2 + 2)u12 for u 1.

    The pattern here is similar to that for AR(1), except for the first term.

  • 8/3/2019 Analysis Time Series

    32/95

    26 Chapter 3. Some Time Series Theory

    3.6.2 The ARMA(p,q) Model

    It is possible to make general statements about the behaviour of general ARMA(p,q)series. When values are more than q time units apart, the memory of the moving-average part of the series is lost. The functions (u) and (u) will then behavevery similarly to those for the AR(p) series

    Yt = 1Yt1 + + pYtp + tfor large u, but the first few terms will exhibit additional structure.

    3.6.3 Computation

    Stationarity can be checked by examining the roots of the characteristic poly-nomial of the AR operator and model parameterisation can be checked by ex-

    amining the roots of characteristic polynomial of the MA operator. Both checkscan be carried out with polyroot.

    The autocorrelation function for an ARMA series can be computed withARMAacf. For the model

    Yt = .5Yt1 + t + .3t1this can be done as follows

    > plot(0:14, ARMAacf(ar=-.5, ma=.3, lag=14), type="h",

    xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    and produces the result shown in figure 3.8.

    Simulation can be carried out using arma.sim

    > x = arima.sim(model = list(ar=-.5,ma=.3), n = 100)

    > plot(x)

    producing the result in figure 3.9.

    3.6.4 Common Factors

    If the AR and MA operators in an ARMA(p,q) model possess common factors,then the model is over-parameterised. By dividing through by the commonfactors we can obtain a simpler model giving and identical description of theseries. It is important to recognise common factors in an ARMA model because

    they will produce numerical problems in model fitting.

    3.7 The Partial Autocorrelation Function

    The autocorrelation function of an MA series exhibits different behaviour fromthat of AR and general ARMA series. The acf of an MA series cuts of sharplywhereas those for AR and ARMA series exhibit exponential decay (with possi-ble sinusoidal behaviour superimposed). This makes it possible to identify anARMA series as being a purely MA one just by plotting its autocorrelation func-tion. The partial autocorrelation function provides a similar way of identifyinga series as a purely AR one.

  • 8/3/2019 Analysis Time Series

    33/95

    3.7. The Partial Autocorrelation Function 27

    0 2 4 6 8 10 12 14

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 3.8: The acf for the model Yt = .5Yt1 + t + .3t1.

    Time

    x

    0 20 40 60 80 100

    3

    2

    1

    0

    1

    2

    Figure 3.9: Simulation of the model Yt = .5Yt1 + t + .3t1.

  • 8/3/2019 Analysis Time Series

    34/95

    28 Chapter 3. Some Time Series Theory

    Given a stretch of time series values

    . . . , Y tu, Ytu+1, . . . , Y t1, Yt, . . .

    the partial correlation of Yt and Ytu is the correlation between these randomvariables which is not conveyed through the intervening values.

    If the Y values are normally distributed, the partial autocorrelation betweenYt and Ytu can be defined as

    (u) = cor(Yt, Ytu|Yt1, . . . , Y tu+1).A more general approach is based on regression theory. Consider predicting

    Yt based on Yt1, . . . , Y tu+1. The prediction is

    Yt = 1Yt1 + 2Yt2 , u1Ytu+1

    with the s chosen to minimize

    E(Yt Yt)2.It is also possible to think backwards in time and consider predicting Ytuwith the same set of predictors. The best predictor will beYiu = 1Ytu+1 + 2Ytu+2 , u1Yt1.(The coefficients are the same because the correlation structure is the samewhether the series is run forwards or backwards in time.

    The partial correlation function at lag u is the correlation between the pre-diction errors.

    (u) = cor(Yt Yt, Ytu Ytu)

    By convention we take (1) = (1).It is quite straightforward to compute the value of (2). Using the results

    of Appendix 3.8, the best predictor of Yt based on Yt1 is just (1)Yt1. Thus

    cov(Yt (1)Yt1, Yt2 (1)Yt1) = 2Y((2) (1)2 (1)2) + (1)2)

    = 2Y((2) (1)2)and

    var(Yt (1)Yt1) = 2Y(1 + (1)2 2(1)2)

    = 2Y(1

    (1))2

    This means that

    (2) =(2) (1)2

    1 (1)2 (3.13)

    Example 3.7.1 For the AR(1) series, recall that

    (u) = u (u 0).Substituting this into equation 3.13 we find

    (2) =2 21 2 = 0.

  • 8/3/2019 Analysis Time Series

    35/95

    3.7. The Partial Autocorrelation Function 29

    Example 3.7.2 For the MA(1) series

    (u) =

    1 + 2if u = 1,

    0 otherwise.

    Subsituting this into 3.13 we find

    (2) =0 (/(1 + 2))21 (/(1 + 2))2

    =2

    (1 + 2)2 2

    = 2

    1 + 2 + 4 .

    More generally it is possible to show

    (u) =u(1 2)1 2(u+1) for u 0.

    For the general AR(p) series, it is possible to show that (u) = 0 for allu > p. For such a series, the best predictor of Yt using Yt1, . . . , Y tu+1 foru > p is

    1Yt1 + + pYtp.because

    Yt

    1Yt1 +

    + pYtp = t

    and t is uncorrelated with Yt1, Yt2, . . ., so that the fit cannot be improved.The prediction error corresponding to the best linear predictor of Ytu is

    based on Yt1, . . . , Y tu+1 and so must be uncorrelated with t. This showsthat (u) = 0.

    For the general MA(q), it is possible to show that (u) decays exponentiallyas u .

    3.7.1 Computing the PACF

    The definition of the partial autocorrelation function given in the previous sec-tion is conceptually simple, but it makes computations hard. In this sectionwell see that there is an equivalen form which is computationally simple.

    Consider the kth order autoregressive prediction of Yk+1Yk+1 = k1Yk + + kkY1 (3.14)obtained by minimizing E(Yk+1 Yk+1)2. We will show that the kth partialautocorrelation values is given by (k) = kk . The proof of this is a geometricone which takes places in the space H generated by the series {Yt}.

    We begin by defining the subspace H1 = sp{Y2, . . . , Y k} and associatedprojection PH1 , the subspace H2 = sp{Y1 PH1Y1} and the subspace Hk =sp{Y1, . . . , Y k}. Any Y H,

    PHkY = PH1Y + PH2Y.

  • 8/3/2019 Analysis Time Series

    36/95

    30 Chapter 3. Some Time Series Theory

    Thus

    Yk+1 = PHkYk+1= PH1Yk+1 + PH2Yk+1

    = PH1Yk+1 + a(Y1 PH1Y1),where

    a = Yk+1, Y1 PH1Y1/Y1 PH12. (3.15)Rearranging, we find Yk+1 = PH1(Yk+1 aY1) + aY1.The first term on the right must be a linear combination of Y2, . . . , Y k, so com-paring with equation 3.14 we see that a = kk.

    Now, the kth partial correlation is defined as the correlation between theresiduals from the regressions of Yk+1 and Y1 on Y2, . . . , Y k. But this is just

    cor(Yk+1 PH1Yk+1, Y1 PH1Y1)= Yk+1 PH1Yk+1, Y1 PH1Y1/Y1 PH12

    = Yk+1, Y1 PH1Y1/Y1 PH12

    = a

    by equation 3.15.A recursive way of computing the regression coefficients in equation 3.14 from

    the autocorrelation function was given by Levinson (1947) and Durbin (1960).

    The Durbin-Levinson algorithm updates the coefficients the from k 1st ordermodel to those of the kth order model as follows:

    kk =

    (k) k1j=1

    k1,jkj

    1 k1j=1

    k1,jj

    k,j = k1,j kkk1,kj j = 1, 2, . . . , k 1.

    3.7.2 Computation

    The R function ARMAacf can be used to obtain the partial autocorrelation func-tion associated with a stationary ARMA series. The call to ARMAacf is identicalto its use for obtaining the ordinary autocorrelation function, except it has theadditional argument pacf=TRUE.

    The following code computes and plots the partial autocorrelation functionfor the ARMA(1,1) model with = .5 and = .3.

    > plot(1:14, ARMAacf(ar=-.5, ma=.3, lag=14, pacf=TRUE),

    type="h", xlab = "Lag", ylab = "ACF")

    > abline(h = 0)

    The resulting plot is shown in figure 3.10.

  • 8/3/2019 Analysis Time Series

    37/95

    3.8. Appendix: Prediction Theory 31

    2 4 6 8 10 12 14

    0.20

    0.15

    0.10

    0.05

    0.00

    0.05

    Lag

    ACF

    Figure 3.10: The partial acf for the model Yt = .5Yt1 + t + .3t1.

    3.8 Appendix: Prediction Theory

    Suppose we have a random variable Y with mean Y and variance 2Y, andthat we want to predict the value of Y. One way to proceed is to minimize the

    mean-square prediction error:

    g(c) = E(Y c)2.Now,

    g(c) = E(Y Y + Y c)2= E(Y Y)2 + 2(Y c)E(Y Y) + E(Y c)2= 2Y + E(Y c)2

    This is minimized by taking c = Y.Now suppose that we have a random variable X which has mean X and

    variance 2X and whose correlation whith Y is XY . Because X is correlatedwith Y it contains information which can be used to predict Y. In particularwe can consider predicting Y using a linear predictor a + bX.

    The minimum mean-square prediction error for such a predictor is

    g(a, b) = E(Y a bX)2.This can be expanded as

    g(a, b) = E(Y2) + a2 + b2E(X2) 2aE(Y) + 2abE(X) 2bE(XY)

    = E(Y2) + a2 + b2E(X2) 2aY + 2abX 2bE(XY)

  • 8/3/2019 Analysis Time Series

    38/95

    32 Chapter 3. Some Time Series Theory

    and we can minimize it by differentiating and setting the partial derivatives

    equal to 0.g(a, b)

    a= 0 = 2a 2Y + 2bX = 0

    g(a, b)

    b= 0 = 2bE(X2) + 2aX 2E(XY) = 0

    Multiplying the first equation by X and subtracting it from the second yields

    bE(X2) b2X = E(XY) XYand hence

    b = E(XY) XYE(X2) 2X

    =cov(X, Y)

    var(X)

    = XYYX

    .

    Additionally,

    a = Y XY YX

    X .

  • 8/3/2019 Analysis Time Series

    39/95

    Chapter 4

    Identifying Time SeriesModels

    4.1 ACF Estimation

    We have seen that it is possible to distinguish between AR, MA and ARMAmodels by the behaviour of their acf and pacf functions. In practise, we don;tknow these functions and so we must estimate them.

    Given a stretch of data Y1, . . . , Y T, the usual estimate of the autocovariancefunction is

    (u) =1

    T

    Tu

    t=1 (Yt+u Y)(Yt Y)Note that this estimator is biased an unbiased estimator would have a

    divisor of T u in place of T. There are two reasons for using this estimator.The first of these reasons is that it produces a(u) which is positive definite.

    This means that for any constants c1, . . . , ck,

    ku=1

    kv=1

    cucv(u v) 0.This ensures that our estimate of the variance of

    k

    u=1cuXtu

    will be non-negative, something which might not be the case for the unbiasedestimate.

    The second reason is that for many time series (u) 0 as u . Forsuch time series, the biased estimate can have lower mean-squared error.

    The estimate of (u) based on (u) isr(u) =

    (u)(0) =Tu

    t=1 (Yt+u Y)(Yt Y)Tt=1(Yt Y)2

    (Again, this can have better mean-squared error properties than the estimatebased on the unbiased estimate of (u).)

    33

  • 8/3/2019 Analysis Time Series

    40/95

    34 Chapter 4. Identifying Time Series Models

    In order to say whether an observed correlation is significantly different from

    zero, wee need some distribution theory. Like most time series results, the theoryhere is asymptotic (as T ). The original results in this area were obtainedby Bartlett in 1947. We will look at results due to T. W. Anderson in 1971.

    Suppose that

    Yt = +

    u=0

    utu

    with the t independent and identically distributed with zero mean and non-zerovariance. Suppose that

    u=0

    |u| < andu=0

    u2u < .

    (This is true for all stationary ARMA series). The last condition can be replacedby the requirement that the {Yt} values have a finite fourth moment.Under these conditions, for any fixed m, the joint distribution of

    T(r(1) (1)),

    T(r(2) (2)), . . .

    T(r(m) (m))

    is asymptotically normal with zero means and covariances cuv where

    cuv =t=0

    (t + u)(t + v) + (t u)(t + v)

    2(u)(t)(t + v) 2(v)(t)(t + u) (4.1)

    + 2(u)(v)(t)2I.e. for large T

    r(u) N(0, cuu/T) cor

    r(u), r(v) cuv

    cuucvv

    Notice that var r(u) 0 but that the correlations stay approximately constant.Equation 4.1 is clearly not easy to interpret in general. Lets examine some

    special cases.

    Example 4.1.1 White Noise.

    The theory applies to the case that the Yt are i.i.d.

    var r(u) 1

    T corr(u), r(v) 0Example 4.1.2 The AR(1) Series.

    In this case (u) = u for u > 0. After a good deal of algebra (summinggeometric series) one finds:

    var r(u) 1T

    (1 + 2)(1 2u)

    1 2 2u2u

    .

    In particular, for u = 1,

    var r(1) 1 2

    T.

  • 8/3/2019 Analysis Time Series

    41/95

    4.1. ACF Estimation 35

    Table 4.1: Large Sample Results for rk for an AR(1) Model.

    var r(1)

    var r(2) cor

    r(1), r(2)

    var r(10)

    0.9 0.44/

    T 0.807/

    T 0.97 2.44/

    T

    0.7 0.71/

    T 1.12/

    T 0.89 1.70/

    T

    0.4 0.92/

    T 1.11/

    T 0.66 1.18/

    T

    0.2 0.98/

    T 1.04/

    T 0.38 1.04/

    T

    Notice that the closer is to 1, the more accurate the estimate becomes.As u , 2u 0. In that case

    var r(u) 1T

    1 + 21 2

    .

    For values of close to 1 this produces large variances for the r(u).For 0 < u v (after much algebra),

    cuv =(v1 v+u)(1 + 2)

    1 2 (v u)vu (v + u)v+u.

    In particular,

    cor

    r(1), r(2)

    2

    1 2

    1 + 22 341/2

    Using these formulae it is possible to produce the results in table 4.1.

    Example 4.1.3 The MA(1) Series

    For the MA(1) series it is straightforward to show that

    c11 = 1 3(1)2 + 4(1)4

    cuu = 1 2(1)2 u > 1

    c12 = 2(1)(1 (1)2)Using these results it is easy to produce the results in table 4.2.

    Example 4.1.4 The General MA(q) Series

    In the case of the general MA(q) series it is easy to see that

    cuu = 1 + 2

    qv=1

    (v)2, for u > q,

    and hence that

    var r(u) =1

    T

    1 + 2

    qv=1

    (v)2

    , for u > q.

  • 8/3/2019 Analysis Time Series

    42/95

    36 Chapter 4. Identifying Time Series Models

    Table 4.2: Large Sample Results for rk for an MA(1) Model.

    var r(1)

    var r(k) (k > 1) cor

    r(1), r(2)

    0.9 0.71/

    T 1.22/

    T 0.86

    0.7 0.73/

    T 1.20/

    T 0.84

    0.5 0.79/

    T 1.15/

    T 0.74

    0.4 0.84/

    T 1.11/

    T 0.65

    Notes

    1. In practise we dont know the parameters of the model generating the datawe might have. We can still estimate the variances and covariances of ther(u) by substituting estimates of (u) into the formulae above.

    2. Note that there can be quite large correlations between the r(u) values socaution must be used when examining plots of r(u).

    4.2 PACF Estimation

    In section 3.7.1 we saw that the theoretical pacf can be computed by solvingthe Durbin-Levinson recursion

    kk =

    (k) k1j=1

    k1,jkj

    1 k1j=1

    k1,jj

    k,j = k1,j kkk1,kj j = 1, 2, . . . , k 1.

    and setting (u) = uu.In practise, the estimated autocorrelation function is used in place of the

    theoretical autocorrelation function to generate estimates of the partial auto-correlation function.

    To decide whether partial autocorrelation values are significantly differentfrom zero, we can use a (1949) result of Quenouille which states that if the trueunderlying model is AR(p), then the estimated partial autocorrelations at lagsgreater than p are approximately independently normal with means equal tozero and variance 1/T. Thus 2/T can be used as critical limits on (u) foru > p to test the hypothesis of an AR(p) model.

  • 8/3/2019 Analysis Time Series

    43/95

    4.3. System Identification 37

    Time

    DifferenceInYield

    1962 1964 1966 1968 1970 1972 1974

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    1.6

    Figure 4.1: Monthly differences between the yield on mortgages and gov-ernment loans in the Netherlands, January 1961 to March 1974.

    4.3 System Identification

    Given a set of observations Y1, . . . , Y T we will need to decide what the appro-poriate model might be. The estimated acf and pacf are the tools which canbe used to do this. If the acf exhibits slow decay and the pacf cuts of sharplyafter lag p, we would identify the series as AR(p). If the pacf shows slow decayand the acf show a sharp cutoff after lag q, we would identify the series as beingMA(q). If both the acf and pacf show slow decay we would identify the seriesas being mixed ARMA. In this case the orders of the AR and MA parts are notclear, but it is resonable to first try ARMA(1,1) and move on to higher ordermodels if the fit of this model is not good.

    Example 4.3.1 Interest Yields

    Figure 4.1 shows a plot of the Monthly differences between the yield on mort-gages and government loans in the Netherlands, January 1961 to March 1974.The series appears stationary, so we can attempt to use the acf and pacf todecide whether an AR, MA or ARMA model might be appropriate.

    Figures 4.2 and 4.3 show the estimated acf and pacf functions for the yieldseries. The horizontal lines in the plots are drawn at the y values 1.96/159(the series has 159 values). These provide 95% confidence limits for what canbe expected under the hypothesis of white noise. Note that these limits arepoint-wise so that we would expect to see roughly 5% of the values lying outsidethe limits.

    The acf plot shows evidence of slow decay, while the pacf plot shows a sharp

  • 8/3/2019 Analysis Time Series

    44/95

    38 Chapter 4. Identifying Time Series Models

    0.0 0.5 1.0 1.5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.2: The acf for the yield data.

    0.5 1.0 1.5

    0.0

    0.2

    0.4

    0.6

    0.8

    Lag

    PartialACF

    Figure 4.3: The pacf for the yield data.

  • 8/3/2019 Analysis Time Series

    45/95

    4.4. Model Generalisation 39

    Time

    Yield

    0 10 20 30 40 50 60 70

    30

    40

    50

    60

    70

    80

    Figure 4.4: Yields from a chemical process (from Box and Jenkins).

    cutoff after lag 1. On the basis of these two plots we might hypothesise thatan AR(1) model was an appropriate description of the data.

    Example 4.3.2 Box and Jenkins Chemical Yields

    Figure 4.4 shows an example from the classic time series text by Box and Jenk-ins. This series contains consecutive yields recorded from a chemical process.Again, the series is apparently stationary so that we can consider identifying anappropriate model on the basis of the acf and pacf.

    Again, the acf seems to show slow decay, this time with alternating signs.The pacf shows sudden cutoff after lag 1, suggesting that again an AR(1) modelmight be appropriate.

    4.4 Model Generalisation

    ARMA series provide a flexible class of models for stationary mean-zero series,with AR and MA series being special cases.

    Unfortunately, many series are clearly not in this general class for models.It is worthwhile looking at some generalisation of this class.

    4.4.1 Non-Zero Means

    A series {Yt} with a non-zero with mean , the mean can be subtracted and thedeviations from the mean models as an ARMA series.

    Yt = 1(Yt1 ) + + p(Ytp ) + t + 1t1 + + qtq

  • 8/3/2019 Analysis Time Series

    46/95

    40 Chapter 4. Identifying Time Series Models

    0 5 10 15

    0.4

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.5: The acf for the chemical process data.

    5 10 15

    0.4

    0.3

    0.2

    0.1

    0.0

    0.1

    0.2

    Lag

    PartialACF

    Figure 4.6: The pacf for the chemical process data.

  • 8/3/2019 Analysis Time Series

    47/95

    4.4. Model Generalisation 41

    Alternatively, the model can be adjusted by introducing a constant directly.

    Yt = 1Yt1 + + pYtp + 0 + t + 1t1 + + qtqThe two characterisations are connected by

    = 0 (1 + + p)so that

    =0

    1 + + por

    0 = (1 + + p).

    4.4.2 Deterministic Trends

    Consider the modelYt = f(t) + Zt

    where Zt is a stationary ARMA series and f(t) is a deterministic function of t.Considering Yf(t) reduces Yt to an ARMA series. (Iff(t) contains unknownparameters we can estimate them.)

    4.4.3 Models With Nonstationary AR Components

    Weve seen that any AR model with characteristic equation roots outside theunit circle will be non-stationary.

    Example 4.4.1 Random Walks

    A random walk is defined by the equation

    Yt = Yt1 + t

    where {t} is a series on uncorrelated (perhaps independent) random variables.In operator form this equation is

    (1 L)Yt = t.The equation 1 z = 0 has a root at 1 so that Yt is non-stationary.

    Example 4.4.2 Integrated Moving Averages

    Consider the model

    Yt = Yt 1 + t + t1.This is similar to an ARMA(1,1) model, but again it is non-stationary.

    Both these models can be transformed to stationarity by differencing transforming to Yt = Yt Yt1. In the first example we have,

    Yt = t,which is the white noise model. In the second we have,

    Yt = t + t1,which is the MA(1) model.

  • 8/3/2019 Analysis Time Series

    48/95

    42 Chapter 4. Identifying Time Series Models

    4.4.4 The Effect of Differencing

    Suppose that Yt has a linear trend

    Yt = 0 + 1t + Zt

    where Zt is stationary with E(Zt) = 0.

    Zt = 0 + Zt

    Differencing has removed the trend. Now suppose that {Yt} has a deterministicquadratic trend

    Yt = 0 + 1t + 2t2 + Zt

    then

    Yt = (0 + 1t + 2t2) (0 + 1(t 1) + 2(t 1)2) + Zt= 1 + 2(t

    2 + (t2t + 1)) + Zt= (1 2) + 22t + Zt= linear trend + stationary.

    Differencing again produces

    2Yt = 22 +

    2Zt.

    In general, a polynomial trend of order k can be eliminated by differencing ktimes.

    Now lets consider the case of a stochastic trend. Suppose that

    Yt = Mt + t

    where Mt is a random process which changes slowly over time. In particular,we can assume that Mt is generated by a random walk model.

    Mt = Mt1 + t

    with t independent of Zt. Then

    Zt = t + t t1.

    Zt is stationary and has an autocorrelation function like that of an MA(1)series.

    (1) =1

    2 + (/)2

    More generally, higher order stochastic trend models can be reduced to sta-tionarity by repeated differencing.

  • 8/3/2019 Analysis Time Series

    49/95

    4.5. ARIMA Models 43

    4.5 ARIMA Models

    If Wt = dYt is an ARMA(p,q) series than Yt is said to be an integrated autore-gressive moving-average (p,d,q) series, denoted ARIMA(p,d,q). If we write

    (L) = 1 1L pLp

    and(L) = 1 + 1L + + qLq

    then we can write down the operator formulation

    (L)dYt = (L)tExample 4.5.1 The IMA(1,1) Model This model is widely used in businessand economics. It is defined by

    Yt = Yt1 + t + t1

    Yt can be thought of as a random walk with correlated errors.Notice that

    Yt = Yt1 + t + t1

    = Yt2 + t1 + t2 + t + t1

    = Yt2 + t + (1 + )t1 + t2

    ...

    = Ym + t + (1 + )t1 +

    + t + (1 + )m + m1

    If we assume that Ym = 0 (i.e. observation started at time m),Yt = t + (1 + )t1 + + t + (1 + )m + m1

    This representation can be used to derive the formulae

    var(Yt) =

    1 + 2 + (1 + )2(t + m)

    2

    cor

    Yk, Ytk

    =1 + 2 + (1 + )2(t + m k)

    var(Yt)var(Ytk)1/2

    = t + m kt + m 1

    for m large and k moderate. (We are considering behaviour after burn-in.).This means that we can expect to see very slow decay in the autocorrelationfunction.

    The very slow decay of the acf function is characteristic of ARIMA serieswith d > 0. Figures 4.7 and 4.8 show the estimated autocorrelation functionsfrom a simulated random walk and an integrated autoregressive models. Bothfunctions show very slow declines in the acf.

  • 8/3/2019 Analysis Time Series

    50/95

    44 Chapter 4. Identifying Time Series Models

    0 5 10 15 20

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.7: The autocorrelation function of the ARIMA(0,1,0) (randomwalk) model.

    0 5 10 15 20

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 4.8: The autocorrelation function of the ARIMA(1,1,0) modelwith 1 = .5.

  • 8/3/2019 Analysis Time Series

    51/95

    Chapter 5

    Fitting and Forecasting

    5.1 Model Fitting

    Suppose that we have identified a particular ARIMA(p,d,q) model which appearsto describe a given time series. We now need to fit the identified model andassess how well the model fits. Fitting is usually carried out using maximumlikelihood. For a given set of model parameters, we calculate a series of one-step-ahead predictions. Yk+1 = PHkYk+1where Hk is the linear space spanned by Y1, . . . , Y k. The predictions are ob-tained in a recursive fashion using a process known as Kalman filtering. Eachprediction results in a prediction error Yk+1 Yk+1. These are, by construc-tion, uncorrelated. If we add the requirement that the {Yt} series is normallydistributed, the prediction errors are independent normal random variables andthis can be used as the basis for computing the likelihood.

    The parameters which need to be estimated are the AR coefficients 1, . . . , p,the MA coefficients 1, . . . , q and a constant term (either or 0 as outlinedin section 4.4.1). Applying maximum-likelihood produces both estimates andstandard errors.

    5.1.1 Computations

    Given a time series y in R, we can fit an ARMA(p,d,q) model to the series asfollows

    > z = arima(y, order = c(p, d, q))

    The estimation results can be inspected by printing them.

    Example 5.1.1 U.S. Unemployment Rates

    Figure 5.1 shows a plot of the seasonally adjusted quarterly United States un-employment rates from the first quarter of 1948 to the first quarter of 1978.If the data is stored as a time series in the R data set unemp, the plot can beproduced with the command

    > plot(unemp)

    45

  • 8/3/2019 Analysis Time Series

    52/95

    46 Chapter 5. Fitting and Forecasting

    Time

    unemp

    1950 1955 1960 1965 1970 1975

    3

    4

    5

    6

    7

    8

    9

    Figure 5.1: United States quarterly unemployment rates (seasonally ad-justed).

    Neither the original series nor the presence of very slow decay in the acf

    (figure 5.2) indicate strongly that differencing is required so we will leave theseries undifferenced and attempt to find an appropriate ARMA model.

    The acf and pacf functions for the series can be computed and plotted withthe following commands

    > acf(unemp)

    > pacf(unemp)

    The resulting plots (figures 5.2 and 5.3) show the strong signature of an AR(2)series (slow decay of the acf and sharp cutoff of the pacf after two lags.)

    With the series identified we can go about estimating the unknown param-eters. We do this with the arima function in R.

    > z z

    Call:

    arima(x = unemp, order = c(2, 0, 0))

    Coefficients:

    ar1 ar2 intercept

    1.5499 -0.6472 5.0815

    s.e. 0.0681 0.0686 0.3269

    sigma^2 estimated as 0.1276: log likelihood = -48.76, aic = 105.53

  • 8/3/2019 Analysis Time Series

    53/95

    5.1. Model Fitting 47

    0 1 2 3 4 5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 5.2: The acf for the unemployment series.

    1 2 3 4 5

    0.5

    0.0

    0.5

    Lag

    PartialACF

    Figure 5.3: The pacf for the unemployment series.

  • 8/3/2019 Analysis Time Series

    54/95

    48 Chapter 5. Fitting and Forecasting

    The fitted model in this case is

    Yt = 5.0815 + 1.5499 Yt1 0.6472 Yt2 + t

    where t has an estimated variance of 0.1276.

    Example 5.1.2 Railroad Bond Yields

    Figure 5.4 shows a plot of the monthly yields on AA rated railroad bonds (as apercentage times 100). With the data values stored in the R variable rrbonds,the plot was produced with the command

    > plot(rrbonds, ylab = "Bond Yields")

    In this case it is clear that the series is non-stationary and we need to trans-form to stationarity by taking differences. The series can be differenced and theresult plotted as follows

    > rrdiffs plot(rrdiffs, ylab = "Differenced Yields")

    Figure 5.5 shows a plot of the differenced series and it is clear from this plotthat the assumption of stationarity is much more reasonable. To confirm this,and to suggest possible models we need to examine the acf and pacf functions.These are shown in figures 5.6 and 5.7.

    The model signature here is less certain than that of the previous example.

    A possible interpretation is that the acf is showing rapid decay and the pacfis shown sharp cutoff after one lag. This suggests that original series can bemodelled as an ARIMA(1,1,0) series.

    The estimation of parameters can be carried out for this model as follows

    > z z

    Call:

    arima(x = rrbonds, order = c(1, 1, 0))

    Coefficients:

    ar10.4778

    s.e. 0.0865

    sigma^2 estimated as 84.46: log likelihood = -367.47, aic = 738.94

    The fitted model in this case is

    Yt = 0.4778 Yt + t.

    with t having an estimated variance of 84.46.

  • 8/3/2019 Analysis Time Series

    55/95

    5.1. Model Fitting 49

    Time

    Bond

    Yields

    1968 1970 1972 1974 1976

    650

    700

    750

    800

    850

    900

    Figure 5.4: Monthly AA railroad bond yields (% 100).

    Time

    DifferencedYields

    1968 1970 1972 1974 1976

    20

    10

    0

    10

    20

    30

    Figure 5.5: Differenced monthly AA railroad bond yields.

  • 8/3/2019 Analysis Time Series

    56/95

    50 Chapter 5. Fitting and Forecasting

    0.0 0.5 1.0 1.5

    0.2

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Lag

    ACF

    Figure 5.6: The ACF for the differenced monthly AA railroad bondyields.

    0.5 1.0 1.5

    0.2

    0.1

    0.0

    0.1

    0.2

    0.3

    0.4

    Lag

    PartialACF

    Figure 5.7: The PACF for the differenced monthly AA railroad bondyields.

  • 8/3/2019 Analysis Time Series

    57/95

    5.2. Assessing Quality of Fit 51

    5.2 Assessing Quality of Fit

    Once a model has been fitted to a set of data it is always important to assesshow well the model fits. This is because the inferences we make depend cruciallyon the appropriateness of the fitted model. The usual way of assessing goodnessof fit is through the examination of residuals.

    In the case of autoregressive models it is easy to see how residuals might bedefined. In the case of the AR(p) model

    Yt = 1Yt1 + + pYtp + tit is clear that we can take the residuals to be

    t = Yt

    1Yt1 + +

    pYtp

    In the general ARMA case

    Yt = 1Yt1 + + pYtp + t + 1t1 + + qtqor

    (L) = (L)tq

    we must first transform to autoregressive form by inverting the MA operator.

    (L)1(L) = t

    or

    Yt =

    u=1uYtu + t.The residuals can then be defined as

    t = Yt u=1

    uYtu.This is a useful theoretical approach, but in practise the residuals are ob-

    tained as a byproduct of the computation of the likelihood (they are the predic-tion errors from the one-step ahead forecasts). If the ARMA model is correct(and the series is normally distributed) then the residuals are approximatelyindependent normal random variables with mean zero and variance 2 .

    A simple diagnostic is to simply plot the residuals and to see whether theyappear to be a white noise series. In the case of the US unemployment serieswe can do this as follows

    > z plot(resid(z))

    The results are shown in figure 5.8.It is also possible to carry out tests of normality on the residuals. This can

    be done by simply producing a histogram of the residuals or by performing anormal quantile-quantile plot. This can be done as follows:

    > hist(resid(z))

    > qqnorm(resid(z))

  • 8/3/2019 Analysis Time Series

    58/95

    52 Chapter 5. Fitting and Forecasting

    Time

    resid(z)

    1950 1955 1960 1965 1970 1975

    0.5

    0.0

    0.5

    1.0

    Figure 5.8: Residuals from the unemployment data.

    A simple check of heteroscedasticity can be carried out by plotting the resid-uals against the fitted values. This is done with the following command:

    > plot(unemp - resid(z), resid(z), xy.lines = FALSE,

    + xy.labels = FALSE)

    None of these plots indicate that there is any problem with the fit of the model.

    5.3 Residual Correlations

    It is tempting to examine the quality of model fit by seeing whether the residualsform an uncorrelated sequence. One might for example plot the estimated acfof the residuals and look for lags where the correlations exceed 2/T. Unfor-tunately, while these limits are approximately correct for large lags, for small

    lags they overstate the variability of the estimated correlations. This should beno surprise, the effect of model-fitting is to remove as much of the correlationpresent in the series as possible. The correlations between the residuals shouldbe closer to zero than for a non-fitted series.

    In addition to checking whether there are large individual correlations presentin a time series, it can be useful to pool information from successive correlationsto see whether there is significant correlation left in the residuals.

    One test statistic in common use is the modified Box-Pierce (or Ljung-Box-Pierce) statistic

    Q = T(T + 2)Kk=1

    r2kT k .

  • 8/3/2019 Analysis Time Series

    59/95

    5.4. Forecasting 53

    If the true underlying model is ARMA(p,q), the distribution of Q is approxi-

    mately

    2

    Kpq.In R, the modified Box-Pierce statistic (and an older variant) can be com-puted with the function Box.test (you should specify type="Ljung"). Analternative is to use the function tsdiag which plots the (standardised) residu-als, the acf of the residuals and the modified Box-Pierce statistic for a varietyof values of K.

    5.4 Forecasting

    Once we have decided on an appropriate time-series model, estimated its un-known parameters and established that the model fits well, we can turn to theproblem of forecasting future values of the series.

    The autoregressive representation

    Yt =

    u=1

    uYtu + t.

    suggests predicting the next observation beyond Y1, . . . , Y T using

    YT+1 = u=1

    uYT+1u.where the u are obtained by substituting the estimated parameters in place ofthe theoretical ones.

    Once a forecast is obtained for YT+1 we can use this forecast to obtain aforecast for YT+2 and so on. because uncertainty increases as we predict furtherand further from the data we have, we can expect the standard errors associatedwith our predictions to increase.

    In practise forecasts are generated by the same Kalman filter algorithm usedto compute the likelihood used for parameter estimation.

    5.4.1 Computation

    Once a model has been fitted using arima, the function predict can be used toobtain forecasts. In the case of the US unemployment series we could obtain 10future forecasts as follows:

    > z p p$pred

    Qtr1 Qtr2 Qtr3 Qtr4

    1978 5.812919 5.491268 5.243253

    1979 5.067017 4.954379 4.893856 4.872947

    1980 4.879708 4.903719 4.936557

    and their standard errors with

  • 8/3/2019 Analysis Time Series

    60/95

    54 Chapter 5. Fitting and Forecasting

    Time

    unemp

    1950 1955 1960 1965 1970 1975 1980

    2

    3

    4

    5

    6

    7

    8

    9

    Figure 5.9: Forecasts for the unemployment data.

    > p$se

    Qtr1 Qtr2 Qtr3 Qtr4

    1978 0.3572328 0.6589087 0.9095040

    1979 1.0969935 1.2248696 1.3040856 1.3479499

    1980 1.3689669 1.3771201 1.3792859

    It can be useful to plot the forecasts on the same graph as the original the series.This a relatively complex task, and would probably be worth packaging up asan R function. Here is how 20 forecasts for the unemp series and their standarderrors can be plotted

    > p xlim ylim plot(unemp, xlim = xlim, ylim = ylim)

    > lines(p$pred, lwd = 2)

    > lines(p$pred - 2 * p$se, lty = "dotted")

    > lines(p$pred + 2 * p$se, lty = "dotted")

    The result of this appears in figure 5.9.

  • 8/3/2019 Analysis Time Series

    61/95

    5.5. Seasonal Models 55

    Time

    tempdub

    1964 1966 1968 1970 1972 1974 1976

    10

    20

    30

    40

    50

    60

    70

    Figure 5.10: Monthly average temperatures C in Dubuque, Iowa.

    5.5 Seasonal Models

    5.5.1 Seasonal ARIMA Models

    Many time series exhibit strong seasonal characteristics. Well use s to denotethe seasonal period. For monthly series, s = 12, and for quarterly series s = 4.These are the most common cases, but seasonal patterns can show up in otherplaces (e.g. weekly patterns in daily observations or daily patterns in hourlydata).

    Consider the series generated by the model

    Yt = t + t1

    This series has an acf which is zero except at lag s, and will exhibit seasonalcharacteristics.

    In general we can define a seasonal MA(q) model by

    Yt = t + 1ts + 2t2s + + qtQs

    5.5.2 Stationary Series

    Figure 5.10 shows a plot monthly average temperatures in Dubuque, Iowa, fromJanuary 1964 to December 1975. The series shows a very strong seasonal pat-tern. This pattern is also apparent in the acf of the series which is shown infigure 5.11.

    One way to handle this

  • 8/3/2019 Analysis Time Series

    62/95

    56 Chapter 5. Fitting and Forecasting

    0.0 0.5 1.0 1.5 2.0 2.5 3.0

    0.5

    0.0

    0.5

    1.0

    Lag

    ACF

    Figure 5.11: The acf of the Dubuque temperature series.

    5.5.3 Seasonal Series with Trends

    Observed time series data often exhibit strong seasonal effects. Figure 5.12shows a plot of a data set presented by Box and Jenkins in their classic time seriestext. The plot shows the number of international airline passengers, recordedmonthly. There is clearly a strong seasonal trend present in the data thenumber of passengers peaks during the Northern summer.

    Before considering the seasonal effect present in the series, we will the ex-amine another feature of the data set. Although the number of passengers isclearly increasing with time, the number travelling in July and August is alwaysroughly 50% greater than the number travelling in January and February. Thiskind of proportional variability suggests that it would be more appropriate toexamine the series on a log scale.

    Figure 5.13 shows the data plotted on a log scale. On that scale the seriesshows a consistent level of seasonal variation across time. It seems appropriate

    to analyse this time series on the log scale.We now need to consider what the appropriate methods might be for analysing

    series which show strong seasonal effects.

  • 8/3/2019 Analysis Time Series

    63/95

    5.5. Seasonal Models 57

    Time

    ThousandsofPassengers

    1950 1952 1954 1956 1958 1960

    100

    200

    300

    400

    500

    600

    Figure 5.12: International airline passengers, monthly totals (in thou-sands).

    Time

    lo

    g10

    ThousandsofPassengers

    1950 1952 1954 1956 1958 1960

    2.0

    2.2

    2.4

    2.6

    2.8

    Figure 5.13: Log10 international airline passengers.

  • 8/3/2019 Analysis Time Series

    64/95

    58 Chapter 5. Fitting and Forecasting

  • 8/3/2019 Analysis Time Series

    65/95

    Chapter 6

    Frequency Domain Analysis

    6.1 Some Background

    6.1.1 Complex Exponentials, Sines and Cosines

    The following formula defines the relationship between the complex exponentialfunction and the real sine and cosine functions.

    ei = cos + i sin

    From this it is possible to derive many trigonometric identities. For example,we know that

    ei(+)

    = cos( + ) + i sin( + )

    and also that

    ei(+) = eiei

    = (cos + i sin )(cos + i sin )

    = cos cos sin sin + i(cos sin + sin cos ).

    Equating real and imaginary parts

    cos( + ) = cos cos sin sin sin( + ) = cos sin + sin cos .

    It is also possible to invert the basic formula to obtain the following represen-tations for the sine and cosine functions.

    cos =ei + ei

    2

    sin =ei ei

    2.

    For many people it is a natural instinct to try to rewrite ei in the cos + i sin form. This is often a mistake because the exponential function is usually easierto handle.

    59

  • 8/3/2019 Analysis Time Series

    66/95

    60 Chapter 6. Frequency Domain Analysis

    Example 6.1.1 (From Homework)

    Tt=T

    eit =sin (T + 1/2)

    sin /2

    While it is possible to show this by converting immediately to cosines and sines,it is much simpler to recognize that this is just a geometric series and use theformula for summing a geometric series.

    Tt=T

    eit = eiT2Tt=0

    eit

    = eiT

    ei(2T+1) 1

    ei

    1

    =ei(T+1) eiT

    ei 1

    =ei(T+1) eiT

    ei 1 ei/2

    ei/2

    =ei(T+1/2) ei(T+1/2)

    ei/2 ei/2

    =sin (T + 1/2)

    sin /2

    6.1.2 Properties of Cosinusoids

    The general cosinusiod function is defined to be

    f(t) = a cos(t + )

    where a is the amplitude, is the angular frequency and is the phase of thecosinusoid.

    We will be concerned with the case that t takes integer values. In that caseit makes sense to restrict to the range [0, 2]. This is because

    a cos(( + 2k)t + ) = a cos(t + + 2kt)

    = a cos(t + )because cos is periodic with period 2 and 2kt is a multiple of 2. This lackof identifiability is known as the aliasing problem. It is illustrated in figure 6.1.

    Note thata sin(t + ) = a cos(t + ( +pi))

    so that sines are also cosinusoids. It is also common to refer the function

    aei(t+)

    as a complex cosinusoid.

  • 8/3/2019 Analysis Time Series

    67/95

    6.1. Some Background 61

    0 2 4 6 8 10

    1.0

    0.5

    0.0

    0.5

    1.0

    Figure 6.1: The aliasing problem for cosines

    6.1.3 Frequency and Angular Frequency

    A cosinusoid with frequency

    a cos(t + )

    repeats itself every two time units. Such a function is usually said to have afrequency of 0.5 because it goes through 0.5 cycles in a unit time period.

    In general

    Frequency =Angular Frequency

    2.

    Frequency is more meaningful but leads to lots of 2s in formulae. Becauseof this, it is usual to carry out theoretical studies of time series with angularfrequency, but to perform data analysis in terms of frequency.

    Theory Practise

    a cos(t + ) a cos(2t + )

    The curse of time series analysis is that 2 = 1.

    6.1.4 Invariance and Complex Exponentials

    We will be considering experimental situations which are in some sense time

    invariant. This means that we can expect to obtain the same types of resultstoday as we might tomorrow. One definition of invariance for a function f isthat it satisfy

    f(t + u) = f(t) t, u = 0, 1, 2, . . .The only functions which satisfy this condition are constant.

    A less restrictive condition would require

    f(t + u) = Cuf(t) t, u = 0, 1, 2, . . . , and C1 = 0 (6.1)Setting u = 1 we obtain

    f(t) = C1f(t 1) = C21f(t 2) = = Ct1f(0) t 0

  • 8/3/2019 Analysis Time Series

    68/95

    62 Chapter 6. Frequency Domain Analysis

    and

    f(t) = C1f(t + 1) = C

    2

    1f(t + 2) = = Ct

    1f(0) t 0This means that for any t Z we can write

    f(t) = Ct1f(0).

    If we write C1 = e (for real or complex) and A = f(0), then the general

    solution of equation 6.1 isf(t) = Aet.

    The bounded solutions correspond to = i for real. In other words, thegeneral bounded solution to 6.1 is

    f(t) = Aeit.

    This type of invariance also extends by linearity to functions of the form

    f(t) =j

    cjeijt

    because

    f(t + u) =j

    cjeij(t+u)

    =j

    cjeijteiju

    =j

    cjeijt

    The study of functions which can be written as a sum of complex exponentialsis called harmonic analysis or Fourier analysis.

    6.2 Filters and Filtering

    6.2.1 Filters

    A filter is an operation which takes one time series as input and produces anotheras output. We indicate that Y(t) is a filtered version of X(t) as follows.

    Y(t) = A[X](t).A filter is called linear if it satisfies

    A[X+ Y](t) = A[X](t) + A[Y](t)for all constants and , and time-invariant if it satisfies

    A[LuX](t) = LuA[X](t)for all integer values of u.

    A important class of linear, time-invariant filters can be written in the form

    A[X](t) =

    u=

    a(u)X(t u).

    Here the a(u) are called the filter coefficients.

  • 8/3/2019 Analysis Time Series

    69/95

    6.2. Filters and Filtering 63

    Example 6.2.1 Moving Average Filters

    Moving average filters of the form

    A[X](t) = 12M + 1

    Mu=M

    X(t u)

    are used for smoothing time series.

    Example 6.2.2 The Differencing Filter

    The differencing filter defined by

    A[X](t) = X(t) X(t 1)

    is used to eliminate long-term trends from time series.

    6.2.2 Transfer Functions

    While the filter coefficients provide a complete description of what a filter does,they may not provide much intuition about what kind of effect a filter willproduce. An alternative way of investigating what a filter does is to examine itseffect on complex exponentials (or equivalently on sine and cosine functions).For notational convenience we will define the function E(t) by

    E(t) = eit.

    Clearly,

    LuE(t) = ei(t+u)

    = eiuE(t).

    Time invariance and linearity then allow us to show

    A[E](t + u) = LuA[E](t)= A[LuE](t)= A[eiuE]= eiuA[E](t).

    Setting t = 0 produces

    A[E](u) = eiuA[E](0).

    The function A() = A[E](0) is known as the transfer function of the filter.The argument above has shown that

    A[E](u) = A()E(u).

    In other words, linear lime invariant filtering of a complex exponential functionproduces a constant multiple of a complex exponential function with the samefrequency.

  • 8/3/2019 Analysis Time Series

    70/95

    64 Chapter 6. Frequency Domain Analysis

    Note that for any integer value of k,

    E+2k(t) = E(t)

    for all t Z. This means that transfer functions are periodic with period 2.In general, transfer functions are complex-valued. It can often be useful to

    rewrite them in their polar form

    A() = G()ei().

    G() is the gain function of the filter and () is the phase function.If the filters coefficients are real-valued then it is easy to show that the

    transfer function satisfiesA() = A().

    This in turn means that the gain must satisfy

    G() = G()and the phase must satisfy

    () = ().

    6.2.3 Filtering Sines and Cosines

    The transfer function describes the effect of a filter on complex exponentialfunctions. Using linearity and the representations of sin and cos in terms ofcomplex exponentials it is easy to show that filtering

    R cos(t + )

    produces the resultG()R cos(t + + ()).

    The gain and phase functions (or equivalently the transfer function) or a filterdescribe the action of the filter on cosinusoids in exactly the same way as theydo