Modeling with ARMA Processes: I - University of Washingtonfaculty.washington.edu/dbp/s519/PDFs/2013/13-overheads.pdf · Modeling with ARMA Processes: I • goal: determine which ARMA

Modeling with ARMA Processes: I

• goal: determine which ARMA(p, q) process is best model forobserved time series x1, . . . , xn

• tasks at hand include

� determining p and q (order selection)

� estimating process mean (easily done!), coe�cients �j & ✓j(not so easy) and white noise variance �2 (relatively easy)

� subjecting selected model to goodness-of-fit tests

• note: will assume that, if need be, series x1, . . . , xn has beenadjusted so that it can be regarded as realization of zero meanstationary process (usual procedure: take sample mean x0 =Pn

t=1 x0t of original series x01, . . . , x0n and set xt = x0t � x0)

BD–137, CC–149, SS–121 XIII–1

Modeling with ARMA Processes: II

• with p & q assumed initially to be known, will advocate Gaussianmaximum likelihood (ML) estimators for �j, ✓j & �2

• requires use of nonlinear optimization procedure, for which needgood initial estimates of coe�cients �j & ✓j

• can base initial estimates on easier-to-compute estimators

� Yule–Walker (Y–W) estimator (good for AR(p) case)

� Burg estimator (for AR(p) also)

� innovations algorithm (handles MA(q) and ARMA(p, q))

� Hannan–Rissanen (adapts Y–W to handle ARMA(p, q))

• will now described these estimators, along with some prelimi-nary discussion about order selection

BD–138, CC–149, SS–121 XIII–2

Yule–Walker Estimation: I

• assume causal AR(p) model �(B)Xt = Zt, i.e.,

Xt �pX

j=1

�jXt�j = Zt, (⇤)

with {Zt} ⇠WN(0,�2)

• can develop set of p linear equations linking

� = [�1, . . .�p]0

to ACVF values �(0), �(1), . . . , �(p� 1)

• Y–W estimators gotten by substituting usual estimator �(h)for �(h) in p equations (so-called ‘moment matching’)

• one additional equation needed to estimate �2 via this scheme

BD–139, CC–149, SS–121 XIII–3

Yule–Walker Estimation: II

• with h � 0, multiply both sides of (⇤) by Xt�h:

XtXt�h �pX

j=1

�jXt�jXt�h = ZtXt�h

• take expectations to get

�(h)�pX

j=1

�j�(h� j) = E{ZtXt�h} =

(�2, h = 0;

0, h � 1,(⇤⇤)

because causality allows us to write

Xt =1Xj=0

jZt�j and hence E{ZtXt�h} =1Xj=0

jE{ZtZt�h�j}

(recall that 0 = 1)

BD–139, CC–149, SS–121 XIII–4

Yule–Walker Estimation: III

• leads toPp

j=1 �j�(h� j) = �(h) for h = 1, . . . , p:

�1�(0) + �2�(1) + · · ·�p�(p� 1) = �(1)�1�(1) + �2�(0) + · · ·�p�(p� 2) = �(2)

...�1�(p� 1) + �2�(p� 2) + · · ·�p�(0) = �(p)

• in matrix notation we have �p� = �p, where

�p =

2664

�(0) �(1) · · · �(p� 1)�(1) �(0) · · · �(p� 2)

... ... . . . ...�(p� 1) �(p� 2) · · · �(0)

3775 , � =

2664�1�2...�p

3775 , �p =

2664�(1)�(2)

...�(p)

3775

• to (mis)quote Yogi Berra: ‘This is like deja vu all over again!’

BD–139, CC–149, SS–121 XIII–5

Yule–Walker Estimation: IV

• exactly same matrix equation arose when trying to find coe�-cients of best linear predictor of Xn+1 given Xn, . . . , X1 for ageneral stationary process {Xt} (i.e., not necessarily AR(p))

• given time series x1, . . . , xn, form usual estimate of ACVF:

�(h) =1

n

n�|h|Xt=1

xtxt+|h|

• with �p & �p formed by replacing �(h)’s in �p & �p by �(h)’s,Y–W estimator � of AR(p) coe�cients � given by

� = ��1p �p,

where inverse ��1p exists as long as time series isn’t ‘boring’

BD–139, 140, CC–149, SS–121 XIII–6

Yule–Walker Estimation: V

• can solve equation using Levinson–Durbin recursions

• note: provides Y–W estimates not only for order p, but also forall lower orders 1, . . . , p� 1

• once � has been computed, can return to h = 0 case of (⇤⇤),namely,

�2 = �(0)�pX

j=1

�j�(j), to get estimator �2 = �(0)�pX

j=1

�j�(j)

(similar equation arose for getting MSE of best linear predictor)

• fitted model

Xt � �1Xt�1 � · · ·� �pXt�p = Zt, {Zt} ⇠WN(0, �2),

is guaranteed to be causal!

BD–139, 140, CC–149, SS–121 XIII–7

Yule–Walker Estimation: VI

• fitted model has theoretical ACVF that is identical to esti-mates �(h) at lags h = 0, 1, . . . , p, but in general is di↵erentat higher lags

• as an example, let’s revisit the sunspot time series:

� fit AR models of orders p = 1, . . . , 8 & then 29 using Y–W

� compare sample ACVF to the theoretial ACVFs correspond-ing to 9 fitted AR models

BD–141, CC–149, SS–121 XIII–8

Sunspots (1749–1963)

1750 1800 1850 1900 1950

050

100

150

year

x t

BD–99 I–7

Sample PACF for Sunspots

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

PAC

F

BD–99 XII–13

Sample and Fitted AR(1) ACVFs for Sunspots

0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–9


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–10


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–11


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–12


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–13


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–14


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–15


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–16


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–17

Yule–Walker Estimation: VII

• distribution of Y–W estimators � is approximately multivariatenormal with mean � & covariance �2��1

p /n for large n

• large sample distribution of ML estimators is the same

• don’t even need to worry about inverting �p: can show that

�2��1p = A0A�B0B = AA0 �BB0,

where A and B are p⇥ p lower triangular matrices whose firstcolumns are, respectively,2

6641��1

...��p�1

3775 and

2664

�p�p�1

...�1

3775 ;

A has the same element along any given diagonal; and B has asimilar structure (sometimes referred to as a Toeplitz structure)

BD–141, CC–161, SS–122 XIII–18

Confidence Intervals and Regions for �

• can use large sample distribution to get approximate confidenceintervals for individual �j’s or confidence region for vector �

• approximate 95% confidence interval for �j given by24�j � 1.96

v1/2j,jpn

, �j + 1.96v

1/2j,jpn

35 ,

where vj,j is jth diagonal element of �2��1p

• letting �20.95(p) denote 95% quantile of chi-squared distribution

with p degrees of freedom, approximate 95% confidence regionfor � is the set of all �’s such that

(�� )0�p(�� ) �20.95(p)

�2

n

BD–142, 143 XIII–19

Yule–Walker Estimation and Order Selection: I

• when Y–W is used to estimate coe�cients for AR(h) model

Xt � �1Xt�1 � · · ·� �hXt�h = Zt,

estimate �h is same as �h,h (hth member of sample PACF)

• as noted before, large sample theory suggests that �h,h is ap-proximately N (0, 1/n) for h > p (the true AR model order)

• given estimates of �h,h out to some maximum order, say H,Brockwell & Davis suggest setting p to be smallest m such that|�h,h| < 1.96/

pn for m < h H

• obvious danger: sampling variability might result in p being settoo high

� with H = 40 in sunspot example, would select p = 29, whichmight not be a reasonable choice

BD–96, 141, CC–115, SS–122 XIII–20

Yule–Walker Estimation and Order Selection: II

• another approach is to select order that minimizes AICC statis-tic (biased-corrected version of Akaike’s information criterion):

AICC = �2 ln (L(�, S(�)/n)) +2(p + 1)n

n� p� 2,

where L is Gaussian likelihood function, and S(�) is definedbelow

• given zero-mean Gaussian AR(p) time series Xn with covari-ance matrix �n (implicitly dependent on � & �2), can write

L(�n) = (2⇡)�n/2(det �n)�1/2 exp�� 1

2X0n��1

n Xn�

and hence

�2 ln (L(�n)) = n ln (2⇡) + ln (det �n) + X 0n��1n Xn

BD–141, 142, 158, CC–130, SS–53, 153 XIII–21

Yule–Walker Estimation and Order Selection: III

• when considering ML estimation later on, will argue that


can be rewritten in AR(p) case as

�2 ln L(�,�2) = n ln(2⇡�2) +p�1Xj=0

ln(rj) +nX

j=1

(Xj � bXj)2

�2rj�1,

where rj ⌘ vj/�2 (note: rj = 1 for j � p)

• dependence on � is through vn’s and coe�cients determiningbXj (can get these from � using reverse L–D recursions)

• can remove �2 by replacing it with S(�)/n, where

S(�) =nX

j=1

(Xj � bXj)2

rj�1

BD–141, 142, 158, CC–130, SS–53, 153 XIII–22

Yule–Walker Estimation and Order Selection: IV

• with removal of �2, AICC statistic becomes

AICC = Cn + n ln

0@ nX

j=1

(Xj � bXj)2

rj�1

1A +

p�1Xj=0

ln(rj) +2(p + 1)n

n� p� 2,

whereCn ⌘ n + n ln(2⇡/n)

• note: will discuss other order selection statistics (BIC etc.) later

• let’s see what order the AICC picks out for sunspot series

BD–141, 142, 158, CC–130, SS–53, 153 XIII–23

AICC for Sunspots

0 10 20 30 40

1850

1900

1950

model order

AIC

C

XIII–24


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–25

Example – Recruitment Time Series: I

• monthly measure of number of new fish entering Pacific Ocean(453 months covering 1950–87; Shumway & Sto↵er got it fromRoy Mendelssohn, NOAA/PFEL, who got it from Pierre KleiberNOAA/NMFS, who generated measures using a model . . . )

SS–7 XIII–26

Recruitment Time Series (1950–1987)

0 100 200 300 400

020

4060

8010

0

t (months starting with Jan 1950)

x t

SS–8 XIII–27

Sample ACF for Recruitment Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

ACF

SS–109 XIII–28

Sample PACF for Recruitment Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

PAC

F

SS–109 XIII–29

Sample & Fitted AR(2) ACFs for Recruitment Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

ACF

XIII–30

Example – Recruitment Time Series: II

• for AR(2) model, Y–W estimates are

� =

"�1

�2

#.=

1.3316�0.4445

�and �2 .

= 94.171

(note: R function ar gives �2 .= 94.799 . . . hmmm)

• using large sample approximation that � is multivariate normalwith mean � and covariance �2��1

2 , can get 95% confidenceintervals (CIs) and regions based upon

�2��12 = �2

�(0) �(1)�(1) �(0)

��1

=

v1,1 v1,2v2,1 v2,2

�.=

0.8024 �0.7396�0.7396 0.8024

�

• usingh�j � 1.96 v

1/2j,j /p

n, �j + 1.96 v1/2j,j /p

ni

yields 95% CIs

[1.2491, 1.4141] for �1 and [�0.5270,�0.3621] for �2

XIII–31

AICC for Recruitment Series

0 10 20 30 403320

3340

3360

3380

3400

3420

model order

AIC

C

XIII–32

Sample & Fitted Y–W AR(13) ACFs

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

ACF

XIII–33

Burg’s Algorithm: I

• Y–W estimator � of � is based on L–D recursions with �(h)replaced by �(h)

• given �k�1 & vk�1, recursion gives us �k & vk via 3 steps

1. get kth order partial autocorrelation:

�k,k =�(k)�

Pk�1j=1 �k�1,j�(k � j)

vk�1

2. get remaining �k,j’s:264

�k,1...

�k,k�1

375 =

264

�k�1,1...

�k�1,k�1

375� �k,k

264�k�1,k�1

...�k�1,1

375

3. get kth order MSE: vk = vk�1(1� �2k,k)

BD–147, 148 XIII–34

Burg’s Algorithm: II

• when k = p, get Y–W estimators � = �p and �2 = vp

• start procedure by setting

�1 = [�1,1] =

�(1)

�(0)

�and v1 = �(0)(1� �2

1,1)

• sample ACVF comes into play in forming �1 and in 1st step ofL–D recursions, but not in 2nd and 3rd steps

• sample ACVF just used to get PACF estimates �1,1, . . . , �p,p

• kth component of PACF is a correlation coe�cient:

�k,k = corr {Xk � bXk,X0 � bX0|k�1}• Burg’s algorithm is based on estimating �k,k in keeping with

the above rather than via sample ACVF

BD–147, 148 XIII–35

Burg’s Algorithm: III

• let �k�1 = [�k�1,1, . . . , �k�1,1]0 be Burg estimator of coe�-

cients for AR(k � 1) process based on X1, . . . , Xn

• calculate forward & backward observed innovations:

�!Ut(k � 1) ⌘ Xt �

k�1Xj=1

�k�1,jXt�j, k t n

�Ut�k(k � 1) ⌘ Xt�k �

k�1Xj=1

�k�1,jXt�k+j, k + 1 t n + 1

• can show that, for any estimator �k,k with �k,1, . . . , �k,k�1generated by step 2 of L–D, have, for k + 1 t n

�!Ut(k) =

�!Ut(k � 1)� �k,k

�Ut�k(k � 1)

�Ut�k(k) =

�Ut�k(k � 1)� �k,k

�!Ut(k � 1)

BD–147, 148 XIII–36

Burg’s Algorithm: IV

• Burg’s idea: choose �k,k that minimizes

SSk(�k,k) ⌘nX

t=k+1

�!U 2

t (k) + �U 2

t�k(k)

• yields Burg’s estimator

�k,k ⌘Pn

t=k+1�!Ut(k � 1)

�Ut�k(k � 1)

12Pn

t=k+1�!U 2

t (k � 1) + �U 2

t�k(k � 1)

• compare above to following expression:

�k,k = corr {Xk � bXk,X0 � bX0|k�1}

=cov {Xk � bXk,X0 � bX0|k�1}�

var {Xk � bXk} var {X0 � bX0|k�1}�1/2

BD–147, 148 XIII–37

Burg’s Algorithm: V

• initialize with�!Ut(0) ⌘ Xt and

�Ut�1(0) ⌘ Xt�1

• guaranteed to have |�k,k| 1

• if |�p,p| 6= 1, Burg estimators � = �p of coe�cients � alwayscorrespond to stationary & causal AR(p) process (same is truefor Y–W, except that |�p| = 1 can’t happen)

• large sample distribution for Burg same as for Y–W and ML,but Monte Carlo studies show Burg outperforming Y–W

• fitted model has theoretical ACVF that need not be identicalto sample ACVF, as another visit to sunspot time series shows

BD–147, 148 XIII–38


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–39


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–40


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–41


0 10 20 30 40

−500

050

010

0015

00

h (lag)

ACVF

XIII–42

Example – Recruitment Time Series: III

• reconsider recruitment time series, this time using Burg’s algo-rithm

• can use Burg to get estimate of PACF that is an alternative tosample PACF (latter is based on Y–W)

XIII–43

Sample PACF for Recruitment Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

PAC

F

SS–109 XIII–29

Burg Estimate of PACF for Recruitment Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

PAC

F

XIII–44

Sample & Fitted AR(2) ACFs for Recruitment Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

ACF

XIII–45

Example – Recruitment Time Series: IV

• for AR(2) model, Burg estimates are

� =

�1�2

�.=

1.3515�0.4620

�and �2 .

= 89.337,

as compared to Y–W estimates:

� =

"�1

�2

#.=

1.3316�0.4445

�and �2 .

= 94.171

• can determine Burg estimator �(h) of ACVF by feeding � intoone of the methods for computing theoretical ARMA ACVFs

• yields

�2��12 = �2

�(0) �(1)�(1) �(0)

��1

=

v1,1 v1,2v2,1 v2,2

�.=

0.7866 �0.7271�0.7271 0.7866

�

& 95% CIs [1.2698, 1.4332] for �1 & [�0.5436,�0.3803] for �2

XIII–46

95% Confidence Regions for � (Y–W and Burg)

1.20 1.25 1.30 1.35 1.40 1.45 1.50

−0.60

−0.50

−0.40

−0.30

q1

q 2

XIII–47

95% Confidence Regions and Causality Region for �

−2 −1 0 1 2

−1.0

−0.5

0.0

0.5

1.0

q1

q 2

XIII–48

AICC for Recruitment Series

0 10 20 30 403320

3340

3360

3380

3400

3420

model order

AIC

C

XIII–49

Sample & Fitted Burg AR(13) ACFs

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

ACF

XIII–50

Sample & Fitted Y–W AR(13) ACFs

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

ACF

XIII–33

Moment Matching and MA(q) Processes: I

• Y–W & Burg give preliminary estimates of � & �2 for AR(p)

• Y–W estimator based on moment matching

• relates �1, . . . , �p and �2 to ACVF values �(0), . . . �(p) viap + 1 linear equations

• solve p equations to get �, namely,

�1�(0) + �2�(1) + · · ·�p�(p� 1) = �(1)�1�(1) + �2�(0) + · · ·�p�(p� 2) = �(2)

...�1�(p� 1) + �2�(p� 2) + · · ·�p�(0) = �(p)

after which get �2 via

�2 = �(0)� �1�(1)� · · ·� �p�(p)

• Q: is a similar scheme viable to estimate ✓ & �2 for MA(q)?

BD–145, 146, CC–150, SS–123 XIII–51

Moment Matching and MA(q) Processes: II

• consider invertible MA(1) model: Xt = Zt+✓Zt�1 with |✓| < 1and {Zt} ⇠WN(0,�2)

• ACVF given by

�(h) =

8><>:�2(1 + ✓2), h = 0,

�2✓, h = ±1,

0, otherwise,

• using �(0) = �2(1 + ✓2) and �(1) = �2✓ to express ✓ in termsof ACVF leads to solving nonlinear equation

�(1)

�(0)= ⇢(1) =

✓

1 + ✓2, i.e., need to find roots of ⇢(1)✓2�✓+⇢(1) = 0

BD–145, 146, CC–150, SS–123 XIII–52

Moment Matching and MA(q) Processes: III

• when ⇢(1) 6= 0, possible solutions are

✓ =1 ±

p1� 4⇢2(1)

2⇢(1), which requires �1

2 ⇢(1) 12

for ✓ to be real-valued (need �12 < ⇢(1) < 1

2 to satisfy |✓| < 1)

• since ⇢(1) need not obey this constraint, moment matching canfail to give viable estimators of ✓ for MA(q) process

• failure of moment matching might suggest MA(q) model isinappropriate!

• if viable moment matching estimator ✓ exists in MA(1) case,estimator of �2 is

�2 =�(0)

1 + ✓2

BD–145, 146, CC–150, SS–123 XIII–53

Innovations Algorithm: I

• alternative to moment matching is innovations algorithm (IA),which involves linear manipulation of �(h)’s

• innovations representation for stationary process {Xt} is

Xn+1 =nX

j=0

✓n,jUn�j+1, n = 0, 1, 2, . . . ,

where ✓n,0 = 1, other ✓n,j’s given by IA, U1 = X1 and

Un+1 = Xn+1� bXn+1, with bXn+1 =nX

j=1

�n,jXn, n = 1, 2, . . . ,

and bX1 = 0

• vn = var {U2n+1} = E{(Xn+1 � bXn+1)

2} is associated MSE

BD–73, 150, 151, SS–114 XIII–54

Innovations Algorithm: II

• when {Xt} is invertible MA(q) process with coe�cients ✓1,. . . , ✓q & white noise variance �2, innovations representationsimplifies:

Xn+1 =qX

j=0

✓n,jUn�j+1, n = 0, 1, 2, . . . ,

where ✓n,j ! ✓j and vn! �2 as n!1• convergence can be rapid or painfully slow, depending on how

close roots of ✓(B) = 0 are to unit circle

• as examples, let’s consider three MA(3) processes:

Xt = Zt + 0.4Zt�1 + 0.2Zt�2 + 0.1Zt�3 (1)Xt = Zt + 0.8Zt�1 + 0.8Zt�2 + 0.8Zt�3 (2)Xt = Zt + 0.84Zt�1 + 0.88Zt�2 + 0.92Zt�3 (3)

BD–150, 151, SS–114 XIII–55

Root Plot for First MA(3) Process

−3 −2 −1 0 1 2 3

−3−2

−10

12

3

x

y

**

*

*

*

*

XIII–56

Convergence of ✓n,j’s to ✓j for First MA(3) Process

5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

n

e nj

XIII–57

Root Plot for Second MA(3) Process

−2 −1 0 1 2

−2−1

01

2

x

y

*

*

**

*

*

XIII–58

Convergence of ✓n,j’s to ✓j for Second MA(3) Process

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

n

e nj

XIII–59

Root Plot for Third MA(3) Process

−2 −1 0 1 2

−2−1

01

2

x

y

*

*

**

*

*

XIII–60

Convergence of ✓n,j’s to ✓j for Third MA(3) Process

5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

n

e nj

XIII–61

Innovations Algorithm: III

• IA munches on �(h)’s and spits out ✓n,j’s and vn’s

• IA estimators of ✓ and �2 for MA(q) process gotten by lettingIA munch on �(h)’s instead

• let ✓n,j and vn denote what IA gives when handed �(h)’s

• IA estimators ✓ & �2 of ✓ & �2 given by ✓j = ✓n,j & �2 = vn

• large sample theory for ✓ & �2 much more complicated thanthat for Y–W and Burg (see B&D, p. 151)

BD–150, 151 XIII–62

Innovations Algorithm: IV

• theory says: for invertible MA(q) process satisfying certain reg-ularity conditions and for any fixed k > 0, normalized vector

n1/2h✓m,1 � ✓1, . . . , ✓m,k � ✓k

iconverges to multivariate normal with mean vector 0 and co-variance matrix A whose (i, j)th element is

Ai,j =

min {i,j}Xl=1

✓i�l✓j�l; in particular, var {✓m,i} ⇡Ai,i

n=

1

n

i�1Xl=0

✓2l

(here ✓0 ⌘ 1 and ✓l ⌘ 0 for l > q)

• need m⌧ n, but m must be large enough so E{✓m,i} ⇡ ✓i

BD–151 XIII–63

Confidence Intervals and Regions for ✓

• can use large sample distribution to get approximate confidenceintervals for individual ✓j’s or confidence region for vector ✓

• with ✓0 ⌘ 1, approx. 95% confidence interval for ✓j given by"✓j � 1.96

✓Pj�1l=0 ✓

2l

n

◆1/2

, ✓j + 1.96

✓Pj�1l=0 ✓

2l

n

◆1/2#

• approx. 95% confidence region for ✓ is set of all ✓’s such that

(✓ � ✓)0A�1(✓ � ✓) �20.95(q)

n,

where q ⇥ q matrix A has (i, j)th element given by

Ai,j =

min {i,j}Xl=1

✓i�l✓j�l

BD–152, 153 XIII–64

Order Selection for MA(q) Processes

1. overhead IX–52: for large n, sample ACF RVs for MA(q) timeseries at lags h > q are approximately N (0, wh,h/n), where

wh,h = 1 + 2⇢2(1) + · · · + ⇢2(q)

(⇢(h) should fall between ±1.96(wh,h/n)1/2 with prob.⇡ 95%)

2. in a similar manner, can base order selection on IA estimator✓, using its large sample theory to assess variability

3. can also select q to be minimizer of AICC statistic, with likeli-hood for each order being evaluated using IA estimator ✓

BD–152, CC–110, SS–524, 525 XIII–65

Atomic Clock Time Series

0 200 400 600 800 1000

−20

−10

010

t

x t

IX–53

Sample ACF for Atomic Clock Series

0 10 20 30 40

−0.4

−0.2

0.0

0.2

0.4

h (lag)

ACF

IX–54

Sample PACF for Atomic Clock Series

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

PAC

F

BD–99 XIII–66

PACF for MA(1) Process with ✓ = �0.5

0 10 20 30 40

−1.0

−0.5

0.0

0.5

1.0

h (lag)

PAC

F

BD–99 XIII–67

Moment Matching for Atomic Clock Series

• assuming q = 1, can do moment matching since ⇢(1).= �0.3659

and hence |⇢(1)| < 0.5, as required

• possible estimates are

✓ =1 ±

p1� 4⇢2(1)

2⇢(1); i.e., either ✓

.= �2.2978 or ✓

.= �0.4352

• two estimates are reciprocals of one another

• choice ✓.= �0.4352 corresponds to invertible MA(1) model

• corresponding estimate of �2 is

�2 =�(0)

1 + ✓2

.= 23.919

• assuming q = 2, . . . hmmm . . .

XIII–68

Convergence of ✓n,j’s for Atomic Clock Series

0 10 20 30 40

−0.6

−0.4

−0.2

0.0

0.1

n

e nj

j = 1 j = 2 j = 3 j = 4

XIII–69

Convergence of vn’s for Atomic Clock Series

0 10 20 30 40

2224

2628

n

v n

XIII–70

Innovations Algorithm for Atomic Clock

• appears have converged by n = 15 (sooner?)

• base estimates & 95% CIs for ✓1, . . . , ✓4 on ✓15,1, . . . , ✓15,4:

j ✓j lower bound upper bound1 �0.5879 �0.6491 �0.52662 �0.1316 �0.2026 �0.06053 �0.0690 �0.1405 0.00254 �0.0044 �0.0760 0.0672

• moment matching for MA(1) model gave ✓.= �0.4352, which

is not within 95% CI based on ✓1

• CIs suggest MA(2) model is appropriate

• here �2 = v15.= 20.782 (got �2 .

= 23.919 from MA(1) momentmatching)

BD–151 XIII–71

95% Confidence Region for ✓

−0.75 −0.65 −0.55 −0.45

−0.30

−0.20

−0.10

0.00

e1

e 2

XIII–72

95% Confidence Region & Invertibility Region for ✓

−2 −1 0 1 2

−1.0

−0.5

0.0

0.5

1.0

e1

e 2

XIII–73

AICC for Atomic Clock Series

0 10 20 30 40

6020

6040

6060

6080

model order

AIC

C

XIII–74

Parameter Estimation for Mixed ARMA(p, q) Models

• so far have discussed estimation techniques appropriate for pureAR(p) models and pure MA(q) models

• can handle mixed ARMA models (i.e., p > 0 and q > 0) using

� innovations algorithm

� higher-order Yule–Walker method with innovations algorithm

� Hannan–Rissanen algorithm, an example of a so-called leastsquares estimator

XIII–75

Innovations Algorithm for Mixed ARMA Models: I

• assume causal ARMA process

Xt� �1Xt�1� · · ·� �pXt�p = Zt + ✓1Zt�1 + · · ·+ ✓qZt�q,

where {Zt} ⇠WN(0,�2)

• because of causality, can write

Xt =1Xj=0

jZt�j,

where 0 = 1 and

j = ✓j +

min {p,j}Xi=1

�i j�i, j � 1, (⇤)

for which we define ✓j = 0 for j > q

BD–154, 155 XIII–76

Innovations Algorithm for Mixed ARMA Models: II

• knowing 1 . . . , p+q, can solve for �i’s & ✓j’s, as follows

• equation (⇤) for j = 1, . . . , q gives

1 = ✓1 + �1, . . . , q = ✓q +

min {p,q}Xi=1

�i q�i (•)

• (⇤) for j = q + 1, . . . , q + p does not involve ✓j’s directly:

q+1 =

min {p,q+1}Xi=1

�i q+1�i, . . . , q+p =pX

i=1

�i q+p�i (†)

• use (†) to solve for �i’s, after which (•) gives

✓j = j �min {p,j}X

i=1

�i j�i, j = 1, . . . , q

BD–154, 155 XIII–77

Innovations Algorithm for Mixed ARMA Models: III

• IA takes ACVF and gives ✓m,j’s, where ✓m,j ! j as m!1• to get estimates of �i’s and ✓j’s,

1. use IA with sample ACVF to get estimates of ✓m,j (with mchosen large enough to ensure convergence)

2. set j equal to estimate of ✓m,j

3. use j’s with p + q equations to get �i’s and ✓j’s

• B&D note that

� resulting �i need not correspond to a causal process

� order selection using sample ACVF and PACF dicey withmixed models because no clear patterns to distinguish be-tween, e.g., ARMA(2,1) and ARMA(1,2)

� order selection can still be done using AICC

BD–154, 155 XIII–78

Innovations Algorithm for Mixed ARMA Models: IV

• can base estimate of �2 on normalized one-step-ahead MSEs:

�2 =1

n

nXt=1

(Xt � bXt)2

rt�1, where rt�1 =

E{(Xt � bXt)2}�2 ,

and bXt is predictor of Xt based upon Xt�1, . . . , X1

• can get �2 by using �(0) (sample variance for time series)along with estimates �i & ✓j to calculate ACVF for theoreticalARMA(p, q) process and feeding this ACVF into L–D recur-sions – desired estimate �2 is nth order MSE vn

• alternatively, once ARMA(p,q) ACVF has been determined,apply IA to it with m set large enough so that vm is stable,and then use �2 = vm

BD–154, 155 XIII–79

Example – Atomic Clock Series: I

• as an example, let’s use IA to fit an ARMA(1,1) model

Xt � �Xt�1 = Zt + ✓Zt�1, {Zt} ⇠WN(0,�2),

to atomic clock series

• the p + q = 2 relevant equations are

1 = ✓ + � & 2 = � 1, yielding � = 2

1& ✓ = 1 � �

• basing our estimates of 1 and 2 on

✓15,1.= �0.5879 and ✓15,2

.= �0.1316

(see overhead XIII–71) yields

�.= 0.2238 and ✓

.= �0.8117

(corresponds to a causal and invertible ARMA(1,1) model)

• get �2 = 20.860 (compared to �2 .= 20.782 for MA(2) model)

XIII–80

Estimation of �2 via v100 from Innovations Algorithm

0 20 40 60 80 100

2224

2628

m

v m

XIII–81

Sample & ARMA(1,1) ACF for Atomic Clock

0 10 20 30 40

−0.4

−0.2

0.0

0.2

0.4

h (lag)

ACF

XIII–82

Sample & ARMA(1,1) PACF for Atomic Clock

5 10 15 20

−0.4

−0.2

0.0

0.2

0.4

h (lag)

PAC

F

XIII–83

AICC for Atomic Clock Series

0 10 20 30 40

6020

6040

6060

6080

number of parameters in model

AIC

C

XIII–84

Higher-Order Yule–Walker Method: I

• alternative to IA algorithm for handling mixed ARMA modelsis based on structure of ACVF {�(h)} for such models

• as noted before (overhead IX–20), ARMA(p, q) ACVF satisfies

�(k)� �1�(k � 1)� · · ·� �p�(k � p) = 0

for all k � q + 1

• does not involve MA coe�cients

• can use so-called higher-order Y–W equations to get �i’s:

�1�(q) + �2�(q � 1) + · · ·�p�(q � p + 1) = �(q + 1)�1�(q + 1) + �2�(q) + · · ·�p�(q � p + 2) = �(q + 2)

...�1�(q + p� 1) + �2�(q + p� 2) + · · ·�p�(q) = �(q + p)

BD–145 XIII–85

Higher-Order Yule–Walker Method: II

• with �i’s known, can filter time series X1, . . . , Xn and getoutput Yp+1, . . . , Yn with MA(q) structure:

Yt ⌘ Xt � �1Xt�1 � · · ·� �pXt�p

= Zt + ✓1Zq�1 + · · · + ✓qZt�q

• higher-order Y–W method with IA thus consists of

� substituting �(h)’s into higher-order Y–W equations andsolving to get estimates �i

� using �i’s to filter time series to get output, say Y 0t� forming sample ACVF for Y 0t ’s and using these as input to

IA to estimate MA coe�cients ✓j

BD–145 XIII–86

Example – Atomic Clock Series: II

• as an example, let’s use scheme to fit an ARMA(1,1) model toatomic clock series

• relevant higher-order Y–W equation is ��(1) = �(2), yielding

� =�(2)

�(1).= 0.1823, as compared to �

.= 0.2238 using IA

• estimate corresponds to a causal process, but might not happenfor other time series (no reason why �(1) ⇡ 0 can’t occur)

• forming sample ACVF for Y 0t = Xt � �Xt�1, t = 2, . . . , n,and feeding it into IA yields ✓n,j’s and vn’s shown on nextoverheads

XIII–87

Convergence of ✓n,j’s for Y 0t ’s

0 10 20 30 40

−0.8

−0.6

−0.4

−0.2

0.0

n

e nj

j = 1 j = 2 j = 3 j = 4

XIII–88

Convergence of vn’s for Y 0t ’s

0 10 20 30 40

2022

2426

2830

32

n

v n

XIII–89

Example – Atomic Clock Series: III

• using ✓15,1 to estimate ✓ in ARMA(1,1) model, get ✓.= �0.7748

compared to ✓.= �0.8117 using IA by itself (overhead XIII–80)

• using v15 to estimate �2 yields �2 .= 20.631 as compared to

IA-based �2 .= 20.860

• sampling theory for ✓n,j’s suggests that those for j = 2, 3 and 4are not significantly di↵erent from zero; i.e., ARMA(1,q) modelwith q > 1 not indicated

• AICC for fitted ARMA(1,1) model is 6022.7, so model is lesslikely than IA-based model with AICC of 6016.2

• next overheads

� show AICC compared to ones for IA-based MA(q) models

� compare theoretical and sample ACVFs and PACFs

XIII–90

AICC for Higher-Order Y–W Method

0 10 20 30 40

6020

6040

6060

6080

number of parameters in model

AIC

C

XIII–91


0 10 20 30 40

−0.4

−0.2

0.0

0.2

0.4

h (lag)

ACF

XIII–92


5 10 15 20

−0.4

−0.2

0.0

0.2

0.4

h (lag)

PAC

F

XIII–93

Least Squares Estimators: I

• as prelude to Hannan–Rissanen algorithm, consider least squares(LS) estimators for AR(p) coe�cients

• express AR(p) model as

Xt = �1Xt�1 + · · · + �pXt�p + Zt, {Zt} ⇠WN(0,�2),

• above looks like a multiple regression model, except explanatoryvariables are lagged versions of dependent variable Xt

CC–154, SS–126 XIII–94

Least Squares Estimators: II

• given time series X1, . . . , Xn, have n�p observations for whichwe can also get required explanatory variables:

Xp+1 = �1Xp + · · · + �pX1 + Zp+1

Xp+2 = �1Xp+1 + · · · + �pX2 + Zp+2...

Xn = �1Xn�1 + · · · + �pXn�p + Zn

• in matrix formulation, can write as X = A� + Z, where

X =

2664

Xp+1Xp+2

...Xn

3775, A =

2664

Xp · · · X1Xp+1 · · · X2

... ... ...Xn�1 · · · Xn�p

3775, � =

2664�1�2...�p

3775, Z =

2664

Zp+1Zp+2

...Zn

3775

CC–154, SS–126 XIII–95

Least Squares Estimators: III

• use ordinary LS to estimate �

• by definition, LS estimators minimize

Sf (�) ⌘nX

t=p+1

�Xt � �1Xt�1 � · · ·� �pXt�p

�2

• Sf is a quadratic function of �, so any minimizing � mustsatisfy, for i = 1, . . . , p,

@Sf (�)

@�i= �2

nXt=p+1

�Xt � �1Xt�1 � · · ·� �pXt�p

�Xt�i = 0

• yields set of p so-called normal equations:nX

t=p+1

�1Xt�1Xt�i + · · · + �pXt�pXt�i =nX

t=p+1

XtXt�i

CC–154, SS–126 XIII–96

Least Squares Estimators: IV

• in matrix formulation, normal equations become A0A� = A0X

• denote solution as �f – this is the LS estimator of �

• �f need not correspond to causal AR(p) model, and A0A neednot be positive definite

• interesting connection between �f and Y–W estimator �:

� take time series and add p zeros before X1 and p zeros afterXn to create a time series, say {X 0t}, with n + 2p values

� LS estimator of {X 0t} is identical to Y–W estimator of {Xt}!� in particular, A0A� = A0X reduces to �p� = �p

� solution � must correspond to causal AR(p) model, andA0A = �p is always positive definite – adding zeros acts asregularization procedure!

CC–154, SS–126 XIII–97

Least Squares Estimators: V

• in view of

Sf (�) =nX

t=p+1

�Xt � �1Xt�1 � · · ·� �pXt�p

�2 ,

can regard �f as arising from minimization of sum of squaredforward prediction errors

• in the same spirit as Burg’s algorithm, can also consider back-ward prediction errors:

Sb(�) =n�pXt=1

�Xt � �1Xt+1 � · · ·� �pXt+p

�2

• leads to forward/backward LS estimator �fb, which is whatever� minimizing

Sfb(�) = Sf (�) + Sb(�)

XIII–98

Hannan–Rissanen Algorithm: I

• Hannan–Rissanen (H–R) algorithm extends LS estimator towork with ARMA(p, q) processes

• reexpress ARMA model to mimic a multiple regression model:

Xt = �1Xt�1 + · · · + �pXt�p

+ ✓1Zt�1 + · · · + ✓qZt�q + Zt, {Zt} ⇠WN(0,�2)

• explanatory variables now also include lagged versions of errorsZt, which can’t be directly observed

• H–R gets around this by creating surrogates Zt for unobservableZt’s and then forging ahead with LS procedure

BD–156, 157 XIII–99

Hannan–Rissanen Algorithm: II

• start by fitting high-order AR(m) model to time series X1, . . . ,Xn using Y–W, where m > max {p, q}

• idea is that high-order AR model might be able to closely mimiccovariance structure of low-order ARMA model

• estimate Zt using

Zt = Xt� �m,1Xt�1� · · ·� �m,mXt�m, t = m + 1, . . . , n

• estimate � = [�1, . . . ,�p, ✓1, . . . , ✓q]0 by minimizing

S(�) =nX

t=m+1+q

⇣Xt � �1Xt�1 � · · ·� �pXt�p � ✓1Zt�1 � · · ·� ✓qZt�q

⌘2

• let � = [�1, . . . , �p, ✓1, . . . , ✓q]0 denote resulting estimator

BD–156, 157 XIII–100

Hannan–Rissanen Algorithm: III

• to get estimates of white noise component, recursively set

Zt =

(0, 1 t max {p, q};Xt �

Ppi=1 �iXt�i �

Pqj=1 ✓jZt�j, max {p, q} < t n

• since Zt’s might not be quite white, recursively set

Vt =

(0, 1 t max {p, q};Pp

i=1 �iVt�i + Zt, max {p, q} < t n

and

Wt =

(0, 1 t max {p, q};�

Pqj=1 ✓jWt�j + Zt, max {p, q} < t n

• note: �(B)Vt = Zt & ✓(B)Wt = Zt, so �(B)Vt = ✓(B)Wt

BD–157, 158 XIII–101

Hannan–Rissanen Algorithm: IV

• estimate � = [�1, . . . ,�p, ✓1, . . . , ✓q]0 by minimizing

S(�) =nX

t=max {p,q}+1

✓Zt �

pXi=1

�iVt�i �qX

j=1

✓jWt�j

◆2

• let �† = [�1, . . . , �p, ✓1, . . . , ✓q]0 denote resulting estimator

• H–R estimator is � = � + �†, but B&D stick with just �

• three comments

1. can handle both pure MA(q) & mixed ARMA(p, q) models

2. usual formulation of H–R calls for use of Y–W, but Burg isa better choice (in particular, Zt’s are computed as part ofBurg’s algorithm)

3. as in IA, choice of m > max {p, q} requires some care

BD–155, 156, 157 XIII–102

Example – Atomic Clock Series: IV

• as an example, let’s use H–R to fit an MA(4) model to atomicclock series to compare with IA results

• next two overheads look at

1. dependence of estimates ✓1, . . . , ✓4 on order m for approx-imating AR process; m = 5 to 40; m = 15 looks like goodchoice; dotted lines indicate ✓j = ✓15,j from IA

2. same, but now for refinement ✓j; will use m = 15 again

j ✓HR,j ✓HR,j ✓IA,j lower bound upper bound1 �0.5860 �0.5890 �0.5879 �0.6491 �0.52662 �0.1426 �0.1509 �0.1316 �0.2026 �0.06053 �0.0723 �0.0616 �0.0690 �0.1405 0.00254 �0.0058 �0.0110 �0.0044 �0.0760 0.0672

XIII–103

Convergence of ✓j’s for Atomic Clock Series

0 10 20 30 40

−0.6

−0.4

−0.2

0.0

0.1

m

e j

j = 1 j = 2 j = 3 j = 4

XIII–104

Convergence of ✓j’s for Atomic Clock Series

0 10 20 30 40

−0.6

−0.4

−0.2

0.0

0.1

m

e~ j

j = 1 j = 2 j = 3 j = 4

XIII–105

AICC for MA(q) Models for Atomic Clock Series

0 10 20 30 40

6020

6040

6060

6080

model order q

AIC

C

IA HR

XIII–106

Example – Atomic Clock Series: V

• now let’s use H–R to fit an ARMA(1,1) model

Xt � �Xt�1 = Zt + ✓Zt�1, {Zt} ⇠WN(0,�2),

• H–R yields �.= 0.2730 and ✓

.= �0.8662 (m = 15)

• IA yields �.= 0.2238 and ✓

.= �0.8117

• AICC for IA model is 6016.2, while that for H–R is 6011.5; i.e.,H–R model is more likely

XIII–107

AICC for MA(q) Models for Atomic Clock Series

0 10 20 30 40

6020

6040

6060

6080

model order q

AIC

C

IA HR HR ARMA(1,1)

XIII–108


0 10 20 30 40

−0.4

−0.2

0.0

0.2

0.4

h (lag)

ACF

XIII–109


5 10 15 20

−0.4

−0.2

0.0

0.2

0.4

h (lag)

PAC

F

XIII–110

Maximum Likelihood Estimation: I

• as noted before (XIII–21), likelihood for Gaussian zero-meanstationary time series Xn = [X1, . . . , Xn]0 with covariance ma-trix �n is given by

L(�n) = (2⇡)�n/2(det �n)�1/2 exp�� 1

2X0n��1

n Xn�

and hence


• for ARMA(p, q) time series, parameters �, ✓ & �2 set �n

• given Xn, can assess likelihood of various parameter settings

• maximum likelihood estimators (MLEs) of parameters are set-tings such that L(�n) is maximized

• note: L(�n) is maximized when �2 ln (L(�n)) is minimized

BD–158, CC–158, SS–124 XIII–111

Maximum Likelihood Estimation: II

• key to evaluating L(�n) is knowing how to compute det �n andX 0n��1

n Xn for various parameter settings

• can appeal to IA equation to get manageable expressions

• key equation behind IA is Xn = CnUn, where

Cn =

26666664

1 0 0 · · · 0 0✓n�1,1 1 0 · · · 0 0

... ... ... . . . 0 0✓n�1,n�3 ✓n�2,n�4 · · · 1 0 0✓n�1,n�2 ✓n�2,n�3 · · · ✓2,1 1 0✓n�1,n�1 ✓n�2,n�2 · · · ✓2,2 ✓1,1 1

37777775

is an n⇥n lower triangular matrix, and Un = [U1, . . . , Un]0 isa vector of innovations (one-step-ahead prediction errors)

BD–158, 159, CC–158, SS–124 XIII–112

Maximum Likelihood Estimation: III

• innovations in Un are uncorrelated RVs with variances v0, v1,. . . , vn�1

• covariance matrix Dn for Un is thus a diagonal matrix, withdiagonal elements given by vj’s

• since Xn = CnUn, standard result from theory of randomvectors (B&D, Equation (A.2.5)) says that covariance �n forXn can be written as CnDnC0n

• can argue (why?) that

det �n = (det Cn)(det Dn)(det C0n) =n�1Yj=0

vj,

which gives us the required manageable expression for det �n

BD–158, 159, CC–158, SS–124 XIII–113

Maximum Likelihood Estimation: IV

• to get a manageable expression for X 0n��1n Xn, note that, since

Xn = CnUn implies C�1n Xn = Un;

since

Un = Xn � Xn, where Xn = [ bX1, . . . , bXn]0;and since

�n = CnDnC0n implies ��1n = (C0n)�1D�1

n C�1n ;

it follows that

X 0n��1n Xn = X 0n(C0n)�1D�1

n C�1n Xn = U 0nD�1

n Un,

i.e.,

X 0n��1n Xn =

nXj=1

U2j

vj�1=

nXj=1

(Xj � bXj)2

vj�1

BD–158, 159, CC–158, SS–124 XIII–114

Maximum Likelihood Estimation: V

• since �, ✓ & �2 determine �n and since vj = rj�2, we can

reexpress �2 ln (L(�n)) as

�2 ln�L(�,✓,�2)

�= n ln (2⇡) + ln (det �n) + X 0n��1

n Xn

= n ln (2⇡) + ln

✓ n�1Yj=0

vj

◆+

nXj=1

(Xj � bXj)2

vj�1

= n ln (2⇡�2) +n�1Xj=0

ln (rj) +1

�2

nXj=1

(Xj � bXj)2

rj�1

⌘ n ln (2⇡�2) +n�1Xj=0

ln (rj) +S(�,✓)

�2 ,

where we note that rj’s and S(�,✓) do not depend on �2

BD–160, CC–158, SS–124 XIII–115

Maximum Likelihood Estimation: VI

• di↵erentiating �2 ln�L(�,✓,�2)

�with respect to �2 and set-

ting resulting expression to zero yields MLE

�2 =S(�,✓)

n=

1

n

nXj=1

(Xj � bXj)2

rj�1

• substituting �2 for �2 in �2 ln�L(�,✓,�2)

�yields a so-called

profile likelihood, which does not depend on �2

BD–160, CC–158, SS–124 XIII–116

Maximum Likelihood Estimation: VII

• profile likelihood takes the form

�2 ln�L(�,✓)

�= n ln (2⇡�2) +

n�1Xj=0

ln (rj) +S(�,✓)

�2

= n ln (2⇡S(�,✓)/n) +n�1Xj=0

ln (rj) +S(�,✓)

S(�,✓)/n

= n + n ln (2⇡/n) + n ln (S(�,✓)) +n�1Xj=0

ln (rj)

BD–160, CC–158, SS–124 XIII–117

Maximum Likelihood Estimation: VIII

• to evaluate�2 ln�L(�,✓)

�for a particular ARMA(p, q) model,

here are the steps we need to take

1. compute ACVF for model out to lag n� 1, setting �2 = 12. get rj’s and compute 1-step-ahead predictions via recursions

bXj+1 =

(Pjk=1 ✓j,kUj�k+1, 1 j < m;Ppi=1 �iXj�i+1 +

Pqk=1 ✓j,kUj�k+1, m j n� 1

where rj’s & ✓j,k’s come from IA applied to model ACVF

(note: usually vj = rj�2, so setting �2 = 1 gives rj = vj);

Uj = Xj � bXj (as usual!); and m = max {p, q}3. compute S(�,✓) =

Pnj=1 U2

j /rj�1 andPn�1

j=0 ln (rj)

• MLEs are settings � and ✓ that minimize �2 ln�L(�,✓)

�• side calculation gives corresponding �2 = S(�, ✓)/n

BD–160, CC–158, SS–124 XIII–118

Maximum Likelihood Estimation: IX

• can take simpler approach to evaluate �2 ln�L(�)

�in special

case of AR(p) model

1. use reverse L–D to generate rj’s and coe�cients �j,k for bestlinear predictors of orders j = p� 1, . . . , 1, along with r0

2. compute 1-step-ahead predictions via recursions

bXj+1 =

(Pjk=1 �j,kXj�k+1, 1 j < p;Ppi=1 �iXj�i+1, p j n� 1,

along with innovations Uj+1 = Xj+1 � bXj+1

3. compute S(�) =Pp

j=1 U2j /rj�1+

Pnj=p+1 U2

j &Pp�1

j=0 ln (rj)

• MLEs are settings � that minimize �2 ln�L(�)

�• side calculation gives corresponding �2 = S(�)/n

XIII–119

ML-Based Least Squares Estimation

• can argue that, for large n,

�2 ln�L(�,✓)

�= n+n ln (2⇡/n)+n ln (S(�,✓))+

n�1Xj=0

ln (rj)

depends mainly on n ln (S(�,✓)), notP

j ln (rj), in part be-cause rj ! 1 (for AR(p) models rj = 1 for j � p)

• withP

j ln (rj) dropped, minimization of

n + n ln (2⇡/n) + n ln (S(�,✓))

is equivalent to minimization of S(�,✓), with minimizers �and ✓ defining ML-based least squares estimators

• corresponding estimator for �2 is �2 = S(�, ✓)/(n� p� q)

• nonlinear optimization still required, but sometimes easier tofind LS estimates than MLEs

BD–161, CC–158, SS–124 XIII–120

Order Selection

• order selection can be based on AICC statistic:

AICC = �2 ln�L(�, ✓)

�+

2(p + q + 1)n

n� p� q � 2,

which, for a given model, is necessarily minimized by MLEs

• B&D, C&C and S&S discuss other order selection criteria (FPEfor AR(p) models, AIC, BIC)

• see also Choi (1992), McQuarrie & Tsai (1998) and Stoica &Selen (2004)

BD–161, 169–174, CC–130, 131, SS–52, 53 XIII–121

Large Sample Distribution of MLEs: I

• consider causal & invertible ARMA(p, q) model

�(B)Xt = ✓(B)Zt, {Zt} ⇠WN(0,�2)

• let {Ut} and {Vt} be AR(p) and AR(q) processes satisfying

�(B)Ut = Zt and ✓(B)Vt = Zt,

and let Ut = [Ut, . . . , Ut+1�p]0 and Vt = [Vt, . . . , Vt+1�q]

0

• letting � = [�1, . . . ,�p, ✓1, . . . , ✓q]0 and letting � denote the

corresponding vector of MLEs, have, for large n,

� ⇡ N (�, V (�)/n),

where

V (�) = �2

E{UtU0t} E{UtV

0t }

E{VtU0t} E{VtV

0t }

��1

(q = 0: V (�) = �2 [E{UtU0t}]�1; p = 0: V (�) = �2 [E{VtV

0t }]�1)

BD–161, 162, CC–161, SS–133, 134 XIII–122

Large Sample Distribution of MLEs: II

• easy to get E{UtU0t} & E{VtV

0t }; E{VtU

0t} is more work

• B&D give V (�) for five special cases:

AR(1):h1� �2

i

AR(2):

1� �2

2 ��1(1 + �2)��1(1 + �2) 1� �2

2

�

MA(1):h1� ✓2

i

MA(2):

1� ✓2

2 ✓1(1� ✓2)✓1(1� ✓2) 1� ✓2

2

�

ARMA(1,1):1 + �✓

(� + ✓)2

(1� �2)(1 + �✓) �(1� �2)(1� ✓2)�(1� �2)(1� ✓2) (1� ✓2)(1 + �✓)

�

• can plug in estimates �i & ✓j (OK for large n & small p + q)

BD–162, 163, CC–161, SS–133, 134 XIII–123

Example – Atomic Clock Series: VI

model suggested by fitted by � ✓1 ✓2 ✓3 AICCMA(1) ACF CIs IA — �0.59 — — 6067.05

ML — �0.73 — — 6051.33MA(2) IA CIs IA — �0.59 �0.13 — 6024.04

ML — �0.61 �0.18 — 6016.66MA(3) IA AICC IA — �0.59 �0.13 �0.07 6013.40

ML — �0.60 �0.14 �0.08 6012.65ARMA(1,1) ACF/PACF H–R 0.27 �0.87 — — 6011.53

ML 0.27 �0.86 — — 6011.52

• ML AICCs for

� AR(1), AR(2) & AR(3): 6190.78, 6136.49, 6095.82

� ARMA(1,2) & ARMA(2,1): 6013.20, 6013.14

XIII–124

Diagnostic Checking

• diagnostic tests based on normalized innovations correspondingto fitted model:

Wt =Utprj�1

=Xt � bXtp

rj�1, t = 1, . . . , n

• if fitted model is good representation for time series, Wt’s shouldresemble zero-mean white noise (but won’t be exactly so)

• tests include

� informal assessment based on plot of Wt’s

� sample ACF and PACF for Wt’s

� checks on hypothesis of randomness

• let’s illustrate diagnostic checking by looking at fitted ARMA(1,1)model for atomic clock series

BD–164 XIII–125

ARMA(1,1) Residuals Wt for Atomic Clock

0 200 400 600 800 1000

−20

−10

010

20

t

Wt

XIII–126

Sample ACF for ARMA(1,1) Residuals Wt

0 10 20 30 40

−0.2

−0.1

0.0

0.1

0.2

h (lag)

ACF

XIII–127

Sample PACF for ARMA(1,1) Residuals Wt

0 10 20 30 40

−0.2

−0.1

0.0

0.1

0.2

h (lag)

PAC

F

XIII–128

Portmanteau Tests of ARMA(1,1) Residuals Wt

0 5 10 15 20

05

1015

2025

30

h

QLB

and

0.9

5% q

uant

ile

XIII–129

Example – Atomic Clock Series: VII

expected testtest value statistic p-value

turning point 681.3 709 0.040di↵erence-sign 511.5 515 0.705

rank 261888 262439 0.920runs 513.0 506 0.662

AR method AICC order AIC orderYule–Walker 0 0

Burg 0 0OLS 0 3MLE 0 0

XIII–130

Atomic Clock Time Series

0 200 400 600 800 1000

−20

−10

010

20

t

x t

IX–53

Simulated Atomic Clock Time Series: I

0 200 400 600 800 1000

−20

−10

010

20

t

x t

XIII–131

Simulated Atomic Clock Time Series: II

0 200 400 600 800 1000

−20

−10

010

20

t

x t

XIII–132

Simulated Atomic Clock Time Series: III

0 200 400 600 800 1000

−20

−10

010

20

t

x t

XIII–133

Forecasting Atomic Clock Time Series

1000 1005 1010 1015 1020 1025 1030

−20

−10

010

20

t

x t

** * * * * *

XIII–134

References

• B. Choi (1992), ARMA Model Identification, New York: Springer-Verlag

• R. A. Johnson and D. W. Wichern (1998), Applied Multivariate Statistical Analysis(Fourth Edition), Upper Saddle River, New Jersey: Prentice Hall

• A. D. R. McQuarrie and C.-L. Tsai (1998), Regression and Time Series Model Selection,Singapore: World Scientific

• P. Stoica and Y. Selen (2004), ‘Model-Order Selection: A Review of Information CriterionRules,’ IEEE Signal Processing Magazine, 21, pp. 36–47

XIII–135

Modeling with ARMA Processes: I - University of Washingtonfaculty.washington.edu/dbp/s519/PDFs/2013/13-overheads.pdf · Modeling with ARMA Processes: I • goal: determine which ARMA

Documents