EVALUATION OF THE BIAS OF YULE-WALKER ESTIMATES IN ...crunk/masters/SaeidAssadiThesis.pdfby Saeid M. Assadi Intimeseries analysis theuseofYule-WalkerEquations isacommonmethodfor estimating

EVALUATION OF THE BIAS OF YULE-WALKER ESTIMATES IN

AUTOREGRESSIVE TIME SERIES PROCESSES

A Writing Project Presented to

The Faculty of the Department of Mathematics

San Jose State University

In Partial Fulfillment

of the Requirements for the Degree

Master of Arts

by

Saeid M. Assadi

May 2006

c© 2006

Saeid M. Assadi

ALL RIGHTS RESERVED

APPROVED FOR THE DEPARTMENT OF MATHEMATICS

Dr. Steven Crunk

Dr. Leslie Foster

Dr. Bee Leng Lee

ABSTRACT

EVALUATION OF THE BIAS OF YULE-WALKER ESTIMATES IN

AUTOREGRESSIVE TIME SERIES PROCESSES

by Saeid M. Assadi

In time series analysis the use of Yule-Walker Equations is a common method for

estimating the parameters of an autoregressive process. These equations are known

to produce biased results and a formula for the asymptotic value of the bias is known.

Even still, the Yule-Walker equations coupled with the asymptotic bias formula to-

gether still have limitations in accuracy. Simulation can readily demonstrate how

accurate the bias formulas can be for a given sample size and set of coefficients. As

a description of accuracy, we can use several factors, but mainly are interested in

the difference between the expected bias produced from the known formulas and the

mean bias produced form a simulated set of data. In some situations we observe the

difference to be small fraction of the expected bias, but in other situations, the dif-

ference is large enough to be several times the size of the expected bias itself. These

situations can be organized in such a way that we can use plots on the unit circle to

demonstrate the varying difference. The unit circle plots, with real and imaginary

axes as their domains, are used to display roots from a polynomial derived from the

coefficients of the autoregressive process itself. We focus our analysis on the case

where the characteristic polynomial of the autoregressive process is of second degree

with complex roots. The relevance of the unit circle plots is that the values of the

roots have a relationship with the difference in bias and with the coefficients. Specifi-

cally, the values of roots which fall in certain locations of the unit circle correspond to

large inaccuracies in the expected bias calculations. These values of the roots, along

with those that produce accurate estimates of the bias can then be used to deter-

mine in what situations the Yule-Walker estimates and their bias formula are valid.

Looking into what causes the expected bias to be inaccurate, we see that derivation

of the formula relies on assumptions which may not always hold. We can investigate

and plot the situations in which the assumptions are valid. These results along with

the simulated data both point to how good the Yule-Walker estimates are and how

accurate the bias approximation is.

v

DEDICATION

To mom

vi

CONTENTS

CHAPTER

1 INTRODUCTION 1

2 THE AUTOREGRESSIVE MODEL AND YULE-WALKER ESTIMATES 3

2.1 The Autoregressive Model and the Characteristic Equation . . . . . . 3

2.2 The Yule-Walker Equation . . . . . . . . . . . . . . . . . . . . . . . . 6

3 THE BIAS EXPRESSION FOR THE YULE-WALKER ESTIMATES 9

3.1 Derivation of the Bias Expression . . . . . . . . . . . . . . . . . . . . 9

3.2 Examples of Bias Calculation for p = 1 and p = 2 . . . . . . . . . . . 13

4 SIMULATION OF DATA SETS 21

4.1 Simulation Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Simulation of the AR(1) Process . . . . . . . . . . . . . . . . . . . . . 22

4.3 Simulation of the AR(2) Process: Real Roots . . . . . . . . . . . . . . 32

4.4 Simulation of the AR(2) Process: Complex Roots . . . . . . . . . . . 33

4.5 Higher Order AR(p) Processes . . . . . . . . . . . . . . . . . . . . . . 42

5 DISCUSSION 43

vii

BIBLIOGRAPHY 45

APPENDIX

viii

CHAPTER 1

INTRODUCTION

In this paper we concern ourselves with stationary autoregressive time series

processes. A time series process is simply a process which varies over time. The

property of stationarity, to be defined in more detail later, allows for the processes

to have properties which can be estimated. Autoregressive here indicates that the

process has dependencies on previous values of itself. See Brockwell and Davis (1987)

for more on stationary autoregressive processes. Examples of time series processes

are measurements of the population, sunspots, or stock market indicators. Each one

can be measured over time or at regular time intervals, plotted, and has parameters

which can be estimated.

Although the applications of time series are very interesting, we are focused

more on the estimation of time series parameters. There are many procedures used

to estimate time series parameters including Yule-Walker and Least Squares. See

Shaman and Stine (1988) for more on the use of the Least Squares estimator for time

series processes. Since the Yule-Walker estimator leads to a stationary solution, it

can be used in cases where the Least Squares estimator does not. However, the Yule-

Walker estimator is known to be biased. It is well known that being biased does not

mean the estimator is invalid. There are known equations to calculate the expected

bias for the Yule-Walker estimator (see Shaman and Stine 1988). The derivation

2

of the bias equations for the estimator, however, involve assumptions which are not

always valid. We will investigate the validity of these equations and assumptions as

well.

In Chapter 2 we will derive and discuss the Yule-Walker estimator showing how

well it models a time series process and when it is biased. In Chapter 3 we will analyze

the bias expression, the assumptions it makes, and where it loses accuracy. In Chapter

4, we will discuss the simulations used to yield results analyzing the estimators and

bias expressions. We will show examples of simulated data sets to cover a range of

different scenarios of processes. In Chapter 5 we will summarize our findings in a

discussion.

CHAPTER 2

THE AUTOREGRESSIVE MODEL AND YULE-WALKER

ESTIMATES

2.1 The Autoregressive Model and the Characteristic Equation

We shall concern ourselves with the estimation of the parameters of the autore-

gressive process of order p, or AR(p) with known mean µ,

(Yt − µ) + α1(Yt−1 − µ) + . . . + αp(Yt−p − µ) = εt, (2.1)

where µ = E(Yt) ∀t, t ∈ Z = {0, 1, 2, ...}.The error terms {εt} are independent and identically distributed with 0 mean

and variance σ2.

E(εt) = 0

cov(εt, εt+τ ) = 0 unless τ = 0, in which case

cov(εt, εt+τ ) = cov(εt, εt) = Var(εt) = σ2 ∀t .

We will use the polynomial or the characteristic equation

A(z) = α0 + α1z + . . . + αpzp =

p∑j=0

αjzj , α0 = 1. (2.2)

Using the backshift operator B defined by

BjYt = Yt−j

4

and provided that A(z) = 0 has roots outside the unit circle, we can rewrite Yt

through the following process:

A(B) = α0B0 + α1B

1 + . . . + αpBp

=

p∑j=0

αjBj

⇒ A(B)Yt = (α0B0 + α1B

1 + . . . + αpBp)Yt

= α0Yt + α1Yt−1 + . . . + αpYt−p

⇒ A(B)Yt = εt

Yt = 1A(B)

εt

= εt + ν1εt−1 + ν2εt−2 + . . .

=

∞∑t=0

νiεt−i

where∞∑i=0

ν2i < ∞.

If the roots of A(z) = 0 are outside the unit circle, then Yt only depends on

past and present values of the innovations process i.e. has the property of causality.

If the roots of A(z) = 0 are inside the unit circle, then we will not have causality.

If the roots of A(z) = 0 are on the unit circle, then the process is not stationary

(see section 2.2 for definition of stationarity).

We will insist that the roots are outside the unit circle for these reasons (causal-

ity and stationarity). We can rewrite the characteristic equation so that all the roots

lie inside the unit circle for better visualization of graphs:

if Roots of

p∑j=0

αjzj = 0 lie outside the unit circle

then Roots of

p∑j=0

αjz−j = 0 lie inside the unit circle

(2.3)

As an example, we can take an AR(1) process with zero mean

yt + α1yt−1 = εt

5

whose characteristic equation is

A(z) = α0 + α1z or

A(z−1) = α0 + α1z−1.

Finding the root of A(z−1) = 0, we have

α0 + α1z−1 = 0

⇒ α0z + α1 = 0

⇒ z = −α1

and so to be stationary and causal, we require that |z| = |α1| < 1.

As another example, we have the AR(2) process

yt + α1yt−1 + α2yt−2 = εt (2.4)

whose characteristic equation is

A(z) = α0 + α1z + α2z2 or

A(z−1) = α0 + α1z−1 + α2z

−2.

Finding the roots of A(z−1) = 0, we have

α0 + α1z−1 + α2z

−2 = 0

⇒ z2 + α1z + α2 = 0

⇒ z =−α1 ±

√α2

1 − 4α2

2.

and just as in the AR(1) case, to be stationary and causal, we require that both

roots |z| =∣∣∣−α1 ±

√α2

1 − 4α2

2

∣∣∣ < 1. These solutions lead to the cases of real roots

and complex conjugate roots which all can be plotted within the unit circle to analyze

the performance of Yule-Walker estimates and the expected bias as well.

6

2.2 The Yule-Walker Equation

Let Yt be a real valued time series, t ∈ Z. Let γ be the covariance between the

two random variables Yt and Yt+τ such that γ(t, t + τ) = E[(Yt −µ)(Yt+τ −µ)]. Then

we can insist on second order stationarity:

(i) E(Yt) = µ, ∀t ∈ Z

(ii) E(Y 2t ) < ∞, ∀t ∈ Z

(iii) γ(t, t + τ) = γ(t + r, t + τ + r), ∀t, r, τ ∈ Z

We will hereafter call γ(t+r, t+τ +r) just γτ since the covariance of two observations

τ units apart is identical regardless of where in time they are sampled. The three

assumptions above are also used by Shaman and Stine [1988].

Now referring back to the AR(p) model with known mean

(Yt − µ) + α1(Yt−1 − µ) + . . . + αp(Yt−p − µ) = εt. (2.5)

Multiplying both sides of equation (2.5) by (Yt−k − µ) (for k ≥ 0) and taking

expectations yields the Yule-Walker equations

γk + α1γk−1 + . . . + αpγk−p =

⎧⎪⎨⎪⎩

σ2, k = 0

0, k = 1, 2, 3, . . .(2.6)

Let α indicate the vector of coefficients to the autoregressive process α =

(α1, . . . , αp)′, γ be the vector of autocovariances γ = (γ1, . . . , γp)

′, and Γp be a matrix

of autocovariances Γp = [γi−j], 1 ≤ i, j ≤ p. Note that γ−k = γk:

Γp =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

γ0 γ1 γ2 . . . γp−1

γ1 γ0 γ1 . . . γp−2

γ2 γ1 γ0 . . . γp−3

......

.... . .

...

γp−1 γp−2 γp−3 . . . γ0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

7

Then we have the matrix form of the Yule-Walker equations for k = 1, ..., p,

Γpα = −γ. (2.7)

If Γp is invertible (which it always is), then we can write

α = −Γ−1p γ . (2.8)

We can now use equation (2.8) to estimate the coefficients α of the autoregres-

sive process. Using as the estimators for covariance values γτ and letting T represent

the sample size, we have

γτ =

T−τ∑t=1

(yt − µ)(yt+τ − µ)

T. (2.9)

For the case where the mean is unknown, we have

γ∗τ =

T−τ∑t=1

(yt − y)(yt+τ − y)

T. (2.10)

By substituting the estimators γ into both the Γp matrix and the γ vector, we

obtain the vector of Yule-Walker coefficient estimates

α = −Γ−1p γ. (2.11)

We are interested here in seeing how close the autoregressive Yule-Walker esti-

mates of α are to the true values of the coefficients α. Some difference is expected

since α is a biased estimator. We can calculate the expected bias and use it to correct

the estimator using equations provided by Shaman and Stine (1988). The bias is

more specifically known as the asymptotic bias since it is the product of the sample

size T and a standard definition of the bias. Using TE(α − α) as the expression for

the bias (see Bhansali 1981) and call it the O(T−1) bias, where O is the order of mag-

nitude and T is the sample size. The order of magnitude is used for notation purpses

8

only in this paper, however, it can be quantified (see Brockwell 1981). Following this

notation, and letting B = (Γp − Γp)Γ−1p we have

TE(α − α) = −TΓ−1p E(γ − γ − Bγ) +

TΓ−1p E

((Γp − Γp)Γ

−1p × ((Γp − Γp)α + γ − γ

))+

O(T−1/2). (2.12)

Note that the term O(T−1/2) is not related to the label for the entire expression

as the O(T−1) bias. Equation (2.12) includes a Taylor Series expansion of B which

consequently requires it to have eigenvalues in absolute value less than 1. We will

show scenarios where this assumption is not true thus causing the bias expression to

be invalid. Additionally ,we will show scenarios where the assumption is indeed true

yet the bias expression is still invalid. The derivation of equation (2.12) is shown in

the next chapter.

CHAPTER 3

THE BIAS EXPRESSION FOR THE YULE-WALKER ESTIMATES

3.1 Derivation of the Bias Expression

Our goal in this section is to derive a bias expression for the Yule-Walker esti-

mated coefficients of the AR(p) process. Starting with a standard bias expression for

α, we have

Bias(α) = E(α − α).

We will factor in the sample size T to allow for simplification of the bias expres-

sion in a later step. Using the general form of the Yule-Walker equation Γpα = −γ

and making substitutions, we have

TE(α − α) = TE(−Γ−1p γ + Γ−1

p γ). (3.1)

We will introduce the matrix B from Bhansali (1981) in order to remove any

terms containing E(Γ−1) in our bias expression. Bhansali has defined B as

B = (Γp − Γp)Γ−1p . (3.2)

Rearranging terms then solving for Γ−1p , we have

Γp − Γp = BΓp

⇒ Γp = (B + I)Γp

⇒ Γ−1p = Γ−1

p (I + B)−1.

10

Assuming that the matrix B has eigenvalues in absolute value all less than 1,

we can use a Taylor Series expansion of the matrix B which yields

Γ−1p = Γ−1

p

(I +

∞∑n=1

(−B)n)

(3.3)

= Γ−1p

(I − B + B2 − . . .

). (3.4)

We should note here that equation (3.3) relying on B having eigenvalues in

absolute value all less than 1 is an important assumption. We can have scenarios

where this is not the case. In these cases, the Taylor Series expansion of B would be

invalid and thus the bias calculations would be invalid as well. Although our process

is stationary, that doesn’t ensure the matrix B has all eigenvalues in absolute values

less than 1. We can readily simulate situations in which the eigenvalues of B in

absolute value are greater than 1 (see section 4.4) in a stationary process.

Moving forward under the assumption that B has eigenvalues in absolute value

less than 1, we can group all the higher order terms after B2 into an error term and

call it O(T−1/2). See Shaman and Stine (1988) for more on the error term and its

order of magnitude. Now taking equation (3.4) and substituting it for the Γ−1p term

in equation (3.1), we have

TE(α − α) = TE(− Γ−1

p

(I − B + B2 − (B3 + ...)

)γ + Γ−1

p γ). (3.5)

Now, we will rearrange the terms of equation (3.5) so that we have solvable

11

portions of equations for which the expected value can be calculated:

TE(α − α) = TE(Γ−1

p

(γ − (I − B + B2)γ

))+ O(T−1/2)

= TΓ−1p E

(γ − (I − B + B2)(γ − γ + γ)

)+ O(T−1/2)

= TΓ−1p E

(γ + (−γ + Bγ − B2γ)+

(γ − Bγ + B2γ) + (−γ + Bγ − B2γ))

+ O(T−1/2)

= TΓ−1p E

((−γ + γ + Bγ) + B(γ − γ − Bγ)

)+ O(T−1/2)

= −TΓ−1p E(γ − γ − Bγ)+

TΓ−1p E

(B(γ − γ − Bγ)

)+

O(T−1/2).

Let us break this equation up and deal with the individual parts separately. We

will define the whole and pieces as such:

TE(α − α) = A + B + O(T−1/2) (3.6)

where A = −TΓ−1p E(γ − γ − Bγ) (3.7)

and B = TΓ−1p E

(B(γ − γ − Bγ)

). (3.8)

Taking the limit of equation (3.6), we arrive at

limT→∞

TE(α − α) = A + B. (3.9)

Equation (3.9) relies on an infinite sample size T which can not be done in prac-

tice. However, we can use a reasonable sample size and approximate the equation (see

equation 4.1). Satisfying the Taylor Series summation requirement of the eigenvalues

of B being less than 1 is not the only problem in deriving equation (3.5). Note that

there the Taylor Series of B terms is truncated after the second order term and all

subsequent terms are grouped into the O(T−1/2) term. In the event that matrix B

has eigenvalues in absolute value in the neighborhood of 1, the problem of the Taylor

12

Series being truncated too soon may contribute to error. This error caused by the

series not being long enough can be shown by example. Consider the series sum

1

1 − x= 1 + x + x2 + . . . |x| < 1

Table (3.1) shows the difference in rate of convergence of the Taylor Series given

two different values of x. When an x-value close to 0 is chosen, for example x = 0.01,

the series converges within 3 terms to 4 decimal places of the actual sum. When

a base closer to 1 is chosen, for example x = 0.99, the series takes 1000 terms to

converge within 4 significant digits. This may contribute to error in the expected

bias calculation and can be shown by simulation (see section 4.4).

x = 0.01

n

n∑k=0

xk

0 1.01 1.012 1.01013 1.0101 . . .

10 1.0101 . . .100 1.0101 . . .

1000 1.0101 . . .1

1 − x1.01

x = 0.99

n

n∑k=0

xk

0 1.01 1.992 2.97013 3.9404

10 10.4662100 63.7628

1000 99.99571

1 − x100

Table 3.1: List of sums for f(x) = 1 + x + x2 + . . .

Looking into a way of calculating the bias using equation 3.9, we will need to

represent the covariances as known parameters such as the coefficients of the autore-

gressive equation and it’s order. Shaman and Stine (1988) have derived the following

13

which we present without derivation:

A = Γ−1C + D (3.10)

where C =( p∑

k=0

|1 − k|γ1−kαk , . . . ,

p∑k=0

|p − k|γp−kαk

)′(3.11)

and D =(1 − αp,

1∑r=0

(αr − αp−r) , . . . ,

p−1∑r=0

(αr − αp−r))′

. (3.12)

It should be known that D is the mean correction term which is only present

when the mean of the process is unknown. When the mean is known, D = 0. In the

case where the mean is unknown and we must estimate the mean, we will denote A as

A∗. In all of the simulated studies presented in this paper, it should be noted that we

are only working with the case of the unknown mean. Shaman and Stine (1988) have

also derived a simplified expression for B which we also present without derivation:

B =

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

p/2−1∑j=0

(αj − αp−j)ej − (α1, 2α2, . . . , pαp)′ , if p is even

(p−1)/2∑j=0

(αj−1 − αp−j)ej − (α1, 2α2, . . . , pαp)′ , if p is odd

⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎭

(3.13)

Where ej is a vector of size p and has 1’s in rows j+2, j+4, . . . and 0’s elsewhere

if p is even. If p is odd, then ej is still of size p, but has 1’s in rows j + 1, j + 3, . . .

and 0’s elsewhere.

3.2 Examples of Bias Calculation for p = 1 and p = 2

As our first example, we will examine the AR(1) with both a known and an

unknown mean. Our goal is to arrive at the expected bias of the autoregressive

coefficients TE(α − α). If we are given the true coefficients α, we may solve the

covariance vector γ using the Yule-Walker equations. Our strategy will be then to

assemble the matrix Γp from the elements of γ. Following equation 3.6 and the

14

definition of its parts, we will solve each part and put the terms together afterwards.

Starting with the Yule-Walker equations and rewriting as the general form for p = 1

γk + α1γk−1 =

⎧⎪⎨⎪⎩

σ2, k = 0

0, k = 1

⎫⎪⎬⎪⎭ .

Expanding to individual equations

k = 0 ⇒ γ0 + α1γ1 = σ2

k = 1 ⇒ γ1 + α1γ0 = 0.

Rearranging terms

γ0 + α1γ1 = σ2

α1γ0 + γ1 = 0.

Rewriting in matrix form

ΛG = S. (3.14)

Where

Λ =

⎡⎢⎣ 1 α1

α1 1

⎤⎥⎦ , G =

⎡⎢⎣γ0

γ1

⎤⎥⎦ , S =

⎡⎢⎣σ2

0

⎤⎥⎦ .

Note that α0 = 1, where α0 is the leading coefficient in the general time-

series equation. The elements of Λ contain only constants and known values from

{α0, α1, . . . , αp}. It can be shown that the variance σ2 can be any value since it will

only scale the covariances and have no impact on the bias. Therefore, for ease of

calculation, we will use a variance σ2 = 1. We can produce the covariances in γ by

solving equation (3.14) for G, we have

G = Λ−1S. (3.15)

Obtaining the inverse of Λ, we have

Λ−1 =

⎡⎢⎢⎣

−1

−1 + α21

α1

−1 + α21

α1

−1 + α21

−1

−1 + α21

⎤⎥⎥⎦ .

15

Setting σ2 = 1 and solving for G, we have

G =

⎡⎢⎢⎣

−1

−1 + α21

α1

−1 + α21

⎤⎥⎥⎦ .

Now constructing the Γ matrix which is only a single term for the AR(1) process.

Γ = γ0

=1

1 − α21

.

Solving for the inverse of Γ, we have

Γ−1 = 1 − α21.

Back to computing the bias in parts, we will need to solve for A and B. Solving

for A requires first solving for C and D. The vector C, using p = 1, reduces to a

single element with only one term. Keeping in mind that that the leading time series

coefficient α0 = 1, we have

C =

(1∑

k=0

|1 − k|γ1−kαk , . . . ,

1∑k=0

|1 − k|γ1−kαk

)′

=

1∑k=0

|1 − k|γ1−kαk

= |1 − 0|γ1α0 + |1 − 1|γ0α1

= γ1

=α1

α21 − 1

.

Now turning to the equation for the summand D which is only required for the

case where the mean is unknown:

D =(1 − αp,

∑1r=0(αr − αp−r) , . . . ,

∑p−1r=0(αr − αp−r)

)′= 1 − αp

= 1 − α1.

16

Now we can complete the equation for A (for the process with a known mean):

A = Γ−1C

= (1 − α21)(γ1)

= (1 − α21)( α1

−1 + α21

)= −α1.

We can calculate A∗ for the case where the mean is unknown. Using the result

Γ−1C = −α1, we have

A∗ = Γ−1C + D

= (−α1) + (1 − α1)

= 1 − 2α1.

Turning to B with p = 1 being odd, and removing the matrix braces since all

matricies have only a single term, we have, letting α−1 ≡ 0,

B =0∑

j=0

(αj−1 − α1−j)ej − (α1, 2α2, . . . , pαp)

= −α1

(1

)−(

α1

)

= −2α1.

We can now put all the elements for the bias equations together.

limT→∞

TE(α − α) = A + B

= −α1 − 2α1

= −3α1. (3.16)

limT→∞

TE(α∗ − α) = A∗ + B

= 1 − 2α1 − 2α1

= 1 − 4α1. (3.17)

17

As another example, we can follow the same process for p = 2. Beginning with

the general form of the Yule-Walker equations, we have

γk + α1γk−1 + α2γk−2 =

⎧⎪⎨⎪⎩

σ2, k = 0

0, k = 1, 2

⎫⎪⎬⎪⎭ .

Expanding to individual equations:

k = 0 ⇒ γ0 + α1γ1 + α2γ2 = σ2

k = 1 ⇒ γ1 + α1γ0 + α2γ1 = 0

k = 2 ⇒ γ2 + α1γ0 + α2γ0 = 0.

Rearranging terms:

γ0 + α1γ1 + α2γ2 = σ2

α1γ0 + (1 + α2)γ1 = 0

α2γ0 + α1γ1 + γ2 = 0.

Rewriting in matrix form

ΛG = S.

Where

Λ =

⎡⎢⎢⎢⎢⎣

1 α1 α2

α1 1 + α2 0

α2 α1 1

⎤⎥⎥⎥⎥⎦ , G =

⎡⎢⎢⎢⎢⎣

γ0

γ1

γ2

⎤⎥⎥⎥⎥⎦ , S =

⎡⎢⎢⎢⎢⎣

σ2

0

0

⎤⎥⎥⎥⎥⎦ .

Note again that α0 = 1. The coefficients in Λ are known functions of α and

the variance σ2 is set to 1. We can produce the covariances in G with the following

equation:

G = Λ−1S.

18

We will not write out the entire matrix Λ−1 and its contents since all the terms

written together are quite large. Instead, we will continue by solving for G

G =

⎡⎢⎢⎢⎢⎢⎢⎣

1 + α2

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)α1

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)α2

1 − α2 − α22

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎥⎥⎥⎥⎦

.

Constructing Γ, we have

Γ =

⎡⎢⎢⎣

1 + α2

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

α1

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)α1

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

1 + α2

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦ .

Taking the inverse of Γ, we have

Γ−1 =

⎡⎢⎣ 1 − α2

2 α1 − α1α2

α1 − α1α2 1 − α22

⎤⎥⎦ .

Back to computing the bias in parts, we will need to solve for A and B. Solving

for A requires first obtaining the terms C and D. Solving the vector C for p = 2, we

have

C =

(2∑

k=0

|1 − k|γ1−kαk , . . . ,

2∑k=0

|2 − k|γ2−kαk

)′

=

(2∑

k=0

|1 − k|γ1−kαk ,

2∑k=0

|2 − k|γ2−kαk

)′

=

(γ1 + γ1α2 , 2γ2 + γ1α1

)′

=

(α1 + α1α2

(1 − α2)(1 + α1 + α2)(1 − α1 + α2),

2(α21 − α2 − α2

2) + α21

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

)′.

Now solving for D, we have

D =

(1 − αp ,

1∑k=0

(αk − α2−k)

)′

=

(1 − α2 , α0 − α2

)′

=

(1 − α2 , 1 − α2

)′.

19

Now completing the equations for A first considering the case of the known

mean, we have

A = Γ−1C

=

⎡⎢⎣ 1 − α2

2 α1 − α1α2

α1 − α1α2 1 − α22

⎤⎥⎦⎡⎢⎢⎣

α1 + α1α2

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)2(α2

1 − α2 − α22) + α2

1

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦

=

⎡⎢⎢⎣

α1(1 − α22) + α1(2(α2

1 − α2 − α22) + α2

1)

(1 + α1 + α2)(1 − α1 + α2)α2

1(1 + α2) + (1 − α2)(2(α21 − α2 − α2

2) + α21)

(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦ .

Solving for A∗ for the case of the unknown mean, we have

A∗ = Γ−1C + D

=

⎡⎢⎣ 1 − α2

2 α1 − α1α2

α1 − α1α2 1 − α22

⎤⎥⎦⎡⎢⎢⎣

α1 + α1α2

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)2(α2

1 − α2 − α22) + α2

1

(1 − α2)(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦+

⎡⎢⎣1 − α2

1 − α2

⎤⎥⎦

=

⎡⎢⎢⎣

α1(1 − α22) + α1(2(α2

1 − α2 − α22) + α2

1)

(1 + α1 + α2)(1 − α1 + α2)α2

1(1 + α2) + (1 − α2)(2(α21 − α2 − α2

2) + α21)

(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦+

⎡⎢⎣1 − α2

1 − α2

⎤⎥⎦ .

Turning to B with p = 2 being even, we have

B =0∑

j=0

(αj − α2−j)ej − (α1, 2α2)′

= (α0 − α2)

⎡⎢⎣0

1

⎤⎥⎦−

⎡⎢⎣ α1

2α2

⎤⎥⎦

=

⎡⎢⎣ −α1

1 − 3α2

⎤⎥⎦ .

20

Putting all the elements for the bias equations together, we have

limT→∞

TE(α − α) = A + B

=

⎡⎢⎢⎣

α1(1 − α22) + α1(2(α2

1 − α2 − α22) + α2

1)

(1 + α1 + α2)(1 − α1 + α2)α2

1(1 + α2) + (1 − α2)(2(α21 − α2 − α2

2) + α21)

(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦

+

⎡⎢⎣ −α1

1 − 3α2

⎤⎥⎦ .

For the case where the mean is unknown, we have

limT→∞

TE(α − α) = A∗ + B

=

⎡⎢⎢⎣

α1(1 − α22) + α1(2(α2

1 − α2 − α22) + α2

1)

(1 + α1 + α2)(1 − α1 + α2)α2

1(1 + α2) + (1 − α2)(2(α21 − α2 − α2

2) + α21)

(1 + α1 + α2)(1 − α1 + α2)

⎤⎥⎥⎦

+

⎡⎢⎣1 − α1 − α2

2 − 4α2

⎤⎥⎦ .

We have just shown the bias expression for an AR(2) process as a function of

the coefficients α. Although the equations contain many terms, they are all known,

and the entire expression is easily computed numerically. This methodology can be

applied to higher order processes. The bias expressions can be readily computed

numerically using mathematical software packages available (e.g. in S-Plus or Maple)

or analytically using other packages (e.g. in Maple).

CHAPTER 4

SIMULATION OF DATA SETS

4.1 Simulation Design

As a methodology of testing how well the Yule-Walker estimates of an AR(p)

process resemble the actual coefficients and how well the observed bias of the coef-

ficients resemble the expected bias, we have developed a program to simulate time

series data. The program takes a given set of coefficients and creates a time series

data set. It should be noted that the roots of the characterstic equation may also

be used as a starting point since they map directly to a set of coefficients. When

choosing either coefficients or roots, the stationary and causal constraint applies such

that the roots to A(z−1) = 0 must lie within the unit circle (see Chapter 3). Once the

time series data set is obtained, the program can generate the Yule-Walker estimates

for coefficients as well as measure the bias of the coefficients and other calculations.

Note that the true mean of the process used to generate our simulated data sets is

always unknown requiring us to use the equations which handle this case.

We have been using α to represent the Yule-Walker estimate of the coefficient α.

The bias of α (equation 3.9) requires the limit as T → ∞. In practice, since we can not

have an infinite sample size, we will approximate the bias using an observed sample

size in simulation Tobs. We will call this approximation the Asymptotic Expected

(AE) Bias which is defined as follows:

22

AE(α − α) := limT→∞

TE(α − α)

Tobs(4.1)

or AE(α) := α + limT→∞

TE(α − α)

Tobs

. (4.2)

Equation (4.2) is the definition of the asymptotic expected value of α, which

is the sum of α and the asymptotic bias of α. We have summarized all the different

notations we have used and will continue to use for the coefficient α in table 4.1.

Table 4.1: Notation used for coefficients

Symbol Definitionα True value of α

α Observed Yule-Walker estimate of α

AE(α) Asymptotic expected value of α

αi Coefficient index i = {1, 2, . . . , p} for an AR(p) processα − α Observed bias of Yule-Walker estimate α

AE(α − α) Asymptotic expected bias of Yule-Walker estimate α

4.2 Simulation of the AR(1) Process

In order to replicate an AR(1) process, we have chosen the length of each data

set to be Tobs = 50 observations and then run the simulation for 1000 data sets for

each given coefficient between [-1, 1]. Choosing coefficients in this range is equivalent

to only looking at the roots of A(z)−1 inside the unit circle, a stationary and causal

process. Once again, here is the AR(1) process with zero mean:

yt + α1yt−1 = εt.

We will compare each value of α1 to the output result α1 generated from the

simulation. Our first plot (see figure 4.1) shows how the simulated α1 differs from the

23

true α1. Each data point represents the average Yule-Walker estimate α1 over the

1000 simulated data sets for given true value of α1 in the AR(1) equation.

Figure 4.1: AR(1) Estimated vs. Actual Coefficient (α1 vs. α1)

The scatter plot represents the simulated output α1. The line α1 = α1 indicates

the set of points where there would be no bias in α1. The difference is greater at the

corners of the graph where we have values of α1 approaching either 1 or -1. For

values of α1 near 0, we have little bias. We can look at the bias more closely by

plotting the difference between α1 and α1 on a smaller scale. Figure (4.2) shows

a scatter plot representing the difference (α1 − α1) plotted against α1. There are

two lines overlaid onto the plot. One line represents the asymptotic expected bias

AE(α1 −α1) as computed from formulas given in chapter 3. The other line indicates

no bias α1 − α1 = 0, which is equivalent to the diagnoal line in figure 4.1.

Figure 4.2 shows the observed bias in α1 is very close to the expected, however

there are two noticeable differences. First, the observed bias increases as we approach

24

Figure 4.2: Observed Bias vs. Asymptotic Expected Bias

α1 values of -1 and 1 from other points nearby. Secondly, the bias changes sign for α1

values close to 0. If we were to fit a curve through the scatter plot α1, it would appear

as though we have a second order or higher curve as the best fit. The expected bias

for an AR(1) process has as its limit a linear equation (see equation 3.17).

But we would like to determine how close they are. In doing so, we will define

the accuracy of the asymptotic expected bias as such:

Accuracy of Asymptotic Expected Bias = Observed Bias − Asymptotic Expected Bias

= (α1 − α1) − AE(α1 − α1)

= α1 − AE(α1). (4.3)

In figure 4.3, we have plotted this difference.

25

Figure 4.3: Accuracy of the Asymptotic Expected Bias

Since the bias increases as the abolute value of the coefficient α1 increases, we

would like to investigate relative error or relative accuracy of the asymptotic expected

bias. We will define the relative accuracy of the asymptotic exptected bias as such:

Relative Accuracy of AE Bias =Observed Bias − AE Bias

True value of coefficient

=α1 − AE(α1)

α1. (4.4)

We plotted the relative accuracy of the bias in figure 4.4. Clearly, the relative

bias increases as we approach the ends of the horizontal axis. Note also that when

α1 = 0, the relative accuracy of the bias function is undefined because of division by

zero.

26

Figure 4.4: Relative Accuracy of the Expected Bias

Figures 4.5 through 4.7 show how the coefficient α1, the bias of α1, and the bias

accuracy of α1 respectively, vary with sample size for the AR(1) process.

It can be seen that as the sample size increases we achieve estimates for α1

which are closer to the true values. In another words, the coefficient estimates from

the Tobs = 100 simulation are closer to the no-bias line (α = α) than those form the

Tobs = 10 simulation (see figure 4.5). We also see a higher accuracy of the asymptotic

exptected bias for the larger sample size. This is expected since our asymptotic

bias approaches the true exptected bias as our observed sample size Tobs approaches

infinity:

limTobs→∞

AE Bias = limTobs,T→∞

TE(α − α)

Tobs

= TE(α − α)

= Expected Bias.

27

Figure 4.5: Sample size comparison for coefficient α1

Figure 4.6: Sample size comparison for bias

28

Figure 4.7: Sample size comparison for AE bias accuracy

Recall the term B from equation (3.2). From chapter 3, B is used the derivation

of the bias expression presented by Shaman and Stine (1988). The assumptions made

around B in deriving the bias expression were:

1. For AR(2) or higher order processes the matrix B is assumed to have

eigenvalues less than 1 in absolute value in order to use a Taylor Series

sum. If the process is AR(1), then this assumption can be restated such

that the value |B| is assumed to be less than 1.

2. The Taylor Series expansion of B is truncated by taking all the terms of

higher order than B2 and grouping them into the term O(T−1/2) which

is then omitted from the expression.

We have simulated results indicating what values of B took on for various values

of α1 (see figure 4.8).

29

Figure 4.8: Minimum, maximum, mean, and median of |B| vs. α1

For these results, a group of 1000 simulations were taken for many given value of

α1 ranging from -1 to 1. The values for α1 were chose such that the very left data point

of each graph represents α1 = −0.999 with a resolution of 0.001 in between points

nearby. Likewise, the very right data point of each graph represents α1 = 0.999. The

minimum graph shows for every α1, there is at least one simlulation in the group

which yields a |B| close to zero with the exception of the edges of the horizontal axis.

The maximum graph shows |B| to exceed 1 for almost all values of α1 indicating

the assumptions made around B do not hold for at least those instances. Both the

maximum and the minimum graph shows values of |B| to be nearly the same at the

left and right edges of the horizontal axis indicating hardly any variation between

30

outputs of each simulation for these values of α1. The mean and median graph show

a consistent rise of |B| as α1 approaches ±1. This indicates that for α1 near ±1, |B|will be expected to be near 1 which violates the truncation assumption of the infinte

series representation of B.

Figure 4.9: Sample size comparison for mean |B| vs. α1

We can also observe the impact of sample size on the mean of |B| (see figure

4.9). Each curve represents a simulation run given the labeled value of the sample

size Tobs. As the sample size goes up, the closer the mean of the 1000 values of |B|approaches zero for all values of α1 except toward the edges of the horizontal axis. No

matter what the sample size we have used in the simulation, the mean of |B| always

approaches 1 when α1 approaches ±1.

In this section, we have discussed the AR(1) process focusing on simulated out-

put. We notice that our coefficients produced from simulation are biased as expected.

However, when the observed bias is compared to the expected bias as given by our

equations in Chapter 3, we notice differences. The differences are mostly apparent

31

in situations when the coefficients α1 close to ±1. Additionaly, the bias equations

indicate a linear relationship between the true coefficient α and the estimated coeffi-

cient α whereas the simulated output points toward a nonlinear relationship. Finally,

looking into what causes the difference, we notice the assumptions made around B

are violated in several cases, but mainly when we are dealing with situations when

the AR coefficient is close to ±1.

32

4.3 Simulation of the AR(2) Process: Real Roots

Once again we look at the AR(2) process with zero mean:

yt + α1yt−1 + α2yt−2 = εt.

Also, recall from Chapter 3 the characteristic equation where A(z−1) = 0:

z2 + α1z + α2 = 0.

Our simulation design takes the solution to the characteristic equation in sep-

arate cases, one simulation for the case where there are real roots, and once for

imaginary roots.

Figure 4.10: Bias of Alpha 1 and Alpha 2

We will use the labels z1 and z2 to represent each root. Recall from the beginning

of this chapter that we start the simulation by choosing either coefficients or roots

which in turn imply a set of coefficients. In figure 4.10 we can see that the bias

increases as either or both roots increase in absolute value. The contour lines in both

graphs show an exponentially increasing pattern as we approach (z1 = z2 = ±1).

33

4.4 Simulation of the AR(2) Process: Complex Roots

The solution to the characteristic equation in the AR(2) process may yield

complex roots. Since the roots can be plotted in the complex plane, we will use the

real and imaginary axes to plot all of our observations. Since complex roots come in

conjugate pairs, we will make all of our observations only with respect to the root

with the positive imaginary part. Plotting the conjugate with the negative imaginary

part would yield a mirror image. Figures 4.11 and 4.12 show the bias (α − α) of the

coefficients averaged over 100 simulations.

Figure 4.11: Bias of Alpha 1

The contour lines in figures 4.11 and 4.12 indicate the that bias of both α1 and

α2 increase in absolute value as the roots approach (±1 + 0i). As a comparison to

the real roots scenario, the real (horizontal) axis of these plots is equivalent to the

line with slope equal to 1 in the plots of figure 4.10. In both α1 and α2, the presence

of bias is expected. Shaman and Stine (1988) have produced equations for the bias

which we can compare to the bias observed from simulation (see Chapter 3). Using

34

Figure 4.12: Bias of Alpha 2

the Accuracy of the Asymptotic Expected Bias metric defined earlier, we can plot

how close the observed bias is to the expected bias. Figures 4.13 and 4.14 show the

difference between the observed and the expected bias using our expression defined

in the previous section (see equation 4.3).

The greatest innacuracy of the expected bias is greatest at the corners of the

plot (±1 + 0i). We will now consider the possibility of the magnitude of the roots

having an impact on the accuracy of the estimate. Figures 4.15 and 4.16 show that

dividing the bias by the magnitude of the coefficient still results in an increasing value

as the roots approach (±1 + 0i).

Now that we see there is a difference between the expected and observed bias,

we are interested in seeing what role the sample size has to do with the accuracy

of the bias formulas. In figures 4.11 through 4.16, our sample size was Tobs = 100,

however, in figures 4.17 and 4.18, we can see the effect of both a larger and small

sample size.

35

Figure 4.13: Accuracy of Asymptotic Expected Bias of Alpha 1

Figure 4.14: Accuracy of Asymptotic Expected Bias of Alpha 2

For the AR(2) process, the matrix B is 2x2 in size and yields two eigenvalues. In

the case of complex conjugate roots, we are interested in several different instances of

B, one for every root pair we study. In our program design, we have run simulations

36

Figure 4.15: Relative Accuracy of Asymptotic Expected Bias for Alpha 1

Figure 4.16: Relative Accuracy of Asymptotic Expected Bias for Alpha 2

for 5055 unique data points in the top half of the unit circle of the complex plane. For

each point in the unit circle we have several simulations depending on the magnitude

of the sample size. Therefore, at the end of the simulation process, we have a set

37

Figure 4.17: Accuracy of Asymptotic Expected Bias for Tobs = 10

Figure 4.18: Accuracy of Asymptotic Expected Bias for Tobs = 1000

of many eigenvalues. We chose to look at the larger of the two in absolute value

per matrix calculated for each simulated data set. Then, for every point in the unit

circle, we have organized them in an array and will look at the largest value, mean,

38

and median per point on the unit circle.

Figure 4.19 shows that for a sample size such as Tobs = 100, the largest eigen-

value of the B matrix is over 1 at points near the corners (±1 + 0i) of the unit

circle. However, even for a larger sample size Tobs = 10, 000, we still have the largest

eigenvalues exceeding 1 well within the edges of the unit circle (see figure 4.20).

Figure 4.19: Largest Eigenvalue of B for Tobs = 100

Figure 4.20: Largest Eigenvalue of B for Tobs = 10000

39

Figures 4.21 and 4.22 show the mean eigenvalues for the sample sizes Tobs = 100

and Tobs = 10, 000, respectively.

Figure 4.21: Mean of the largest of the pair of Eigenvalues of B per for Tobs = 100

Figure 4.22: Mean of the largest of the pair of Eigenvalues of B for Tobs = 10000

40

Figures 4.23 and 4.24 show the median eigenvalues for the sample sizes Tobs =

100 and Tobs = 10, 000, respectively.

Figure 4.23: Median of the largest of the pair of Eigenvalues of B for Tobs = 100

Figure 4.24: Median of the largest of the pair of Eigenvalues of B for Tobs = 10000

For the mean plots, we can see from visual inspection that there is not much

41

difference in magnitude of the Eigenvalues. However, the larger sample size displays

more defined contour lines. The same can be said about the median plots. The

eigenvalues of B being close to 1 or over 1 is certainly not a rare occurrence, as

presented in the most of the figures.

For the AR(2) process, we have examined the real and complex roots cases.

In both scenarios, we have shown how the bias, the accuracy of the expected bias,

and the eigenvalues of the matrix B all increase in magnitude as the roots approach

(Root 1 = 1, Root 2 = 1). Just as with the AR(1) process, this implies that the

Yule Walker estimates for α in the AR(2) process increase in error in these areas.

Looking toward the AR(3) process and beyond, we can apply a similar methodology

in examining the behavior. We can find the roots of the characteristic equation.

Break out scenarios for each kind of root set, a combination of real and complex

roots. Then, we can examine the bias of each coefficient with respect to one ore more

of the roots.

42

4.5 Higher Order AR(p) Processes

We have investigated in detail the AR(p) process for p = 1 and p = 2. In

representing the graphs for p = 3, we need to consider more cases than before. The

characteristic equation is now a third order polynomial with three roots. First, we

can sepearate cases for the types of roots. There is one case for all real roots and

one case for only one real root and two complex roots. Then, for each case, we

need to set up a plotting environment. For the case with all real roots, an example

of the environment could be a three-dimentional coordinate system with each root

being represented by an axis. If we were to plot the bias of one of the coefficients

of the three on this system, we could use contoured or colored surfaces to represent

a change in the bias. This would be quite difficult to observe with ordinary tools.

A simpler plotting environment for this case could be the same two-dimensional one

which was used for p = 2. In this environment, we could only use two of the roots

to be represented by the axes, then the third root would have to be a fixed value

for any given plot. Then, we could look at parameters such as the bias of one of

the coefficients and plot it using contour lines in the same fashion as in the case for

p = 2. This plot would have to be repeated for several different values of the third

root since it is fixed for any given plot. This produces a set of plots which would then

have to be repeated for the other two coefficients to study their biases. Any other

parameters such as the bias accuracy or maximum eignenvalue could be plotted in a

similarly. Lastly, this entire set of plots could be done once again for the case with

one real and two complex roots. Using this style of studying parameters leads us to

several plots for the AR(3) process. Studying higher order process this way will lead

to an exponentially growing set of plots to examine.

CHAPTER 5

DISCUSSION

The Yule-Walker equations are well known means of estimating parameters in

stationary autoregressive time series processes. The bias of the Yule-Walker estimates

is also known and documented, however required an infinite sample size (see Shaman

and Stine 1988). We have approximated the expected bias using the asymptotic

expected bias which uses a reasonable sample size. We have seen the results from

chapter 4 indicating a significant difference between the asymptotic expected bias

produced by the equations and observed bias from the output of simulation using

a reasonable sample size. The greatest differences are in the cases when the roots

of the characteristic equation of the process are close to the edges of the unit circle

where z = 1 ± 0i. As we look into the derivation of the bias equations further, we

can see theoretical problems with assumptions made around the estimator B. For

reasonable sample sizes and for roots near z = 1± 0i, the estimator B may not have

eigenvalues in absolute value less than one. In these cases, we can see that there are

large differences between the asymptotic expected bias and the true bias of simulated

data sets. For the complete derivation of the bias formulas, see Shaman and Stine

(1988). Additionally, there is other research done on making Yule-Walker estimates

better by tapering (see Crunk 1999).

For further research, we could investigate the behavior of the Yule-Walker es-

44

timates and expected bias calculations for higher order AR(p) processes. For these

processes, the general form of the expected O(1/T) bias may contain many terms and

be irreducible, however, it still could be easily calculated numerically. Additionally

we can investigate revising the expected bias formula. Our bias formula was an order

O(1/T), however, we could have better results from an O(1/T 2), O(1/T 3), or higher

order of T .

BIBLIOGRAPHY

[BD91] P. J. Brockwell and R. A. Davis, Time series: Theory and methods, 2nd ed.,Springer, 1991.

[Bha81] R. J. Bhansali, Effects of not knowing the order of an autoregressive processon the mean squared error of prediction - 1, Journal of the American Statis-tical Association 76 (1981), no. 375, 588–597.

[Cru99] S. Crunk, Dissertation on tapering to improve yule-walker estimation inautoregressive processes, 1999.

[SS88] P. Shaman and R. A. Stine, The bias of autoregressive coefficient estimators,Journal of The American Statistical Associations 83 (1988), 842–848.

[SS89] R. Stine and P. Shaman, A fixed point characterization for bias ofautoregressive estimators, The Annals of Statistics 17 (1989), no. 3, 1275–1284.

[Zha90] H. C. Zhang, Reduction of the asymptotic bias of autoregressive and spectralestimators by tapering, Journal of Time Series Analysis 13 (1990), no. 5,451–469.

EVALUATION OF THE BIAS OF YULE-WALKER ESTIMATES IN ...crunk/masters/SaeidAssadiThesis.pdfby Saeid M. Assadi Intimeseries analysis theuseofYule-WalkerEquations isacommonmethodfor estimating

Documents