Parameter Estimation of AR(p) series

Parameter Estimation of AR(p) series

2015. április 20.

Parameter Estimation of AR(p) series 2015. április 20. 1 / 27

The parameters are to be estimated �rst and the order selection comesonly later.The equation of autoregression is:

X(t) = µ+ α1(X(t− 1)− µ) + . . .+ αp(X(t− p)− µ) + ε(t)

So, here 0 expectation is not necessarily supposed. It is an unknownconstant:

EX(t) = µ

Alltogether the model parameters are: (µ, α1, . . . , αp, σε).

Suppose, that the roots of the characteristic polynomial

P (z) = 1− (α1z + . . .+ αpzp)

lay within the unit circle and so, there exists a stationary solution X(t)of the equation.


The autoregression resembles a regression but the dependent vari-able X(t) and the independent (explanatory) variables X(t −1), . . . , X(t − p) are obtained from the same observation X(t), sothe rows i.e. the cases are interdependent, correlated.

This does not change the estimation of the coe�cients but doeschange the properties, the goodness, the distribution, invalidatingthe usual variances, tests, con�dence bounds.

The method of least squares is applicable as in regression.


The following quadratic functional Q is to be minimised:

Q(µ, α1, . . . , αp) =

N∑t=p+1

ε2t =

=

N∑t=p+1

[X(t)− µ− α1(X(t− 1)− µ)− . . .− αp(X(t− p)− µ)]2 −→ min

the terms ε2(1), . . . , ε2(p) must be omitted, because X(0), . . . , X(−p+1)are not observable.


Consider now the likelihood under Gaussian WN. For the AR(1) case:

(X(t)− µ)− α(X(t− 1)− µ) = ε(t)

and here the noise values ε(t) are independent of each other and for t ≥ 2even from X(1), too. So,

f(x1, . . . , xT ) = f(x1) · f(ε2) · . . . · f(εT )

and disregarding from a multiplicative normalizing constant:

f(x1, . . . , xT ) ∝ f(x1) ·1

σT−1ε

· exp{− 1

2σ2ε·T∑t=2

ε2t }

Given an initial value, the conditional likelihood can be derived fromthis. Proportionally again it equals:

f(X(2), . . . , X(T ) | X(1)) ∝

∝1

σT−1ε

· exp{− 1

2σ2ε·T∑t=2

[X(t)− µ− α(X(t− 1)− µ)]2}


For the general AR(p) case the summation goes from p, whereas theterms in the bracket go up to (t− p):

f(X(2), . . . , X(T ) | X(1), . . . , X(p)) ∝

∝1

σT−1ε

· exp{− 1

2σ2ε·T∑t=p

[Xµ(t)− α1Xµ(t− 1)− . . .− αpXµ(t− p)]2}

where Xµ(s) denotes X(s)− µ.This leads to the minimisation of the same quadratic Q(µ, α1, . . . , αp)funktional as the seast squares method. The same relationship remainstrue that holds in the case of the ordinary regression between the leastsquares and the Gaussiamn maximum likelihood.


What concerns the full (not the conditional) likelihood, the initial Gaus-sian value also plays a role. Since

X(1) ∼ N(µ,σ2ε

1− α2)

the unconditional likelihood is the product of that of the condition andthe conditional one:

f(x1, ..., xT ) ∝

∝

√(1− α2)

σε· exp

{−1− α2

2σ2ε(x1 − µ)2

}f(x2, . . . , xT | X1 = x1)

and that, in turn, leads to a nonlinear minimisation. The di�erence fromthe linear optimisation (i.e. the conditional case) is small however, andtends quickly to zero.


Let us return to Q:

∂

∂µQ(µ, α1, ...αp) = 0⇒

T∑t=p+1

[X(t)− µ− α1(X(t− 1)− µ)− ...− αp(X(t− p)− µ)] = 0 (∗)

∂

∂αjQ(µ, α1, ...αp) = 0⇒

T∑t=p+1

[X(t)− µ− α1(X(t− 1)− µ)− ...− αp(X(t− p)− µ)]·

·(X(t− j)− µ) = 0 (∗∗)

j = 1, . . . , p


The estimation of µ from the previous equations is:

µ =X1 − (α1X2 + . . .+ αpXp)

1− (α1 + . . .+ αp),

where

Xj+1 =1

T − p

T−j∑t=p+1−j

X(t), j = 0, 1, . . . , p− 1

If here p� T ⇒ Xj+1∼= X ∀j and so

µ = X

The estimation of the autoregressive coe�cients can be obtained fromthe (∗∗) equations but we did not give it here in a closed form.


Consider now the maximum likelihood estimation for the AR(1) equa-tion. Suppose that µ is known to be equal to 0: µ = 0. In this case thesolution of the (∗∗) equation can be found easily and can be written inthe form:

α =

T∑t=2

X(t) ·X(t− 1)

T∑t=2

X2(t− 1)

Writing here X(t) from the AR(1) equation:

α = α+

T∑t=2

ε(t) ·X(t− 1)

T∑t=2

X2(t− 1)

.


As a next step we would like to compute the asymptotic variance of theparameter estimation in the case of the AR(1) model. This is a goodexample to understand the di�erence in argueing in the autoregressivecase as compared to the ordinary multivariate regression. If we regard theproblem as a multivariate regression, then conditioning on the regressors,i.e. treating the explanatory (independent) variables as known (non ran-dom) values, in the expression of α a linear combination of the regressionerror terms appears in the numerator and that is the only random term,so its variability creates the variability of the estimator α hence it is theonly source of the variance of it. So the estimator would also be normallydistributes and computing its variance we would arrive at:

α = N(α,σ2ε∑T

t=2X2(t− 1)

)

as in the case of multivariate regression. However, when the explanatoryvariable i.e. the past is given in an AR(1) model (i.e. we condition onthat), then the independent (response) variable is also given � hence thename autoregression � and consequently, the regression error terms arenot random any more, therefore there is no variance to compute.


As a result of the previous arguments, one has to approach the problemdi�erently. Lets try then a di�erent way.The terms in the product in the numerator are independent ⇒ the nu-merator has 0 expectation.Although the terms of the sum in the numerator are not independent,but still they are uncorrelated:

Eε(t) ·X(t− 1) · ε(t− 1) ·X(t− 2) =

Eε(t) · EX(t− 1) · ε(t− 1)X(t− 2) = 0

therefore the variance of the sum is the sum of the variances. So:

D2T∑t=2

ε(t) ·X(t− 1) = (T − 1) · σ2ε · σ2X


The denominator, divided by T − 1 is the corrected empirical variance,which, for a large sample, can be estimated by the variance itself (i.e.tends to the variance). So

D2α ∼= D2

∑ε(t) ·X(t− 1)

(T − 1)σ2X=

1

(T − 1)2 · σ4X·D2

∑ε(t)X(t− 1) =

(T − 1)σ2εσ2X

(T − 1)2σ4X=

σ2ε(T − 1)σ2X

= (∗)

here σ2X = σ2ε

1−α2 ⇒

(∗) = 1− α2

T − 1This conclude

α ∼ N(α,1− α2

T − 1)

An important exception is, when α = 1, or when αthe model is a socalled nearly nonstationary one, that is α is near to, or tend to 1 in somesense.


Let us return to the AR(p) model and the Q functional. The derivativeby µ (*) yielding µ simpli�es to X by ignoring the end e�ect. Let uswrite it into the (**) equation obtained from the derivatives by αj . So:

T∑t=p+1

{X(t)−X − (α1(X(t− 1)−X) + · · ·+ αp(X(t− p)−X))

}·

·(X(t− j)−X) = 0

Ignoring the end e�ect again and using the autocovariance estimator

T∑t=p+1

(X(t− k)−X) · (X(t− j)− x) ∼= N · R(j − k)

Writing it into our equation and dividing by N it transformes to

R(j) = α1R(j − 1) + · · ·+ αpR(j − p), j = 1, · · · , p

That is we arrived at the Yule-Walker equations with the estimated au-tocovarinces.


The Yule-Walker equations in matrix form are as:

Rp = <pα

where Rp = (R(1), · · · , R(p))Tare the vector of autocovariances and α =

(α(1), · · · , α(p))T are the vector of estimated parameters. The matrix <pis

<p =

R(0), R(1), · · · , R(p− 1)

R(1), R(0), · · · , R(p− 2)...

......

R(p− 1), R(p− 2), · · · , R(0)

These equations are identical with the Yule-Walker equations, hence thename: the Yule-Walker estimator of the parameter.


The next important issue is the estimation of the variance of the noise.The known variance estimator in regression, computed by the leastsquares method is

σ2ε =1

N − 2p− 1Q(µ, α1, · · · , αp) =

N − pN − 2p− 1

{R(0) + α1R(1) + · · ·+ αpR(p)

}The denominator, as usual in regression, is the number of observations(from which the �rst p is lost because of the time lags, so alltogether itis n− p) minus the number of estimated parameters (p+ 1).On the contrary, the maximum likelihood estimator, that plays an im-portant role in the sequel is (Brockwell and Davis p.138.):

(∗) σ2ε = R(0)− RTp <−1p Rp = R(0) + Rp · α,

so it does not coincides with the least squares estimator:

σ2ε = 1

N − pQ(µ, α1, · · · , αp) = R(0) + α1R(1) + · · ·+ αpR(p).


Asymptotic distribution: Often the estimators based on momentshave graeter variance than the maximum likelihood. This is not the casehere, because the asymptotic distributions simply coincide (Mann andWald 1943):

α ∼ N(α,N−1 · σ2ε<−1p


How to check for the white noise property:

1 The empirical autocovariance function has to be equal'signi�cantly' to 0.

since the estimated autocovariance values are practicallyindependent and have N(0, 1

n ) distribution, so the 95 % of them hasto fall within the ±1, 96/

√n boundaries.

So, if computing the ACVF e.g. for 40 lags, when 2 or 3 is slightlyout of the bounds or one stands out de�nitely, one has to reject thewhite noise 0 hypothesis.

2 Portmanteau tests:

Compute the test statistics Q = n ·∑h

j=1 R2(j)

i.e. take the squared sum of the of the autocovariances for h lagsand multiply it with n, it has to behave like the squared sum of hstandard normal N(0,1) variables ⇒ it has to have χ2

h distribution.The test is built up accordingly:If Q > χ2

h(1− α) ⇒ the 0 hypothesis is rejected.


Ljung és Box re�nes the previous test by using:

QLB = n · (n+ 2)

h∑j=1

R2(j)/(n− j)

because this statistics approaches the χ2h distribution better, and the test

is stronger.These tests work not only for Gaussian white noise.

The test of McLeod és Li (1983) is speci�c for Gaussian white noiseand it is based on the squared process: X → X2. It takes the squaredsum of the autocovariances of the squared process:

Q = n · (n+ 2)

h∑j=1

R2X2(j)/(n− j)

This is also χ2h distributed but far more sensible statistics. The idea is

that the GWN is also an IVN, hence its squared is also IVN. This is nottrue for a WN, in general, the squared of a WN is not a WN any more.


Return point test:

This works in general for any y1, · · · , yn sample, whether of time seriesor not.Def.: There is a return point in i, 1 < i < n, if{

yi−1 < yi and yi > yi+1 (growth followed by descent)

yi−1 > yi and yi < yi+1 (descent followed by growth)

The probability of a return in i is 2/3 therefore the expected number ofreturns T is: (n− 2)23 = ET = µTIt can be shown thatD2(T ) = (16n−29)/90 = σ2T and T is approximatelyN(µT , σ

2T ) distributed (because it is a sum of indicators variables which

are 1 if there is a return and 0 if there is not). So T is tested against thisdistribution.


Di�erence-sign test:

Count those i, for which yi > yi−1, i = 2, ..., n.This is the same as the number S of positive terms in the di�erencedseries.For an iid sequence:µS = ES = 1

2 · (n− 1)

σ2S = D2S = n+112

and for great n S ∼ N(µS , σ2S)

So again the computed value is tested against this normal distribution.When S−µS is a great positive (or negative) number,⇒ probably thereis an increasing (decreasing) trend in the process.


Wilcoxon Rank test: Useful when a linear trend is to be detected.Let P is the number of those (i, j) pairs for which yj > yi and j > i,i = 1, · · · , n− 1

Alltogether there are

(n

2

)pairs, of which j > i, and the probability is 1

2

for each that yj > yi. Hence

µp = EP = 14 · n · (n− 1)

It can be shown that:

σ2p = D2P = n · (n− 1)(2n+ 5) · 172

and again for great n: P ∼ N(µp, σ2p)

Great positive (negative) P − µp test statistic value indicates increasing(decreasing) trend.


Nemparametrikus regresszió:

Running line (futó egyenes):Simítás, általában a trend torzított becslése.N(0,1)-es i.i.d mintát, ha simítjuk, akár periodikus görbét is kaphatunkbelõle.A simítás során 2 alapkérdés van:

1 Hogyan "átlagoljunk" egy bizonyos környezetben

2 Hogyan válasszuk meg a környezetet

Legközelebbi szomszéd - a legközelebbi k pont

Szimmetrikus legközelebbi szomszéd - az egyik és másik oldalon isk2 ,

k2 pont.

Nincs jelentõsége idõsorra, regresszióra.


Legyen:Yi = µ(Xi) + εi,

ahol µ sima függvény.Running line: egy mozgó ablakot választunk, és az ablakon belül egyegyszerû lineáris regressziót alkalmazunk Y -ra X-szel.Yi-t az Xi alapján abból az ablakból becsüljük, amelynek õ van aközepén.Pl. k=11-re Y14-et az (X9, Y9),...(X19, Y19) ablakból,azaz ezen párokravégzünk regressziót, és ennek együtthatóival predikáljuk Y14-et X14-bõl.Ez az eljárás jó irregulárisan meg�gyelt idõsorra is. Ekkor Xi az idõ, t,ami "véletlenszerû", vagyis regresszorként is felfogható.


Mag regresszió (Kernel regression):

Ekkor is környezeteket választunk, de ezen belül nem egyenlõ súllyalvesszük �gyelembe a pontokat.Ha x0-ban vagyunk kíváncsiak a simított predikcióra, akkor a meg�-gyelési "helyeket" (a regresszor értékeit) súlyozzuk az x0-tól való távol-ságuk függvényében

w0,i =C0

λ·K ·

(∣∣∣∣x0 − xiλ

∣∣∣∣)ahol K egy magfüggvényλ a sávszélesség(Egy lehetõség pl. xi-t a szórásával osztani.)Ezekkel a súlyokkal egy súlyozott regressziót csinálunk, vagyis a mini-malizálandó legkisebb négyzetes kifejezést súlyozva állítjuk elõ.

µ(x0) =

∑K ·

(x0−xiλ

)· yi∑

K ·(x0−xiλ

)Parameter Estimation of AR(p) series 2015. április 20. 25 / 27

Magfüggvények:

Gauss mag:

a Gauss eloszlás sûrûségfüggvénye

Minimális variancia mag:

K(t) = 38(3− 5t2) |t| ≤ 1

Epanechnikov mag:

K(t) = 34(1− t

2) |t| ≤ 1


Lokális regresszió: LOESS

A running line és a súlyozott legkisebb négyzetes illesztés regresszorokszáma Mallow féle Cp statisztika?????


Parameter Estimation of AR(p) series

Documents