Top Banner

of 97

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

V.I.1.a Basic Definitions and Theorems about ARIMA modelsFirst we define some important concepts. Astochastic process(c.q. probabilistic process) is defined by a T-dimensional distribution function.

(V.I.1-1)Before analyzing the structure of a time series model one must make sure that the time series are stationary with respect to the variance and with respect to the mean. First, we will assume statistical stationarity of all time series (later on, this restriction will be relaxed).Statistical stationarityof a time series implies that the marginal probability distribution is time-independent which means that:the expected values and variances are constant

(V.I.1-2)where T is the number of observations in the time series;the autocovariances (and autocorrelations) must be constant

(V.I.1-3)where k is an integer time-lag;the variable has a joint normal distribution f(X1, X2, ..., XT) with marginal normal distribution in each dimension

(V.I.1-4)If only this last condition is not met, we denote this byweak stationarity.Now it is possible to definewhite noiseas a stochastic process (which is statistically stationary) defined by a marginal distribution function (V.I.1-1), where all Xtare independent variables (with zero covariances), with a joint normal distribution f(X1, X2, ..., XT), and with

(V.I.1-5)It is obvious from this definition that for any white noise process the probability function can be written as

(V.I.1-6)Define theautocovarianceas

(V.I.1-7)or

(V.I.1-8)whereas theautocorrelationis defined as

(V.I.1-9)In practice however, we only have the sample observations at our disposal. Therefore we use thesample autocorrelations

(V.I.1-10)for any integer k.Remark that theautocovariance matrixandautocorrelation matrixassociated with a stochastic stationary process

(V.I.1-11)

(V.I.1-12)is always positive definite, which can be easily shown since a linear combination of the stochastic variable

(V.I.1-13)has a variance of

(V.I.1-14)which is always positive.This implies for instance for T=3 that

(V.I.1-15)or

(V.I.1-16)Bartlett proved that thevariance of autocorrelationof a stationary normal stochastic process can be formulated as

(V.I.1-17)This expression can be shown to be reduced to

(V.I.1-18)if the autocorrelation coefficients decrease exponentially like

(V.I.1-19)Since the autocorrelations for i > q (a natural number) are equal to zero, expression (V.I.1-17) can be shown to be reformulated as

(V.I.1-20)which is the so calledlarge-lag variance. Now it is possible to vary q from 1 to any desired integer number of autocorrelations, replace the theoretical correlations by their sample estimates, and compute the square root of (V.I.1-20) to find the standard deviation of the sample autocorrelation.Note that thestandard deviationof one autocorrelation coefficient is almost always approximated by

(V.I.1-21)Thecovariancesbetween autocorrelation coefficients have also been deduced by Bartlett

(V.I.1-22)which is a good indicator for dependencies between autocorrelations. Remind therefore that inter-correlated autocorrelations can seriouslydistortthe picture of theautocorrelation function(ACF c.q. autocorrelations as a function of a time-lag).It is however possible to remove the intervening correlations between Xtand Xt-kby defining apartial autocorrelation function(PACF)The partial autocorrelation coefficients are defined as the last coefficient of a partial autoregression equation of order k

(V.I.1-23)It is obvious that there exists arelationship between the PACF and the ACFsince (V.I.1-23) can be rewritten as

(V.I.1-24)or (on taking expectations and dividing by the variance)

(V.I.1-25)Sometimes (V.I.1-25) is written in matrix formulation according to theYule-Walker relations

(V.I.1-26)or simply

(V.I.1-27)Solving (V.I.1-27) according to Cramer's Rule yields

(V.I.1-28)Note that the determinant of the numerator contains the same elements as the determinant of the denominator, except for the last column that has been replaced.Apractical numerical estimation algorithm for the PACFis given by Durbin

(V.I.1-29)with

(V.I.1-30)Thestandard error of a partial autocorrelation coefficientfor k > p (where p is the order of the autoregressive data generating process; see later) is given by

(V.I.1-31)Finally, we define the followingpolynomial lag-processes

(V.I.1-32)where B is thebackshift operator(c.q. BiYt= Yt-i) and where

(V.I.1-33)These polynomial expressions are used to definelinear filters. By definition a linear filter

(V.I.1-34)generates a stochastic process

(V.I.1-35)where atis a white noise variable.

(V.I.1-36)for which the following is obvious

(V.I.1-37)We call eq. (V.I.1-36) therandom-walk model: a model that describes time series that are fluctuating around X0in the short and in the long run (since atis white noise).It is interesting to note that a random-walk isnormally distributed. This can be proved by using the definition of white noise and computing the moment generating function of the random-walk

(V.I.1-38)

(V.I.1-39)from which we deduce

(V.I.1-40)(Q.E.D.).Adeterministic trendis generated by a random-walk model with an added constant

(V.I.1-41)The trend can be illustrated by re-expressing (V.I.1-41) as

(V.I.1-42)where ct is a linear deterministic trend (as a function of time).Thelinear filter(V.I.1-35)isnormally distributedwith

(V.I.1-43)due to the additivity property of eq.(I.III-33), (I.III-34), and (I.III-35) applied to at.Now theautocorrelation of a linear filtercan be quite easily computed as

(V.I.1-44)since

(V.I.1-45)and

(V.I.1-46)Now it is quite evident that, if thelinear filter(V.I.1-35)generatesthe variable Xt, then Xtis astationary stochastic process((V.I.1-1) - (V.I.1-3))defined by a normal distribution(V.I.1-4)(and therefore strongly stationary), and a autocovariance function (V.I.1-45) which is only dependent on the time-lag k.The set of equations resulting from a linear filter(V.I.1-35) with ACF (V.I.1-44) are sometimes calledstochastic difference equations. These stochastic difference equations can be used in practice to forecast (economic) time series. Theforecasting functionis given by

(V.I.1-47)On using(V.I.1-35),thedensityof the forecasting function(V.I.1-47) is

(V.I.1-48)where

(V.I.1-49)is known, and therefore equal to a constant term. Therefore it is obvious that

(V.I.1-50)

(V.I.1-51)The concepts defined and described above are all time-related. This implies for instance that autocorrelations are defined as a function of time. Historically, thistime-domainviewpoint is preceded by thefrequency-domainviewpoint where it is assumed that time series consist of sine and cosine waves at different frequencies.In practice there are both advantages and disadvantages to bothviewpoints. Nevertheless, both should be seen as complementary to each other.

(V.I.1-52)for theFourier series model

(V.I.1-53)In (V.I.1-53) we define

(V.I.1-54)

Theleast squares estimatesof the parameters in (V.I.1-52) are computed by

(V.I.1-55)In case of a time series with an even number of observations T = 2 q the same definitions are applicable except for

(V.I.1-56)It can furthermore be shown that

(V.I.1-57)

(V.I.1-58)such that

(V.I.1-59)

(V.I.1-60)Obviously

(V.I.1-61)It is also possible to show that

(V.I.1-62)

If

(V.I.1-63)then

(V.I.1-64)and

(V.I.1-65)and

(V.I.1-66)and

(V.I.1-67)and

(V.I.1-68)which state theorthogonality propertiesof sinusoids and which can be proved. Remark that (V.I.1-67) is a special case of (V.I.1-64) and (V.I.1-68) is a special case of (V.I.1-66). Particularly eq. (V.I.1-66) is interesting for our discussion in regard to (V.I.1-60) and(V.I.1-53), since it states that sinusoids are independent.If(V.I.1-52)is redefined as

(V.I.1-69)then I(f) is called thesample spectrum.The sample spectrum is in fact a Fourier cosine transformation of the autocovariance function estimate. Denote the covariance-estimate of(V.I.1-7)by the sample-covariance (c.q. the numerator of (V.I.1-10)), the complex number i, and the frequency by f, then

(V.I.1-70)On using(V.I.1-55)and (V.I.1-70) it follows that

(V.I.1-71)which can be substituted into (V.I.1-70) yielding

(V.I.1-72)Now from (V.I.1-10) it follows

(V.I.1-73)and if (t - t') is substituted by k then (V.I.1-72) becomes

(V.I.1-74)which proves the link between the sample spectrum and the estimated autocovariance function.On taking expectations of the spectrum we obtain

(V.I.1-75)for which it can be shown that

(V.I.1-76)On combining (V.I.1-75) and (V.I1.1-76) and on defining thepower spectrumas p(f) we find

(V.I.1-77)It is quite obvious that

(V.I.1-78)so that it follows that the power spectrum converges if the covariance decreases rather quickly. The power spectrum is a Fourier cosine transformation of the (population) autocovariance function. This implies that for any theoretical autocovariance function (cfr. the following sections) a respective theoretical power spectrum can be formulated.Of course the power spectrum can be reformulated with respect to autocorrelations in stead of autocovariances

(V.I.1-79)which is the so-calledspectral density function.Since

(V.I.1-80)it follows that

(V.I.1-81)and since g(f) > 0 the properties of g(f) are quite similar to those of a frequency distribution function.Since it can be shown that the sample spectrum fluctuates wildly around the theoretical power spectrum a modified (c.q. smoothed) estimate of the power spectrum is suggested as

(V.I.1-82)

b. The AR(1) processThe AR(1) process is defined as

(V.I.1-83)where Wtis a stationary time series, etis a white noise error term, and Ftis called the forecasting function.Now we derive the theoretical pattern of the ACF of an AR(1) process for identification purposes.First, we note that (V.I.1-83) may be alternatively written in the form

(V.I.1-84)Second, we multiply the AR(1) process in (V.I.1-83) by Wt-kin expectations form

(V.I.1-85)Since we know that for k = 0 the RHS of eq. (V.I.1-85) may be rewritten as

(V.I.1-86)and that for k > 0 the RHS of eq. (V.I.1-85) is

(V.I.1-87)we may write the LHS of (V.I.1-85) as

(V.I.1-88)From (V.I.1-88) we deduce

(V.I.1-89)and

(V.I.1-90)

(figure V.I.1-1)We can now easily observe how thetheoretical ACFof an AR(1) process should look like. Note that we have already added thetheoretical PACFof the AR(1) process since the first partial autocorrelation coefficient is exactly equivalent to the first autocorrelation coefficient.In general, a linear filter process is stationary if the(B) polynomial converges.Remark that the AR(1) process isstationaryif the solution for (1 -B) = 0 is larger in absolute value than 1 (c.q. the roots of(B) are, in absolute value, less than 1).This solution is-1Hence, if the absolute value of the AR(1) parameter is less than 1, then model is stationary which can be illustrated by the fact that

(V.I.1-91)For a general AR(p) model the solutions of

(V.I.1-92)for which

(V.I.1-93)must be satisfied in order to obtain stationarity.

c. The AR(2) processThe AR(2) process is defined as

(V.I.1-94)where Wtis a stationary time series, etis a white noise error term, and Ftis the forecasting function.The process defined in (V.I.1-94) can be written in the form

(V.I.1-95)and therefore

(V.I.1-96)Now, for (V.I.1-96) to be valid, it easily follows that

(V.I.1-97)and that

(V.I.1-98)and that

(V.I.1-99)and finally that

(V.I.1-100)The model isstationaryif theiweights converge. This is the case when some conditions on1and2are imposed. These conditions can be found on using the solutions of the polynomial of the AR(2) model. The so-calledcharacteristic equationis used to find these solutions

(V.I.1-101)The solutions of1and2are

(V.I.1-102)which can be either real or complex. Notice that therootsarecomplexif

When these solutions, in absolute value, are smaller than 1, the AR(2) model is stationary.Later, it will be shown that these conditions are satisfied if1and2lie in a (Stralkowski)triangular regionrestricted by

(V.I.1-103)The derivation of thetheoretical ACF and PACFfor an AR(2) model is described below.On multiplying the AR(2) model by Wt-k, and taking expectations we obtain

(V.I.1-104)From (V.I.1-97) and (V.I.1-98) it follows that

(V.I.1-105)Now it is possible to combine (V.I.1-104) with (V.I.1-105) such that

(V.I.1-106)from which it follows that

(V.I.1-107)Therefore

(V.I.1-108)Eq. (V.I.1-106) can be rewritten as

(V.I.1-109)such that on using (V.I.1-108) it is obvious that

(V.I.1-110)

According to (V.I.1-107) the ACF is a second order stochastic difference equation of the form

(V.I.1-111)where (due to (V.I.1-108))

(V.I.1-112)are starting values of the difference equation.In general, the solution to the difference equation is, according to Box and Jenkins (1976), given by

(V.I.1-113)

In particular, three different cases can be worked out for the solutions of the difference equation

(V.I.1-114)

of (V.I.1-102).The general solution of eq. (V.I.1-113) can be written in the form

(V.I.1-115)

(V.I.1-116)Remark that for the case the following stationarity conditions

(V.I.1-117)

(V.I.1-118)has two solutions

due to(V.I.1-114)and

due to

(V.I.1-119)

Hence we find the general solution to the difference equation

(V.I.1-120)In order to impose convergence the following must hold

(V.I.1-121)Hence two conditions have to be satisfied

(V.I.1-122)which describes a part of a parabola consisting of acceptable parameter values for

Remark that this parabola is the frontier between acceptable real-valued and acceptable complex roots (cfr. Triangle of Stralkowski).

(V.I.1-123)in goniometric notation.The general solution for the second-order difference equation can be found by

(V.I.1-124)On defining

(V.I.1-125)the ACF can be shown to be real-valued since

(V.I.1-126)On using the property

(V.I.1-127)eq. (V.I.1-126) becomes

(V.I.1-128)with

(V.I.1-129)In eq. (V.I.1-128) it is shown that the ACF is oscillating with period= 2/and a variable amplitude of

(V.I.1-130)as a function of k.A useful equation can be found to compute theperiod of the pseudo-periodic behaviorof the time series as

(V.I.1-131)which must satisfy the convergence condition (c.q. the amplitude is exponentially decreasing)

(V.I.1-132)

The pattern of thetheoretical PACFcan be deduced from relations(V.I.1-25) - (V.I.1-28).The theoretical ACF and PACF are illustrated below. Figure (V.I.1-2) contains two possible ACF and PACF patterns for real roots while figure (V.I.1-3) shows the ACF and PACF patterns when the roots are complex.

(figure V.I.1-2)

(figure V.I.1-3)

d. The AR(p) processAn AR(p) process is defined by

(V.I.1-133)where Wtis a stationary time series, etis a white noise error component, and Ftis the forecasting function.As described above, the AR(p) process can be written

(V.I.1-134)Hence

(V.I.1-135)Theweights converge if thestationarity conditionsof the roots of thecharacteristic equation

(V.I.1-136)

Thevariancecan be shown to be

(V.I.1-137)

(V.I.1-138)which can be used to study the behavior of thetheoretical ACFpattern.Remember, that theYule-Walkerrelations(V.I.1-26) and (V.I.1-27)hold for all AR(p) models. These can be used (together with the application of Cramer's Rule(V.I.1-28)) to derive thetheoretical PACFpattern from the theoretical ACF function.

e. The MA(1) processThe definition of the MA(1) process is given by

(V.I.1-139)where Wtis a stationary time series, etis a white noise error component, and Ftis the forecasting function

eq.(V.I.1-46) and (V.I.1-45)we obtain

(V.I.1-140)Therefore the pattern of thetheoretical ACFis

(V.I.1-141)Note that from eq. (5.I.1-141) it follows that

(V.I.1-142)

This implies that there exist at least two MA(1) processes which generate the same theoretical ACF.Since an MA process consists of a finite number of y weights it follows that the process is always stationary. However, it is necessary to impose the so-calledinvertibility restrictionssuch that the MA(q) process can be rewritten into a AR() model.

(V.I.1-143)converges.On using the Yule-Walker equations and eq. (V.I.1-141) it can be shown that thetheoretical PACFis

(V.I.1-144)

Hence the theoretical PACF is dominated by an exponential function which decreases.

The theoretical ACF and PACF for the MA(1) are illustrated in figure (V.I.1-4).

(figure V.I.1-4)

f. The MA(2) processBy definition the MA(2) process is

(V.I.1-145)which can be rewritten on using(V.I.1-139)

(V.I.1-146)where Wtis a stationary time series, etis a white noise error component, and Ftis the forecasting function.

into eq. (V.I.1-46) and (V.I.1-45) we obtain

(V.I.1-147)Hence thetheoretical ACFcan be deduced

(V.I.1-148)Theinvertibility conditionscan be shown to be

(V.I.1-149)(compare with thestationarity conditions of the AR(2) process).The deduction of thetheoretical PACFis rather complicated but can be shown to be dominated by the sum of two exponentials (in case of real roots), or by decreasing sine waves (in case the roots are complex).These two possible cases are shown in figures (V.I.1-5) and (V.I.1-6).

(figure V.I.1-5)

(figure V.I.1-6)

g. The MA(q) processThe MA(q) process is defined by

(V.I.1-150)where Wtis a stationary time series, etis a white noise error component, and Ftis the forecasting function.Remark that this description of the MA(q) process is not straightforward-to-use for forecasting purposes due to its recursive character.

into (V.I.1-46) and (V.I.1-45),the following autocovariances can be deduced

(V.I.1-151)Hence thetheoretical ACFis

(V.I.1-152)Thetheoretical PACFfor higher order MA(q) processes are extremely complicated and not extensively discussed in literature.

(V.I.1-153)

h. The ARMA(1,1) processOn combining an AR(1) and a MA(1) process one obtains an ARMA(1,1) model which is defined as

(V.I.1-154)where Wtis a stationary time series, etis a white noise error component, and Ftis the forecasting function.Notethat the model of (V.I.1-154) may alternatively be written as

(V.I.1-155)such that

(V.I.1-156)in-weight notation.The-weights can be related to the ARMA parameters on using

(V.I.1-157)such that the following is obtained

(V.I.1-158)Also the-weights can be related to the ARMA parameters on using

(V.I.1-159)such that the following is obtained

(V.I.1-160)From (V.I.1-158) and (V.I.1-160) it can be clearly seen that an ARMA(1,1) is in fact a parsimonious description of either an AR or a MA process with an infinite amount of weights. This does not imply that all higher order AR(p) or MA(q) processes may be written as an ARMA(1,1). Though, in practice an ARMA process (c.q. a mixed model) is, quite frequently, capable of capturing higher order pure-AR-weights or pure-MA-weights.On writing the ARMA(1,1) process as

(V.I.1-161)(which is a difference equation) we may multiply by Wt-kand take expectations. This gives

(V.I.1-162)In case k > 1 the RHS of (V.I.1-162) is zero thus

(V.I.1-163)If k = 0 or if k = 1 then

(V.I.1-164)Hence we obtain

(V.I.1-165)Thetheoretical ACFis therefore

(V.I.1-166)

The theoretical ACF and PACF patterns for the ARMA(1,1) are illustrated in figures (V.I.1-7), (V.I.1-8), and (V.I.1-9).

(figure V.I.1-7)

(figure V.I.1-8)

(figure V.I.1-9)

i. The ARMA(p,q) processThe general ARMA(p,q) can be defined by

(V.I.1-167)or alternatively in MA(notation

(V.I.1-168)or in AR() notation

(V.I.1-169)where(B) = 1/(B).Thestationarityconditions depend on the AR part: the roots of(B) = 0 must be larger than 1. Theinvertibilityconditions only depend on the MA part: the roots of(B) = 0 must also be larger than 1.The theoretical ACF and PACF patterns are deduced from the so-called difference equations

(V.I.1-170)

k. Non stationary time seriesMost economic (and also many other) time series do not satisfy the stationarity conditions stated earlier for which ARMA models have been derived. Then these times series are callednon stationaryand should be re-expressed such that they become stationary with respect to the variance and the mean.It is not suggested that the description of the following re-expression tools is exhaustive! They rather form a set of tools which have shown to be useful in practice. It is quite evident that many extensions are possible with respect to re-expression tools: these are discussed in literature such as in JENKINS (1976 and 1978), MILLS (1990), MCLEOD (1983), etc...Transformation of time seriesIf we write a time series as the sum of a deterministic mean and a disturbance term

(V.I.1-193)

(V.I.1-194)where h is an arbitrary function.

(V.I.1-195)This can be used to obtain the variance of the transformed series

(V.I.1-196)which implies that the variance can be stabilized by imposing

(V.I.1-197)Accordingly, if the standard deviation of the series is proportional to the mean level

(V.I.1-198)then

(V.I.1-199)from which it follows that

(V.I.1-200)In case the variance of the series is proportional to the mean level, then

(V.I.1-201)from which it follows that

(V.I.1-202)With the use of aStandard Deviation / Mean Procedure(SMP) we are able to detectheteroskedasticityin the time series. Above that, with the help of the SMP, it is quite often possible to find an appropriate transformation which will ensure the time series to be homoskedastic. In fact, it is assumed that there exists a relationship between the mean level of the time series and the variance or standard deviation as in

(V.I.1-203)which is an explicitly assumed relationship, in contrast to(V.I.1-194).The SMP is generated by a three step process:the time series is spilt into equal (chronological) segments;

for each segment the arithmetic mean and standard deviation is computed;

the mean and S.D. of each segment is plotted or regressed against each other.

By selecting the length of the segments equal to the seasonal period one ensures that the S.D. and mean is independent from the seasonal period.In practice one of the following patterns will be recognized (as summarized in the graph). Note that the lambda parameter should take a value of zero when a linearly proportional association between S.D. and the mean is recognized.The value of lambda is in fact the transformation parameter which implies the following:

(V.I.1-204)

(Figure V.I.1-10)Differencing of time seriesWith the use of theAutocorrelation Function(ACF) (with autocorrelations on the y axis and the different time lags on the x axis) it is possible to detectunstationarityof the time series with respect to themean level.

(figure V.I.1-11)When the ACF of the time series isslowly decreasing, this is an indication that the mean is not stationary. An example of such an ACF is given in figure (V.I.1-11).The differencing operator (nabla) is used to make the time series stationary.

l. Differencing (Nabla and B operator)We have already used the back shift operator in previous sections. As we know the back shift operator (B-operator) transforms an observation of a time series to the previous one

(V.I.1-205)Also, it can easily be shown that

(V.I.1-206)

Note that if we back shift k times seasonally we get

(V.I.1-207)The back shift operator can be useful in defining thenabla operatorwhich is

(V.I.1-208)or in general

(V.I.1-209)which is sometimes also called thedifferencing operator.As stated before, a time series which is not stationary with respect to the mean can be made stationary by differencing. How can this be interpreted ?

(figure V.I.1-12)In figure (V.I.1-12) a function is displayed with two points on a graph (a, f(a)), and(b, f(b)).Assume that a time series is generated by the function f(x). Then the derivative of the function gives the slope of a line tangent with respect to the graph in every point of the function's domain.The derivative of a function is defined as

(V.I.1-210)If we compute the slope of the cord in (figure V.I.1-12), this is in fact the same as the derivative of f(x) between a and b with a discrete step in stead of an infinitesimal small step.This results in computing

(V.I.1-211)Although we have assumed the time series to be generated by f(x), in practice we only observe sample values at discrete time intervals. Therefore the best approximation of f(x) between two known points (a, f(a)) and (b, f(b)) is a straight line with slope given by (V.I.1-211).If this approximation is to be optimal, the distance between a and b should be as small as possible. Since the smallest difference between observations of equally spaced time series is the time lag itself, the smallest value of h in eq. (V.I.1-211) is in fact equal to 1.Therefore (V.I.1-211) reduces to

(V.I.1-212)which is nothing else but thedifferencing operator.We conclude that by differencing a time series we 'derive' the function by which it is generated, and therefore reduce the function's power by 1. If e.g. we would have a time series generated by a quadratic function, we could make it stationary by differencing the series twice.Furthermore it should be noted that if a time series is non stationary, and must therefore be 'derived' to induce stationarity, the series is often called to be generated by anintegrated process. Now the ARMA models which have been describedbefore, can be elaborated to the class ofARIMA(c.q. Autoregressive Integrated Moving Average) models.

m. The behavior of non stationary time seriesIn the previous subsections, non stationarity has been discussed at a rather intuitive level. Now we will discuss some morefundamental propertiesof the behavior of non stationary time series.A time series that is generated by

(V.I.1-213)with(B) an AR operator which is not stationary:(B) has d roots equal to 1; all other roots lie outside the unit circle. Thus eq. (V.I.1-213) can be written by factoring out the unit roots

(V.I.1-214)where(B) is stationary.In general a univariate stochastic process as (V.I.1-214) is denoted anARIMA(p,d,q)model where p is the autoregressive order, d is the number of non-seasonal differences, and q is the order of the moving average components.Quite evidently, time series exhibiting non stationarity in both variance and mean, are first to be transformed in order to induce a stable variance, and then to be differenced enabling stationarity with respect to the mean level. The reason for this is that power, and logarithmic transformations are not always defined for negative (real) numbers.The ARIMA(p,d,q) model can be expanded by introducingdeterministic d-order polynomial trends.This is simply achieved by adding a parameter - constant to (V.I.1-214), expressed in terms of a (non-seasonal) non-stationary time series Zt

(V.I.1-215)

The same properties can be achieved by writing (V.I.1-215) as an invertible ARMA process

(V.I.1-216)where c is a parameter-constant. This is because

(V.I.1-217)

Also remark that the p AR parameters must not add to unity, since this would, according to (V.I.1-217), imply (in the limit) an infinite mean level, an obvious nonsense!An ARIMA model can be generally written as adifference equation. For instance, the ARIMA(1,1,1) can be formulated as

(V.I.1-218)which illustrates the postulated fact. This form of the ARIMA model is used for recursive forecasting purposes.The ARIMA model can also be generally written as a random shock model (c.q. a model in terms of the-weights, and the white noise error components) since

(V.I.1-219)it follows that

(V.I.1-220)Hence, if j is the maximum of (p + d - 1, q)

(V.I.1-221)it follows that the-weights satisfy

(V.I.1-222)which implies that large-lagged-weights are composed of polynomials, exponentials (damped), and sinusoids (damped) with respect to index j.This form of the ARIMA model (c.q. eq. (V.I.1-219)) is used to compute the forecast confidence intervals.A third way of writing an ARIMA model is the truncated random shock model form.

The parameter k may be interpreted as the time origin of the observable data. First, we observe that if Yt' is a particular solutionof(V.I.1-213), thus if

(V.I.1-223)then it follows from(V.I.1-213), and (V.I.1-223) that

(V.I.1-224)

Hence, the general solution of(V.I.1-213)is the sum ofYt'' (c.q. a complementary function which is the solution of (V.I.1-224)), and Yt' (c.q. a particular integral which is a particular solutionof(V.I.1-213)).

(V.I.1-225)and that the general solution of the homogeneous difference equation with respect to time origink < t is given by

(V.I.1-226)

(V.I.1-227)

(V.I.1-228)since

(V.I.1-229)see also (V.I.1-227).

The general complementary function for

(V.I.1-230)is

(V.I.1-231)with Didescribed in

(V.I.1-232)From (V.I.1-231) it can be concluded that the complementary function involves a mixture of:

(V.I.1-233)(with-weights of the random shock model form) satisfying the ARIMA model structure (where B operates on t, not on k)

(V.I.1-234)which can be easily proved on noting that

(V.I.1-235)such that

(V.I.1-236)

Hence, if t - k > q eq. (V.I.1-233) is the particular integral of (V.I.1-234).

If in an extreme case k =-then

(V.I.1-237)called thenontruncated random shock formof the ARIMA model.

(V.I.1-238)(compare this result with (V.I.1-237)).Also remark that it is evident that

(V.I.1-239)

This implies that when using the complementary function for forecasting purposes, it is advisable to update the forecast as new observations become available.

o. Unit root testsThere are d unit roots in a non-stationary time series (with respect to the mean) if(B) is stationary and(B) invertible in

(V.I.1-252)

The most frequently used test for unit roots is theaugmented Dickey-Fuller regression(ADF)

(V.I.1-253)

(V.I.1-254)

An example of the use of the ADF is the following LR-test

(V.I.1-255)where

(V.I.1-256)

(V.I.1-257)Some critical 95% values for this LR-test (K1) are: 7.24 (for T24), 6.73 (for T50), 6.49 (for T100), and 6.25 (for T120). It is also possible to perform anEngle-Granger cointegration testbetween the variables Xt,i.This test estimates the cointegrating regression in a first step

(V.I.1-258)