Introduction Model-free extrapolation Univariate time-series models Econometric Forecasting Robert M. Kunst [email protected]University of Vienna and Institute for Advanced Studies Vienna November 10, 2012 Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
86
Embed
Econometric Forecasting · 2012-11-10 · Introduction Model-freeextrapolation Univariatetime-seriesmodels Econometric Forecasting Robert M. Kunst [email protected] UniversityofVienna
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Given some data (observations) on a (possibly multivariate)variable x , i.e. x1, . . . , xN , we want to find a good approximationto the (as yet ‘unknown’) observation xN+h. We use Chatfield’snotation: x̂N(h) is a h–step forecast for xN+h given observations (atime series) until and including xN .
The information set available at t = N for the forecast necessarilyincludes the observed time series but it may be much larger inpractice.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Forecasting and predictingTo many authors, forecasting and prediction are equivalent. Someauthors distinguish the terms: prediction is the technical word,forecasting relates predictions to the substance-matterenvironment.
Clements and Hendry define: predictability is a theoreticalproperty—unconditional and conditional distributions differ—,forecastability is the possibility that this property can be exploitedin practice.
The words prognosis and projection are related but their usage ismore restricted.
The participle forecasted is incorrect but ubiquitous in theliterature.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Some extrapolation methods can be justified by time-series models.It is easier to evaluate model-based procedures, as the models canbe simulated. With actual data, extrapolation can be a surprisinglygood benchmark.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
If decisions are based on forecasts, forecasts may affect theforecasted variables. Effects can be positive or negative:
1. self-fulfilling forecasts: a bad growth forecast may causepessimism and decrease demand; a high inflation forecast mayraise incentives for wage bargaining;
2. self-defeating forecasts: a high unemployment forecast maycause active labor market policies; a high inflation forecastmay cause central banks to implement anti-inflationarypolicies.
A good excuse for inaccurate economic forecasts?
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
1. a true out-of-sample forecast x̂N(h) only uses information overthe time range t ≤ N. If it is model-based, all parameters areestimated for t ≤ N, and this includes data-based elements ofmodel specification;
2. an in-sample forecast uses information over t ≤ N + h. Suchinformation may be exogenous variables, or a model is fittedto a time range ending even after N + h. Forecast errors willbe residuals, not true prediction errors.
In forecasting, good performance in out-of-sample prediction isviewed as the acid test for a good forecast model.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
SES determines the filtered value x̂t from a weighted average overa past observation and a past filtered value:
x̂t = αxt + (1− α)x̂t−1
The constant α ∈ (0, 1) is a damping or smoothing factor. Thisequation is called the ‘recurrence form’ of SES. Note that asmoothed past needs to be known here.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
often given without the last term, assuming x̂0 = 0. Large αimplies strong ‘discounting’ of the past and weak smoothing. Inany case, the past enters with geometrically declining weights.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
◮ SES and DES are quick and simple procedures that oftenperform well;
◮ SES can be shown to be equivalent to a time-series forecastbased on an ARIMA(0,1,1) model, i.e. MA on first differences;
◮ DES is equivalent to a time-series forecast for a specificARIMA(0,2,2) model, i.e. MA(2) on second differences withparameter restrictions on MA coefficients: not a very plausiblemodel;
◮ For this reason, many forecasters avoid DES and use the moreflexible Holt-Winters methods instead.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Holt’s method generalizes DES and introduces a second tuningparameter. It has two recursion equations, for local trend (orrather ‘slope’) T and local level L:
Lt = αxt + (1− α) (Lt−1 + Tt−1) ,
Tt = γ (Lt − Lt−1) + (1− γ)Tt−1.
L averages data and ‘forecast’, T averages old slope and new slopeestimate from L. Meaning of T differs from DES! A very popularmethod.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Forecasting using Holt’s methodThe standard definition for h–step forecasts is
x̂N (h) = LN + hTN ,
such that x̂t−1 (1) = Lt−1 + Tt−1 is a smoothed version of xt .Gardner& McKenzie suggest to forecast from Holt’s methodvia
x̂N (h) = LN +
h∑
j=1
φj
TN .
This forecast corresponds to an ARIMA(1,1,2) generating model,while Holt’s method relies on ARIMA(0,2,2). Chatfield warnsthat all smoothing extrapolations are not genuinely justified byprediction in time-series models. They would imply absurdparameter restrictions.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Quarterly and monthly economic data often has considerableseasonal variation. Traditional seasonal models distinguishmultiplicative seasonality (seasonal factors) and additive seasonality(seasonal dummy intercepts). The Holt-Winters method allows forslow changes in these seasonal factors and intercepts.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
◮ The procedure needs three smoothing parameters that areoften determined by least-squares fitting. Low γ prescribes atime-constant, deterministic seasonal cycle.
◮ Convenient starting values for T are sample averages over∆x . For S , one may use s averages over the specific season.Averages may be restricted to a first portion of the sample.
◮ While the Holt method corresponds to an ARIMA(0,2,2)generating model, there is no simple time-series model thatjustifies Holt-Winters. Nonetheless, the method works well inpractice.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The time-series researchers Brockwell & Davis suggested anappealingly simple alternative to the complex Holt-Wintersalgorithm:
1. Calculate annual averages for the series, interpret them as‘trend’, and subtract the trend from the observed xt to yield aseasonal, but not trending x̃t ;
2. Calculate averages for each season in x̃t over all years, whichyields an estimate of the seasonal cycle;
3. Extrapolate the trend (which one?) plus cycle into the futureto obtain a forecast.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Forecasting using time-series modelsThese methods tentatively assume a data-generating process, thedegree of belief in their model varies among researchers: “Allmodels are wrong, some are useful” [G.E.P. Box]Notes:
◮ ‘useful’ may refer to forecasting;
◮ ‘Model’ usually refers to a parametric model class. Thebest-fitting or true parameter value is unknown and has to beestimated;
◮ knowing the true model class does not guarantee the bestforecasting performance if parameters have to be estimated.The wrong model class may outperform the wrong parameterin the true class: the true model class can be ‘useless’;
◮ simple linear time-series models are good forecasters even ifthe data-generation process is nonlinear.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The current Xt depends on its past and on an error:
Xt = g (Xt−1,Xt−2, . . . ; θ) + εt ,
where g is a nonlinear or linear function, θ is an unknownparameter, (εt) is an unobserved error process.(εt) is often assumed i.i.d. but is at least a martingale-difference
sequence (MDS) defined by
E(εt |It−1) = 0.
It−1 is an information set containing the process past. White noise(uncorrelated) (εt) is not sufficient for prediction!
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
is a convenient forecast X̂t−1(1). It is easily shown that itminimizes the expected squared prediction error E(et)
2 withet = Xt − X̂t among all feasible X̂t .
If θ is unknown, it is estimated from the sample and plugged in, asif it were known. If the model class is correct and the sample islarge, many authors claim that the reduction in accuracy is minor.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The AR(p) model is said to be stable if it permits a stationaryprocess that satisfies the equation and if future depends on thepast in that solution. For example, Xt = 2Xt−1 + εt has astationary solution that is useless for forecasting. A stable model isalso called asymptotically stationary.
The AR(p) model is stable if its characteristic polynomial equation
1− φ1z − φ2z2 − . . .− φpz
p = 0
has only roots greater than one in modulus. For small p, thisproperty is easily checked by hand.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Time-series analysis knows three approaches for lag-order search:
1. Plot the empirical partial autocorrelation function (PACF)ρP(k) that should differ from 0 for k ≤ p and equal 0 fork > p (recommended by Box& Jenkins);
2. fit AR(p) models for different p and test residuals for whitenoise: choose the smallest p such that the test is passed(unreliable);
3. fit AR(p) models for different p and calculate informationcriteria IC (p) for each model: choose the p that minimizesthe criterion.
The information-criterion approach is the most suitable one forforecasting.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Information criteriaThere are two main classes of information criteria:
1. Consistent criteria: as the sample size N → ∞, the true lagorder tends to be found with probability one—for example,Schwarz’ BIC;
2. efficient criteria: as the sample size N → ∞, the forecastbased on the selected model minimizes the expectedmean-squared error—for example, Akaike’s AIC
AIC = log σ̂2(p) +2p
N,
with σ̂2(p) an estimated errors variance from an AR(p) model.
If the aim is forecasting, criteria of the second class, which includesAIC, AICu, AICc , FPE, may be a natural choice.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
In iterated plugging-in, the forecast X̂t(h) is formed as
X̂t(h) = ζhXt + ζh+1Xt−1 + . . .+ ζh+p−1Xt−p+1,
with ζj depending on φ1, . . . , φp . Alternatively, one may fit modelsof the type
Xt = φhXt−h + . . . + φpXt−p + εt
to the sample and use
X̂t(h) = φhXt + . . .+ φpXt−p+h.
The relative merits of this direct modeling method are an issue ofongoing research. For small h and correctly specified models,iterated forecasting can be shown to be better.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Stability of the MA modelThe MA(q) model is always stable. Excluding some q startingvalues, MA processes are stationary, not only asymptoticallystationary.
Evaluation of the characteristic polynomial equation
1 + θ1z + θ2z2 + . . .+ θqz
q = 0
is nevertheless helpful. If it has only roots greater than one inmodulus, there exists a convergent infinite-order autoregressiverepresentation
∞∑
j=0
ψjXt−j = εt ,
which can be useful for prediction. In this case, the MA(q) modelis said to be invertible.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
1. If any of the polynomial roots have modulus exactly equal
one, prediction becomes very difficult. Sometimes, thisnon-invertibility is due to pre-processing the data by filtering,seasonal adjustment, differencing;
2. If any roots have modulus less than one, there exists anobservationally equivalent MA model with all roots larger thanone. This non-invertibility is due to a non-optimal estimationroutine.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Determining the lag order q of an MA modelAgain, time-series analysis knows three approaches for lag-ordersearch:
1. Plot the empirical autocorrelation function (ACF) orcorrelogram ρ(k) that should differ from 0 for k ≤ q andequal 0 for k > q (recommended by Box& Jenkins);
2. fit MA(q) models for different q and test residuals for whitenoise: choose the smallest q such that the test is passed(unreliable);
3. fit MA(q) models for different q and calculate informationcriteria IC (q) for each model: choose the q that minimizesthe criterion.
Again, the IC approach is the most suitable one for forecasting, andthere may be a preference for using ‘efficient’ criteria, such as AIC.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
If parameters are unknown, q is determined empirically, and εtmust be estimated, one may still plug in estimates. Alternatively,program routines may use the ‘inverted’ AR(∞) model and cut offthe sum at some large value.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The ARMA(p, q) model inherits its properties from its AR and MAcomponents.
1. For a unique definition, the characteristic polynomials for theAR and MA parts must not have common zeros, otherwise asimpler representation ARMA(p − 1, q − 1) exists and theadditional parameters cannot be estimated;
2. under condition # 1, if the AR polynomial has only rootslarger than one, the ARMA model is stable;
3. under condition # 1, if the MA polynomial has only rootslarger than one, the ARMA model is invertible, i.e. there is anAR(∞) representation.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Determining the lag orders p and q of an ARMA modelIn principle, time-series analysis knows three approaches forlag-order search:
1. Plot advanced tools such as the empirical extendedautocorrelation function (EACF) and guess a goodcombination of lag orders by visual inspection (rarely used);
2. fit ARMA(p, q) models for different p and q and test residualsfor white noise: choose the smallest p and q such that thetest is passed (unreliable);
3. fit ARMA(p, q) models for different p and q and calculateinformation criteria IC (q) for each model: choose the pair(p, q) that minimizes the criterion.
Again, there may be a preference for using ‘efficient’ criteria, suchas AIC.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Forecasting from an ARMA(p, q) modelForecasting must proceed carefully, using a variant of the methodused in AR models. Suppose an ARMA(2, 2) model has generatedthe data and the parameters are known. Then, one mayreconstruct true εt and use:
In practice, parameters are estimated, p and q are determinedempirically, and εt must be estimated, and these estimates areplugged in. Alternatively, program routines may use the ‘inverted’AR(∞) model and cut off the sum at some large value.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The class of integrated models is the most popular class ofnon-stationary time-series models. An integrated process (Xt) isdefined by the property that it is not stationary but d -th orderdifferences (∆dXt) are stationary. Only d = 1 and d = 2 are ofempirical interest.
The class of integrated processes is a very special class ofnon-stationary processes. They model near-polynomial andrandom-walk trends well but not structural breaks, outliers,increasing variation, and other observed non-stationary features.Note that data cannot be non-stationary.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Box & Jenkins called a process Xt ARIMA(p, d , q) if ∆dXt is astable and well-defined ARMA(p, q) process but ∆d−1Xt is not.
Engle & Granger called a process Xt d-th order integrated, insymbols I(d) if ∆dXt is stationary but ∆d−1Xt is not stationary.This is slightly more general.
How to decide whether data stem from integratedprocesses
Two main ideas:
1. Box & Jenkins suggest to consider the correlogram. If itdecays too slowly, take differences. Use the differencing orderthat makes the correlogram as simple as possible:over-differencing would make it more volatile;
2. Most economists today base this decision on the test byDickey & Fuller and comparable tests. The nullhypothesis is the ‘unit root’: if the test does not reject, takedifferences.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
1. Unit-root tests have low power and tend to support the null,i.e. differencing;
2. in finite samples, it is not certain that the statistical unit-rootdecision and the optimal procedure for forecasting coincide. Inother words, ∆Xt may be easier to forecast even if Xt isstationary;
3. there is no general guideline for the significance level of thetest that optimizes forecasting performance;
4. Clements and Hendry provide evidence that differencingmay improve forecasting performance in the presence ofbreaks and outliers, even though unit-root tests reject.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The most popular ARCH generalization today is still the GARCHmodel by Bollerslev. The lagged unobserved conditionalvariance serves to reflect an ARMA–type geometric decay ofvolatility shocks. The GARCH(1,1) model reads
Xt = µ+ εt ,
E(
ε2t |It−1
)
= ht = α0 + α1ε2t−1 + βht−1,
which models well (log differences of) near random walks in thefinancial world.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
To the time-series forecaster who models serially correlatedvariables, the most interesting extensions are ARMA-ARCH modelswith non-trivial mean equation and an ARCH–type variance
equation, for example:
Xt = µ+ φXt−1 + εt ,
E(
ε2t |It−1
)
= ht = α0 + α1ε2t−1.
Note that this form already appears in the work of Engle (1982),where monthly U.K. inflation was modelled.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
If the mean equation fulfills the usual ARMA stability conditionsand the variance equation fulfills the ARCH stability conditions, εtis white noise and (Xt) is a stationary homoskedastic process.The models view ht (‘volatility’, ‘risk’?) as time-dependent butunconditional varXt as time-constant.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
ARCH models for forecasting: worth the additional work?
1. Forecasts for the ‘level’ of Xt are as good as the meanequation, the ARCH parameters only enter indirectly, theyserve to estimate e.g. φ more efficiently and they specify thestandard error of φ̂;
2. for any data with monthly or lower frequency, modelling theARCH part is not worth the work, gains in efficiency are low;
3. it is tempting to forecast X 2t on the basis of the ARCH model
but such forecasts are often surprisingly poor;
4. current research opines that a systematic forecast of ‘localrisk’ aiming at commercial advice to risk-conscious tradersbased on ARCH models is not possible or at least ‘verydifficult’ (Granger).
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Neurons may be linked to further neurons (synapses), whichpermits ‘multiple layers’. The simplest version of neural nets usedin practice has just one ‘hidden layer’ of such synapses. Assumethe ‘stimulus’ or ‘input’ are past x , and the ‘reaction’ or ‘output’ iscurrent x . This forecast net function follows Chatfield:
x̂t = φ0
wc0 +H∑
h=1
wh0φh
wch +h
∑
j=1
wjhxt−j
,
where all φh are sigmoid functions.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
Weights and numbers of layers are typically optimized over anestimation interval (training set) and are then used for predictionbased on the identified architecture. Extending the training set toan intermediate sample to update the weights is called learning.
Neural nets are really just a class of nonlinear time-series models.Their reported forecasting successes may be rooted in the usage ofsigmoid reaction functions.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna
The simplest of the unobserved-components (UC) models due toHarvey has two state variables µ and β:
Xt = µt + nt ,
µt = µt−1 + βt−1 + w1,t ,
βt = βt−1 + w2,t .
With white-noise input, Xt is a special I(2) or ARIMA(p, 2, q)process. UC adepts claim that the differentparameterization—variances of errors instead of ARMAcoefficients—is more ‘natural’. UC models sometimes performsurprisingly well in forecasting economic data.
Econometric Forecasting University of Vienna and Institute for Advanced Studies Vienna