INTERVENTION BASED ARIMA TIME-SERIES MODELS Ramasubramanian V. Indian Agricultural Statistics Research Institute Library Avenue, New Delhi – 110012 [email protected]1. Time series models Time series (TS) data refers to observations on a variable that occurs in a time sequence. Mostly these observations are collected at equally spaced, discrete time intervals. The TS movements of such chronological data can be resolved or decomposed into discernible components as trend, periodic (say, seasonal), cyclical and irregular variations. One or two of these components may overshadow the others in some series. A basic assumption in any TS analysis/modeling is that some aspects of the past pattern will continue to remain in the future. Here it is tacitly assumed that information about the past is available in the form of numerical data. Ideally, at least 50 observations are necessary for performing TS analysis/ modeling, as propounded by Box and Jenkins who were pioneers in TS modeling. TS models have advantages over other statistical models in certain situations. They can be used more easily for forecasting purposes because historical sequences of observations upon study variables are readily available from published secondary sources. These successive observations are statistically dependent and TS modeling is concerned with techniques for the analysis of such dependencies. Thus in TS modeling, the prediction of values for the future periods is based on the pattern of past values of the variable under study, but not generally on explanatory variables which may affect the system. There are two main reasons for resorting to such TS models. First, the system may not be understood, and even if it is understood it may be extremely difficult to measure the cause and effect relationship, second, the main concern may be only to predict what will happen and not to know why it happens. Many a time, collection of information on causal factors (explanatory variables) affecting the study variable(s) may be cumbersome /impossible and hence availability of long series data on explanatory variables is a problem. In such situations, the TS models are a boon for forecasters. Hence, if TS models are put to use, say, for instance, for forecasting purposes, then they are especially applicable only in the ‗short term‘. Decomposition models are among the oldest approaches to TS analysis albeit a number of theoretical weaknesses from a statistical point of view. These were followed by the crudest form of forecasting methods called the moving averages method. As an improvement over this method which had equal weights, exponential smoothing methods came into being which gave more weights to recent data. Exponential smoothing methods have been proposed initially as just recursive methods without any distributional assumptions about the error structure in them, and later, they were found to be particular cases of the statistically sound AutoRegressive Integrated Moving Average (ARIMA) models.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
M III: 1: Intervention based ARIMA Time-series Models
M III-12
6. Forecasting using simple decomposition methods
In the class, we will discuss about how time series decomposition can be used for
forecasting values for future time periods by means of MSExcel as given below:
Data Seasonal Deaseas Smoothed Trend Deviation
average Data Data
0.8 Y=39.3+4.6X
Data -
Trend
1989 Q1 0 9 29 31 31 39 -8.6
1989 Q2 1 50 52 96 44 44 -0.1
1989 Q3 2 1 8 10 37 49 -11.6
1989 Q4 3 7 30 22 34 53 -19.1
1990 Q1 4 38 29 131 53 58 -4.4
1990 Q2 5 68 52 132 69 62 6.7
1990 Q3 6 11 8 134 82 67 15.1
1990 Q4 7 43 30 142 94 72 22.5
1991 Q1 8 7 29 26 80 76 4.2
1991 Q2 9 41 52 79 80 81 -0.6
1991 Q3 10 13 8 153 95 85 9.4
1991 Q4 11 12 30 41 84 90 -6.1
1992 Q1 12 50 29 173 102 95 7.0
1992 Q2 13 36 52 70 95 99 -4.0
1992 Q3 14 16 8 188 114 104 9.8
1992 Q4 15 35 30 118 115 109 6.0
1993 Q1 16 41 29 140 120 113 6.5
1993 Q2 17 64 52 123 120 118 2.6
1993 Q3 18 1 8 15 99 122 -23.1
1993 Q4 19 53 30 177 115 127 -12.1
1994 Q1 20 36 29 124 124 132 -8.0
1994 Q2 21 69 52 132 132 136 -4.0
1994 Q3 22 11 8 141 141 141 0.0
1994 Q4 23 44 30 145 145 145 0.0
Smoothed data No change
Sum of
trend Trend for Average of
x seas avg/100
and
deviation new values last 6 pts
M III: 1: Intervention based ARIMA Time-series Models
M III-13
7. Other Exponential smoothing methods
If the data has trend or has also seasonality the following methods are employed for
forecasting purposes.
(i) Double exponential smoothing (Holt) (For illustration see Table 2)
The above SES method was then extended to linear exponential smoothing and has been
employed to allow for forecasting non-seasonal time series data with trends. The forecast
for Holt‘s linear exponential smoothing is found by having two equations to deal with –.
one for level and one for trend. The forecast is found using two smoothing constants,
and (with values between 0 and 1), and three equations:
Level: lt = α yt +(1− α)(lt−1 + bt−1),
Trend: bt = β(lt − lt−1)+(1− β)bt−1,
Forecast: yt(h)= lt + bth.
Here lt denotes the level of the series at time t and bt denotes the trend (additive) of the
series at time t. The optimal combination of smoothing parameters and should be
chosen by minimizing the MSE over observations of the model data set.
(ii) Triple exponential smoothing (Winters) (For illustration see Tables 3 (a) & (b))
If the data have no trend or seasonal patterns, then SES is appropriate. If the data exhibit
a linear trend, Holt‘s method is appropriate. But if the data are seasonal, these methods,
on their own, cannot handle the problem well. Holt‘s method was later extended to
capture seasonality directly. The Winters‘ method is based on three smoothing
equations—one for the level, one for trend, and one for seasonality. It is similar to Holt‘s
method, with one additional equation to deal with seasonality. In fact there are two
different Winters‘ methods, depending on whether seasonality is modelled in an additive
or multiplicative way.
The basic equations for Winters‘ multiplicative method are as follows.
Level: lt = α m-t
t
s
y+(1− α)(lt−1 + bt−1)
Trend: bt = β (lt − lt−1)+(1− β)bt−1
Seasonal: st = γ yt / (lt−1 + bt−1)+(1− γ)st−m
Forecast: yt(h)=(lt + bt h)st−m+h
where m is the length of seasonality (e.g., number of, say, months or ‗seasons‘ in a year),
lt represents the level of the series, bt denotes the trend of the series at time t, st is the
seasonal component, and yt(h) is the forecast for h periods ahead. As with all exponential
smoothing methods, we need initial values of the components and parameter values.
M III: 1: Intervention based ARIMA Time-series Models
M III-14
The basic equations for Winters‘ additive method are as follows.
Level: lt = α m-tsty +(1− α)(lt−1 + bt−1)
Trend: bt = β (lt − lt−1)+(1− β)bt−1
Seasonal: st = γ (yt - lt−1 - bt−1) + (1− γ) s t−m
Forecast: yt(h)=(lt + bt h) + st−m+h
8. ARIMA models
8.1 Stationarity of a TS process
A TS is said to be stationary if its underlying generating process is based on a constant
mean and constant variance with its autocorrelation function (ACF) essentially constant
through time. Thus, if we consider different subsets of a realization (TS ‗sample‘) the
different subsets will typically have means, variances and autocorrelation functions that
do not differ significantly.
A statistical test for stationarity is the most widely used Dickey Fuller test. To carry out
the test, estimate by OLS the regression model
y't = y t -1+b 1y‘ t -2+…+ b p y‘ t -p
where y't denotes the differenced series (yt -yt -1). The number of terms in the regression,
p, is usually set to be about 3. Then if is nearly zero the original series yt needs
differencing. And if <0 then yt is already stationary.
8.2 Autocorrelation functions
(i) Autocorrelation
Autocorrelation refers to the way the observations in a TS are related to each other and is
measured by the simple correlation between current observation (Yt) and observation
from p periods before the current one (Ytp ). That is for a given series Yt, autocorrelation
at lag p is the correlation between the pair (Yt , Ytp ) and is given by
It ranges from 1 to +1. Box and Jenkins has suggested that maximum number of useful
rp are roughly N/4 where N is the number of periods upon which information on yt is
available.
(ii) Partial autocorrelation
n
1t
2t
pn
1t
ptt
p
YY
YYYY
r
M III: 1: Intervention based ARIMA Time-series Models
M III-15
Partial autocorrelations are used to measure the degree of association between y t and y t-p
when the y-effects at other time lags 1,2,3,…,p-1 are removed.
(iii) Autocorrelation function(ACF) and partial autocorrelation function(PACF)
Theoretical ACFs and PACFs (Autocorrelations versus lags) are available for the various
models chosen (say, see Pankratz, 1983) for various values of orders of autoregressive
and moving average components i.e. p and q. Thus compare the correlograms (plot of
sample ACFs versus lags) obtained from the given TS data with these theoretical
ACF/PACFs, to find a reasonably good match and tentatively select one or more ARIMA
models. The general characteristics of theoretical ACFs and PACFs are as follows:-
(here ‗spike‘ represents the line at various lags in the plot with length equal to magnitude
of autocorrelations)
Model ACF PACF
AR Spikes decay towards zero Spikes cutoff to zero
MA Spikes cutoff to zero Spikes decay to zero
ARMA Spikes decay to zero Spikes decay to zero
8.3 Description of ARIMA representation
(i) ARIMA modeling (For illustration see Table 4)
In general, an ARIMA model is characterized by the notation ARIMA (p,d,q) where,p, d
and q denote orders of auto-regression, integration (differencing) and moving average
respectively. In ARIMA parlance, TS is a linear function of past actual values and
random shocks. For instance, given a TS process {yt}, a first order auto-regressive
process is denoted by ARIMA (1,0,0) or simply AR(1) and is given by
y t = + 1y t-1 + t
and a first order moving average process is denoted by ARIMA (0,0,1) or simply MA(1)
and is given by
y t = - 1 t-1 + t Alternatively, the model ultimately derived, may be a mixture of these processes and of
higher orders as well. Thus a stationary ARMA (p, q) process is defined by the equation
y t = 1y t-1+ 2y t-2+…+ p y t-p - 1 t-1- 2 t-2+…- q t-q + t where t‘s are independently and normally distributed with zero mean and constant
variance 2 for t = 1,2,...n. Note here that the values of p and q, in practice lie between
0 and 3. The degree of differencing of main variable yt will be discussed in section 7 (i)).
(ii) Seasonal ARIMA modeling
Identification of relevant models and inclusion of suitable seasonal variables are
necessary for seasonal models. The Seasonal ARIMA i.e. ARIMA (p,d,q) (P,D,Q)s
model is defined by
p (B) P (Bs)
d s
D y t = Q (B
s) q (B) t
where
M III: 1: Intervention based ARIMA Time-series Models
M III-16
p (B) = 1 - 1B-….-pB p
, q (B) = 1-1B-…-qBq
P (Bs) = 1-1 B
s-…-P B
sP , Q (B
s) = 1- 1 B
s-…-Q B
sQ
B is the backshift operator (i.e. B y t= y t-1, B 2y t = y t-2 and so on), ‘s‘ the seasonal lag
and ‗ t‘ a sequence of independent normal error variables with mean 0 and variance 2.
's and 's are respectively the seasonal and non-seasonal autoregressive parameters. 's
and 's are respectively seasonal and non-seasonal moving average parameters. p and q
are orders of non-seasonal autoregression and moving average parameters respectively
whereas P and Q are that of the seasonal auto regression and moving average
parameters respectively. Also d and D denote non-seasonal and seasonal differences
respectively.
8.4 Model building
(i) Identification
The foremost step in the process of modeling is to check for the stationarity of the
series, as the estimation procedures are available only for stationary series. There are two
kinds of stationarity, viz., stationarity in ‗mean‘ and stationarity in ‗variance‘. A cursory
look at the graph of the data and structure of autocorrelation and partial correlation
coefficients may provide clues for the presence of stationarity. Another way of checking
for stationarity is to fit a first order autoregressive model for the raw data and test
whether the coefficient ‗1‘ is less than one. If the model is found to be non-stationary,
stationarity could be achieved mostly by differencing the series. Or go for a Dickey
Fuller test (see section 8.1). This is applicable for both seasonal and non-seasonal
stationarity.
Thus, if ‗X t‘ denotes the original series, the non-seasonal difference of first order is
Y t = X t – X t-1
followed by the seasonal differencing (if needed)
Z t = Yt – Y t—s = (X t – X t-1) – (X t-s - Xt-s-1)
The next step in the identification process is to find the initial values for the orders of
seasonal and non-seasonal parameters, p, q, and P, Q. They could be obtained by looking
for significant autocorrelation and partial autocorrelation coefficients (see section 5 (iii)).
Say, if second order auto correlation coefficient is significant, then an AR (2), or MA (2)
or ARMA (2) model could be tried to start with. This is not a hard and fast rule, as
sample autocorrelation coefficients are poor estimates of population autocorrelation
coefficients. Still they can be used as initial values while the final models are achieved
after going through the stages repeatedly. Note that usually up to order 2 for p, d, or q are
sufficient for developing a good model in practice.
(ii) Estimation
M III: 1: Intervention based ARIMA Time-series Models
M III-17
At the identification stage one or more models are tentatively chosen that seem to provide statistically adequate representations of the available data. Then we attempt to obtained precise estimates of parameters of the model by least squares as advocated by Box and Jenkins. Standard computer packages like SAS, SPSS etc. are available for finding the estimates of relevant parameters using iterative procedures. The methods of estimation are not discussed here for brevity.
(iii) Diagnostics
Different models can be obtained for various combinations of AR and MA individually
and collectively. The best model is obtained with following diagnostics.
(a) Low Akaike Information Criteria (AIC)/ Bayesian Information Criteria (BIC)/
Schwarz-Bayesian Information Criteria (SBC)
AIC is given by
AIC = (-2 log L + 2 m)
where m=p+ q+ P+ Q and L is the likelihood function. Since -2 log L is approximately
equal to {n (1+log 2π) + n log σ2} where σ
2 is the model MSE, Thus AIC can be written
as
AIC={n (1+log 2π) + n log σ2 + 2 m}
and because first term in this equation is a constant, it is usually omitted while comparing
between models. As an alternative to AIC, sometimes SBC is also used which is given
by
SBC = log σ2
+ (m log n) /n.
(b) Plot of residual ACF
Once the appropriate ARIMA model has been fitted, one can examine the goodness of fit
by means of plotting the ACF of residuals of the fitted model. If most of the sample
autocorrelation coefficients of the residuals are within the limits N/96.1± where N is
the number of observations upon which the model is based then the residuals are white
noise indicating that the model is a good fit.
(c) Non-significance of auto correlations of residuals via Portmonteau tests (Q-tests
based on Chisquare statistics)-Box-Pierce or Ljung-Box texts
After tentative model has been fitted to the data, it is important to perform diagnostic checks to test the adequacy of the model and, if need be, to suggest potential improvements. One way to accomplish this is through the analysis of residuals. It has been found that it is effective to measure the overall adequacy of the chosen model by examining a quantity Q known as Box-Pierce statistic (a function of autocorrelations of residuals) whose approximate distribution is chi-square and is computed as follows:
Q = n r2 (j)
M III: 1: Intervention based ARIMA Time-series Models
M III-18
where summation extends from 1 to k with k as the maximum lag considered, n is the
number of observations in the series, r (j) is the estimated autocorrelation at lag j; k can
be any positive integer and is usually around 20. Q follows Chi-square with (k-m1)
degrees of freedom where m1 is the number of parameters estimated in the model. A
modified Q statistic is the Ljung-box statistic which is given by
Q = n(n+2) r2 (j) / (nj)
The Q Statistic is compared to critical values from chi-square distribution. If model is
correctly specified, residuals should be uncorrelated and Q should be small (the
probability value should be large). A significant value indicates that the chosen model
does not fit well.
All these stages require considerable care and work and they themselves are not
exhaustive.
Table 2: Illustration for Double Exponential Smoothing model (Holt) using
MSExcel calculations
Monthly wholesale prices (in rupees per quintal) -U.P.-Hapur-Sugar variety C-29 (24
months period 1997Apr - 1999Mar)
Forecasts with values of alpha & neta as 0.1 & 0.2
respectively
Year Month t yt h Ft bt
1997 4 1 1275 1275 5.094*
* for t=o, bt is obtained by fitting yt = a0 + b0 t
5 2 1315 1283.58 5.79
6 3 1325 1292.94 6.50
7 4 1400 1309.50 8.52
8 5 1420 1328.21 10.56
9 6 1400 1344.89 11.78
10 7 1400 1361.00 12.65
11 8 1430 1379.29 13.77
12 9 1450 1398.75 14.91
1998 1 10 1450 1417.30 15.64
2 11 1450 1434.64 15.98
3 12 1420 1447.56 15.37
4 13 1450 1461.64 15.11
5 14 1450 1474.07 14.57
6 15 1460 1485.78 14.00
7 16 1400 1489.80 12.01
8 17 1400 1491.63 9.97
9 18 1465 1497.94 9.24
10 19 1475 1503.96 8.59
11 20 1500 1511.30 8.34
12 21 1455 1513.18 7.05
M III: 1: Intervention based ARIMA Time-series Models
M III-19
1999 1 22 1515 1519.70 6.95
2 23 1465 1520.48 5.71
3 24 1400 1513.58 3.19
4 25 1460 1 1516.77
5 26 1440 2 1519.95
6 27 1515 3 1523.14
7 28 1525 4 1526.33
8 29 1530 5 1529.52
9 30 1525 6 1532.71
Table 3 (a): Illustration for Triple Exponential Smoothing model (Winters) using
MSExcel calculations
Quarterly exports (in thousands of francs) of a French company over a six year
period (Makridakis et al. 1998; page 162)
alpha beta gamma
0.1 0.1 0.2
year quarter yt lead lt bt st
1 1 1 362 362.00 17.83** 0.96*
** the initial bt value is obtained as b0 by fitting a simple linear regression of yt
on t i.e. yt = a0 + b0 t
2 2 385 385.00 18.35 1.02*
3 3 432 432.00 21.21 1.14*
4 4 341 341.00 9.99 0.87*
*Refer Table 3 (b)
2 1 5 382 355.77 10.47 0.98
2 6 409 369.74 10.82 1.04
3 7 498 386.30 11.39 1.17
4 8 387 402.16 11.84 0.89
3 1 9 473 420.67 12.51 1.02
2 10 513 439.24 13.11 1.07
3 11 582 456.80 13.56 1.19
4 12 474 476.31 14.15 0.92
4 1 13 544 494.98 14.60 1.03
2 14 582 513.12 14.96 1.08
3 15 681 532.29 15.38 1.21
4 16 557 553.63 15.98 0.94
5 1 17 628 573.36 16.35 1.05
2 18 707 596.03 16.98 1.11
3 19 773 615.42 17.22 1.22
4 20 592 632.55 17.21 0.94
6 1 21 627 644.61 16.70 1.03
2 22 725 660.73 16.64 1.10
M III: 1: Intervention based ARIMA Time-series Models
M III-20
3 23 854 679.47 16.85 1.23
4 24 661 697.24 16.94 0.94
25 ? 1 736.62
26 ? 2 813.27
27 ? 3 1000.71
28 ? 4 940.03
Table 3 (b): Illustration for ‘seasonals’ used in Triple Exponential Smoothing model
(Winters) using MSExcel calculations
yt 4MA 2x4MA yt/(2x4MA) Sea.index
year quarter
1 1 1 362 0.96
2 2 385 1.02
3 3 432 382.50 1.13 1.14
4 4 341 380.00 388.00 0.88 0.87
2 1 5 382 385.00 399.25 0.96
2 6 409 391.00 413.25 0.99
3 7 498 407.50 430.38 1.16
4 8 387 419.00 454.75 0.85
3 1 9 473 441.75 478.25 0.99
2 10 513 467.75 499.63 1.03
3 11 582 488.75 519.38 1.12
4 12 474 510.50 536.88 0.88
4 1 13 544 528.25 557.88 0.98
2 14 582 545.50 580.63 1.00
3 15 681 570.25 601.50 1.13
4 16 557 591.00 627.63 0.89
5 1 17 628 612.00 654.75 0.96
2 18 707 643.25 670.63 1.05
3 19 773 666.25 674.88 1.15
4 20 592 675.00 677.00 0.87
6 1 21 627 674.75 689.38 0.91
2 22 725 679.25 708.13 1.02
3 23 854 699.50
4 24 661 716.75
M III: 1: Intervention based ARIMA Time-series Models
M III-21
Table 4: ARIMA model upon cotton crop yield (in kg/ha) for Maharashtra state
using SPSS package
1970-71 31 1987-88 100
1971-72 69 1988-89 89
1972-73 75 1989-90 143
1973-74 77 1990-91 117
1974-75 112 1991-92 72
1975-76 57 1992-93 124
1976-77 67 1993-94 180
1977-78 93 1994-95 154
1978-79 89 1995-96 155
1979-80 111 1996-97 173
1980-81 81 1997-98 95
1981-82 92 1998-99 139
1982-83 103 1999-00 162
1983-84 52 2000-01 100
1984-85 93 2001-02 147
1985-86 123 2002-03 158
1986-87 56 2003-04 189
SPSS Output
87654321
Lag Number
1.0
0.5
0.0
-0.5
-1.0
AC
F
Lower ConfidenceLimit
Upper Confidence Limit
Coefficient
y
87654321
Lag Number
1.0
0.5
0.0
-0.5
-1.0
Part
ial A
CF
Lower ConfidenceLimit
Upper Confidence Limit
Coefficient
y
Parameter Estimates
Estimates Std Error T
Approx
Sig
Non-
Seasonal
Lags
AR1 -.613 .172 -3.558 .001
AR2 -.499 .173 -2.880 .008
Constant 3.169 2.819 1.124 .271
M III: 1: Intervention based ARIMA Time-series Models
M III-22
9. Intervention based ARIMA models
Every time series modelling approach has its own advantages and limitations. Time
series intervention models have advantages in certain situations. Sometimes, it may be
known that certain exceptional external events called ‗interventions‘ could affect the time
series under study. Under such circumstances, ‗transfer function‘ models may be used to
account for the effects of the intervention event on the series but wherein the input series
(apart from the main variable) will be in the form of a simple indicator variable to
indicate the presence or absence of the event. Here transfer function modeling refers to
accounting for the dynamic relationship between two time series Yt and Xt (the latter is
the input series) wherein past values of both series may be used in forecasting Yt, leading
to a considerable reduction in the errors of the forecast.
Intervention analysis or event study is used to assess the impact of a special event on the
time series of interest. Alternatively, intervention analysis may be undertaken to adjust
for any unusual values in the series Yt that might have resulted as a consequence of the
intervention event. This will ensure that the results of the time series analysis of the
series, such as the structure of the fitted model, estimates of model parameters, and
forecasts of future values, are not seriously distorted by the influence of these unusual
values. In agriculture, intervention may occur due to introduction of new variety, new
environmental regulations, economic policy changes, strikes, special promotion
campaigns, natural disaster etc. There are broadly three kinds of intervention viz., step,
pulse and ramp.
In its simplest form, intervention analysis itself may be regarded as a generalization of
the two-sample problem (corresponding to pre and post intervention periods) to the case
where the error or noise term is autocorrelated rather than independent. It is well known
that the usual two-sample procedures are not robust against alternatives involving
autocorrelation. Moreover, in many intervention analysis applications, time series data
may be expensive or otherwise difficult to collect. In such cases, ‗power functions‘ are
helpful, because they can be used to determine the probability that a proposed
intervention analysis application will detect a meaningful change. Power is the statistical
term used for the probability that a test will reject the null hypothesis of no change at a
given level of significance for a prescribed change. McLeod and Vingilis (2005) have
suggested power computation methods for use with time series analyses for the certain
cases of intervention analysis. They have also showed that power function also helps to
compute the sample size required for intervention analysis.
9.1 Time Series Intervention Model:
Intervention model is a model in time series which is used to explore the impact on the
series from external factors. Suppose that Yt is a time series the ARIMA (p, d, q) model
is written as:
1 1 1 1 1...d d d
t t p t t q t q tY Y Y
where
M III: 1: Intervention based ARIMA Time-series Models
M III-23
=autoregressive parameter
=moving average parameter
d=degree of differencing
B=backshift operator
t =white noise
Let 2
1 2( ) (1 ... )p
pB B B B
2
1 2( ) (1 ... )q
qB B B B
Now the ARIMA model can be written as-
( )
( )
d
t t
BY
B
Sometimes the time series is depends on season i.e. Yt depends Yt-s, Yt-2s etc. where s is
the length of periodicity. Linearly this relation may be represented as-
1.... ...D
t s t s Ps t Ps s t Qs t Q tY Y Y
where
=seasonal autoregressive parameter
= seasonal moving average parameter
D= degree of seasonal differencing i.e.
Let 2
1 2( ) (1 ... )S S S PS
PB B B B
2
1 2( ) (1 ... )Q
QB B B B
D=degree of seasonal differencing
Now the model can be written as-
( )
( )
D
s t t
BY
B
The ‗seasonal non-seasonal multiplicative‘ model with the above notation can be written
as-
( ) ( )
( ) ( )
sd D
s t ts
B BY
B B
The time series input-output model is of the form
1 1 0 0 1 0... ...t t r t r t b t b t b s tY Y Y X X X N
where,
Xt=exogenous variable
b=delay parameter
Nt=error term
Let
1( ) (1 ... )r
rB B B
M III: 1: Intervention based ARIMA Time-series Models
M III-24
0 1( ) ( ... )s
sB B B
Then the input-output model can be written as
( )
( )
b
t t t
BY B X N
B
When Nt is generated as an ARIMA process then the input-output model is known as
transfer function model. The ARIMA model can be extended to Seasonal ARIMA model
in a straightforward manner as shown above.
Intervention models are special cases of transfer function modeling in which the
exogenous variable is a deterministic categorical variable. Accordingly, an intervention
model with Seasonal ARIMA process can be written as
( ) ( ) ( )
( ) ( ) ( )
sb
t t ts
B B BY B I
B B B
where
It=dummy variable
The term ( )
( )
bBB
B
is called the intervention component, and the model may be extended
to include several intervention components and thereby to account for several types of
interventions that influence the process.
9.2. Input variables:
In general, there are three types of functions in the intervention variable. The intervention
type of step function starts from a given time till the last time period. Mathematically, the
intervention type of step function is written as:
0
1tI
t T
t T
with T is time of intervention when it first occurred.
The pulse function is the intervention type occurs only particular period of time.
Mathematically, the intervention type of pulse function is usually written as:
0
1tI
t T
t T
Apart from pulse and step functions, there is another kind of function known as ramp
function. Mathematically, the intervention type of ramp function is usually written as:
0tI
t T
t T
t T
To fix ideas, the illustration is given with the help of pulse function.
M III: 1: Intervention based ARIMA Time-series Models
M III-25
Indicator coding for pulse, step and ramp functions are given in Table 3.1.
Table 3.1: Values of Intervention variable under different functio (when it occurred
in the 6th
time point)
Time t Pulse function It Step function It Ramp function It
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 1 1 1
7 0 1 2
8 0 1 3
9 0 1 4
10 0 1 5
9.3 Graphical representation of input and various types of outputs
When the input variable takes the form of step and pulse functions, their forms can be
graphically represented as shown below in Fig-9.1 and Fig-9.2 respectively.
Fig-9.1 Fig-9.2
Output
( )T
tBS ( )1
1
T
t
BP
B
Fig-9.3 Fig-9.4
In Fig-9.3, when an input step intervention has occurred, then the output pattern can be
described by single parameter ω and the response is abrupt-permanent and in Fig-9.4,
when a pulse intervention has occurred and the output pattern can be described by two
parameters ω1 and δ and the response due to intervention is abrupt-temporary.
M III: 1: Intervention based ARIMA Time-series Models
M III-26
( )
1
T
t
BS
B
(' )1 2
1 1
T
t
B BP
B B
Fig-9.5 Fig-9.6
In Fig-9.5 step intervention has occurred and the response type is gradual permanent and
in Fig-9.6 pulse intervention has occurred and the response of the input can be described
by two intervention components, one component for explaining the abrupt temporary
effect and other explaining the permanent effect.
( )
1
T
t
BS
B
(' )1 2
01 1
T
t
B BP
B B
Fig-9.7 Fig-9.8
In Fig-9.7 step intervention has occurred and the response type is gradual increasing and
in Fig-9.8 pulse intervention has occurred and the response type can be explained by
three different intervention components.
9.4 Intervention model of null order pulse function:
The Intervention Type Null Order Pulse Function can be written as:
t t tY I N
with
Yt = Response variable at t
ω = Intervention effect on Yt
It = Intervention variable
Nt = ARIMA in pre- intervention data period
M III: 1: Intervention based ARIMA Time-series Models
M III-27
In the above equation, the effect of I on Y is assumed to have an intervention element.
The estimated value of ω can be used to estimate the difference between the two periods
before and after the occurrence of intervention. In general, the effect of I on Y are
categorized as temporary, gradual, permanent or after delay in certain time.
9.5 Intervention model of first order pulse function:
Null order pulse function occurs when the δ parameter is zero. An alternative approach
which accommodates another sort of such an effect as gradual is denoted as first order
pulse function. If the intervention component is denoted as
*
t t tY Y N
then an additional parameter is needed to define f(It) as follows
* ( )1
t t tY f I IB
such that 1 1
We have * *
1t t tY Y I
since * *
1 2 1t t tY Y I and 1 , and so on recursively, the equation becomes:
*
0
j
t t j
j
Y I
If this equation is applied for k (k = 0, 1, 2 …) periods after intervention, the equation
below can be obtained: * 2 1
1 2 ( 1)( .... ...)K K
T K T K T K T K T K K T K KY I I I I
(0 0 0 ... 0 1 0 0 ...)K
K
( a)
M III: 1: Intervention based ARIMA Time-series Models
M III-28
(b)
Fig-9.9 Intervention response with single pulse occurred in t = T
Equation (ix) means that the effect from pulse will vanish gradually corresponding to
geometric sequence which is determined by δ value. Figure 1(a) shows *
tY value for
model with value ω = 1, Fig. 1(b) for value ω = -1 and single pulse occurred in t = T for
some δ (delta) value with 1 and 0 . From Fig. 1, it can also be seen that Yt*
value
approaches to zero asymptotically. In this case, as δ approaches to 0, then the effect of I
to Y will be for a minimum period. On the other hand, as δ approaches to 1, the effect of
I will last longer. For the extreme case of δ = 1, we can get the permanent effect of I to Y
which is shown in Fig. 9.9.
9.6 Procedure for intervention model building
The intervention response which is written as Yt
* in essence are values of the differences
(errors) between the original data after the intervention period and the corresponding
ARIMA model forecasts of the same fitted on the pre-intervention data period.
The time series data Yt are divided into Data I data time series period before the
intervention occurred, Y1t and Data II-data time series period after the intervention
occurred, Y2t:
Step I: The forming of ARIMA model for data time series period before intervention,
Y1t. The model building with respect to identification of autoregressive and moving
average parameters will be done in the usual way based on the Autocorrelation Function
(ACF) and the partial autocorrelation function (PACF).
Step II: In order to identification of the order of b, r and s cross-correlation function is
used in case of transfer function model but in case of intervention model cross-correlation
function cannot be used because the intervention variable are not continuous, cross
correlations between the intervention variable and the dependent variable are meaningless
M III: 1: Intervention based ARIMA Time-series Models
M III-29
the only way to identify the intervention model is impulse response function which is
given below-
( )
( )
B
B
Impulse response
0
0 1B
0
1 B
0 1
1
B
B
Step III: The parameter estimates of intervention model will be tested by using statistical
tests (such as t or z-tests, the latter will be elaborated in a subsequent section). Model
suitability will be diagnosed through residual assumption test for adherence to white
noise.
Bibliography
M III: 1: Intervention based ARIMA Time-series Models
M III-30
Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994). Time series analysis : Forecasting
and control, Pearson Education, Delhi.
Croxton, F.E., Cowden, D.J. and Klein, S.(1979). Applied General Statistics, New Delhi:
Prentice Hall of India Pvt. Ltd.
Makridakis, S., Wheelwright, S.C. and Hyndman, R.J. (1998). Forecasting Methods and
Applications, 3rd
Edition, John Wiley, New York.
Pankratz, A. (1983). Forecasting with univariate Box-Jenkins models: concepts and