1 Ka-fu Wong University of Hong Kong Characterizing the Cyclical Component
Jan 22, 2016
1
Ka-fu WongUniversity of Hong Kong
Characterizing the Cyclical Component
2
Unobserved components model of time series
According to the unobserved components model of a time series, the series yt has three components
yt = Tt + St + Ct
Time trend
Seasonal component
Cyclical component
3
Cyclical component so far
Assuming that the trend function is a polynomial and the seasonal component can be formulated in terms of seasonal dummies,
yt = β0 + β1t + … + βptp + 2D2t … + sDs,t + εt
Once we have estimated the trend and seasonal components (by a regression of yt on 1, t,…,tp,D2t,…,Dst) the estimated cyclical component is simply that part of yt that is not part of the trend or seasonal, i.e.,
ststp
ptt DDttyC ˆ...ˆˆ...ˆˆˆ2210
With respect to your time series (which are seasonally adjusted) and my time series (which is annual), the estimated cyclical component is simply the deviation of the series from the estimated trend line (curve).
4
Possible to predict the cyclical components
So far we have been assuming that the cyclical component is made up of the outcomes of drawings of i.i.d. random variables, which is simple but not very plausible.
Generally, there is some predictability in the cyclical component.
5
Cyclical movement
6
Possible to predict the cyclical components
We would like to model the cyclical component in a way that can then be applied to help us forecast that component.
The cyclical component of a time series is, in contrast to our assumptions about the trend and seasonal components, a “stochastic” (vs. “deterministic”) process. That is, its time path is, to some extent, fundamentally unpredictable. Our model of that component must take this into account.
It turns out the cyclical component of a time series can be modeled as a covariance stationary time series.
7
The underlying probabilistic structure
Consider our data sample, y1,…,yT
Imagine that these observed y’s are the outcomes of drawings of random variables.
In fact, we imagine that this data sample or sample path, is just part of a sequence of drawings of random variables that goes back infinitely far into the past and infinitely far forward into the future. In other words, “nature” has drawn values of yt for t = 0, + 1, +2,… :
{…,y-2,y-1,y0,y1,y2,…}
This entire set of drawings is called a realization of the time series. The data we observe, {y1,…,yT} is part of a realization of the time series.
(This is a model. It is not supposed to be interpreted literally!)
8
The underlying probabilistic structure
We want to describe the underlying probabilistic structure that generated this realization and, more important, the probabilistic structure that governs that part of the realization that extends beyond the end of the sample period (since that is part of the realization that we want to forecast).
Based on the sample path, we can estimate the probabilistic structure that
generated the sample If such key elements of this probabilistic structure
remain fixed over time, we can infer about the probabilistic structure that will generate future values of the series.
9
Example: Stable Probabilistic structure
Suppose the time series is a sequence of 0’s and 1’s corresponding to the outcomes of a sequence of coin tosses, with H = 0 and T = 1: {…0,0,1,0,1,1,1,0,…}
What is the probability that at time T+1 the value of the series will be equal to 0?
If the same coin is being tossed for every t, then the probability of tossing an H at time T+1 is the same as the probability of tossing an H at times 1,2,…,T. What is that probability? A good estimate would be the number of H’s observed at
times 1,…,T divided by T. (By assuming that future probabilities are the same as past probabilities we are able to use the sample information to draw inferences about those probabilities.)
If a different coin will be tossed at T+1 than the one that was tossed in the past, our data sample will be of no help in estimating the probability of an H at T+1. All we can do is make a blind guess!
10
Covariance stationarity
Covariance stationarity refers to a set of restrictions/conditions on the underlying probability structure of a time series that has proven to be especially valuable for the purpose of forecasting.
A time series, yt, is said to be covariance stationary if it meets the following conditions –1. Constant mean2. Constant (and finite) variance3. Stable autocovariance function
11
1. Constant mean
Eyt = μ {vs. μt} for all t. That is, for each t, yt is drawn from a population with the same mean.
Consider, for example, a sequence of coin tosses and set yt = 0 if H at t and yt = 1 if T at t. If Prob(T)=p for all t, then Eyt = p for all t.
Caution – the conditions that define covariance stationarity refer to the underlying probability distribution that generated the data sample rather than to the sample itself. However, the best we can do to assess whether these conditions hold is to look at the sample and consider the plausibility that this sample was drawn from a stationary time series.
12
2. Constant Variance
Var(yt) = E[(yt- μ)2] = σ2 {vs. σt2} for all t.
That is, the dispersion of the value of yt around its mean is constant over time.
A sample in which the dispersion of the data around the sample mean seems to be increasing or decreasing over time is not likely to have been drawn from a time series with a constant variance.
Consider, for example, a sequence of coin tosses and set yt = 0 if H at t and yt = 1 if T at t. If Prob(T)=p for all t, then, for all t
Eyt = p and
Var(yt) = E[(yt – p)2] = p(1-p)2+(1-p)p2 = p(1-p).
13
Digression on covariance and correlation
Recall that Cov(X,Y) = E[(X-EX)(Y-EY)], and Corr(X,Y) = Cov(X,Y)/[Var(X)Var(Y)]1/2
measure the relationship between the random variables X and Y.
A positive covariance means that when X > EX, Y will tend to be greater than EY (and vice versa).
A negative covariance means that when X > EX, Y will tend to be less than EY (and vice versa).
14
Digression on covariance and correlation
Cov(X,Y) = E[(X-EX)(Y-EY)], and Corr(X,Y) = Cov(X,Y)/[Var(X)Var(Y)]1/2
The correlation between X and Y will have the same sign as the covariance but its value will lie between -1 and 1. The stronger the relationship between X and Y, the closer their correlation will be to 1 (or, in the case of negative correlation, -1).
If the correlation is 1, X and Y are perfectly positively correlated. If the correlation is -1, X and Y are perfectly negatively correlated. X and Y are uncorrelated if the correlation is 0.
Remark: Independent random variables are uncorrelated. Uncorrelated random variables are not necessarily independent.
15
3. Stable autocovariance function
The autocovariance function of a time series refers to covariances of the form:
Cov(yt,ys) = E[(yt - Eyt)( ys – Eys)]
i.e., the covariance between the drawings of yt and ys.
Note that Cov(yt,yt) = Var(yt) and Cov(yt,ys) = Cov(ys,yt)
For instance, the autocovariance at displacement 1 Cov(yt,yt-1) measures the relationship between yt and yt-1.
We expect that Cov(yt,yt-1) > 0 for most economic time series: if an economic time series is greater than normal in one period it is likely to be above normal in the subsequent period – economic time series tend to display positive first order correlation.
16
3. Stable autocovariance function
In the coin toss example (yt = 0 if H, yt = 1 if T), what is
Cov(yt,yt-1)?
17
3. Stable autocovariance function
Suppose that
Cov(yt,yt-1) = (1) for all t
where (1) is some constant.
That is, the autocovariance at displacement 1 is the same for all t:
…Cov(y2,y1)=Cov(y3,y2)=…=Cov(yT,yT-1)=…
In this special case, we might also say that the autocovariance at displacement 1 is stable over time.
18
3. Stable autocovariance function
The third condition for covariance stationarity is that the autocovariance function is stable at all displacements.That is –
Cov(yt,yt-) = () for all integers t and
The covariance between yt and ys depends on t and s only through t-s (how far apart they are in time) not on t and s themselves (where they are in time).
Cov(y1,y3)=Cov(y2,y4) =…=Cov(yT-2,yT) = (2)and so on.
Stationarity means if we break the entire time series up into different segments, the general behavior of the series looks roughly the same for each segment.
19
3. Stable autocovariance function
Remark #1: Note that stability of the autocovariance function (condition 3) actually implies a constant variance (condition 2) {set = 0.}
Remark #2: () = (-) since () = Cov(yt,yt-) = Cov(yt-,yt) = (-)
Remark #3: If yt is an i.i.d. sequence of random variables then it is a covariance stationary time series.{The “identical distribution” means that the mean and variance are the same for all t. The independence assumption means that () = 0 for all nonzero , which implies a stable autocovariance function.}
20
3. Stable autocovariance function
The conditions for covariance stationary are the main conditions we will need regarding the stability of the probability structure generating the time series in order to be able to use the past to help us predict the future.
Remark: It is important to note that these conditions do not imply that the y’s are identically distributed (or,
independent). do not place restrictions on third, fourth, and higher
order moments (skewness, kurtosis,…).
[Covariance stationarity only restricts the first two moments and so it also referred to as “second-order stationarity.”]
21
Autocorrelation Function
The stationary time series yt has, by definition, a stable autocovariance function
Cov(yt,yt-) = (), for all integers t and
Since for any pair of random variables X and YCorr(X,Y) = Cov(X,Y)/[Var(X)Var(Y)]1/2
and since Var(yt) = Var(yt-) = (0)
it follows thatCorr(yt,yt-) = ()/(0) ≡ ρ() for all integer t and τ.
We call ρ() the autocorrelation function.
22
Autocorrelation Function
The autocorrelation and autocovariance functions provide the same information about the relationship of the y’s across time, but in different ways. It turns out that the autocorrelation function is more useful in applications of this theory, such as ours.
The way we will use the autocorrelation function – Each member of the class of candidate models will have
a unique autocorrelation function that will identify it, much like a fingerprint or DNA uniquely identifies an individual in a population of humans.
We will estimate the autocorrelation function of the series (i.e., get a partial fingerprint) and use that to help us select the appropriate model.
23
Example of autocorrelation function
One-sided gradual damping
24
Example of autocorrelation function
Non-damping
25
Example of autocorrelation function
Gradual damped oscillation
26
Example of autocorrelation function
Sharp cut off
27
The Partial Autocorrelation Function
Consider (2), Corr(yt,yt-2). This correlation can be thought of as having two parts. First, yt-2 is correlated with yt-1 which, in turn, is
correlated with yt, the indirect effect of yt-2 on yt through yt-1.
Second, yt-2 has its own effect direct effect on yt.
The partial autocorrelation function expresses the correlation between yt and yt- after controlling the effects of the correlations between yt and yt-1,…,yt-+1. p() is the coefficient of yt- in a population linear
regression of yt on yt-1, yt-2,…,yt-.
28
White Noise
29
White Noise Processes
Recall that covariance stationary processes are time series, yt, such E(yt) = μ for all t Var(yt) = σ2 for all t, σ2 < ∞ Cov(yt,yt-) = γ() for all t and
An example of a covariance stationary process is an i.i.d. sequence. The “identical distribution” property means that the series has a constant mean and a constant variance. The “independent” property means that (1)=(2)=… = 0. [(0)=σ2]
30
White Noise Processes
A time series yt is a white noise process if: E(yt) = 0 for all t Var(yt) = σ2 for all t, σ2 < ∞ Cov(yt,ys) = 0 if t ≠ s
That is, a white noise process is a serially uncorrelated, zero-mean, constant and finite variance process. In this case we often write yt ~ WN(0,σ2)
If yt ~ WN(0,σ2) then () = σ2 if = 0,
= 0 if ≠ 0 ρ() = 1 if = 0
= 0 if ≠ 0
31
Autocorrelation of White Noise
ρ() ≡ ()/(0)
32
Partial autocorrelation of White Noise
p() is the coefficient of yt- in a population linear regression of yt on yt-1, yt-2,…,yt-.
33
White Noise Processes
Remark #1: Technically, an i.i.d. process (with a finite variance) is a white noise process but a white noise process is not necessarily an i.i.d. process (since the y’s are not necessarily identically distributed or independent). However, for most of our purposes we will use these two interchangeably.
Remark #2: In regression models we often assume that the regression errors are zero-mean, homoskedastic, and serially uncorrelated random variables, i.e., they are white noise errors.
34
White Noise Processes
In our study of the trend and seasonal components, we assumed that the cyclical component of the series was a white noise process and, therefore, was unpredictable. That was a convenient assumption at the time.
We now want to allow the cyclical component to be a serially correlated process, since our graphs of the deviations of our seasonally adjusted series from their estimated trends indicate that these deviations do not behave like white noise.
However, the white noise process will still be very important to us: It forms the basic building block for the construction of more complicated time series.
35
Estimating the Autocorrelation Function
Soon we will specify the class of models that we will use for covariance stationary processes. These models (ARMA and ARIMA models) are built up from the white noise process. We will use the estimated autocorrelation and partial autocorrelation functions of the series to help us select the particular model that we will estimate to help us forecast the series.
How to estimate the autocorrelation function?
The principle we use is referred to in your textbook as the analog principle. The analog principle, which turns out to have a sound basis in statistical theory, is to estimate population moments by the analogous sample moment, i.e., replace expected values with analogous sample averages.
36
The analog principle
For instance, since the yt’s are assumed to be drawn from a distribution with the same mean, μ, the analog principle directs us to use the sample mean to estimate the population mean:
Similarly, to estimate σ2 = Var(yt) = E[(yt-μ)2], the analog principle directs us to replace the expected value with the sample average, i.e.,
T
ttyT 1
1̂
2
1
2 )ˆ(1
ˆ
T
ttyT
37
Estimating autocorrrelation function
The autocorrelation function at displacement is
])[(
)])([()(
2
t
tt
yE
yyE
The analog principle directs us to estimate ρ() by using its sample analog:
T
tt
T
ttt
yT
yyT
1
2
1
)ˆ(1
)ˆ)(ˆ[(1
)(ˆ
T
tt
T
ttt
y
yy
1
2
1
)ˆ(
)ˆ)(ˆ[(
the sample autocorrelation function or the correlogram of the series.
38
Remarks
Remark #1: The autocorrelation function and sample autocorrelation function will always be equal to one for = 0. For ≠ 0, the absolute values of the autocorrelations will be less than one.
Remark #2: The summation in the numerator of the sample autocorrelation function begin with t = + 1 (rather than t = 1). {Why? Consider, e.g., the sample autocorrelation at displacement 1. If it started at t = 1, what would we use for y0, since our sample begins at y1?}
Remark #3: The summation in the numerator of the -th sample autocorrelation coefficient is the sum of T- terms, but we divide the sum by T to compute the “average”. This is partly justified by statistical theory and partly a matter of convenience. For large values of T- whether we divide by T or T- will have no practical effect.
])[(
)])([()(
2
t
tt
yE
yyE
39
The Sample Autocorrelation Function for a White Noise Process
Suppose that yt is a white noise process (i.e., yt is a zero-mean, constant variance, and serially uncorrelated process). We know that the population autocorrelation function, ρ(), will be zero for all nonzero . What will the sample autocorrelation function look like?
For large samples,
)/1,0(~)(ˆ TN
)1,0(~)(ˆ NT
40
Testing ρ()=0
Suppose, for large samples,
)/1,0(~)(ˆ TN
)1,0(~)(ˆ NT
When will we reject the null that ρ()=0?
When the sample estimate of ρ() appears too far from 0.
)(ˆ )(ˆ T
41
The Sample Autocorrelation Function for a White Noise Process
This result means that if yt is a white noise process then for 95% of the realizations of this time series should lie in the interval
for any given τ.
That is, for a white noise process, 95% of the time will lie within the two-standard error band around 0:
)1,0(~)(ˆ NT
{The “2” comes in because it is approximately the 97.5 percentile of the N(0,1) distribution; the square root of T comes in because it is the standard deviation of rho-hat.}
)(ˆ ]/2,/2[ TT
)(ˆ
]/2,/2[ TT
)/1,0(~)(ˆ TN
42
Testing White noise
This result allows us to check whether a particular displacement has a statistically significant sample autocorrelation.
For example, if then we would likely conclude that the evidence of first-order autocorrelation appears to be too strong for the series to be a white noise series.
)1,0(~)(ˆ NT
T/2)1(ˆ
)/1,0(~)(ˆ TN
43
Testing white noise
However, it is not reasonable to say that you will reject the white noise hypothesis if any of the rho-hats falls outside of the two-standard error band around zero.
Why not? Because even if the series is a white noise series, we expect some of the sample autocorrelations to fall outside that band – the band was constructed so that most of the sample autocorrelations would typically fall within the band if the time series is a realization of a white noise process.
A better way to conduct a general test of the null hypothesis of a zero autocorrelation function (i.e., white noise) against the alternative of a nonzero autocorrelation function is to conduct a Q-test.
44
Testing white noise
Under the null hypothesis that yt is a white noise process, the Box-Pierce Q-statistic
for large T.
So, we reject the joint hypothesis H0: ρ(1)=0,…,ρ(m)=0
against the alternative that at least one of ρ(1),…,ρ(m) is nonzero at the 5% (10%, 1%) test size if QBP is greater than the 95th percentile (90th percentile, 99th percentile) of the 2(m) distribution.
)(~)(ˆ 2
1
2 mTQm
BP
)(~)(ˆ 2
1
2 mTQm
BP
45
How to choose m?
Suppose, for example, we set m = 10 and it turns out that there is autocorrelation in the series but it is primarily due to autocorrelation at displacements larger than 10. Then, we are likely to incorrectly conclude that our series is the realization of a white noise process.
On the other hand, if, for example we set m = 50 and there is autocorrelation in the series but only because of autocorrelation at displacements 1,2, and 3, we are likely to incorrectly conclude that our series is a white noise process.
So, selecting m is a balance of two competing concerns. Practice has suggested that a reasonable rule of thumb is to select
m = T1/2.
)(~)(ˆ 2
1
2 mTQm
BP
46
The Ljung-Box Q-Statistic
The Box-Pierce Q-Statistic has a 2(m) distribution provided that the sample is sufficiently large.
It turns out that this approximation does not work very well for “moderate” sample sizes.
The Ljung-Box Q-Statistic makes an adjustment to the B-P statistic to make it work better in finite sample settings without affecting its performance in large samples.
The Q-statistic reported in EViews is the Ljung-Box statistic.
)(~)(ˆ 2
1
2 mTQm
BP
47
Estimating the Partial Autocorrelation Function
The partial a.c. function p(1), p(2),… is estimated through a sequence of “autoregressions” - p(1):
Regress yt on 1, yt-1:
p(2):Regress yt on 1, yt-1, yt-2 :
and so on.
10ˆ,ˆ
1ˆ)1(ˆ p
210ˆ,ˆ,ˆ
2ˆ)2(ˆ p
48
End