Seasonal Unit Root Tests in Long Periodicity Cases David A. Dickey North Carolina State University U v Tilburg, Tinbergen Inst. 2011
Jan 11, 2016
Seasonal Unit Root Tests in Long
Periodicity Cases
David A. DickeyNorth Carolina State University
U v Tilburg, Tinbergen Inst. 2011
Some models:
1. Regression with Time Series Errors Y(t) = a + bt + seasonal effects + Z(t), Z(t) a stationary time series
Seasonal effects: Sinusoids,Seasonal dummy variables
2. Dynamic Seasonal ModelsY(t) = Y(t-d) + e(t) copy of last season Y(t) = Y(t-d) + e(t) – e(t-d) EWMA of past seasonsY(t) = Y(t-1) + [Y(t-d)-Y(t-d-1)] + Z(t)
Z(t) = e(t) “cut and paste” Z(t) = e(t) – e(t-1) – e(t-d) + e(t-d-1)
“airline” Z(t) = (1-B)(1-Bd) e(t)
Y(t) = Y(t-1) + [Y(t-d)-Y(t-d-1)] (+ e(t))
Y(t) = 10 + t + 8X3 – 8X5 -5X8 – 5X9 – 5X10 (+ e(t))
Summary:
1.Both models can give same predictions for pure trend + seasonal functions.
2.For data, lag model looks back 1 year and ignores (or discounts) others. Good for slowly changing seasonality.
3.For data, dummy variable model weights all years equally. Good for very regular seasonality. 4. Differences in forecast errors too!
Natural gas-a colorless, odorless, gaseous hydrocarbon-may be stored in a number of different ways. It is most commonly held in inventory underground under pressure in three types of facilities. These are: (1) depleted reservoirs in oil and/or gas fields, (2) aquifers, and (3) salt cavern formations. (Natural gas is also stored in liquid form in above-ground tanks).
Weekly natural gas data – unit root forecast
Weekly natural gas data – seasonal dummy variable forecast
A general seasonal model:
Yt –f(t) = Yt-d –f(t-d)) + et
(1-d)(Yt –f(t)) = et
f(t) = deterministic components
H0:
Under H0, period d functions annihilated.
ˆ( )m d
1 1 2 2( 1) 1 ( 1) ( 1) 1
1 1 1 1
(1/ ) ( ) / [ ( )]d m d m
d i s d i s d i ss i s i
d m Y e d m Y
Y1=e1 (Y1,1)
Y2=e2
(Y1,2)Y3=e3
(Y1,3)Y4=e4
(Y1,4)
Y5=e5+e1
(Y2,1)Y6=e6+e2
(Y2,2)Y7=e7+e3
(Y2,3)Y8=e8+e4
(Y2,4)
Use double subscripts: Quarterly (d=4) Example:
Numerator is (sum of d terms)/d1/2
Denominator is (sum of d terms)/d
Known unit root facts:
(1) Moments (d=1 case or individual terms), error variance 1
E{numerator} = 0 Var{numerator} = E{denominator} = (m-1)/(2m)1/2 Var{denominator} = (m-1)(m2-m+1)/(3m3)1/3 Covariance = (m-1)(m-2)/(3m2) 1/3
(2) Studentized statistic asymptotically equivalent to (numerator sum) / (denominator sum)1/2
Basic idea is simple:
Large d numerator approximately normal
Large d denominator converges to E{denominator}
1( 1) 1 ( 1)
1 1
1(1/ ) (0, )
2
d mD
d i s d i ss i
d m Y e N
1 2 2( 1) 1
1 1
1[ ] /
2
d mP
d i ss i
d Y m
ˆ( ) (0,2)Dratio m d N
1( 1) 1 ( 1)
1 1
1(1/ ) (0, )
2
d mD
d i s d i ss i
d m Y e N
1 2 2( 1) 1
1 1
1[ ] /
2
d mP
d i ss i
d Y m
:
/ (0,1)D
t statistic
numerator denominator N
1 2( 1) 1 ( 1)
1 1
2
1(1/ ) ~ ( ) /
2
( 1)
d m
d i s d i s ds i
d m Y e d d
for
( 1) ( 1)
1 2 2( 1) 1 ( 1)
1 1
1[ ]
2 d m s d i s
m m
d i s d i si i
m Y e Y em
Alternative approximation:
( ,1, 1)d N Y
( ,2, 1)d N Y
( ,4, 1)d N Y
d=4 and N(0,1)
-1.645 0 1.645
CDFs
(SAS)
-2.386 0 2.386
d=4 md1/2(-1) and N(0,2)
CDFs
Improving the Normal Approximation:
Older JASA paper (Dickey, Hasza, Fuller) gives limitdistribution for studentized statistic (d=12)
5th %ile = -1.8095th %ile = 1.5250th %ile: -0.14 (Note: (1.52-1.80)/2 = -0.14 !!)
Difference: 1.52+1.80 = 3.32, 2(1.645) = 3.29 (close !!) Suggestion: shift by median
CLT limit distribution median is 0.
2
:
1N numerator terms, mean ~ ( )
2D denominator terms, mean ~ ?(1/ 2,1/ 2 )
( , ) 1/ (3 ) (approx.) [Dickey, 1976]
i d
i
Taylor
N dd
D d
Cov N D d
/ 0 2 0 2 ( 1/ 2) + remainderdN D dN dN D
{ / } 0 0 2 cov( , ) = ( 2 / 3 )
0
.
could
4714 /
use 1 2 ) /(
E dN D d
d
N D d d
Median as function of seasonality d: 1. Get medians for d=2, 4, 12 from DHF 2. Plot median vs. d-1/2 (d=2,4,12,limit)
Median as function of seasonality d:
Regress median on d-1/2
Slope very close to ½, Intercept very close to 0.
Median Shifts and Tau Percentiles.
d med -1/(2 ) p01 p025 p05 p10
d 2 -0.35 -0.35355 -2.67990 -2.31352 -1.99841 -1.63510
4 -0.24 -0.25000 -2.57635 -2.20996 -1.89485 -1.53155
12 -0.14 -0.14434 -2.47069 -2.10430 -1.78919 -1.42589
inf 0.00 0 -2.32685 -1.96046 -1.64535 -1.28205
Simulation Evidence
• m= 100, various d values• 2 sets of 40,000 t statistics at each (m,d)• e.g. d=365 and m=100, (daily data 100 years)
– 36500x40000 = 1.46 billion generated data points.– SAS, 10 minutes run time– Overlay percentiles (adjusted t) on N(0,1)– Duplicates almost exactly the same.
Simulation Evidence - Detrending
• m= 20, d =4, 6, 12, 24, 52, 96, 168, 365• 96 quarter hours/day, 168 hours/week• Detrending:
– None– Constant, linear, quadratic– Period d sinusoids (fundamental & harmonic)
• 3 sets of 20,000 t statistics at each (m,d).
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
20 years of weekly data, 20,000 simulated series TAU
Standard tau percentiles for various adjustments
Three replicates per d value
Conclusions:
Spread between percentiles about constant (and close to N(0,1) spread)
Medians smooth function of 1/sqrt(d)
Degree of detrending matters
Cubic smoothing regression plotted with raw medians.
Focus on Medians:
Focus on Medians:
Claim: As d infinity, Tau N(0,1) for all of these forms of detrendingSeasonal random walk Z, data Y. Y = X + Z Detrend by OLS:
1 1( ( ' ) ') ( ( ' ) ')R PY I X X X X Y I X X X X Z Seasonal Random Walk has d “channels” of m values
Denominator is sum of d quadratic formsWithout detrending each has eigenvalues
1
2 24 sin ( )2(2 1)
O mm
1( ' ) 'X X X Xcan be written as
'T T'T T I
k = rank of X matrixMiddle matrix is diagonal.Projection =>
k diagonal entries 1 rest 0Denominator quadratic form contains
'T T
1' ( ' ) ' ' 'Z X X X X Z Z T T Z
k times maximum eigenvalue = O(km2)Upper probability bound on unnormalized quadratic form.Normalization is m2d so k/d0 suffices for
no limit effect of detrending.Same for numerator, estimator, tau statistic.
2(1/ 3 / 2) /k dBased on Taylor series (for large m) adjustment is
for regression adjustments with k columns selected from intercept and Fourier sines and cosines.
Allowing for augmenting terms, as in seasonal multiplicative model, follows the same proof as in DHF.
Natural gas data: Procedure
(1) Compute residuals (trend + harmonics)(2) AR(2) fit to span 52 differences of residuals(3) Filter with AR(2)
Ft = filtered seriesWt = span 52 differences Ft – Ft-52
(4) Regress Wt on Ft-52 Wt-1 Wt-2
The REG Procedure
Dependent Variable: Diff
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 718362 239454 231.53 <.0001
Error 679 702233 1034.21632
Corrected Total 682 1420595
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -0.68125 1.23111 -0.55 0.5802
L52FY 1 -0.99746 0.03800 -26.25 <.0001
Diff1 1 0.01417 0.00777 1.82 0.0686
Diff2 1 -0.01152 0.00730 -1.58 0.1151
Follow up:
Lag 52 coefficient near -1 suggests 52-1 near -1
Perhaps no lag correlation in the presence of sinusoids
Fit ARIMAX model as a check (AR(2), no seasonal lag): Standard ApproxParameter Estimate Error t Value Pr > |t| Lag Variable
MU 727.58194 684.44164 1.06 0.2878 0 total AR1,1 1.37442 0.03379 40.67 <.0001 1 total AR1,2 -0.38964 0.03381 -11.53 <.0001 2 total NUM1 0.09520 0.04525 2.10 0.0354 0 date NUM2 -883.25146 23.18237 -38.10 <.0001 0 s1 NUM3 240.92573 23.05715 10.45 <.0001 0 c1 NUM4 -133.27021 11.51098 -11.58 <.0001 0 s2 NUM5 122.42419 11.53277 10.62 <.0001 0 c2
Lack of fit? Box-Ljung test on residuals
Autocorrelation Check of Residuals
To Chi- Pr >Lag Square DF ChiSq -------------Autocorrelations------------ 6 1.40 4 0.8449 0.008 -0.012 0.001 -0.000 -0.023 0.033 12 18.66 10 0.0448 -0.086 0.034 0.089 -0.009 0.017 0.077 18 23.67 16 0.0970 0.022 0.002 0.025 0.012 0.047 0.055 24 26.61 22 0.2263 -0.014 -0.037 0.022 -0.027 -0.028 -0.017 30 29.61 28 0.3821 0.010 0.036 0.042 -0.012 -0.021 0.012 36 33.03 34 0.5150 0.001 0.030 -0.027 -0.031 0.042 -0.010 42 46.84 40 0.2122 -0.026 -0.081 -0.035 -0.034 0.078 -0.042 48 51.65 46 0.2625 0.011 0.042 -0.044 -0.027 0.036 0.014 54 65.50 52 0.0989 -0.055 0.037 -0.024 -0.008 0.085 -0.070 60 75.05 58 0.0654 -0.096 0.023 -0.027 -0.002 -0.029 0.022 66 80.14 64 0.0838 -0.006 -0.035 -0.053 -0.030 -0.035 -0.009 72 85.28 70 0.1033 -0.060 -0.017 0.034 0.032 -0.007 0.011 78 87.52 76 0.1724 -0.034 -0.012 -0.026 -0.004 -0.027 -0.001 84 91.06 82 0.2312 0.018 -0.029 -0.011 -0.050 0.010 0.017 90 96.17 88 0.2586 0.000 -0.030 -0.048 0.049 0.006 -0.018 96 107.69 94 0.1582 -0.011 -0.053 0.006 -0.020 -0.066 -0.075102 117.16 100 0.1158 0.082 -0.059 -0.013 0.018 0.016 -0.003108 137.48 106 0.0215 -0.021 -0.058 0.044 0.021 -0.067 -0.112
Lag 104, 52
AR(2) characteristic polynomial m2 - 1.37442 m + 0.38964 (m=1/B)
Regn. Adjustments Order d-1/2 terms (regn.) k No adjustment add (2/d)1/2 /3 0 .4714 Polynomial add 1.16/(d1/2) 1 1.1785 Sine (fund.) add 2.53 /(d1/2) 3 2.5927 + harmonic add 3.80 /(d1/2) 5 4.0069
Sine + linear about the same as sine Generated 3 sets of pctles (20,000 reps) for both models Sorted on d and 5th percentile Result: percentiles interspersed (see below) Moral: Use same adjustments for sine, sine + linear.
Based on Taylor series, for large m, adjustment is
for regression adjustments with k columns selected from intercept and Fourier sines and cosines.
2(1/ 3 / 2) /k d
)2/3/1(2 k
------------------------------------------- d=52 -------------------------------------------
trend t_1 t_2_5 t_5 t_10 t_25 t_50 t_75 t_90 t_95 t_97_5 t_99 n r
harmonic -2.95 -2.58 -2.24 -1.86 -1.23 -0.53 0.16 0.78 1.14 1.48 1.85 1040 20000 harmonic -2.93 -2.54 -2.23 -1.84 -1.22 -0.54 0.15 0.78 1.15 1.46 1.86 1040 20000 harmonic -2.94 -2.55 -2.21 -1.85 -1.21 -0.52 0.17 0.77 1.16 1.50 1.88 1040 20000 sine wave -2.75 -2.36 -2.03 -1.66 -1.04 -0.35 0.34 0.95 1.34 1.65 2.03 1040 20000 sine wave -2.73 -2.34 -2.03 -1.65 -1.03 -0.34 0.34 0.96 1.34 1.67 2.05 1040 20000 lin&sine -2.73 -2.35 -2.03 -1.66 -1.03 -0.34 0.34 0.97 1.34 1.66 2.01 1040 20000 sine wave -2.69 -2.35 -2.01 -1.64 -1.03 -0.34 0.34 0.95 1.31 1.65 2.03 1040 20000 lin&sine -2.71 -2.33 -1.98 -1.62 -1.01 -0.33 0.35 0.98 1.35 1.65 2.04 1040 20000 lin&sine -2.71 -2.33 -1.98 -1.62 -1.01 -0.33 0.35 0.98 1.35 1.65 2.04 1040 20000 mean -2.49 -2.15 -1.83 -1.47 -0.84 -0.17 0.52 1.13 1.48 1.80 2.16 1040 20000 mean -2.52 -2.16 -1.83 -1.46 -0.84 -0.16 0.53 1.16 1.52 1.84 2.21 1040 20000 linear -2.51 -2.13 -1.82 -1.45 -0.81 -0.15 0.54 1.16 1.51 1.82 2.18 1040 20000 quadratic -2.49 -2.13 -1.82 -1.45 -0.84 -0.15 0.52 1.12 1.48 1.80 2.22 1040 20000 linear -2.53 -2.14 -1.81 -1.45 -0.83 -0.15 0.54 1.15 1.53 1.87 2.22 1040 20000 quadratic -2.53 -2.12 -1.80 -1.42 -0.82 -0.14 0.53 1.13 1.49 1.83 2.19 1040 20000 mean -2.44 -2.09 -1.79 -1.44 -0.83 -0.16 0.52 1.13 1.50 1.84 2.25 1040 20000 quadratic -2.50 -2.10 -1.78 -1.43 -0.84 -0.15 0.52 1.14 1.51 1.83 2.18 1040 20000 linear -2.52 -2.10 -1.78 -1.42 -0.83 -0.15 0.53 1.16 1.52 1.85 2.22 1040 20000 none -2.38 -2.05 -1.73 -1.36 -0.75 -0.07 0.62 1.23 1.60 1.90 2.25 1040 20000 none -2.46 -2.07 -1.73 -1.36 -0.75 -0.07 0.61 1.22 1.61 1.93 2.31 1040 20000 none -2.43 -2.04 -1.73 -1.37 -0.75 -0.07 0.62 1.23 1.59 1.90 2.27 1040 20000
Recall: based on Taylor series, for large m, adjustment is
Claim: This holds for any orthogonal set of periodic regressors.Use double subscript arrays:
2(1/ 3 / 2) /k d
Jan. Feb. --- Dec.Year 1 Y(1,1) Y(1,2) --- Y(1,12)Year 2 Y(2,1) Y(2,2) --- Y(2,12) | | | | |Year m Y(m,1) Y(m,2) --- Y(m,12)
Monthly data, double array Yt = Y(i,s)
X’Y = csyis X’e = cseis
Why not cis??
Example: 20 years of sinusoidal cs values (plotted)
X column vertically stacks c11, c21,…,c121
2
1 1
' ( 1) ( ) , 'd d
s s ss s
X X m c X Y c y
2
1 1
' ( 1) ( ) , 'd d
s s ss s
X X m c X Y c y
Regressing seasonal differences (e’s) on lagged Y’s and X variables.Lag Y coefficient is (matrix form)
1 1'( ( ' ) ') / '( ( ' ) ')e I X X X X Y Y I X X X X Y
In double subscript form, expectation of numerator “correction term”
2/))2/)(1(()1(}){)(())1(( 111
1
21
1
2 mmmmeyEccmd
ss
d
ss
Numerator normalized by )/(1 dm , denominator -> 2/1
Suggested adjustment for each such periodic regressor: d2/1