Wavelet-Based Parameter Estimation for Trend Contaminated Fractionally Differenced Processes Peter F. Craigmile Donald B. Percival Peter Guttorp NRCSE T e c h n i c a l R e p o r t S e r i e s NRCSE-TRS No. 047 May 30, 2000 The NRCSE was established in 1996 through a cooperative agreement with the United States Environmental Protection Agency which provides the Center's primary funding.
35
Embed
Wavelet-Based Parameter Estimation for Trend …Wavelet-Based Parameter Estimation for Trend Contaminated Fractionally Differenced Processes Peter F. Craigmile Donald B. Percival Peter
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Wavelet-Based Parameter Estimation for TrendContaminated Fractionally Differenced Processes
Peter F. Craigmile Donald B. Percival Peter Guttorp
NRCSET e c h n i c a l R e p o r t S e r i e s
NRCSE-TRS No. 047
May 30, 2000
The NRCSE was established in 1996 through a cooperative agreement with the United StatesEnvironmental Protection Agency which provides the Center's primary funding.
Wavelet-Based Parameter Estimation for Trend
Contaminated Fractionally Di�erenced Processes
Peter F. Craigmile, 1 Donald B. Percival 2;3 and Peter Guttorp 1.
1Department of Statistics, Box 354322, University of Washington, Seattle. WA 98195{4322.
2Applied Physics Laboratory, Box 355640, University of Washington, Seattle. WA 98195{5640.
3MathSoft Inc., 1700 Westlake Avenue North, Suite 500, Seattle. WA 98109{3044.
Correspondence Address:
Peter F. Craigmile,
Department of Statistics,
Box 354322,
University of Washington,
Seattle. WA 98195{4322.
1
Wavelet-Based Parameter Estimation for Trend
Contaminated Fractionally Di�erenced Processes
Peter F. Craigmile, 1 Donald B. Percival 2;3 and Peter Guttorp 1.
Last updated May 30, 2000.
1Department of Statistics, Box 354322, University of Washington, Seattle. WA 98195{4322.
2Applied Physics Laboratory, Box 355640, University of Washington, Seattle. WA 98195{5640.
3MathSoft Inc., 1700 Westlake Avenue North, Suite 500, Seattle. WA 98109{3044.
Abstract:
A common problem in the analysis of time series is how to deal with a possible trend component,
which is usually thought of as large scale (or low frequency) variations or patterns in the series
that might be best modelled separately from the rest of the series. Trend is often confounded
with low frequency stochastic uctuations, particularly in the case of models such as fractionally
di�erenced (FD) processes, which can account for long memory dependence (slowly decaying auto-
correlation) and can be extended to encompass non-stationary processes exhibiting quite signi�cant
low frequency components. In this paper we assume a model of polynomial trend plus FD noise
and apply the discrete wavelet transform (DWT) to separate a time series into pieces that can be
used to estimate both the FD parameters and the trend. The estimation of the FD parameters is
based on an approximate maximum likelihood approach that is made possible by the fact that the
DWT decorrelates FD processes approximately. We demonstrate our methodology by applying it
In recent years long memory processes have been used to model natural phenomena in areas such as
atmospheric sciences, geosciences and hydrology. Such processes are characterised by slowly decaying
autocorrelations which can be hard to model using short term models such as the auto-regressive
moving average (ARMA) class of models (Box, Jenkins, and Reinsel (1994)). One common example
of a long memory process, the fractionally di�erenced (FD) process (Granger and Joyeux (1980),
Hosking (1981)), extends existing (integer) integrated processes. The succinct de�nition of an FD
process in terms of its spectral density function allows for a varied range of estimation methods.
Considering the FD process singly, a common method of parameter estimation involves calculating
the exact likelihood and maximising with respect to the parameters. Beran (1994) gives a review and
evaluation of this. He concludes that two factors hampering such methods in practice are (1) slow
computations (a consideration that is becoming less and less important with current technology) and
(2) inaccuracies due to a large number of computations. The second can be a signi�cant problem in
this framework because the matrix calculations involved are O(N2). Various approximate likelihood
methods have been proposed to overcome this (Beran (1994)). Some of these methods exploit
fast transforms of the data such as the FFT (Robinson (1994)) or wavelet transforms (Wornell
(1995), McCoy and Walden (1996)). Jensen (2000) considers a wavelet method for estimation of
auto-regressive, fractionally integrated, moving average (ARFIMA) processes.
There is less literature in the case of such a process being contaminated by a trend component. The
topic of long-range dependence and trends is dealt with in Smith (1989a), Smith (1989b) and Smith
(1993). Teverovsky and Taqqu (1997) consider tests for long memory dependence in the presence
of two types of trend (shifting means and slowly decaying trend). Percival and Bruce (1998) extend
the wavelet based approximate likelihood estimates of McCoy and Walden (1996) to work in the
presence of polynomial trends. Beran (1999) uses variable bandwidth smoothing to estimate such
processes with additive trend.
In this paper we consider estimation of the parameters of a trend contaminated FD process using
the discrete wavelet transform (DWT). Wavelet transforms of such time series are useful because:
1. They approximately de-correlate FD and related processes. We will show the resulting wavelet
3
coeÆcients form a near independent Gaussian sequence, simplifying the statistics signi�cantly;
2. Wavelets have excellent time and frequency localisation, which can be useful for expressing
local deviations from a statistical model;
3. Wavelets can separate certain non-linear trends from noise, thus allowing us to analyse depen-
dent time series with a trend.
In this paper we concentrate on polynomial trends, but it is easy using the �nal two properties to
extend these results to other smooth and non-smooth trends (see Craigmile, Percival, and Guttorp
(2000a) for further details on this).
By using the wavelet coeÆcients of the transform in a multivariate Gaussian model (with an assumed
simpli�ed correlation structure of the coeÆcients), we can estimate the parameters using maximum
likelihood. In particular we consider two models:
� White noise wavelet model - we assume the wavelet coeÆcients are independent both within
and across wavelet scales;
� AR(1) wavelet model. We show that there is often a small lag-one auto-correlation between
wavelet coeÆcients on a speci�c scale. As an approximation to this we assume independence
across scales, and an AR(1) model per each wavelet scale.
We consider limits theorem and approximate con�dence intervals for the parameters in these models,
and Monte Carlo simulations are used to assess these methods. We end by applying the theory to a
northern hemisphere temperature dataset obtained from the Climate Research Unit, University of
East Anglia, UK.
2 The Discrete Wavelet Transform
Suppose X = (X0 : : : XN�1) is the observed time series with N divisible by 2J for some integer
J . For an even integer L, let fhlgL�1l=0 denote a Daubechies (Daubechies (1992)) wavelet �lter. By
de�nition this �lter has a square gain function given by
H1(f) � 2 sinL(�f)PL
2�1
l=0
�L2�1+ll
�cos2l(�f): (1)
4
Note that the square gain function does not uniquely specify the wavelet �lter. Daubechies distin-
guishes between two types:
� The extremal phase, D(L), �lters are the ones which exhibit the smallest delay (have maximum
cumulative energy) over other choices of wavelet �lter;
� The least asymmetric, LA(L), �lters are de�ned for L=8,10,. . . . These �lters are closest to
that of a linear phase �lter.
We refer to the D(2) wavelet as the Haar wavelet for historical reasons. All these wavelet �lters are
useful because they can reduce polynomials to zero due to an inherent di�erencing in the �lter.
Corresponding to the �lter we let W denote a N � N orthonormal DWT matrix of level J . The
DWT coeÆcients are then given by W =WX. We partition these coeÆcients as
W = (W1;W2; : : : ;WJ ;VJ )
� Wj are the N=2j wavelet coeÆcients associated with changes in average on scale �j � 2j�1
and with times spaced �j � 2j units apart;
� VJ are the N=2J scaling coeÆcients associated with averages on scale �J � 2J�1 and with
times spaced �J � 2J units apart.
Equivalent to the matrix form, the \pyramid algorithm" (Mallat (1989)) can be used to calculate
W hierarchically (see e.g. Percival and Walden (2000) for a step-by-step guide to this algorithm).
Letting Lj � (2j � 1)(L � 1) + 1, the jth level wavelet coeÆcients can also be computed using the
jth level wavelet �lter fhj;lgLj�1l=0 :
Wj;k =
Lj�1Xl=0
hj;lX2j(k+1)�l�l mod Nj�1j = 1 : : : J; k = 0 : : : Nj � 1
where fhj;lg has a square gain function given by
Hj(f) � H1(2j�1f)
j�2Yk=0
H1(12 � 2kf): (2)
5
The DWT handles �ltering operations periodically. That is to say a number of the wavelet coeÆ-
cients on each level are a weighted sum of points at the start and the end of the original signal. In
particular the �rst Bj � d(L�2)(1�2�j )e wavelet coeÆcients are a�ected by this circularity problem.
We call these coeÆcients the boundary dependent coeÆcients. The remainingMj � N=2j�Bj are un-
a�ected by boundaries and are named the boundary independent coeÆcients. We let M �PJj=1Mj
denote the total number of boundary-independent wavelet coeÆcients. As we shall see the statistical
properties of the two sets of wavelet coeÆcients are very di�erent.
3 Fractionally Di�erenced Processes
The fractionally di�erenced (FD) process is an example of a long memory dependence model in
which the covariance fades slowly over increasing lags. The process was originally proposed by
Granger and Joyeux (1980) and Hosking (1981) as an extension to ARIMA(0; d; 0) models to allow
for fractional values of d.
De�nition 3.1 Let d 2 [�1=2; 1=2) and �2� > 0. We say that fXtgt2Z is a FD(d; �2� ) or
ARFIMA(0; d; 0) process if it has spectral density function:
SX(f) = �2� j2 sin(�f)j�2d for jf j � 1=2: (3)
d is known as the di�erence parameter and �2� is the innovation variance. The auto-covariance
sequence for this process is given by
sX;k = �2�(�1)k�(1� 2d)
�(1� d+ k)�(1� d� k):
When d = 0, fXtg is a white noise (i.e. uncorrelated) process. Extending this model by letting
d � 1=2 in equation (3), we obtain a class of non-stationary processes which are stationary if we
di�erence bd+ 1=2c times. Beran (1994) lists further properties of FD processes.
Suppose we observe a realisation of a Gaussian FD(d; �2� ) process, fXtgN�1t=0 . By the linearity of the
DWT, the wavelet coeÆcients are clearly Gaussian. We also have that the Daubechies level j wavelet
�lter acts as an approximate band-pass �lter with pass-band [2�(j+1); 2�j ] (see Daubechies (1992)).
6
This approximation improves with increasing L, and hence it can be argued from the spectral
representation theorem for stationary processes that boundary independent wavelet coeÆcients at
di�erent scales are asymptotically uncorrelated. We will use this approximation in our modelling
procedure.
Theorem 3.2 When d < (L + 1)=2 the boundary independent wavelet coeÆcients within a given
level j are a portion of a zero mean stationary process with auto-covariance sequence given by
�2��j;� (d) �Z 1=2
�1=2ei2�f�Sj(f) df; (4)
where we de�ne
Sj(f) � 2�j2j�1Xk=0
Hj(2�j(f + k))SX(2
�j(f + k)): (5)
By Tew�k and Kim (1992), the boundary-independent wavelet coeÆcients of an FD process are
approximately uncorrelated. To verify this fact we check that Sj(�) is close to the spectral density
function (SDF) for a white noise process, i.e. Sj(�) is approximately at. Figure 1 illustrates this for
an FD(0.25,1) process analysed using a LA(8) wavelet �lter. The top left panel shows the spectrum of
the process along with the approximate band-passes that correspond to the �rst four wavelet levels.
The top right panel shows Sj(�) for j = 1 : : : 4. The lower panels illustrate the approximations to
this spectrum used in the paper. If we assume that the wavelet coeÆcients are uncorrelated per
each wavelet level, we obtain the at spectra given in the lower left panel. Clearly the lower right
panel show spectra that better model the true spectrum of the wavelet coeÆcients. In this case we
assume that the wavelets on each level follow an AR(1) model, where the AR parameters are given
by �j;1(d)=�j;0(d) and hence depend on d alone.
For future use, we note that the �rst two derivatives of �j;� (d) for d < (L+ 1)=2 are
�0j;� (d) � @@d�j;� (d) = �4
Z 1=2
0log(sin(�f))Hj(f) cos(2
j+1�f�)(2 sin(�f))�2d df
and,
�00j;� (d) � @2
@d2�j;� (d) = 8
Z 1=2
0(log(sin(�f)))2Hj(f) cos(2
j+1�f�)(2 sin(�f))�2d df:
(These follows from Leibnitz's rule which allows us to interchange di�erentiation and integration).
7
4 A Model for Trend
Let fXtgN�1t=0 be a Gaussian FD(d; �2� ) process and fTtgN�1
t=0 be a deterministic polynomial trend of
order K. The data we observe is the sum of these two elements, Yt � Tt +Xt. Suppose we perform
a DWT on the data fYtg. Since the error process is Gaussian and the trend is deterministic, the
observed data are Gaussian. By the linearity of the transform the wavelet coeÆcients are Gaussian.
Because a Daubechies wavelet �lter of order L has L=2 embedded di�erencing operations we can
zero out a trend of polynomial order K in the boundary-independent wavelet coeÆcients if K � L2 ;
i.e., only the boundary wavelet and scaling coeÆcients will contain the trend component. The result
due to Tew�k and Kim (1992) also holds for this model since the trend component is not included
in the boundary independent coeÆcients. Thus boundary-independent wavelet coeÆcients can be
regarded approximately as either uncorrelated or following an AR(1) model on each level.
5 The White Noise Discrete Wavelet Transform Model
In this section we consider the simplest model for estimating the parameters of the FD process
using the wavelet coeÆcients (the next section explores the re�nement given by the AR(1) model).
Denote the boundary-independent wavelet coeÆcients by f(Ww)j;k : j = 1 : : : J; k = 0 : : : Mj � 1gand assume they form an independent sample with (Ww)j;k � N(0; �j;0(d)�
2� ): The likelihood for
this model is given by
LN (d; �2� j(Ww)j;k) =
JYj=1
Mj�1Yk=0
(2��j;0(d)�2� )�1=2 exp
� (Ww)
2j;k
2�j;0(d)�2�
!:
If we let Rj �PMj�1
k=0 (Ww)2j;k denote the sum of squares of the jth level boundary independent
wavelet coeÆcients and M =PJ
j=1Mj , the log-likelihood is
lN (d; �2� j(Ww)j;k) = �M
2log(2��2� )�
JXj=1
Mj
2log(�j;0(d)) �
JXj=1
Rj
2�j;0(d)�2�; (6)
which is maximised by
�̂2�;N(d) � 1
M
JXj=1
Rj
�j;0(d)(7)
8
for a given d. Substituting this estimate into equation (6) we obtain the pro�le log-likelihood with
respect to d:
lN (d; �̂2�;N (d)j(Ww)j;k) = �M
2
�log(2��̂2�;N (d)) + 1
��
JXj=1
Mj
2log(�j;0(d)): (8)
Maximising with respect to d yields d̂N .
5.1 Limit Theory for the White Noise Wavelet Model
For any twice di�erentiable function g, de�ne the two operators
�1(g(x)) =@@xg(x)
g(x)and �2(g(x)) =
@
@x�1(g(x)) =
@2
@x2g(x)
g(x)�
@@xg(x)
g(x)
!2
:
Let �L � f(d; �2� ) : d < L + 1 and �2� > 0g. Suppose that �0 2 �L denotes the true value of the
parameters which are estimated in our model by �̂N 2 �L.
Theorem 5.1 As N !1
(a) (Consistency) (�̂N � �0)!p 0;
(b) (Joint Asymptotic Normality)pN(�̂N � �0)!d N(0;��1(�0)); where
�(�) � 1
2
264
PJj=1 2
�j �21(�j;0(d)) ��2�
PJj=1 2
�j �1(�j;0(d))
��2�
PJj=1 2
�j �1(�j;0(d)) ��4�
375 ;
(c) (Marginal Asymptotic Normality of d̂N)pN(d̂N � d0)!d N(0; �2d0), where
�2d � 2
24� JX
j=1
2�j �21(�j;0(d))
��� JXj=1
2�j �1(�j;0(d))�235
�1
:
This limit theorem also holds if we replacepN by
pM and replace �2d by ~�2d �
2h(PJ
j=1(Mj=M) �21(�j;0(d))) � (
PJj=1(Mj=M) �1(�j;0(d)))
2i�1
. Monte Carlo studies indicate that
these replacements yield a better small sample approximation.
Corollary 5.2 (Exact distribution of �̂2�;N(d0)) Under the white noise DWT model �̂2�;N (d0) =d
M�1�2��2M .
9
6 The AR(1) Discrete Wavelet Transform Model
Figure 1 suggests that a better approximation to the spectrum of the boundary-independent wavelet
coeÆcients may be to assume that within each level f(Ww)j;k, : k = 0; : : : ;Mj � 1g is a portion of
an AR(1) process (as before we assume independence between the levels). Thus
Proof (a) holds by the assumptions of the white noise model, (b) holds by the Strong Law of
Large Numbers, the proofs of (c) and (d) are obvious.
Lemma 11.2 (Derivatives of White Noise model likelihood) For � 2 �L, let _lN (�) denote
the vector of derivatives of equation (6) with respect to d and �2� respectively, and let �lN (�) be the
2� 2 matrix of second derivatives. Then
�2_lN (�) =
266664
JXj=1
(Mj �Aj(�))�1(�j;0(d))
M �PJj=1Aj(�)
�2�
377775 ; (12)
and
�2�lN (�) =
2666664
JXj=1
h(Mj �Aj(�))�2(�j;0(d)) +Aj(�)�
21(�j;0(d))
i PJj=1Aj(�)�1(�j;0(d))
�2�PJj=1Aj(�)�1(�j;0(d))
�2�
2PJ
j=1Aj(�)�M
(�2� )2
3777775 : (13)
Proof We can rewrite equation (6) as
�2lN (�) = M log(2��2� ) +JX
j=1
hAj(�) +Mj log(�j;0(d))
i: (14)
Result follows by taking derivatives of this equation and noting that
@
@dAj(�) = �Aj(�)�1(�j;0(d));
@2
@d2Aj(�) = �Aj(�)
��2(�j;0(d)) ��2
1(�j;0(d))�;
@
@�2�Aj(�) = �Aj(�)
�2�:
19
Lemma 11.3 (Strong Laws for the derivatives) Suppose �0 2 �L. Then as N !1,
N�1 _lN (�0)!as 0 and �N�1�lN (�0)!as �(�0) for �(�) de�ned in Theorem 5.1.
Proof Result follows directly from Lemmas 11.1 and 11.2.
Lemma 11.4 For d < (L+1)=2 de�ne fL(d) � �2;0(d)=�1;0(d). Then fL(d) is a strictly increasing
function of d.
Proof For L = 2 it can be shown that
f2(d) =6
(2� d)(3 � d);
which is a strictly increasing function for d < 32 . Since f 0L(d) is a continuous function of L and d,
one can validate graphically that f 0L(d) > 0 for a particular L > 2 and d < (L+1)=2 (see Craigmile
(2000)).
Lemma 11.5 (Existency and Consistency of ML Estimates) Suppose �̂N ; �0 2 �L. Then
with probability converging to 1, there exist solutions �̂N of the likelihood equation such that �̂N !p �0
as N !1.
Proof We follow the proof of Lehmann (1997), P430. For r > 0, let Qr � f� 2 �L : j�� �0j = rg.We want to show that
P (lN (�) < lN (�0) for all � 2 Qr)! 1;
as N ! 1. This implies that the likelihood equations have a local maximum inside Qr. Since the
equations are satis�ed at a local maximum, for any r > 0 with probability converging to one, the
likelihood equations have a solution within Qr. Now de�ne
j(�; �0) � �j;0(d0)�2�;0
�j;0(d)�2�:
20
Then Aj(�) = Aj(�0) j(�; �0), and by equation (14)
2N�1 (lN (�)� lN (�0)) = �MN
log(2��2� )�1
N
JXj=1
hAj(�) +Mj log(�j;0(d))
i
+M
Nlog(2��2�;0) +
1
N
JXj=1
hAj(�0) +Mj log(�j;0(d0))
i
=M
N
�log(�2�;0)� log(�2� )
�+
JXj=1
N�1 (Aj(�0)�Aj(�))
+JX
j=1
Mj
N(log(�j;0(d0))� log(�j;0(d)))
=JX
j=1
Mj
Nlog( j(�; �0)) +
JXj=1
Aj(�0)
N(1� j(�; �0)):
Now j(�; �0) � 0 for all �, since we have a ratio of variances. Thus by Lemma 11.1, as N !1
�j;N(�; �0) � Mj
Nlog( j(�; �0)) +
Aj(�0)
N(1� j(�; �0))
!as 2�j [log( j(�; �0)) + 1� j(�; �0)]
� �j(�; �0)
which is a function of r and is non-positive for � 2 Qr (since log(x) + 1 � x � 0 for x > 0). By
independence of the Aj(�0)'s (j = 1; : : : ; J)
JXj=1
�j;N(�; �0) !as
JXj=1
�j(�; �0)
which is also non-positive for � 2 Qr. This term is negative if at least one of the summands is
negative. This is indeed the case if we show for � 2 Qr
1(�; �0)
2(�; �0)6= 1;
which is equivalent to proving that for d; d0 < (L+ 1)=2
fL(d) 6= fL(d0):
This is con�rmed by Lemma 11.4. Thus with probability one the likelihood evaluated at � 2 Qr is
smaller than that at �0. If we let r ! 0 we obtain the consistency result by always taking the root
of the likelihood equations closest to �0.
21
Proof of 5.1 Consistency follows from Lemma 11.5. A Taylor series expansion for _lN (�̂N ) about
�0 is given by
_lN (�̂N ) = _lN (�0) + �lN (��)(�̂N � �0)
where �� lies between �0 and �̂N . Since _lN (�̂N ) = 0
_lN (�0) = [��lN (��)](�̂N � �0): (15)
To show asymptotic normality ofpN(�̂N � �0) we �rst show as N !1
N�1=2 _lN (�0)!d N(0;�(�0)):
We will prove by the Cramer{Wold theorem. Let � � (�1; �2)T 2 R2 and consider the characteristic
function of N�1=2�T _lN (�0). By Lemma 11.1 we can show for large N
�N�1=2�T _lN (�0)(t) � exp
0@�t2
2
JXj=1
2�j
2
�1�1(�j;0(d0)) +
�2�2�;0
!21A :
By the uniqueness of characteristic functions, N�1=2�T _lN (�0) convergences to a
N
�0;PJ
j=1 2�(j+1)
��1�1(�j;0(d0)) + (�2=�
2�;0)�2�
random variable. Note that the variance
term is equal to �T�(�0)�. The asymptotic normality of �̂N follows by dividing equation (15) bypN , and using Lemma 11.5 and Lemma 11.3 along with the above to show
(1) N�1=2 _lN (�0)!d N(0;�(�0));
(2) �N�1�lN (�0)!p �(�0);
(3) N�1h�lN (�0)� �lN (�
�)i!p 0.
Invertibility of �(�0) and Slutsky's theorem yields the required result. For the marginal asymptotic
distribution of d̂N use the Cramer{Wold theorem with the vector (1; 0). �2d corresponds to the �rst
diagonal element of ��1(�0).
Proof of 5.2 �̂2�;N(d0) =M�1�2�PJ
j=1Aj(�0) =d M�1�2��
2M :
22
Proposition 11.6 For � 2 �L the �rst two derivatives of the log-likelihood are given by