Licentiate Thesis:
A Multiresolution Analysis of Stock Market Volatility
Using Wavelet Methodology
Copyrighted material.
Tommi A. Vuorenmaa
[e-mail: [email protected]]
Department of Economics, University of Helsinki
Septemb er 21, 2004
ID=2934
Tiedekunta-Facultet-Faculty Valtiotieteellinen tiedekunta
Laitos-Institution-Department Kansantaloustieteen laitos
Tekijä-Författare-Author Vuorenmaa, Tommi
Työn nimi-Arbetets titel-Title A Multiresolution Analysis of Stock Market Volatility Using Wavelet Methodology
Oppiaine - Läroämne - Subject Ekonometria
Työn laji-Arbetetets art-Level Lisensiaatintyö
Aika-Datum-Month and year2004-09-22
Sivumäärä - Sidantal - Number of pages 140
Tiivistelmä - Referat - Abstract Rahoitusaikasarjat ovat luonteeltaan epästationaarisia, mikä ilmenee volatiliteetin kasautumisena ja suurina hyppyinä. Wavelet-menetelmä soveltuu hyvin tällaisten hankalien aikasarjojen analysointiin toisin kuin esimerkiksi traditionaalinen Fourier-menetelmä, joka muunnosta tehtäessä tuhoaa aika-ulottuvuuden. Wavelet-muunnos säilyttää ajan ja tuo yhden ulottuvuuden lisää: aikaskaalan. "Wavelet-multiresoluutioanalyysillä" voidaankin tarkastella aikasarjaa useilla aikaskaaloilla. Koska osakemarkkinat koostuvat aikaskaalan suhteen heterogeenisista sijoittajista, volatiliteetin käyttäytymistä voidaan tutkia syvällisemmin. Tätä päämäärää edesauttaa viime vuosina saataville tullut "korkeataajuinen" data, joka sopii oivallisesti wavelet-analyysin raaka-aineeksi. Edes datan suuri määrä ei ole ongelma, koska wavelet-muunnokset ovat nopeita laskea.
Tämän lisensiaattityön tarkoituksena on esitellä wavelet-menetelmä ekonometrikoille. Teoriaa havainnollistetaan Helsingin pörssissä (HEX) toteutuneiden Nokia Oyj -osakekauppojen avulla, jotka on poimittu likvideiltä markkinoilta viiden minuutin välein. Tutkimustulosten mielekkyyden varmistamiseksi markkinoiden mikrorakenteeseen liittyviä seikkoja oli otettava huomioon. Näistä eräät ovat ominaisia HEX:lle. Wavelet-analyysi paljastaa, että volatiliteetti IT-kuplan puhkeamista ennen (1999 - 2000) ja sen jälkeen (2001 - 2002) poikkeaa merkittävästi toisistaan tietyillä aikaskaaloilla. Skaalauslaitkaan eivät välttämättä ole aikainvariantteja kuten normaalisti oletetaan. Koska kuplan puhkeamista lisäksi edelsi voimakkaampi volatiliteetin pitkämuistisuus kuin jälkimmäisellä jaksolla, muistikin muuttuu ajassa. Tätä tietoa voidaan hyödyntää lokaalisesti stationaarisen stokastisen volatiliteetin mallin avulla. Myös volatiliteetin periodisuudella näytetään olevan merkitystä lopputuloksiin. Avainsanat Nyckelord Keywords
aikaskaala korkeataajuinen data osakemarkkinat pitkä muisti volatiliteetti wavelet Säilytyspaikka - Förvaringsställe - Where deposited /(täytetään kirjastossa)
Muita tietoja - Övriga uppgifter - Additional information
Page 1 of 1
21.9.2004http://www.valt.helsinki.fi/cgi-shl/gradu.pl
ID=2935
Tiedekunta-Facultet-Faculty Faculty of Social Sciences
Laitos-Institution-Department Department of Economics
Tekijä-Författare-Author Vuorenmaa, Tommi
Työn nimi-Arbetets titel-Title A Multiresolution Analysis of Stock Market Volatility Using Wavelet Methodology
Oppiaine - Läroämne - Subject Econometrics
Työn laji-Arbetetets art-Level Licentiate thesis
Aika-Datum-Month and year2004-09-22
Sivumäärä - Sidantal - Number of pages 140
Tiivistelmä - Referat - Abstract The non-stationary character of stock market returns manifests itself through the volatility clustering effect and large jumps. An efficient way of representing a time series with such complex dynamics is given by wavelet methodology. With the help of a wavelet basis, the discrete wavelet transform (DWT) is able to break a time series with respect to a time-scale while preserving the time dimension and energy unlike the traditional Fourier transform which "trades" time for frequency. Time-scale specific information is important if one accepts the view that stock market consists of heterogenous investors operating at different time-scales. In that case considerable more insight into the volatility dynamics is gained by looking at the data at several time-scales. At small time-scales, in particular, the locality of wavelet analysis allows one to fully exploit high-frequency data. Wavelet transforms are also fast to calculate, so they are ideally suited for analyzing large data sets.
The "large-scale aim" of this licentiate thesis is to first introduce wavelet methodology to econometricians and then to analyze stock market volatility with it. In more detail, the data consists of 5-minute observations of the liquid Nokia Oyj stock at the Helsinki Stock Exchange (HEX). Several microstructure problems have to be dealt with, some characteristic of the HEX. Pre-filtered volatility series is then being analyzed by the "maximal overlap" DWT to study both the global and local scaling laws in a turbulent "IT-bubble" (1999 - 2000) and its calmer aftermath period (2001 - 2002). Significant time-scale specific differences between these two periods are found. The global scaling laws may not be time-invariant as usually claimed. The bubble period also experienced stronger long-memory in volatility than its aftermath. Thus long-memory may be time-varying as well. Such a finding can be applied in a locally stationary stochastic volatility model. Finally, the effects of the intraday volatility periodicity are studied and they are also found to be significant. Avainsanat Nyckelord Keywords
high-frequency data long-memory stock markets time-scale volatility wavelet Säilytyspaikka - Förvaringsställe - Where deposited /(täytetään kirjastossa)
Muita tietoja - Övriga uppgifter - Additional information
Page 1 of 1
21.9.2004http://www.valt.helsinki.fi/cgi-shl/gradu.pl
Acknowledgements
Most of the empirical research for this licentiate thesis was conducted when visiting the
Bank of Finland (Research Department). Their hospitality is gratefully acknowledged. I
also gratefully acknowledge the grants from OP-Ryhman Tutkimussaatio and Yrjo Jahns-
son Foundation. Several people have made academic contributions to this work. I would
especially like to thank Professor Pentti Saikkonen for his patient guidance. In addi-
tion, the following persons gave valuable comments: Professors Erkki Koskela, Seppo
Honkapohja, Markku Lanne, Matti Viren, Lasse Holmstrom, Seppo Pynnonen, Doctors
Juha Tarkka, Tuomas Takalo, and Jouko Vilmunen. I also thank the participants at the
Economics and Econometrics of the Market Microstructure Summer School (Constance,
Germany 2004), RAKA-colleagues, and my father M.Sc. Osmo Vuorenmaa for the helpful
discussions. Finally, I thank the Fulbright Center and the Finnish Cultural Foundation
for giving me the essential financial backing to study a year at the University of California
(San Diego) in La Jolla (”The Jewel” in English), where this thesis was ”grinded”.
La Jolla, CA
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 A historical glance at wavelet methodology . . . . . . . . . . . . . . . . . . . . . 5
3 Essentials of Fourier and wavelet theories . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Fourier theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 Continuous Fourier transform . . . . . . . . . . . . . . . . . . . . . 10
3.1.3 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Wavelet theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Continuous wavelet transform . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . 21
4 Multiresolution analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Multiresolution composition . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Filters and wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Compactly supported wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Daubechies wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Decomposing time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.1 Practical issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Partial discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . 48
5.4 Maximal overlap discrete wavelet transform . . . . . . . . . . . . . . . . . 49
5.5 Wavelet variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Volatility modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1 Measures of volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Stochastic volatility and long-memory . . . . . . . . . . . . . . . . . . . . . 63
7 Empirical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2 Preliminary data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
ii
7.3 Multiresolution decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.4 Global scaling laws and long-memory . . . . . . . . . . . . . . . . . . . . . 88
7.5 Local scaling laws and long-memory . . . . . . . . . . . . . . . . . . . . . . 93
7.6 Effects of volatility periodicity . . . . . . . . . . . . . . . . . . . . . . . . . 99
8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
A Appendix: Some functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . 130
B Appendix: Orthonormal transforms . . . . . . . . . . . . . . . . . . . . . . . . . 132
C Appendix: Fractional differencing and long-memory . . . . . . . . . . . . . . . . 133
D Appendix: Locally stationary process . . . . . . . . . . . . . . . . . . . . . . . . 134
E Appendix: Fourier flexible form . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
F Appendix: List of abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
iii
1 Introduction
Financial time series share common characteristics. For example, in stock market return
data one typically observes discontinuities, i.e. sudden big changes, and clusters of volatil-
ity, i.e. alternating of highly volatile and tranquil periods (see Fig. 1). These phenomena
are so widely recognized today that they are called stylized facts (see e.g. Cont (2001) or
Dacorogna et al. (2001)). Inavoidably, then, a good model of financial returns would at
least need to capture the non-Gaussianity and the time-varying volatility.
In the 1970s a lot of effort was put into the study of discontinuities since the pioneering
work of Mandelbrot (1963) and Fama (1965). In fact, the study of non-Gaussian heavy-
tailed distributions dominated the empirical finance literature back then. Although there
is nowadays a vital new line of research of jumps (see e.g. Ait-Sahalia (2003), Barndorff-
Nielsen and Shephard (2003a, 2003b), Andersen et al. (2003)), for the last two decades the
main emphasis has been put into the research of volatility clustering or what has become
to be known as the ”ARCH-effect”. In particular, the seminal articles of Engle (1982) and
Bollerslev (1986) launched a huge interest in different kinds of (generalized) autoregressive
conditional heteroskedastic ((G)ARCH) models (for a review, see e.g. Bollerslev et al.
(1992, 1994) and Bollerslev (2001)).1 The immense interest in the conditional variance
stems from the fact that a correctly specified volatility model is important in valuation of
stocks and stock options and in designing optimal dynamic hedging strategies for options
and futures, among other reasons.
But as important as the ARCH-models and their numerous extensions have been for
the newborn field of financial econometrics, they have not helped much in explaining the
stylized facts. True enough, they are only models and as such perhaps only meant for
succesful data fitting, but are they missing something crucial? In a way they are, since
they are modeling only one time-scale (usually a day, or larger) at time. But stock market
data have no specific time-scale to analyze! A notable exception in this respect is the
heterogenous ARCH model introduced in Muller et al. (1997) based on the hypothesis of
a heterogenous market of Muller et al. (1993). According to this hypothesis stock markets
1In 2003 Engle was given a (half of) Nobel Prize in Economic Sciencies ”for methods of analyzing
economic time series with time-varying volatility” (see Http://www.nobel.se/economics/laureates/2003).
1
Nokia at the Helsinki Stock Exchange (1999 - 2002)
log(
Pric
e)
0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05
2.5
3.0
3.5
4.0
Time
log-
Ret
urn
0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05
-10
05
15
Figure 1: Logarithmic price and return history of Nokia at the HEX sampled every 5
minutes from January 4 (1999) to December 30 (2002).
2
consist of multiple layers of investment horizons (time-scales), varying from an extremely
short (minutes) to long (years). The small time-scales are commonly thought to be related
to speculative activity while the bigger time-scales are related to investment activity. It is
well known that the players in the stock market form a heterogenous group with respect
reasons such as perceptions of the market, risk profiles, institutional constraints, degree of
information, prior beliefs, and other characteristics such as geographical locations. Inter-
estingly, however, Muller et al. argue that many of these differences translate to sensitivity
to different time-scales. Indeed, Muller et al. (1997) show some support for the view that
time-scale is ”one of the most important aspects in which trading behaviours differ” (see
also Lynch and Zumbach (2003)). For example, big institutional investors have relatively
long trading horizons and they trade on economic fundamentals. On the other hand, the
so-called day-traders don’t keep open positions over night and they simply trade on mar-
ket sentiment. These small time-scales, in particular, have become increasingly important
because of the recent availability of high-frequency data (see e.g. Engle (2000)). In fact,
Goodhart and O’Hara (1997) conjecture that ”the ability to analyze higher frequency data
may be particularly useful in pursuing [why volatility persistance endures]”. One can then
argue that to better capture the dynamics of stock markets one must analyze data at
multiple, perhaps even a continuum of, time-scales.
Multiple time-scales are especially important from the view point of risk management.
For example, in the risk management industry one often needs to scale a risk measure
(standard deviation, say) of one time-scale to another. The industry standard is to scale
by the square-root of time, familiar from Brownian motion (i.e. continuous-time random
walk). But one is then implicitly assuming that the data generating process (DGP) is
made of independent and identically distributed (IID) random variables. This assumption
is not reasonable for financial time series; just consider volatility clustering, for example.
Indeed, the existance of serial correlation in the conditional second moments is usually
obvious even by eye (as in Fig. 1). The persistance is universally found so strong and long-
lasting that volatility is said to exhibit long-memory (or long-range dependence). Under
such non-IID circumstances square-root scaling may indeed lead to wrong conlusions (see
Diebold et al. (1997)).
3
Another commonly used overly simplifying assumption is stationarity, often of second-
order. Most of the parametric GARCH models assume this, for instance. Of the less
restricting non-parametric methods, spectral analysis requires ”covariance stationarity”
as well. This method is useful because it allows one to represent a stationary time series
in frequency domain in which the frequency aspects can be easily studied. This should
be contrasted with a time domain presentation — the customary way of presenting stock
market data — which hides the frequency information. In this context the requirement
of stationarity stems from the fact that a power spectrum is just a Fourier transform of
the corresponding sample autocorrelation function.2 As is well known, autocorrelation
is incapable of detecting non-stationarities. And because Fourier transform is looking
only for sinusoids ”globally”, spectral analysis is not suitable for transient and evolving
behavior of stock markets. Clean sinusoid components are rarely (never, dare to say!)
encountered in empirical finance.
Clearly, therefore, one needs to use more flexible tools and representations to study
stock market data. Considering the arguments made above, the most straightforward way
of increasing flexibility would be to use a non-parametric multiscale approach. This is ex-
actly where wavelet analysis enters the picture. Wavelet analysis offers a non-parametric,
mathematically concise way of studying the heterogeneity of stock markets under non-
stationary conditions. But what is wavelet analysis and how is it able to provide a time-
scale perspective on a complex (deterministic) function or, in the case of time series, on a
non-stationary realization of a stochastic process? Loosely speaking, wavelet methodology
extends Fourier methodology by replacing frequency by time-scale while still preserving
time dimension (Fourier ”trades” time for frequency). The best asset of wavelet analy-
sis, local adaptiviness, stems from the mathematical fact that the basis functions used
in wavelet analysis, called wavelets, are well-localized in both time and scale. This gives
wavelets a distinct advantage over standard frequency domain methods when analyzing
complex dynamics such as the one found in stock markets.
There are a wide variety of wavelets available today. However, in the empirical analysis
of this thesis only wavelets belonging to the family of Daubechies compactly supported
2In certain cases one can analyze non-stationary series with Fourier methods. The Fourier transform
can also be time-localized to a certain degree (see e.g. Priestley (1996)).
4
wavelets are applied. Specifically, the least asymmetric wavelets are used to analyze the
volatility of Nokia Oyj sampled every 5 minutes at the Helsinki Stock Exchange. Nu-
merous microstructure problems arise that have to be carefully dealt with. The empirical
results are interesting in several aspects. First, wavelet multiresolution analysis offers
new useful insight into the volatility dynamics. More precisely, the analysis reveals time-
varying changes in global and local scaling laws that have remained hidden with traditional
methods. Quantitatively, the semiparametric wavelet approach allows the estimation of a
locally stationary stochastic volatility model with a time-varying long-memory parameter.
The wavelet-based approach is ideally suited in the high-frequency context because of the
complex volatility dependencies and the large number of observations. It is found that the
bubble period experienced stronger long-memory than its aftermath. The long-memory
is argued to be true and not spuriously caused by structural breaks or a trend. Finally,
the analysis reveals significant effects of the intraday volatility periodicity. As a whole,
this thesis is more than just about ”deployment of wavelet technique to financial data” of
which Norsworthy et al. (2000), and justifiably so, have criticized some early authors of.
Before turning to the results of the empirical analysis (Sec. 7), however, it is essential
to go through the basics of classical Fourier theory as well as wavelet theory (Secs. 3.1 and
3.2). These two subsections serve as the backbone to the idea and theory of multiresolution
analysis, discussed next in somewhat technical manner (Sec. 4). Only after then, I believe,
one is prepared well enough to handle stochasticity (Sec. 5). There is also a quick review
of volatility measures and modeling (Sec. 6) before the actual analysis. I begin this rather
long tour by first taking a quick glance at the history of wavelet methodology, a form of
atomic decomposition.
2 A historical glance at wavelet methodology
The theory of wavelets origins from several different sources overlapping each other which
makes it a relatively hard research topic. Contrary to common beliefs, however, wavelets
have a quite long and fascinating history in mathematics. This delusion is probably due the
fact the word ”wavelet” does not appear in the literature until the 1980s when applications
in signal and image processing started to emerge. The interested reader should turn to
5
Meyer (1994) for more details than given below.
As the reader might know already, the origins of frequency based analysis lies in Joseph
Fourier who in 1807 asserted that any 2π-periodic function can be represented as a sum of
sinusoidal components with appropriate coefficients. In fact, Fourier happened to discover
a new functional universe. This space of square-integrable functions on the interval [0, 2π],
denoted henceforth by L2[0, 2π], was later created by Henri Lebesgue. For example, the
sequence 1/√2π, cos(x)/
√π, sin(x)/
√π, cos(2x)/
√π, sin(2x)/
√π, ... is an orthonormal
basis for this space. In year 1909 Alfred Haar found another orthogonal system of func-
tions, this time defined on [0, 1], that forms a series converging uniformly to a continuous
function on [0, 1] (chosen for convenience). The Haar basis functions were extremely simple
objects: indicator functions 1[a,b] that are 1 at the interval [a, b] and 0 elsewhere. Haar’s
discovery was important because it enabled to describe small and complicated detail that
the Fourier basis could not represent. In the 1930s the Haar basis was used by Paul Levy to
redefine the mathematically complicated Brownian motion with the Haar basis functions
(Schleicher (2002, p. 3)). Unfortunately, there are some serious problems with the simple
Haar basis. First, indicator functions are discontinuous, so approximating a continuous
function with them is not coherent. Secondly, the Haar construction is suitable only for
continuous or square-integrable functions defined on [0, 1].
To overcome these problems, Faber and Schauder replaced the indicator functions by
triangles. One then obtains a Schauder basis for the Banach space of continuous functions
on [0, 1] and the series constructed converges uniformly on [0, 1]. Notice that the Haar
system is not a Schauder basis because of the discontinuity of the Haar basis functions.
Schauder basis is superior to Fourier basis for studying local regularity properties. It can for
example be used to study the multifractal structure of Brownian motion. In contrast, the
trigonometric system does not allow direct and easy access to local regularity properties. It
is also known that the trigonometric system experiences trouble with localizing the energy
of a function. One solution to this problem is to manipulate the series by the so-called
Littlewood—Paley methods. One then essentially analyzes a function in various scales.
This type of analysis has been extensively used in numerical image processing and it is
akin to wavelet analysis.
6
Other orthonormal wavelet basis were discovered too. For example, in 1927 Philip
Franklin created an orthonormal basis from the Schauder basis by using the Gram—Schmidt
process. The Franklin basis can decompose any function in L2[0, 1], and it works well in
relatively irregular cases too. The main problem with the Franklin basis is its complex
algorithmic structure: Franklin wavelets are not derived from a fixed wavelet function.
Another direction of 1930s was that of Lusin’s who worked with Hardy spaces, a space
that can be identified with a closed subspace of Lp(R). These spaces play an important
role in signal processing today. Guido Weiss and Ronald R. Coifman were the first to
interpret Lusin’s theory in terms of ”atoms” (the simplest elements of the function space)
and ”atomic decompositions”. The aim is then to find these atoms and assembly rules that
allow the reconstruction of all the elements of the function space. For example, the Haar
system is the simplest atomic decomposition for the spaces Lp[0, 1], 1 < p <∞. One canuse the so-called Calderon’s identity to look at an atomic decomposition. This identity was
rediscovered by Grossman and Morlet in 1980, although their original interpretation was
in terms of quantum mechanics. They came up with the notion of an analyzing wavelet
ψ (nowadays known as the mother wavelet) and wavelets ψ(a,b) (defined later). Grossman
and Morlet were also the first ones to define wavelet coefficients as the inner product
of a function to-be-analyzed and wavelets. They also gave an inversion formula for the
synthesis of a function from the wavelet coefficients.
It was however not until the work of Stephane Mallat and Yves Meyer in the late
1980s that wavelets entered mainstream science. By combining signal processing theory
of quadrature mirror filters and orthonormal wavelet basis, Mallat came up with the con-
cept of a ”multiresolution analysis”. This notion succeeded in uniting different aspects of
wavelet theory and gave an elegant way of constructing wavelets. After Mallat’s contri-
bution one more major breakthrough took place: in 1988 Ingrid Daubechies constructed
”consumer-ready” wavelets with a preassigned degree of smoothness which made applied
work much easier.
Although applied in engineering for nearly two decades now, the applications of wavelets
in economics and finance started to emerge as late as the mid 1990s. One of the reasons
was that most of the theory of wavelets was developed in the context of deterministic func-
7
tions, not stochastic processes. Indeed, statistical theory and applications lagged behind
(the first most important ones being e.g. Donoho (1992), Donoho and Johnstone (1992)
and Donoho et al. (1993, 1995)). Today however, as Schleicher (2002) notes, the number
of succesful applications indicates that ”wavelets are on the verge of entering mainstream
econometrics”.
3 Essentials of Fourier and wavelet theories
The purpose of this section is to give the necessary technical background for the wavelet
multiresolution analysis (to be defined mathematically in the next section). It is suggested
that also those readers familiar with the basics of Fourier and wavelet theories would at
least skim through this section to be acquintant with the notation used.
3.1 Fourier theory
3.1.1 Introduction
Not long ago many thought that the mathematical world was created out of
analytic functions. It was the Fourier series which disclosed a terra incognita
in a second hemisphere. — E. B. van Vleck 1914 (Bachman et al. (2000, p.
159).)
Begin by considering a Taylor series approximation of a function. Obviously the quality
of this approximation depends on the number of polynomials included. A defect of this
approach is that for a function to have a Taylor series, it must (among other things) be
infinitely differentiable in some interval. On the other hand, sines and cosines are more
versatile ”prime elements” than powers of t. Indeed, sines and cosines may be used not
only to approximate non-analytic functions but also wildly discontinuous ones. Even the
periodicity of sines and cosines does not constitute a very serious limitation. Sines and
cosines triumphantly approximate functions f on [−π,π] in the sense that their Fourierseries converge (in some sense) to f . (Bachman et al. (2000, 139).)
8
In what follows, some basic concepts from functional analysis are required (see App.
A). Standard abbreviations are used throughout the thesis (see App. F) as well as the
following notation:
Notation 1 The boldfaced R denotes the set of real numbers, C denotes the set of complex
numbers, andK = R or C without specifying which. As usual, Z denotes the set of integers
and N denotes the set of natural numbers.
An integrable function f ∈ Lr1[−π,π] (the superscript r refers to real-valued elementsof L1[−π,π]) can be approximated by its trigonometric form of the Fourier series
a02+Xn∈N
an cosnt+Xn∈N
bn sinnt,
where
an =1
π
Z π
−πf(t) cosntdt
= hf, cosnt/πi , n ∈ N ∪ {0}
(i.e the inner product of f and the cosine-term), and
bn =1
π
Z π
−πf(t) sinntdt
= hf, sinnt/πi , n ∈ N,
are the cosine and sine Fourier coefficients of f , respectively (Bachman et al. (2000, pp.
139—40)).3 The coefficients an and bn tell in what way the analyzing function, i.e., cosines
and sines, needs to be modified in order to reconstruct the ”signal” which is in this case
a deterministic function.
Equivalently, Xn∈Z
F [f ](n)eint
3An early attempt to use trigonometric functions to approximate a function was done by Daniel
Bernoulli in 1753 when trying to solve the so-called wave equation ∂2w/∂t2 = a2∂2w/∂x2. In 1757 Euler
came up with the integral form of the Fourier coefficients. (Bachman et al. (2000, p. 140).)
9
is the exponential form of the Fourier series for f ∈ L1[−π,π], where the nth Fouriercoefficient
F [f ](n) = 1
2π
Z π
−πf(t)e−intdt, n ∈ Z,
is the finite Fourier transform of f evaluated at n. It is said to be ”finite” because
the domain of integration is finite. (Bachman et al. (2000, p. 264).)
3.1.2 Continuous Fourier transform
Clearly, one must be able to handle functions defined for all real t instead of just [−π,π].For any f ∈ L1(R), then, the Fourier transform of f is
F [f ](ω) =Z +∞
−∞f(t)e−iωtdt,
and its inverse Fourier transform4,
f(t) =1
2π
Z +∞
−∞F [f ](ω)eiωtdω.
(Bachman et al. (2000, p. 278).) Intuitively, the Fourier transform serves as a bridge
between the time domain and the frequency domain by eliminating all time resolution
and leaving only frequency resolution. The Fourier transform basis functions are indexed
by a single parameter ω. In physical terms this means that the Fourier transform is char-
acteristic of the global behavior of the function. The physical concept of ”frequency” is
related purely to the family of coupled exponential functions that are used in the Fourier
transform (Priestley (1996, p. 90)).
Example 2 (Rectangular pulse) The Fourier transform of an indicator function of
[−a, a], i.e. of
f(t) = 1[−a,a](t) =
1, |t| ≤ a,0, |t| > a,
, a > 0,
4The inversion formula holds for a large class of functions f ∈ L1(R). However, integrable functionscan have non-integrable Fourier transforms. An example of such a function is f(t) = e−tU(t), where
U(t) = 1[0,∞] is the unit step function. Then F [f ](ω) = 1/(1+ iω). But for large ω, |F [f ](ω)| ≈ 1/ |ω|, soF [f ](ω) /∈ L1(R). (Bachman et al. (2000, Ch. 5.5).)
10
is F [1[−a,a]](ω) = sin aωω/2
/∈ L1(R). Thus one has a non-integrable transform even in the verytame case of a time-limited f ; ”time-limited” because the function f of time t vanishes
outside a closed interval. (Bachman et al. (2000, p. 281).)
The general properties of the Fourier transform are skipped here (see e.g. Bachman et
al. (2000, Ch. 5.5)), but the following key concept must be introduced: the convolution
for f, g ∈ L1(R) isf ? g(t) =
Z ∞
−∞f(t− x)g(x)dx.
It satisfies
F [f ? g](ω) =Z ∞
−∞f ? g(t)e−iωtdt = F [f ](ω)F [g](ω).
It is important to realize that from a filtering point of view convolution acts like a linear
filter. Because only ”past values” are being transformed, the filter is causal. If the ouput
is a finite sequence, one has a finite impulse response (FIR) filter.5 FIR-filters are
commonly used in technical analysis of financial markets; just consider a simple moving
average of a finite period for example. More generally, filters are used in economics and
finance filters to extract components of time series such as trends, seasonalities, business
cycles, and noise (see e.g. Hamilton (1994) and Gencay et al. (2002a)). Identification and
extraction of components is important in terms of modeling and inference.6
Example 3 (Rectangular pulse and smoothing) The convolution of a function f ∈L1(R) with a rectangular pulse 1[−a,a] results in a pulse that is generally smoother than f :
f ? 1[−a,a] =
Z ∞
−∞f(t− x)1[−a,a]dx
=
Z a
−af(t− x)dx
=
Z t+a
t−af(u)du,
5”Impulse response” refers to the fact that when convolving an (unit) ”impulse”, i.e. the so-called
Dirac’s delta function, then the output is ”response” to that convolution operation.6Some of the most well-known filters used in economics and finance are the exponentially weighted
moving average, Hodrick—Prescott filter, and Baxter—King filter. In mean square error sense optimal linear
filters are called Wiener filters. Filters based on the use of state-space technique and recursive algorithms
are called Kalman filters. See Gencay et al. (2002a, Chs. 2.4 and 3).
11
where u = t− x (Bachman et al. (2000, p. 292)). This operation eliminates spikes fromf caused by transient phenomena in the manner of a moving average.
Notice that all of the above has been defined for f ∈ L1(R). Functional analysis saysthat for finite intervals [a, b], the bigger p gets, the smaller Lp[a, b] becomes: q > p implies
Lq[a, b] ⊂ Lp[a, b]. But the space Lp(R) does not descend as p increases. It is thereforepossible that a square-integrable function does not belong to L1(R) and have a well-defined
Fourier transform.7 It turns out, fortunately, that it is possible to establish L2-versions of
many results for L1 functions (see Bachman et al. (2000, Ch. 5.18)). In particular, for
f, g ∈ L2(R) it holds thatkfk22 =
1
2πkF [f ]k22 ,
i.e. the Fourier transform preserves energy (Bachman et al. (2000, p. 358)). So the
Fourier transform is (practically) a linear isometry of L2(R) onto L2(R).8
In anticipation of the wavelet transform, a refinement of the Fourier transform is worth
a discussion (for more details, see e.g. Priestley (1996) and Daubechies (1990)). The
windowed Fourier transform is (Bachman et al. (2000, p. 480))
(Tg,bf)(ω) =
Z ∞
−∞f(t)e−iωtg(t− b)dt,
where the window function g ∈ L2(R) is such that tg ∈ L2(R) (and the ”bar” denotes acomplex conjugate). This transform, better known as the short-time Fourier trans-
form (STFT), attempts to balance between time and frequency by sliding a window
across the time series and taking the Fourier transform of the windowed series. The result
is a function of two parameters, frequency and time-shift, so it is time-dependent. The
problem of the STFT is that the window size is fixed with respect to frequency. So the
STFT still suffers from the lack of time resolution and thus does not purely characterize
the properties of the function f. Of course, in theory, one could change the width of the
7For example, the unit step function f(t) = 1tU(t− 1) belongs to L2(R) but not to L1(R) (Bachman
et al. (2000, p. 356)).8A linear map A : X → Y (where X and Y are normed spaces) is said to be a linear isometry if
kAxk = kxk for every x ∈ X. This notion ”enables us to exclude many apparently different objects as onlytrivially different”. (Bachman et al. (2000, p. 25).)
12
window as a function of frequency ω but then the relationship between ω and the width of
the g would be quite arbitrary. Furthermore, the physical interpretation of (Tg,bf) is open
to the following question: ”what exactly it is measuring and what type of representation
for f leads to a meaningful interpretation of (Tg,bf)” (Priestley (1996, p. 91)).
The choice of the window g would appear to be very important in applications.9 If it
is chosen so that it decays to zero very fast, then the integral above operates only over a
very small time domain ”width”. But then, according to the ”uncertainty principle” (see
e.g. Priestley (1988, p. 151)), the STFT loses resolution in frequency domain (and vice
versa). The most famous example is the Gaussian window ga(t) =1
2√πae−t
2/4a resulting in
the Gabor transform (Bachman et al. (2000, p. 480))
G(a, b; f)(ω) =
Z ∞
−∞f(t)e−iωt
1
2√πae−(t−b)
2/4adt.
This window decays to zero as b→ ±∞ (and integrates to one, unlike wavelets).
3.1.3 Discrete Fourier transform
In some troublesome cases a closed form evaluation cannot be found and one has to use
approximation. The range of integration is then truncated to an interval [a, b] and the
integral for F [f ] is approximated by a finite sum such as F [f ](ω) ≈PN−1k=0 f(tk)e
−iωtk∆t.
From a discrete perspective, one is dealing with the values of f at only a finite number of
points {0, 1, ..., N − 1}. Consider f as defined on the cyclic group of integers modulo thepositive integer N ,
ZN = Z/(N) (Z modulo N), where (N) = {kN : k ∈ Z},
and
f : ZN → C, k + (N) 7→ f(k).
9Actually the choice of a window is a theoretically interesting question too. It was proved in 1981 by
Balian (incompletely, though) that one can not have an orthogonal representation with windowed Fourier
analysis when the window is reasonably regular and well-localized (Gaussian, say) (Hubbard (1998, p.
38)). It is thus a small miracle that very smooth (i.e. infinitely differentiable) orthogonal wavelets do
indeed exist as shown later (in Sec. 4).
13
This function f can be viewed as the N-periodic function defined on Z by taking
f(k + nN) = f(k) on Z, for k = 0, 1, ..., N − 1, n ∈ Z.
(Bachman et al. (2000, pp. 383—5).)
Since ZN is finite, any function defined on it is integrable. Thus L1(ZN) = L2(ZN) =
CN , the collection of all functions f : ZN → C. One gets the discrete Fourier trans-
form (DFT) for f : ZN → C,
D[f ]k =N−1Xt=0
fte−i2πtk/N ,
and its inverse discrete Fourier transform,
ft =1
N
N−1Xk=0
D[f ]kei2πtk/N
(Bachman et al. (2000, p. 390)).
Analogously to the continuous case, the energy preservation for discrete f, g ∈ L1(ZN) =L2(ZN) = C
N isN−1Xk=0
|fk|2 = 1
N
N−1Xk=0
|D[f ]k|2
(Bachman et al. (2000, p. 393)). Another analogy is the discrete (cyclic) convolu-
tion, f ? gt of f, g ∈ L1(ZN) defined at each t ∈ ZN by
f ? gt =N−1Xu=0
fugt−u,
satisfying
D[f ? g]k =N−1Xt=0
f ? gte−i2πtk/N = D[f ]kD[g]k
(Bachman et al. (2000, p. 397)). Notice that this can also be written as (Percival and
Walden (2000, pp. 29—30)),
f ? gt =N−1Xu=0
fugt−umodN , for t = 0, ..., N − 1,
14
where the ”modulo operation” is to remind that one is in fact using a periodic extension,
i.e., g−1 = gN−1, g−2 = gN−2, and so forth.10 Yet another way of writing the discrete
convolution is in a matrix form:
f ? g0
f ? g1...
f ? gN−1
=[N×N ]
g0 gN−1 gN−2 · · · g1g1 g0 gN−1 · · · g2...
......
. . ....
gN−1 gN−2 gN−3 · · · g0
f0
f1...
fN−1
,
where the N × N matrix’s rows are just circularly shifted versions of each other, i.e.,
vectors g(1),g(2), ...,g(N−1),g(0) that are shifted to the right by amount k = 0, ..., N − 1.Finally notice the DFT can be represented in matrix form as D[f ] = MNf, f ∈ CN ,
where MN is a N × N matrix. I do not discuss the structure of this matrix here (see
e.g. Bachman et al. (2000, Ch. 6)), but just comment briefly on its computational
complexity. Namely, to evaluate the DFT requires N2 multiplications and N(N − 1)additions. However, the fast Fourier transform (FFT) algorithm forN = 2k, a clever
way of factorizing the matrix MN , reduces the multiplications to something proportional
to N log2N. This reduction has been so important for a wide range of applications that
the FFT has been called ”the most valuable numerical algorithm of our lifetime” (Strang
(1993)). And although the modern literature on the FFT starts from the classic article of
Cooley and Tukey (1965), the notion can be traced back as far as Gauss. (Bachman et al.
(2000, Ch. 6).)
3.2 Wavelet theory
3.2.1 Introduction
One of the main ideas of wavelet analysis is to use functions different from sinusoids to ap-
proximate a function.11 The crucial difference to Fourier analysis is that in wavelet analysis
10More precisely, consider the expression ”jmodN”. If 0 ≤ j ≤ N − 1, then jmodN .= j. But if j is
any other integer, then jmodN.= j+nN , where nN is the unique integer such that 0 ≤ j+nN ≤ N −1.
(Pervival and Walden (2000, p. 30).)11Almost all the literature on wavelet analysis deals with the representation of deterministic square-
integrable functions. Admittedly, this approach is not the most insightful when dealing with time series.
15
one is expressing a possibly continuous function in terms of discontinuous wavelets. By
stretching (dilating) and shifting (translating) a ”mother wavelet”, one is able to capture
features that are local both in time and frequency. This property alone makes wavelets
more suitable for analyzing non-stationary or transient signals which are often present in
nature, finance and economics. In other words, wavelet basis is more interesting compared
to the Fourier one because ”unlike sines and cosines, individual wavelet functions are quite
localized in space; simultaneously, like sines and cosines, individual wavelet functions are
quite localized in frequency or (more precisely) characteristic scale” (Press et al. (1992, p.
584)).
Wavelets are functions that satisfy the following two conditions (an exact definition of
a wavelet will be given later): Z ∞
−∞ψ(t)dt = 0, (1)Z ∞
−∞|ψ(t)|2 dt = 1. (2)
That is, wavelets have zero average and unit energy. This guarantees that a wavelet has
non-zero entries but that those entries must eventually cancel out. Originally, Morlet saw
the zero average condition as a physical necessity: seismic time series undergo compressions
and rarefactions that must eventually cancel out. (Percival and Walden (2000, Ch. 1.1).)
Example 4 (Morlet wavelet) The classic example of a continuous-time wavelet is the
Morlet wavelet (Gencay et al. (2002a, p. 102)):
ψMorlet(t) =1√2πe−iωte−
t2
2
(see Fig. 2).
There are two central concepts in wavelet theory that appear constantly: dilation and
translation. They should be therefore well understood. For a real-valued, square-integrable
function f(x) ∈ L2(R) a (dyadic) dilation by j ∈ Z is defined as fj,0(x) .= 2j/2f (2jx),where the factor 2j/2 guarantees energy preservation, i.e., kf(x)k2 = kfj,0(x)k2. On theother hand, a translation of a real-valued function f(x) by k is defined as f0,k(x)
.=
Nevertheless, I will follow the tradition and deal with stochasticity later (in Sec. 5).
16
-4 -2 0 2 4
-0.2
0.0
0.2
0.4
Morlet wavelet
Time
Re(
Mor
let)
Figure 2: Real portion of the Morlet wavelet in time domain. In frequency domain the
Morlet wavelet is well localized, too (not plotted here, though).
17
f (x− k) for all k ∈ Z. However, in practical applications the indexing of j often runsto the opposite direction and the definition of dilation must then be changed accordingly.
In this ”reverse” custom (used in Secs. 5 and 7), the jth level dilation of the kth level
translation of f(x) is defined as
fj,k(x).= 2−j/2f
¡2−jx− k¢ ,
where x ∈ (−∞,∞) and j, k ∈ Z (see e.g. Percival and Walden (2000, p. 459)).
3.2.2 Continuous wavelet transform
The wavelet transform was originally developed as an analysis and synthesis tool for finite-
energy signals x(t) ∈ L2(R). Loosely speaking, the wavelet transform is an intelligently
adaptive tool that, in Meyer’s words (from Hubbard (1998, pp. 33—35)), jumps ”straight
to essentials”; and in ”contrary to what happens with Fourier series, the coefficients of
the wavelet series translate the properties of the function or distribution simply, precisely
and faithfully”. Indeed, the wavelet transform is especially suited for analyzing signals
that display strong transients, for example discontinuity, rupture, or ”the unforeseen”.
More precisely, if f(t) has such singularities, these will affect only the coefficients at time
points near the singularities. In contrast, the standard Fourier transform described above
depends on the global properties of f(t) and any singularity in it will affect all coefficients.
The continuous wavelet transform (CWT) of f ∈ L2(R) by ψ ∈ L2(R) is aprojection of a function f onto a particular wavelet ψs,u(t):
(Wψf)(s, u) =
Z ∞
−∞f(t)ψs,u(t)dt,
where
ψs,u(t) =1√sψ
µt− us
¶is the dilated (by s) and translated (by u) version of the mother wavelet ψ (Bachman et
al. (2000, p. 483)). The most obvious difference between the Fourier transform and the
CWT is that the wavelet basis functions are indexed by two parameters instead of just
one. The scale s is assumed to be restricted to R+, which is natural since s may, although
tenuously, be interpreted as a reciprocal of frequency. As Priestley (1996, p. 90) explains,
18
this relationship may be established in the case of an oscillatory mother wavelet ψ (such
as Morlet), because then as s decreases the ”oscillations” become more intense and show
”high-frequency” behavior. Similarly, when s increases the ”oscillations” become drawn
out and show ”low-frequency” behavior.
So the CWT turns the function f(t) of one parameter t to continuous wavelet
coefficients (Wψf)(s, u) that depend on two parameters, scale and location. Obviously,
a large wavelet coefficient occurs when the wavelet ψs,u(t) and the function f(t) match
in shape. Thus the above integral measures the variation of f in the neighborhood of u,
whose size is proportional to s. In other words, by calculating the wavelet transform one is
analyzing the potentially complex structure of the function by decomposing it into simpler
components. The key element of wavelets is their adaptiviness to different components of
a signal. Namely, by using a small window one looks at high-frequency components and
by using a large window one looks at low-frequency components. For this reason wavelets
are often called a ”mathematical microscope”. The capability of zooming by changing
the width of the window differentiates the CWT from the STFT. And because of the
time indexing of wavelets (and their compactness), the projection onto a wavelet space is
essentially local. This should be contrasted with the projection in Fourier analysis which
is essentially global although some localization can be achieved by first convolving the
function with a filter that decreases rapidly in modulus of the weights (as explained in
Sec. 3.1.2). The uncertainty principle still holds true, however: in an ideal sense, one
cannot choose a mother wavelet in such a way that it achieves ”good localization” in both
the time and and frequency domains (Priestley (1996, p. 91)).
The wavelet transform is an energy preserving transformation,Z ∞
−∞|f(x)|2 dx = 1
Cψ
Z ∞
−∞
Z ∞
−∞|(Wψf)(s, u)|2 ds
s2du,
if the mother wavelet ψ ∈ L2(R) satisfies the so-called admissibility condition (Mallat(1998, Th. 4.3))
Cψ =
Z ∞
0
|F [ψ(t)](ω)|ω
dω <∞;
first proven by the mathematician Calderon in 1964. To guarantee the finitiness of the
above integral, one must have F [ψ(0)] = 0 meaning thatR∞−∞ ψ(t)dt = 0. Conversily,
19
if the zero average property holds and ifR∞−∞ (1 + |t|α) |ψ(t)| dt < ∞ for some α > 0,
then Cψ < ∞. So if the admissibility condition indeed does hold, then the inversecontinuous wavelet transform exists and any f ∈ L2(R) is given by the so-calledresolution of identity (or Calderon’s reproducing identity)
f(t) =1
Cψ
Z ∞
0
Z ∞
−∞(Wψf)(s, u)ψs,u(t)
ds
s2du.
That is, the function f(t) can be synthesized by reconstructing it from the corresponding
wavelet coefficients (Wψf)(s, u). In fact, this perfect reconstruction property of square-
integrable functions is the key property of wavelet transforms.
Other properties of the CWT are skipped here (see e.g. Vidakovic (1999, pp. 46—
47)). Also the regularity (smoothness) of scaling and wavelet functions, measured in
terms of Lipschitz (or Holder) exponents, are skipped because of their excess technicality.
One practically useful characterization of wavelets is their number of vanishing moments,
however. If the wavelet function ψ(t) and P−1 of its derivatives are everywhere continuousand satisfy certain regularity conditions, then ψ(t) has P vanishing moments, i.e.,Z ∞
−∞tpψ(t)dt = 0, for p = 0, 1, ..., P − 1.
Only if the wavelet function has P vanishing moments, can the wavelet function ψ(t)
and P − 1 of its derivatives be continuous — but not necessarily (Percival and Walden(2000, p. 483)).12 The continuity of the wavelet and its derivatives is important when
analyzing a signal in practice because it helps to prevent artifacts due the wavelet itself.
As a rule of thumb, the smoother the analyzing wavelet, the more ”reliable” the outcome.
Furthermore, a wavelet with P vanishing moments guarantees that one has a multiscale
differential operator of order P (see below).
12One can measure the local regularity of a signal with a wavelet that has enough vanishing moments.
The decay of the wavelet transform amplitude across scales is related to the uniform and pointwise
Lipschitz regularity of the signal. By zooming into signal structures with a scale going to zero one can
then measure the asymptotic decay. (Mallat (1998, Ch. 6.1).)
20
3.2.3 Discrete wavelet transform
The CWT is a highly reduntant transform.13 It is defined at all points in the time-
frequency plane and the wavelet coefficients contain more information than necessary for
the perfect reconstruction property to hold. By a clever discretization of the CWT one
can reduce the number of wavelet coefficients to the minimum while still preserving all
information of the function. By choosing s and u according to a rule that is known as the
critical sampling (Vidakovic (1999, Sec. 3.2)),
s = 2−j and u = k2−j,
one gets the discrete wavelet transform (DWT)
(Wψf)(j, k) = 2j/2Z ∞
−∞f(t)ψ(2jt− k)dt
=f,ψj,k
®, j, k ∈ Z.
Another possibility is to choose s = 2−j and u = k, in which case one has the maximal
overlap DWT (Gencay et al. (2002a, p. 106)). This transform has different properties
from the ”ordinary” DWT and it will be discussed in more detail later. The computational
aspects of these transforms are discussed later, as well.
The critical sampling gives the functions ψj,k(t) = 2j/2ψ(2jt − k) and the set of dis-
crete wavelet coefficients (Wψf)(j, k) for f ∈ L2(R), where j and k represent theset of discrete dilations and translations, respectively. Importantly, the functions ψj,k form
an orthonormal system called a wavelet basis. One has arrived at a key definition (see e.g.
Woljtaszczyk (1997, p. 17)):
Definition 5 Wavelet. A mother wavelet is a function ψ(t) ∈ L2(R) such that thefamily of functions, called wavelets,
ψj,k(t) = 2j/2ψ(2jt− k), j, k ∈ Z,
is an orthonormal basis in the Hilbert space L2(R).
13Typically the CWT is redundant by a factor of ten. Alternatively, the signal can be said to be over-
sampled meaning that Shannon’s sampling theorem would hold with less number of samples. (Hubbard
(1998, p. 36).)
21
Example 6 (Haar wavelet) The oldest, the most famous, and the simplest wavelet basis
is the Haar basis. It was discovered by the mathematician Alfred Haar in 1909. The
functions that create the Haar basis are constructed from
ψHaar(t) =
1 on [0, 1/2)
−1 on [1/2, 1)0 elsewhere
(see Fig. 3). The function ψHaar is in time space extremely localized but clearly not
continuous. However, in frequency space it is poorly localized (see Fig. 4). It enjoys a
special role because it is the only known symmetric compactly supported wavelet.
4 Multiresolution analysis
This section explains, mathematically, how wavelet methodology is used to decompose
a deterministic finite-energy function with respect to a resolution (time-scale). The key
concept of multiresolution analysis is introduced. Importantly, wavelets are shown to act
as linear filters. The construction of compactly supported wavelets belonging to the family
of Daubechies is discussed in detail.
4.1 Introduction
The process of understanding is always facilitated if more complicated struc-
tures are known to be synthesized from simpler ones — matter from molecules,
molecules from atoms, atoms from quarks, organisms from cells, integers from
primes. (Bachman et al. (2000, p. 139).)
Economists often emphasize the importance of discerning between long-run and short-
run behavior. The distinction between permanent and transitory shocks, or the distinction
between equilibrium and the efficiency of dynamic adjustment are examples of situations
that involve the notion of time-scale. In econometrics, a cointegrated vector autoregressive
model is an implementation of the idea of long-run equilibrium supplemented with short-
run dynamics as Diebold (2004) notes. But there is no a priori reason why restrict to this
22
-4 -2 0 2 4
-1.0
-0.5
0.0
0.5
1.0
(0,0)
-4 -2 0 2 4
-1.0
-0.5
0.0
0.5
1.0
(-1,0)
-4 -2 0 2 4
-1.0
-0.5
0.0
0.5
1.0
(0,1)
-4 -2 0 2 4
-1.0
-0.5
0.0
0.5
1.0
(-1,1)
Figure 3: The Haar mother wavelet ψ being first translated by 1 to yield ψ0,1, dilated by
−1 to yield ψ−1,0, and finally dilated by −1 and translated by 1 to yield ψ−1,1.
23
Haar function in time domain
0 2000 4000 6000 8000 10000
-1.0
0.0
0.5
1.0
Haar function in frequency domain
Index
0 20 40 60 80 100
020
040
060
0
Figure 4: Time and frequency representation of the Haar function. In time domain the
Haar function is extremely localized, but in frequency domain it is very dispersed.
24
simplistic dichotomy instead of generalizing into multiple time horizons! This has led to
a growing awareness of the importance of different time-scales on economic and financial
actions and decisions. In fact, one of the central themes running through Nobel laureate
Engle’s work is the difference in economic dynamics across frequencies. For example, Engle
(1973) explicitly considered the idea of decomposition of dynamics by frequency through
band-spectral regression. More recent examples that use a time-scale dependent approach
in economics are Shleifer and Vishny (1990) and Osler (1995).
Not surprisingly, time-scale dependency is found useful also in finance where the het-
erogeneity of investors (see e.g. Muller et al. (1993)) makes it natural to study different
time-scales. Consider for example data on interest rates and bond maturities that encap-
sulate the distinctions between decision makers with different time horizons. Furthermore,
the recent availability of high-frequency financial data has allowed the analysis of high-
frequency dynamics such as microstructure noise, intraday calender effects, and the arrival
of quotes and trades in real time (see e.g. Engle and Russell (1998) and Engle (2000)).
A recent example taking a multiscale look at high-frequency data is Ghysels et al. (2003)
who use the so-called ”mixed data sampling” regressions to compare the predictive per-
formance of various models of volatility at several different ”frequencies”.
Wavelet methodology is by its nature a multiscale approach: by using several different
combinations of dilations and translations one is able to capture the information hidden
in several different time-scales. In a selective review article, Ramsey (1996) describes
the time-scale decomposition based on wavelets as their ”most promising” opportunity
in economics. In particular, Ramsey regards the investigation of potentially time varying
phase relationships (causalities) between variables of interest as a uniquely well suited
for wavelet decompositions. Such time varying relationships are studied in, for example,
Ramsey and Lampart (1998a) and Ramsey and Lampart (1998b). In the former, the
causality between money and income is found to be ambiguous: the direction of the
relationship depends on the level of the time-scale. Extending this to European countries
entering the European Monetary Union, Chew (2001) found that the lower frequency
components of volatility of velocity are primarily affected. Similarly, Atkins and Sun (2003)
found that the Fisher effect, i.e. the empirical relationship between nominal interest rates
25
and inflation, cannot be identified at the short time-scale, but at the largest time-scale.
In stock and foreign exchange rate markets the success of time-scale decomposition
seems to be less persuasive, though. Ramsey (2002) argues that this might be so because
there is much greater degree of ”mixing” inherent in these data originating from rapid and
extensive arbitrage activities. Maybe so, but this does not necessarily imply a lost case.
For example, Gencay and co-authors report several interesting applications of wavelets in
finance (see Gencay et al. (2002a)). In particular, Gencay et al. (2002c) study the robust-
ness of systematic risk (beta of an asset) across time-scales using wavelets and conclude
that the beta varies as a function of time-scale. In a financial risk management related
paper, Gencay et al. (2002b) use hidden Markov models in combination with wavelets
to study the asymmetry of information flow between volatilities across time-scales. The
lesson to be learned is that time-scale decompositions can provide very useful information
of the underlying complex dynamics of financial markets. At the bare minimum, one will
obtain a first-hand qualititative description that can be utilized in risk management, for
instance.
In the next subsection the concept of multiresolution analysis is first discussed in a fairly
formal way. Theorems and proofs are avoided to make the exposition more readable. Those
looking for a more rigorous treatment can for example turn to Bachman et al. (2000), who,
quite amusingly in their own Foreword, wonder if their coverage might be best described
as ”wavelets for idiots?” (mine is for econometricians?).
4.2 Multiresolution composition
One of the central theoretical questions of wavelets is how to construct an orthonormal
basis different from the Fourier basis. Even the existance of such a basis is not obvious
for a smooth function. To solve this problem one can use a technique called (dyadic)
multiresolution analysis (MRA) originally developed by Yves Meyer and Stephane
Mallat in the 1980s. In short, the construction relies heavily on the fact that MRAs
produce an orthogonal direct sum decomposition of L2(R) (shown in this subsection). It
then follows that an MRA is able to produce a mother wavelet ψ(t) such that the wavelets
ψj,k(t) = 2j/2ψ(2jt− k), where j, k ∈ Z, comprise an orthonormal basis for L2(R) (see the
26
next two subsections). Indeed, one often sees statements like ”good wavelets are usually
constructed starting from a multiresolution analysis” (Woljtaszczyk (1997, p. 17)).14
Definition 7 Multiresolution analysis. A sequence of closed subspaces {Vj : j ∈ Z}of L2(R) together with a function ϕ ∈ V0 is called a multiresolution analysis if itsatisfies the following conditions (e.g. Bachman et al. (2000, p. 414) and Vidakovic
(1999, p. 51)):
(a) · · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · (Increasing);(b) Closure
³Sj∈Z Vj
´= L2(R) (Density);
(c)Tj∈Z Vj = {0} (Separation);
(d) f(t) ∈ Vj if and only if f(2t) ∈ Vj+1 (Scaling);(e) There exists a scaling function (of the MRA) ϕ ∈ V0 whose integer translates span
the space V0, i.e., V0 =©f ∈ L2(R) | f(t) =
Pk∈Z gkϕ(t− k)
ª, and the set {ϕ(t−k) : k ∈
Z} is an orthonormal basis for V0 (Orthonormality).
In words, Condition (a) means that the signal to be analyzed at given resolution
contains all the information of the signal at coarser resolutions.15 Condition (b) means
that any signal can be approximated with arbitrary precision. Condition (c) means that
the function 0 is the only object common to all the spaces Vj. Condition (d) shows that
there is really only one space, e.g. V0 — all the other spaces are scaled versions (resolutions)
of the prototype V0. And finally, Condition (e) means that the scaling function must be
14Dyadic MRAs have been powerful in image compression and noise reduction. However, also non-
dyadic MRAs exist. There the study has concentrated on generalizing dyadic wavelets toM -band wavelets
(Steffen et al. (1993)) and dyadic wavelet analysis to mixed-radix wavelet analysis (Pollock and Lo Cascio
(2003, 2004)). The motivation behind non-dyadic constructions is that the dyadic scheme might be
too restrictive in empirical time series because there is no a priori reason why economic and financial
structures, in particular, should stay within dyadic bands. Nevertheless, in what follows I will only
consider a dyadic MRA.15One should be very careful with the dilation index j here. The above convention is due Mallat, but in
applications (algorithms) one often prefers Daubechies’ convention where the indexing runs in the opposite
direction. For this reason I will follow Mallat’s convention in this section but use Daubechies’ in the next
ones that deal with empirics.
27
orthonormal to its translates by integers.16
It is assumed that the scaling function ϕ satisfiesR∞−∞ ϕ(t)dt 6= 0 (made more precise
later) which obviously differentiates it from wavelets (see Eg. (1)). By Conditions (d)
and (e), the functions ϕ1,k(t) =√2ϕ(2t − k), k ∈ Z, constitute an orthonormal basis in
V1. Since V0 ⊂ V1, the scaling function ϕ belongs to V1 and can be represented as a linear
combination of functions from V1:
ϕ(t) =Xk∈Z
gk√2ϕ(2t− k) (dilation equation), (3)
where the gk’s are the Fourier coefficients of ϕ in V1 with respect to the basis {√2ϕ(2t−n),
n ∈ Z} :gk =
Z +∞
−∞ϕ(t)ϕ1,k(t)dt =
Dϕ(t),
√2ϕ(2t− k)
E.
The dilation equation (Eq. (3)) is also known as the two-scale difference equation, relating
two scales to each other.
By assuming that functions satisfying the conditions (a)—(e) of an MRA exist (they
do in fact!) it follows that there exists a wavelet that ”with its translates by integers
and dilates by a factor of two, can encode the difference of information between the signal
seen at two successive resolutions” (Hubbard (1998, p. 173)).17 In purely mathematical
terms this means that the space Wj, called resolution level j of an MRA, associated
with the wavelet is orthogonal to the space Vj, i.e. Wj = V ⊥j (the superscript refers to
orthogonality) and it represents the difference between Vj and Vj+1 :
Vj+1 = Vj ⊕Wj,
i.e. every x ∈ Vj+1 can be uniquely expressed as a sum x = y + z, y ∈ Vj, z ∈ Wj and
Vj ∩Wj = {0}. By iteration one then finds that
Vn+1 = V0 ⊕³⊕+∞
n=0Wn
´, n ∈ N,
16The requirement of orthogonality can be relaxed to obtain a Riesz MRA. Namely, it is sufficient to
assume that {h(t − k) : k ∈ Z} be a Riesz basis for V0 which was Mallat’s original assumption (see e.g.Mallat (1998)). There would then exist a function ϕ ∈ V0 such that {ϕ(t− k) : k ∈ Z} is an orthonormalbasis for V0.17These conditions are not independent of each other, however: Conditions (a), (d), and (e) imply (c)
(Bachman et al. (2000, 449—50)). Furthermore, Conditions (a) and (e) are not at all obvious.
28
and the closed subspaces V0,W0,W1, . . . are mutually orthogonal by construction. And
since (by similar arguments) V0 = ⊕+∞n=1W−n, it follows that an MRA ((Vn),ϕ) produces
an orthogonal direct sum decomposition of L2(R):
L2(R) =⊕n∈ZWn,
where Wn = V⊥n in Vn+1 (Bachman et al. (2000, pp. 417—8)).
Concretely, the above means that any finite-energy function can be decomposed in
terms of scaling functions and wavelets. In specific, the so-called inhomogenous wavelet
expansion is
x(t) =Xk
v0,kϕ0,k(t) +∞Xj=0
Xk
wj,kψj,k(t),
where the v0,k coefficients summarize the general form of the function and the wj,k coef-
ficients represent the local details. These coefficients, known as the scaling and wavelet
coefficients, are calculated as, respectively,
v0,k =
Z ∞
−∞x(t)ϕ0,k(t)dt and wj,k =
Z ∞
−∞x(t)ψj,k(t)dt.
This can also be formulated in terms of wavelets only by the so-called homogenous
wavelet expansion
x(t) =∞Xj−∞
Xk
wj,kψj,k(t),
where the ”reference” space V0 is being eliminated.18 (Hardle et al. (1998, p. 28).)
4.3 Filters and wavelets
Wavelets can be regarded as special kind of ”band-pass” filters. In the present context, the
notion of a filter is defined mathematically as follows (Bachman et al. (2000, pp. 423—4)):
Definition 8 Filter. Let ((Vn),ϕ) be an MRA. Then any g ∈ V1 can be written in termsof the orthonormal basis {√2ϕ(2t− k) : k ∈ Z} for V1 as
g(t) =Xk∈Z
bk√2ϕ(2t− k).
18One could as well choose Vj0 , j0 ∈ Z, as the reference space. One then obtains the same expansionformulas with 0 replaced by j0. (Hardle et al. (1998, p. 28).)
29
SinceP
k∈Z |bk|2 <∞, one can form the 2π-periodic function
mg(ω) =1√2
Xk∈Z
bke−ikω ∈ L2(T),
where T represents the circle group, i.e., a multiplicative group consisting the points of
the unit circle T = {z ∈ C : |z| = 1} of the complex plane. The function mg is called the
filter associated with g.
The function mg is better known as the transfer function or the frequency
response function (FRF). In general, a transfer function is able to classify a filter.
The use of the transfer function mϕ, in particular (notice the subscript ϕ), is that it
enables one to express a Fourier transform F [ϕ] of ϕ ∈ V1 in terms of mϕ and a Fourier
transformed scaling function F [ϕ]. Namely, by taking a Fourier transform of the dilation
equation (Eq. (3)) one has (Vidakovic (1999, p. 53))
F [ϕ](ω) =Xk
√2gk
Z ∞
−∞ϕ(2t− k)e−iωtdt
=Xk
gk√2e−ikω/2
Z ∞
−∞ϕ(2t− k)e−i(2t−k)ω/2d(2t− k)
= mϕ
³ω2
´F [ϕ]
³ω2
´(filter equality), (4)
for ϕ ∈ V1; or equivalently, F [ϕ](2ω) = mϕ(ω)F [ϕ](ω). This implies that different
mϕ(ω/2) give different scaling functions.
In Bachman et al. (2000, Th. 7.4.8) it is shown that a similar equality holds for a
mother wavelet ψ:
F [ψ](ω) = e−i(ω2+π)mϕ
³ω2+ π
´F [ϕ]
³ω2
´.
By the inverse Fourier transform one then recovers a mother wavelet ψ ∈ V0 whose integertranslates {ψ(t − n) : n ∈ Z} form an orthonormal basis for W0 = V1 ∩ V ⊥0 , and thatsatisfies
ψ(t) =Xk∈Z
hk√2ϕ(2t− k), (5)
where the wavelet coefficients are hk = (−1)kg1−k, and (gk) ∈ `2(Z) the scaling coefficients.It follows that ϕ and ψ have the same regularity properties.
30
Importantly, the filter (transfer function, FRF)
mϕ(ω) =1√2
Xk∈Z
gke−ikω
is a low-pass filter associated with the scaling function ϕ, defined as follows:
Definition 9 low-pass filter. A ”system” consists of an output v, an impulse response
h, and an input u. If the Fourier transformed impulse response h in a system F [v](ω) =F [h](ω)F [u](ω) is zero for high-frequency components, i.e. if F [h](ω) = 0 for |ω| ≥ ω0,
then the system is called a low-pass filter.
A simple discrete example of a filter of finite length (width) 2 is the following:
Example 10 (2-period simple moving average) Gencay et al. (2002a, p. 35) con-
sider a moving average of the form yt =12(xt + xt−1) having the FRF
1
2
¡1 + e−iω
¢=1
2
¡e−iπf + eiπf
¢e−iπf = [cos(πf)] e−iπf .
When f = 0 then the FRF is 1, and when f approaches 1/2, then the FRF approaches 0
implying that this is a low-pass filter.
In practice, this means that convolving the scaling function with data gives the em-
pirical scaling coefficients which are an approximation of the original series with high-
frequency detail filtered out. Notice that when the filter is of finite length, FRF is often
defined a bit differently (as in Percival and Walden (2000)). I adapt to this by using a
parallel notation and define
G(f).=
L−1Xl=0
gle−il2πf
for the transfer function of a scaling filter gl of length L. Especially notice that the scaling
factor has changed by√2. It can now be shown that such a filter must satisfy:
L−1Xl=0
gl = ±√2, (6)
L−1Xl=0
g2l = 1, andL−1Xl=0
glgl+2n = 0, (7)
for all non-zero integers n (Percival and Walden (2000, p. 76)).
Similarly than above:
31
Definition 11 High-pass filter. If F [h](ω) = 0 for |ω| ≤ ω0, then the system is called
a high-pass filter.
Example 12 (2-period moving difference) Gencay et al. (2002a, p. 38) consider a
moving difference yt =12(xt − xt−1) having the FRF
1
2
¡1− e−iω¢ = 1
2
¡e−iπf − e−iπf¢ eiπf = sin(πf)ie−iπf .
When f = 0 then the FRF is 0, and when f approaches 1/2, then the FRF approaches 1
implying that this a high-pass filter.
Definition 13 Band-pass filter. If F [h](ω) = 0 for ω1 ≤ |ω| ≤ ω0, then the system
is called a band-pass filter.
The wavelet filter acts as a high-pass (or band-pass) filter. By analogy, therefore, the
FRF H(f) for hl is defined as
H(f).=
L−1Xl=0
hle−i2πfl.
In practice, convolving a wavelet filter with data gives the empirical wavelet (sometimes
called differencing) coefficients, i.e. the details of the original series with low-frequencies
filtered out. It is important to notice that the above two filter types are interconnected
in a simple, rather remarkable way. By what are known as the quadrature mirror
relationships (QMRs)19,
gl.= (−1)l+1hL−1−l or hl .= (−1)l+1gL−1−l, (8)
a wavelet filter hl of length L must satisfy (cf. Eqs. (1) and (2)):
L−1Xl=0
hl = 0, (9)
19Mathematically speaking, the QMRs (below) and hk = (−1)kg1−k implying gk = (−1)k−1h1−k (usedabove) are equivalent (Percival and Walden (2000, p. 79)). The latter one is useful for filters with infinite
width since the formula does not include the filter width L. I prefer to use the former however because
in applications filters are of finite width.
32
L−1Xl=0
h2l = 1, andL−1Xl=0
hlhl+2n = 0 (orthonormality), (10)
for all non-zero integers n (Percival and Walden (2000, p. 69)).
Example 14 (Haar filter coefficients) The Haar scaling and wavelet filter coefficients
are, respectively,
gHaar0 = 1/√2, gHaar1 = 1/
√2, hHaar0 = 1/
√2, and hHaar1 = −1/
√2.
The QMRs (Eq. (8)) and the three conditions for both filters (Eqs. (6), (7), (9) and (10))
are easily checked to hold.
Example 15 (Daubechies 4 filter coefficients) The Haar filter belongs to the family
of Daubechies filters (see Sec. 4.5) for which there are, in general, no explicit time-domain
formulae. However, in the case of ”Daubechies 4”, the unique solution of scaling filter
coefficients is (see e.g. Press et al. (1992, pp. 585—6)):
gDaub0 = (1 +√3)/4√2, gDaub1 = (3 +
√3)/4√2,
gDaub2 = (3−√3)/4√2, and gDaub3 = (1−√3)/4√2.
The QMRs can then be used to find the wavelet filter coefficients, for instance.
The squared gain function (SGF) associated with the wavelet filters is defined as
H(f) .= |H(f)|2, where the gain function |H(f)| represents the magnitude of the FRF.The orthonormality property of wavelet filters in time domain (Eq. (10)) can then be
stated in terms of SGFs. By defining G(f) .= |G(f)|2 , the analogous formula for scalingfilters (Eq. (7)) becomes (Percival and Walden (2000, p. 69))
G(f) + G(f + 1/2) = 2 for all f. (11)
The FRF (of wavelet filters) can also be expressed in polar notation as H(f) =
|H(f)| eiθ(f), where the phase function θ(f) represents the phase of the FRF (the same
is true for scaling filters, of course). It is worthwhile to notice that neither the gain function
nor the SGF carry any phase information but that the gain function is handy in visualizing
33
the frequency properties of a filter. For example, when plotted as a function of frequency,
the gain function of an ideal band-pass filter would have well defined cut-off frequencies
(cf. Def. 13). In practice, however, an ideal filter is computationally not realizable as
it would require infinitely many coefficients, and one must settle for an approximation.
(Gencay et al. (2002a, Sec. 2.3.1).)
The case of θ(f) = 0 for all f is special because then H(f) = |H(f)| ; such a filter iscalled a zero phase filter. Zero phase filters are important in practice because they do
not shift the location of a discontinuity from the original series. The filter in Example
10 is not a zero phase filter because its phase is πf. But a centered moving average is
a zero phase filter (see Gencay et al. (2002a, pp. 35—36)). If a filter has a zero phase,
then θ(f) = 2πfν corresponds to the case of a linear phase filter, H(f) = |H(f)| ei2πfν.Linear phase filters are such that advancing the filter by ν units advances the output by
ν units. For example, if ν = 2, then the events in the output occur two units in advance
of the original output. This would correspond to using a filter whose coefficients are
being advanced circularly by two units. The above concepts are needed to understand the
rationale for using the so-called Daubechies least asymmetric filters later (in Sec. 4.5).
(Percival and Walden (2000, pp. 110—1).)
4.4 Compactly supported wavelets
The construction of a compactly supported scaling function and wavelets is now sketched
because this class is often preferred in applications. In the construction I will use the fact
that the associated filtermϕ is a trigonometric polynomialmϕ(ω) =PM
k=−M¡gk/√2¢e−ikω.
Details can be found in Bachman et al. (2000, Ch. 7.5), for example.
First, by iterating the filter equality (Eq. (4)), one has
F [ϕ](ω) = mϕ
³ω2
´F [ϕ]
³ω2
´= mϕ
³ω2
´mϕ
³ω4
´F [ϕ]
³ω4
´= · · ·=
ÃnYj=1
mϕ
³ ω2j
´!F [ϕ]
³ ω
2n
´, n ∈ N.
34
The uniform convergence and continuity of F [ϕ](ω) can be proved, as well as the the factthat F [ϕ] ∈ L2(R). From the continuity it follows that limnF [ϕ](ω/2n) = F [ϕ](0) = 1.This means that in the limit one has infinitely many multiplications,
F [ϕ](ω) =Yj∈N
mϕ
³ ω2j
´(convolution cascade), (12)
which can be interpreted as corresponding ”in ’physical’ space to a cascade of convolutions
of the low-pass filter with itself at different scales” (Hubbard (1998, p. 177)). In practice
the product converges very fast to the limit, so one routinely needs only as few as six
terms.
Example 16 (Convolution cascade for Haar) The convolution cascade in the Haar
filter case is F [ϕ]Haar(ω) =Qj∈NmHaarϕ
¡ω2j
¢= e−iω/2 sinω/2
ω/2(Woljtaszczyk (1997, p. 98)).
At this point some of the main properties of the filter mϕ are worth summarizing (see
Bachman et al. (2000, pp. 436—7) for more details):
Summary 17 Properties of the filter mϕ. The function mϕ is a trigonometric
polynomial if ϕ has compact support and F [ϕ](0) 6= 0. Then it is also true that F [ϕ](ω)is continuous and mϕ satisfies
(a) mϕ is continuous and 2π-periodic;
(b) |mϕ(ω)|2 + |mϕ(ω + π)|2 = 1 for all ω ∈ R (scaling identity);
(c) mϕ(0) = 1.
Condition (b) corresponds to a condition for the low-pass filter of a pair of comple-
mentary twin filters (with just a different scaling factor, see Eq. (11)). In general it is not
clear that one can find such a polynomial mϕ but for the Haar system it is relatively easy
to check:
Example 18 (Scaling identity for Haar) Hubbard (1998, pp. 174—6) happens to de-
fine the Fourier transform a bit differently by replacing e−ikω by eikω. This, and ω = 2πf,
then implies that mHaarϕ (f) = 1
2
¡1 + e2πif
¢. From basic trigonometry one knows that
cosω = 12(eiω + e−iω), and so
mHaarϕ (f) = eπif cosπf.
35
Because cos¡ω + π
2
¢= − sinω, and because eπi2 = −i, it is easy to confirm that
mHaarϕ
µf +
1
2
¶= eπif (−i sinπf) .
Thus ¯eπif cosπf
¯2+¯eπif (−i sinπf)¯2 = 1,
which is Equation (11), or equivalently, the scaling identity (i.e. Property (b)).
The convolution cascade (Eq. (12)) satisfies F [ϕ](ω/2) =Q∞j=1mϕ(ω/2
j+1). So
F [ϕ](ω) = mϕ
³ω2
´Yj∈N
mϕ
³ ω
2j+1
´= mϕ
³ω2
´F [ϕ]
³ω2
´=
ÃnX
k=−n
gk√2e−ikω/2
!F [ϕ]
³ω2
´.
Because F [ϕ](ω) ∈ L2(R), the inverse Fourier transform F−1[ϕ] is well defined. It turnsout to be ϕ(t) =
Pnk=−n gk
√2ϕ(2t − k), which indeed has compact support (Bachman
et al. (2000, Ex. 7.5.1)). By Equation (5), then, one gets the mother wavelet ψ(t) =Pnk=−n hk
√2ϕ(2t − k), which as a linear combination of compactly supported scaling
functions is also compactly supported.20
Notice that the filter mϕ being a trigonometric polynomial is equivalent to the dilation
equation ϕ(t) =P
k∈Z gk√2ϕ(2t − k) when there are only a finite number l of non-zero
terms gl. This is indeed the case for the Haar and, more generally, for the scaling functions
corresponding to the Daubechies wavelets (Hubbard (1998, p. 177)). As Woljtaszczyk
(1997, p. 75) puts it: ”It is a surprising fact that [compactly supported wavelets], other
than Haar wavelet, exist and moreover can be chosen arbitrary smooth.”
20To be careful, there does exist trigonometric polynomials satisfying Properties (a)—(c) of Summary
17 for compactly supported scaling function ϕ but for which {ϕ(t − n) : n ∈ Z} is not orthonormal. Itactually takes a bit more than a trigonometric polynomial to generate a compactly supported wavelet.
To guarantee the orthonormality of {ϕ(t − n) : n ∈ Z}, one must have mϕ(ω) 6= 0 for ω ∈ [−π/2,π/2](see Bachman et al. (2000, pp. 443—4)). When one has an orthonormal family {ϕ(t− n) : n ∈ Z}, thenone is able to construct an MRA for which ϕ is the scaling function.
36
4.5 Daubechies wavelets
Although Daubechies wavelets are not the only compactly supported wavelets (for coiflets,
see e.g. Hardle et al. (1998, Ch. 7.2)), these filters are very practical because they yield a
DWT that can be described in terms of generalized differences of weighted averages. This
in turn implies that the Daubechies wavelet filters are capable of producing stationary
wavelet coefficient vectors from ”higher degree” non-stationary stochastic processes. In
fact, these particular filters have L/2 embedded differencing operations. Such a property
is handy in many applications.21 Several studies have also confirmed that ”long-memory”
processes such as fractional Brownian motion (Flandrin (1992), Tewfik and Kim (1992),
and Dijkerman and Mazumdar (1994)), autoregressive fractionally integrated moving av-
erage (ARFIMA) processes (Jensen (1998, 2000)) and fractionally differenced (FD) pro-
cesses (McCoy and Walden (1996) and Vannucci and Corradi (1999)) can be decorrelated
up to a certain degree, both within and between scales. Craigmile and Percival (2002),
in particular, demonstrate that for a wide class of stochastic processes the covariance of
between-scale wavelet coefficiens decreases to zero as the width L of the wavelet filter in-
creases. Unfortunately, as documented in Percival and Walden (2000), increasing L can
lead to an increase in the covariance of within-scale wavelet coefficients. This dilemma
can be solved by modeling the remaining within-scale covariance using an autoregressive
process of order p. Useful statistical asymptotic theory of wavelet coefficients could then
be achieved by letting L→∞ and p→∞ (see Craigmile et al. (2000)).
A concrete problem with the Daubechies class of filters is that there are in general no
explicit time-domain formulae for them. Originally the Daubechies scaling and wavelet
filters were obtained by specifying certain vanishing moment conditions on a wavelet func-
tion that is entirely determined by the associated scaling filter. More precisely, Daubechies
wanted to find the exact form of trigonometric polynomials mϕ(ω) which produce scaling
and wavelet functions with compact supports such that the moments of the scaling and
wavelet function of order from 1 to n vanish. This would guarantee good approximation
21Coiflets have remarkably good phase properties (i.e. they provide a very good approximation to zero
phase filters). The problem with coiflets is that they can introduce artifacts into an MRA and they have
only L/3 embedded differencing operations. (Percival and Walden (2000, Ch. 4.9).)
37
properties of the corresponding wavelet expansions.22 A sketch of Daubechies’ (1988) con-
struction can be found in Hardle et al. (1998, Ch. 7.1), from which the following definition
is also adapted:
Definition 19 Daubechies wavelets (Hardle et al. (1998, p. 61)). Wavelets con-
structed with the use of functions mϕ(ω) satisfying
|mϕ(ω)|2 = cNZ π
ω
sin2N−1 tdt, (13)
where the constant cN is chosen so thatmϕ(0) = 1, are called the Daubechies wavelets.
Example 20 (Daubechies 2 coincides with Haar) By setting cN =12and N = 1 in
Equation (13), one gets
|mϕ(ω)|2 = 1
2
Z π
ω
sin tdt =1 + cosω
2.
On the other hand, by choosing mHaarϕ (ω) = 1
2(1 + e−iω) , one has
|mϕ(ω)|2 = mϕ(ω)mϕ(−ω) = 1 + cosω
2,
so that the Daubechies 2 coincides with the Haar. (Hardle et al. (1998, p. 61).)
For such functions mϕ(ω) one can tabulate the scaling filter coefficients gl (see e.g.
Percival and Walden (2000, Ch. 4.8)). Discretely, the definition of Daubechies wavelets
can be stated in terms of the SGF for the associated Daubechies scaling filters gl (Gencay
et al. (2002a, p. 112)):
GDaub(f) = 2 cosL(πf)L/2−1Xl=0
µL/2− 1 + l
l
¶sin2l(πf),
where L is a positive even integer. Notice that by setting L = 2, one has GHaar(f) =2 cos2(πf), from which one gets the Haar scaling filter coefficients by inversion. Thus the
Haar is again seen to belong to the Daubechies family.
22Daubechies’ wavelets have vanishing moments for wavelet functions, but not for scaling functions.
Coiflets have vanishing moments also for scaling functions. (Hardle et al. (1998, Ch. 7.2).)
38
The problem with this approach is that SGF does not uniquely characterize a sequence
of Daubechies wavelet filters. This is because the phase information is lost when doing
a modulus operation on the FRF (see Sec. 4.3). In fact, given GDaub(f), one can obtainall possible gl by a procedure known as spectral factorization (see Percival and Walden
(2000)). The factorization that Daubechies originally used corresponds to an extremal
phase choice for the transfer function and produces what is known as a minimum delay
filter in the engineering literature. These filters are henceforth referred to as D(L) filters
(see Fig. 5, where ”N” refers to the number of the filter in the wavelet family that is
family specific). Like mentioned in many occasions now, only the D(4) and D(2) (Haar)
wavelets have simple expressions.23 As a rule of thumb, the number of vanishing moments
for the Daubechies wavelets is half the filter length (i.e. L/2).
Another factorization leads to the least asymmetric family of scaling filters which
are henceforth referred to as the LA(L) filters (Hardle et al. (1998) call them symmlets).
These filters have a phase function that has the smallest maximum deviation in frequency
from the best fitting linear phase function. Or put differently, the phase of mϕ(ω) is
minimal among all the mϕ(ω) with the same value of |mϕ(ω)| . This means that the degreeof asymmetry for a filter is measured by the deviation from linearity of its phase. The
LA(L) filters try to be as close as possible to symmetry without losing compactness (see
Fig. 6).24 Shann and Yen (1999) provide exact values for both the D(L) and LA(L) filters
of length L = {8, 10}. Notice that a greater degree of symmetry does not mean increasedregularity even in the case of the same gain function (Percival and Walden (2000, p. 494)).
In fact, although the LA(L) filters are more symmetric than the D(L) filters, the ”Holder
regularity” for the LA(L) scaling and wavelet functions is lower than for the D(L) scaling
and wavelet functions (see Rioul (1992)).
23A mathematical curiosity of the D(4) filter is that although its scaling and wavelet function are
continuous, they do not have continuous derivatives (Percival and Walden (2000, p. 386—7)).24Recall that only in the Haar system can both the scaling and wavelet function be at the same time
compactly supported and symmetric.
39
0.0 0.2 0.4 0.6 0.8 1.0
-1.0
-0.5
0.0
0.5
1.0
(a)
Haar wavelet
ψ(x)
-1.0 -0.5 0.0 0.5 1.0 1.5
-1.0
0.0
0.5
1.0
1.5
(b)
Daub cmpct on ext. phase N=2x
ψ(x)
-1.0 -0.5 0.0 0.5 1.0 1.5
-1.0
0.0
0.5
1.0
1.5
(c)
Daub cmpct on ext. phase N=3
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
-1.0
-0.5
0.0
0.5
1.0
(d)
Daub cmpct on ext. phase N=4x
Figure 5: Daubechies extremal phase compactly supported wavelets of different lengths
L: (a) D(L = 2), (b) D(L = 4), (c) D(L = 6), and (d) D(L = 8).
40
-0.5 0.0 0.5 1.0 1.5 2.0
-1.0
-0.5
0.0
0.5
1.0
1.5
(a)
Daub cmpct on least asymm N=4
ψ(x)
-1.0 -0.5 0.0 0.5 1.0 1.5
-1.0
-0.5
0.0
0.5
1.0
(b)
Daub cmpct on least asymm N=5x
ψ(x)
-0.5 0.0 0.5 1.0 1.5 2.0
-1.0
-0.5
0.0
0.5
1.0
1.5
(c)
Daub cmpct on least asymm N=6
-1 0 1 2 3
-1.0
-0.5
0.0
0.5
1.0
(d)
Daub cmpct on least asymm N=7x
Figure 6: Daubechies least asymmetric compactly supported wavelets of different lengths
L: (a) LA(L = 8), (b) LA(L = 10), (c) LA(L = 12), and (d) LA(L = 14).
41
5 Decomposing time series
This section aims to show how multiresolution analysis is done in a time series context.
Notice that in this section the function-to-analyzed is not deterministic but instead a
realization of a stochastic process. Special attention is paid to the energy preservation
property that allows the decomposition of total variance into time-scale specific wavelet
variances. The most comprehensive coverage of wavelet analysis in statistical time series
context at the moment is Percival and Walden (2000). A nice complementary review
article is Nason and von Sachs (1999).
5.1 Practical issues
In order to come up with a useful wavelet analysis of a time series, one must begin by
taking into account several things. The most important ones are (see Percival and Walden
(2000, Ch. 4.11)):
1. the choice of a wavelet filter;
2. handling boundary conditions;
3. sample sizes that are not a power of two.
Choice of a wavelet filter. The problem with the smallest width wavelet filters is that
they can sometimes introduce undesirable artifacts into the resulting analysis, such as
unrealistic blocks, ”sharks’ fins”, etc. The wider wavelet filters can better match to the
characteristic features in a time series. Unfortunately, however, as the width gets wider,
(i) more coefficients are being unduly influenced by boundary conditions, (ii) there is some
decrease in the degree of localization of the DWT coefficients, and (iii) there is an increase
in computational burden. Thus one should search for the smallest L that gives reasonable
results. In practice, if one also wants to have the DWT coefficients be alignable in time,
the optimal choice is often LA(8).
Handling boundary conditions. The DWT uses circular filtering which means that the
time series is treated as a portion of a periodic sequence with period N . For financial
time series this is problematic since there is rarely evidence to support this assumption.
42
Furthermore, there may be a large discontinuity between the last and first observations.
The extent that circularity influences the DWT coefficients and corresponding MRA is
quantified in Percival and Walden (2000, pp. 145—9). Notice that the Haar wavelet yields
coefficients that are free of the circularity assumption.
As a minimal way of dealing with the circularity assumption, Percival and Walden
suggest to exactly indicate on plots the DWT coefficients and MRAs that are affected
by the boundary. However, they continue to argue that the influence of circularity can
be quite small, particularly when the discrepancy between the beginning and end of the
series is not too large. Thus the marked regions are usually quite conservative measures
of the influence of circularity. One way of reducing the impact of circularity is to reflect
the time series about its end point. The resulting series of length 2N has the same
mean and variance as the original series. This method eliminates the effects due a serious
mismatch between the first and last values. The cost is increased, but ”quite acceptable”,
computational burden.25
Handling sample sizes that are not a power of two. The ordinary DWT requires N to
be a power of two and the ”partial” DWT requires N to be an integer multiple of 2J0.
In reality, however, it rarely happens that the data at hand is of dyadic length or even
an integer multiple of it. There are some ad hoc methods for dealing with this problem.
The most obvious one is to truncate the series to the closest integer multiple of 2J0 but
then one needs to consider what choice J0 is reasonable. An easy alternative — familiar
from Fourier analysis — is to ”pad” the series with zeros or the sample mean. Note that
padding with the sample mean does not change the sample mean from the original series.
One could also pad by replicating a data value (typically the last one). Ogden (1997) has
compared various ways of preconditioning data not meeting the criteria of power of two.
He found no method to be clearly superior in every respect but dictated by the particular
application of interest. As one might suspect, though, extending the data by padding is
the easiest to implement and the least computationally expensive too. Ogden points out
that the wavelet coefficients resulting from preconditioned data should be used cautiously,
25There are also other ways to cope with circularity like polynomial extrapolation at both ends of the
time series and specially designed ”boundary wavelets” that are zero outside the range of the data (see
e.g. Bruce and Gao (1996)). These are however not considered in this thesis.
43
however. Padding by repeating the last value will introduce a flat artifact towards the end
of the interval causing new problems that remain even with very large samples.
In what follows, I will consider only DWTs for two reasons: (i) financial data are
inherently discrete, and (ii) discrete transforms are computationally less demanding than
continuous ones. This is also the standard way in time series applications.
5.2 Discrete wavelet transform
For a throughout description of the DWT, see Percival and Walden (2000, Ch. 4). For
a quick and ”dirty” treatment, see Gencay et al. (2002a, Ch. 4.4). My presentation
parallels with the latter because of space limitations. Notice that in practice the DWT is
calculated via Mallat’s (1989) pyramid algorithm (see e.g. Percival and Walden (2000)).
This algorithm has been nick-named the fast wavelet transform (FWT) because it
needs only O(N) (i.e., at most of order N) multiplications instead of the DWT’s O (N2)
multiplications. And quite remarkably, the FWT is even faster than the celebrated FFT
which demands O(N log2N) calculations. This computational efficiency makes the FWT a
good choice for analyzing very large data sets like the ones used in high-frequency finance.
Construction. An easy way to introduce the DWT is through a matrix operation.
Consider a dyadic length (i.e., N = 2J) column vector of observations x. The length N
column vector of discrete wavelet coefficients w is obtained via
w =Wx,
where W ∈ M(N ×N) is an orthonormal matrix defining the DWT (see App. B). Thematrix W is composed of the wavelet and scaling filter coefficients arranged on a row-by-
row basis. The structure of the resulting w and the matrix W may be seen through the
subvectors w1,w2, ...,wJ ,vJ and submatrices W1, ...,WJ ,VJ , respectively:
w =
w1
w2...
wJ
vJ
and W =
W1
W2
...
WJ
VJ
,
44
where wj is a length N/2j column vector of wavelet coefficients associated with changes
on a scale of length λj = 2j−1 and vJ is a length N/2J column vector of scaling coefficients
associated with averages on a scale of length 2J = 2λJ . Similarly, Wj ∈ M(N/2j × N)and VJ ∈M(N/2J ×N).As an illustration of the structure of the matrix W, consider a filter of length L = 2
and a signal of length N = 8. Then the matrix W1 ∈M(4× 8) is
W1 =
h1 h0 0 0 0 0 0 0
0 0 h1 h0 0 0 0 0
0 0 0 0 h1 h0 0 0
0 0 0 0 0 0 h1 h0
=h(2)1
h(4)1
h(6)1
h1
,
where h(k)1 , k ∈ {2, 4, 6}, is the vector of zero-padded unit scale wavelet filter coefficients
in reverse order, circularly shifted to the right by amount k.
Similarly, by letting h2 and h4 denote the vector of zero-padded scale 2 and 4 wavelet
filter coefficients, respectively, one can construct the matrix W2 ∈ M(2× 8) and the rowvector W3 ∈ M(1 × 8). In this case, the circular shift is by factors of 4 and 8 (i.e., nochange), respectively:
W2 =
h(4)2h2
and W3 = h3.
The matrix V3 ∈ M(1 × 8) is just a row vector whose elements are all equal to 1/√N
(Gencay et al. (2002a, p. 120)).
Of course one must be able to explicitly compute the wavelet filter coefficients for level
j = 1, ..., J to complete the construction of W. Given the FRFs of the unit scale waveletand scaling filters, it is possible to recover the wavelet filter hj,l for scale λj by the inverse
DFT of
Hj,k = H1,2j−1kmodN
j−2Yl=0
G1,2lkmodN , for k = 0, ..., N − 1.
The length of the resulting wavelet filter is Lj = (2j − 1)(L − 1) + 1. Similarly, one can
recover the scaling filter gJ for scale λJ by the inverse DFT of
GJ,k =J−1Yl=0
G1,2lkmodN , for k = 0, ..., N − 1.
45
(Gencay et al. (2002a, p. 121).)
Example 21 (Haar) In the case of L = 2 and N = 8, the matrix W is
W =
W1
W2
W3
V3
=
1√2− 1√
20 0 0 0 0 0
0 0 1√2− 1√
20 0 0 0
0 0 0 0 1√2− 1√
20 0
0 0 0 0 0 0 1√2− 1√
2
12
12−12−12
0 0 0 0
0 0 0 0 12
12
−12−12
1√8
1√8
1√8
1√8− 1√
8− 1√
8− 1√
8− 1√
8
1√8
1√8
1√8
1√8
1√8
1√8
1√8
1√8
.
Boundary effects. Clearly the number of affected coefficients grow as the level j and
length L grow. More precisely, Percival and Walden (2000, Ch. 4.11) show that the
number of the affected DWT coefficients is
L0j =§(L− 2) ¡1− 2−j¢¨ ,
where dxe is the smallest integer greater than or equal to x. Thus, for display purposes,it is possible to approximately line up the DWT coefficient vectors with the original time
series by circularly shifting the level j vector of DWT coefficients appropriately. Percival
and Walden (2000, Ch. 4.11) give a precise table of integer shifts for the least asymmetric
wavelet filter. The following heuristic can be used, however: If the number of coefficients
affected by the boundary is even, then place half of the of the boundary coefficients at
each end of the series. If, on the other hand, the number of coefficients affected by the
boundary is odd, then place the ”extra” coefficient at the beginning of the series. Notice
that shifting coefficients from the extremal phase wavelet filter is not as straigthforward
given its poor phase properties (see Sec. 4.5). (Gencay et al. (2002a, p. 145.))
Multiresolution analysis. An additive decomposition of a time series can be obtained
using the DWT by first defining the jth level wavelet detail
dj.=WT
j wj, for j = 1, ..., J,
46
which is associated with changes in x at scale λj. The wavelet coefficients wj = Wjx
represent the portion of the wavelet analysis attributable to scale λj. Thus WTj wj is the
portion of the wavelet synthesis attributable to scale λj. For a dyadic length N = 2J time
series, the final wavelet detail dJ+1 = VTJ vJ is equal to the sample mean of observations.Next define the jth level wavelet smooth as
sj.=
J+1Xk=j+1
dk for j = 0, ..., J,
where sJ+1 is defined to be a vector of zeros. In contrast to the wavelet detail dj which is
associated with variations at a particular scale, the wavelet smooth sj becomes smoother
as more details are summed. Indeed, it holds that x − sj =Pj
k=1 dk. The jth level
wavelet rough characterizes the remaining lower-scale details through
rj.=
jXk=1
dk, for j = 1, ..., J + 1,
where r0 is defined to be a vector of zeros. A time series x may then be decomposed as
x = rj + sj =
jXk=1
dk +J+1Xk=j+1
dk =J+1Xk=1
dk.
Variance decomposition. One of the most important properties of the DWT is its
ability to decompose the sample variance of a time series on a scale-by-scale basis. This
is possible because the DWT is an energy (variance) preserving transform:
kxk2 = xTx = (Ww)TWw = wTWTWw = wTw = kwk2 ,
where W is an orthonormal matrix defining the DWT. In another words,
kxk2 =N−1Xt=0
x2t =JXj=1
N/2j−1Xt=0
w2j,t + v2J,0 = kwk2 .
Given the structure of the wavelet coefficients, kxk2 is decomposed on a scale-by-scalebasis via
kxk2 =JXj=1
kwjk2 + kvJk2 ,
47
where kwjk2 is the energy (proportional to variance) of x due to changes at scale λj
and kvJk2 is the information due to changes at scales λJ and higher. Because W and Vare orthonormal matrices, one has dTj dj = wT
j wj for 1 ≤ j ≤ J and sTJ sJ = vTJvJ , so
alternatively,
kxk2 =JXj=1
kdjk2 + ksJk2 .
The theoretical counterpart of the variance decomposition in the context of stationary
long-memory processes will be discussed later (in Sec. 5.5).
5.3 Partial discrete wavelet transform
For a bit closer look at the Partial discrete wavelet transform (pDWT) than
presented below, see either Percival and Walden (2000, Ch. 4.7) or Gencay et al. (2002a,
Ch. 4.4.2). I will be rather brief here because the pDWT is a straightforward generalization
of the DWT. The pDWT offers more flexibility due the choice of a scale beyond which
a wavelet analysis into individual large scales is no longer of real interest. A practical
benefit of this is that the sample size no longer needs to be a of dyadic length. It is
enough that the sample size to be a multiple of 2J0 (the choice of J0 depends on the goals
of the analysis, of course).
Construction. The structure of the orthonormal matrix W is similar to the DWT:
W =
W1
W2
...
WJp
VJp
,
except that the matrix of scaling filter coefficients VJp ∈ M(N/2Jp × N) is a matrix ofcircularly shifted scaling coefficients vectors.
Multiresolution analysis. For a level J0 < J pDWT, one has the MRA
x =J0Xj=1
dj + sJ0 ,
48
where the details dj are related to changes on a scale of λj = 2j−1 and the smooth sJ0 to
averages of a scale of λJ0 = 2J0 (as above).
Variance decomposition. The energy decomposition is
kxk2 =J0Xj=1
kwjk2 + kvJ0k2
=J0Xj=1
kdjk2 + ksJ0k2 .
5.4 Maximal overlap discrete wavelet transform
For a throughout discussion of the maximal overlap discrete wavelet transform
(MODWT), see Percival and Walden (2000, Ch. 5). Once again, Gencay et al. (2002a, Ch.
4.5) give a compact introduction. I will mostly follow the latter because of the increased
complexity of the transform. Notice that in practice a pyramid algorithm similar to
that of the DWT is utilized (see Percival and Mojfeld (1997)). This algorithm requires
O(N log2N) multiplications, so it is computationally a bit heavier to execute than the
DWT, but still only as heavy as the FFT. The increased complexity stems from the fact
that the MODWT gives up orthogonality in order to gain features that the DWT does
not posses.26
The following properties distinguish the MODWT from the DWT (Percival andWalden
(2000, pp. 159—60)):
1. The MODWT can handle any sample size N , while the Jpth order pDWT restricts
the sample size to a multiple of 2Jp;
2. The detail and smooth coefficients of a MODWT multiresolution analysis are asso-
ciated with zero-phase filters;
3. The MODWT is invariant to circularly shifting the original time series;
4. The MODWTwavelet variance (to be defined in Sec. 5.5) estimator is asymptotically
more efficient than the same estimator based on the DWT.26The MODWT is also known as the ”stationary DWT”, the ”translation-invariant DWT” and the
”time-invariant DWT”.
49
Construction. Let x be a length N column vector of observations. The length (J+1)N
column vector of MODWT coefficients ew is obtained via
ew = fWx,where fW ∈ M((J + 1)N × N) is a non-orthogonal matrix defining the MODWT. Theresulting ew and the matrix fW consist of column subvectors ew1, ..., ewJ , evJ , each of lengthN , and submatrices fW1, ...,fWJ , eVJ ∈M(N ×N):
ew=
ew1ew2...ewJevJ
and fW =
fW1fW2
...fWJeVJ
,
where ewj is associated with changes on a scale of length λj = 2j−1 and evJ is associated
with averages on a scale of length 2J = 2λJ . For any positive integer J0, the level J0
MODWT of x is a transform consisting of the J0+1 vectors ew1, ..., ewJ0 and evJ0 which areall N-dimensional. The vector ewj contains the MODWT wavelet coefficients associatedwith changes on scale λj, while evJ0 contains the MODWT scaling coefficients associatedwith averages on scale λJ0 = 2J0. In the special case of a dyadic length time series,
the MODWT may be subsampled and rescaled to obtain the DWT wavelet and scaling
coefficients via
wj,t = 2j/2 ewj,2j(t+1)−1 and vJ,t = 2J/2evJ,2J (t+1)−1, for t = 0, ..., N/2j − 1.
In this case the DWT and MODWT filter coefficients are related in the following way:
instead of using the wavelet and scaling filters from the previous section, the MODWT
uses the rescaled filters
ehj = hj/2j and egJ = gJ/2J , for j = 1, ..., J.The construction of the submatrix fW1 is done by circular shifting the rescaled wavelet
filter vector eh1 by integer units. The submatrices fW2 and fW3 are formed similarly by
50
replacing eh1 by eh2 and eh3. Thus revisiting the case of L = 2 and N = 8 of Example 21,
one has
fW1 =
eh1 0 0 0 0 0 0 eh2eh2 eh1 0 0 0 0 0 0
0 eh2 eh1 0 0 0 0 0
0 0 eh2 eh1 0 0 0 0
0 0 0 eh2 eh1 0 0 0
0 0 0 0 eh2 eh1 0 0
0 0 0 0 0 eh2 eh1 0
0 0 0 0 0 0 eh2 eh1
=
eh(1)1eh(2)1eh(3)1eh(4)1eh(5)1eh(6)1eh(7)1eh1
.
Example 22 (Haar) Insert eh1 = −1/√8 and eh2 = 1/√8 in the above matrix fW1. The
matrix fW ∈M(32× 8) is too large to write down here.Boundary effects. The MODWT uses integer translates of the wavelet and scaling
filters, both of length Lj = (2j−1)(L−1)+1. This causes there to be a total of Lj wavelet
coefficients affected by the boundary at each j. The time-alignment property of an MRA
(Property 2) does not hold for the MODWT wavelet and scaling coefficients anymore
without a proper adjustment, however. Precise integer shifts for the least asymmetric
low-pass filter gj,l and high-pass filter hj,l are given by
ξgj =
− (Lj−1)(L−2)
2(L−1) if L2is even
− (Lj−1)L2(L−1) if L = 10 or 18
− (Lj−1)(L−4)2(L−1) if L = 14
,
ξhj =
−Lj
2if L
2is even
−Lj2+ 1 if L = 10 or 18
−Lj2− 1 if L = 14
,
respectively. (Gencay et al. (2002a, p. 145).)27
27An alternative definition of shifts for both extremal phase and least asymmetric wavelet filters is
provided by Hess-Nielsen and Wickerhauser (1996), but Gencay et al. (2002a, p. 145) argue the differences
be of minor importance.
51
Multiresolution analysis. A MODWT MRA can be written as
xt =J+1Xj=1
edj,t for t = 0, ..., N − 1,where edj,t is the tth element of the the jth level MODWT detail edj .= fWT
j ewj, for j =1, ..., J. The MODWT wavelet smooth and rough are, respectively,
esJ,t = J+1Xk=j+1
edk,t and erj,t = jXk=1
edk,t, for t = 0, ..., N − 1.Importantly, although MODWT is not an orthonormal transformation, the MRA
x =J0Xj=1
edj +esJ0 , (14)
where esJ0 .= VTJ0evJ0 is the J0 level MODWT smooth, still holds true. This is useful in
practice because as stated above, features in the original time series are aligned with the
wavelet details and smooth without an adjustment.
Variance decomposition. In order to retain the variance preserving property of the
DWT, the wavelet and scaling coefficients must be rescaled properly as seen above. But
although the MODWT is capable of producing a scale-by-scale analysis of variance upon
the energy decomposition (Percival and Mojfeld (1997)),
kxk2 =J0Xj=1
kewjk2 + kevJ0k2 , (15)
energy preservation does not hold for the MODWT details and smooths in general:
kxk2 6=J0Xj=1
°°°edj°°°2 + kesJ0k2 .This is because the MODWT is not an orthonormal transformation. Percival and Walden
(2000, Ch. 5.3) show for example that°°°ed1°°°2 ≤ kew1k2 . Thus when using the MODWT,
one is restricted to analyzing the wavelet and scaling coefficients in order to quantitatively
study the scale-dependent variance properties.
52
5.5 Wavelet variance
As seen above, the DWT and the MODWT can decompose the sample variance of a time
series on a scale-by-scale basis. A wavelet based analysis of variance is sometimes called
a wavelet spectrum. Such a spectrum may be of interest for several reasons (Percival
and Walden (2000, p. 296)):
1. a scale-by-scale decomposition of variance is useful if the phenomena consists of
variations over a range of different scales;
2. wavelet variance (to be defined below) is closely related to the concept of spectral
density function (SDF, Fourier spectrum);
3. wavelet variance is a useful substitute for the variance of a process for certain pro-
cesses with infinite variance.
Consider a discrete parameter real-valued stochastic ARFIMA process {Xt} (see App.C) whose dth order backward difference Yt is a stationary process with mean µY (not
necessarily zero). Then a Daubechies wavelet filter ehl of width L ≥ d results in the jthwavelet coefficient process
wj,t.=
Lj−1Xl=0
ehj,lXt−l,being a stationary process (for a stationary process any L would suffice). Now define the
(time-independent, global) wavelet variance for {Xt} at scale λj to be
ν2X(λj).= V {wj,t} ,
which represents the contribution to the total variability in {Xt} due to changes at scaleλj. Then by summing up these time-scale specific wavelet variances, one gets the variance
of {Xt}: ∞Xj=1
ν2X(λj) = V{Xt}. (16)
The wavelet variance is well defined for both stationary and non-stationary processes with
stationary dth order backward differences as long as the width L of the wavelet filter is
large enough. In the non-stationary case the sum of the wavelet variances diverges to
53
infinity. An advantage of the wavelet variance is that it handles both types of processes
equally well. (Percival and Walden (2000, Ch. 8.2).)
The wavelet coefficients affected by the boundary warrant special attention. By taking
into account only coefficients that are not affected by the periodic boundary conditions,
an unbiased estimator of ν2X(λj) is
eν2X(λj) = 1
Mj
N−1Xt=Lj−1
ew2j,t,whereMj
.= N−Lj+1 > 0 and ewj,t .=PLj−1
l=0ehj,lXt−l modN are the (periodically extended)
MODWT coefficients. Furthermore, if a sufficiently long wavelet filter is used, i.e. if L > 2d
(or if µY = 0), then E {wj,t} = 0, which in turn implies that
ν2X(λj) = E©w2j,tª= E
©ew2j,tª ,where the last equality follows from using coefficients not affected by the boundary. If the
sample mean over all possible t would be used, then in general one would get a biased
estimator of ν2X(λj). (Percival and Walden (2000, Ch. 8).)
Now consider {Xt} to be a stationary process with SX(f) defined over the frequencyinterval [−1/2, 1/2]. A fundamental property of the Fourier spectrum is thatZ 1/2
−1/2SX(f)df = V{Xt},
i.e. the SDF decomposes the variance of a series across different frequencies (Percival
and Walden (2000, p. 296)). On the other hand, by Equation (16) the wavelet spectrum
decomposes the variance of a series across different scales. Since there is a close (although
fragile) relationship between frequency and time-scale (see Sec. 3.2.2), it is then no surprise
that the estimates of the wavelet variance can be turned into SDF estimates. The band-
pass nature of the MODWT wavelet filter implies that
ν2X(λj) ≈ 2Z 1/(2j∆t)
1/(2j+1∆t)
SX(f)df,
where SX is the SDF estimated by the squared magnitude of the coefficients of the DFT,
called periodogram,
bSX(fk) = 1
N
¯¯N−1Xt=0
Xte−2πfkt
¯¯2
,
54
and fk = k/N denotes the kth Fourier frequency, k = 0, ..., bN/2c. This approximationimproves as the width L of the wavelet filter increases because then ehj,l becomes a betterapproximation to an ideal band-pass filter. In fact, if the filter is wide enough, one can es-
timate SX using piecewise constant functions over each interval [1/2j+1∆t, 1/2j∆t]. In the
case of long-memory, however, this approximation underestimates the lowest frequencies
(see Percival and Walden (2000, Ch. 8.5)).
The SDF can be used to construct a confidence interval for ν2X(λj). Assuming that
{wj,t} is a Gaussian process, then for largeMj the random variable eν2X(λj) is approximatelyGaussian distributed with mean eν2X(λj) and variance 2Aj/Mj, where Aj
.=R 1/2−1/2 S
2j (f)df
provided that Aj is finite and Sj(f) > 0 (almost everywhere). The downside of this con-
truction is that the confidence interval may have a negative lower limit which is problematic
when plotting wavelet variance estimates in a double-logarithmic scale. Furthermore, an
incorrect Gaussian assumption will produce too narrow intervals that do not reflect the
true variability of the point estimate.28
It is well known that the periodogram is an inconsistant estimator of the Fourier
spectrum (see e.g. Priestley (1992, p. 425)). Likewise, the popularly used GPH-estimator
(Geweke and Porter-Hudak (1983)), based on an ordinary least squares (OLS) regression of
the log-periodogram for frequencies close to zero, is in general an inconsistant estimator of
the long-memory parameter from a fractionally integrated process with |d| < 1/2. Otherasymptotic properties of this estimator are problematic too (see Hurvich and Beltrao
(1993) and Robinson (1995)).29 Using the fact that wavelet variance is a regularization
of the Fourier spectrum (so that those scales that contribute the most to the variance of
the series are associated with those coefficients with the largest variance), Jensen (1999)
showed that an OLS-based wavelet estimator is consistant when the sample variance of
the wavelet coefficients is used in the regression. Specifically, using the wavelet variance
28There does exist other more complex ways of constructing confidence intervals. Confidence intervals
could for example be constructed using an ”equivalent degrees of freedom” argument and a chi-squared
distribution (Percival and Walden (2000, pp. 336—7)) or multitaper spectrum estimation (Serroukh et al.
(2000)). These are not considered in this thesis however.29However, in the case of 0 < d < 1/2 and under certain regularity conditions — Gaussianity, in particular
— Robinson (1995) has proven that the GPH-estimator is consistant and asymptotically Gaussian. The
problem then is that volatility is not distributed normally (but exponentially).
55
of the DWT coefficients wj,t,
ν2X(λj) =1
2j
2j−1Xk=0
w2j,k, (17)
one has that
V {wj,t} = ν2X(λj)→ σ22j(2d−1),
as j → ∞ (here σ2 is a finite constant). If a large number of wavelet coefficients are
available for scale j, then the sample wavelet variance provides a consistent estimator of
the true wavelet variance (Jensen (1999, p. 22)). Thus, by taking logarithms on both
sides, one obtains the (approximate) log-linear relationship
log ν2X(λj) = log σ2 + (2d− 1) log 2j, (18)
from which the unknown d can be estimated consistently by the OLS-regression by re-
placing ν2X with its sample variance ν2X of Equation (17). The asymptotic variance of the
estimator of d was derived by Jensen, too. He also found that the (negative) bias found in
this estimator is offset by its low variance. In mean square error (MSE) sense the wavelet
OLS-estimator fared significantly better than the GPH-estimator.30
Wavelet variance can be defined also locally. But unlike in Jensen (1999) where all
wavelet coefficients were used in calculating the wavelet variance, now only those ”close”
to the time point t are used. Given L > 2d(u), an unbiased estimator of local wavelet
variance for {Xt} at scale λj based upon the MODWT is
eν2X(u,λj) = 1
Kj
τj+KjXs=τj
ew2j,t+s,T , (19)
where u represents a time point in the rescaled time domain [0, 1] (i.e. u = t/T ), Kj
is a ”cone of influence”, and τ j is an ”offset” (Whitcher and Jensen (2000, p. 98)). In
principle, Kj (the central portion of a filter) includes only those wavelet coefficients where
the corresponding observation made a significant contribution. This is motivated by the
fact that the width of the filter L is not a very good measure of the effective width of
30To further reduce the MSE of bd, a weighted least squares estimator could be applied (see e.g. Abry etal. (1993) and Abry and Veitch (1998)). In particular, Percival and Walden (2000, Ch. 9.5) have shown
via simulation that weighting reduces the MSE by a factor of two in comparison to the OLS estimator.
56
Table 1: Cones of influence and offsets.
Level j KHaarj K
LA(8)j τ
LA(8)j
1 2 3 (8) 2
2 4 9 (22) 6
3 8 19 (50) 14
4 16 38 (106) 31
5 32 76 (218) 64
6 64 150 (442) 131
7 128 299 (890) 264
8 256 597 (1786) 531
9 512 1192 (3578) 1065
10 1024 2383 (7162) 2132
the filter because coefficients around l = 0 and l = Lj − 1 are very close to zero and thusdo not significantly contribute to the calculation of the wavelet coefficient (Percival and
Walden (2000, p. 103)). A slight inconvenience in using Kj is that it varies across scales
and different filters. The tabulated values for Daubechies family of wavelets are given in
Whitcher and Jensen (2000). Also the values of ”offsets” τ j for each wavelet filter L > 2
are needed to indicate where the width Kj begins (given in Whitcher and Jensen). For
the relevant part, these tables are reproduced here (see Table 1, where the numbers in
parentheses are the lengths Lj of the scale λj wavelet filter). Notice that the offset τHaarj
for the Haar is zero for all levels since KHaarj = LHaarj = 2j.
Whitcher and Jensen (2000) have shown that when the MODWT is being applied
to a locally stationary (in the sense of Dahlhaus (1996, 1997); see App. D) long-
memory process {Xt,T}, then the level-j MODWT wavelet coefficients {ewj,t,T} form a
locally stationary process with mean zero and time-varying variance
V {ewj,t,T} = ν2X(u,λj)→ σ2(u)2j[2d(u)−1],
as j →∞ (the expression for σ2(u) is given in Whitcher and Jensen). Thus, analogously
to Equation (18),
log ν2X(u,λj) = log σ2(u) + [2d(u)− 1] log 2j, (20)
57
from which the unknown d(u)’s can be estimated consistently by the OLS by replacing
ν2X by its time-varying sample variance eν2X from Equation (19). Gencay et al. (2002a, p.
172) argue that the parameter estimation for a non-stationary long-memory time series
model through the OLS ”should benefit greatly” from wavelet-based methods.
Using simulations, Whitcher and Jensen (2000) showed that the median of bd(u) ac-curately estimates the true value of the fractional differencing parameter (with a slight
negative bias near the boundaries) in the case of globally stationary ARFIMA. Because less
information is used to construct the local estimator than the global one, bd(u) also exhib-ited a slight increase in its MSE. Importantly, when the ARFIMA process was disturbed
by a sudden shift in the long-memory parameter to imitate local stationarity, the esti-
mated fractional differencing parameter still performed well (on both sides of the change)
although with a slight bias and increase in MSE at the boundaries.
6 Volatility modeling
Volatility, interpreted as uncertainty, is one of the key variables in most models in modern
finance.31 The explosive growth in derivative markets and the recent availability of high-
frequency data have only highlighted its relevance. In option pricing, for example, volatility
of the underlying asset (which may be volatility itself, by the way) must be known from now
until the option expires as accurately as possible. In financial risk management, volatility
forecasting has even become compulsary after the ”1996 Basle Accord Amendment” which
sets the minimal capital requirements in banks. More generally, periods of high uncertainty
have economically paralyzing consequences. Just consider the terrorist attack in New York
on September 11 (2001). In a way, therefore, volatility estimates can be considered ”as a
barometer for the vulnerability of financial markets and the economy” (Poon and Granger
(2003, p. 479)).
31Volatility is not the same as risk, however. In particular, risk is usually associated with small or
negative returns (the so-called ”downside risk”) whereas most measures of dispersion (e.g. standard
deviation) make no such distinction. Furthermore, standard deviation is a useful risk measure only when
it is attached to a distribution or a pricing dynamic. For further details on the conceptual differences
between volatility, risk, and standard deviation, see Poon and Granger (2003, Sec. 2.1).
58
In this section some of the most popular measures of volatility and types of models
are reviewed. There are basicly two strands of volatility models, those assuming that
conditional variance depends on past values (i.e. observation driven models) and those
assuming that conditional variance is stochastic (made precise later). Although this section
starts with the former approach, most attention is paid to the latter. Poon and Granger
(2003) provide an extensive up-to-date review of different types of volatility models used
for forecasting in financial markets.
6.1 Measures of volatility
It is an indisputable stylized fact that volatility tends to cluster so that the variance is
time-varying and shows persistant behavior. A systematic search for the causes of serial
correlation in conditional second moments is in its infancy, though. Diebold and Nerlove
(1989) discuss the possibility of a serially correlated news arrival process as the generating
mechanism for which Engle et al. (1990) find some evidence. Using the mixture-of-
distribution hypothesis (see Clark (1973)), Andersen and Bollerslev (1997b) show that
long-memory features of volatility (the slowly decaying autocorrelation function) may in-
deed arise through the interaction of a large number of heterogenous information arrivals.
Such a finding is important because then long-memory characteristics reflect inherent prop-
erties of the DGP, rather than structural shifts as suggested for example by Lamoureux
and Lastrapes (1990b).
The basic paradigm in volatility estimation is that ”volatility” (now approximated by
the square of returns rt) can be decomposed into predictable and unpredictable compo-
nents via
rt = σtεt,
where εt are IID disturbances with mean 0 and variance 1. By definition, then, the pre-
dictable component is the conditional variance σ2t of a series.32 In a seminal paper, Engle
(1982) proposed that the conditional variance depends linearly on the past squared values
32The determinants of the predictable part are of special interest in finance because the risk premium
is a function of it.
59
of the process,
σ2t = σ2 +
qXk=1
αkr2t−k;
the well-known ARCH(q) model. Bollerslev (1986) generalized this to the parsimonious
GARCH(p, q) model,
σ2t = σ2 +
pXj=1
βjσ2t−j +
qXk=1
αkr2t−k,
where the volatility is a linear function of both lagged squares of returns and lagged volatili-
ties. Bollerslev showed that this equation defines a second-order stationary solution if (and
only if)Pp
j=1 βj +Pq
k=1 αk < 1 (and σ2 > 0). In practice, GARCH(1, 1) often suffices.
Moreover, it is usually found (in long time series, at least) that the estimated parameters β
and α sum close to one suggesting the presence of a ”unit root” in the volatility equation.
This so-called ”Taylor-effect” (Taylor (1986)) has commonly been interpreted as evidence
of volatility persistance (see e.g. Poterba and Summers (1986)) although it has faced
a considerably amount of criticism lately (see e.g. Mikosch and Starica (2004)). In any
case, Engle and Bollerslev (1986) extended the model to integrated GARCH (IGARCH).33
Counterintuitively, the (approximate) autocorrelation function for IGARCH(1, 1) is expo-
nentially (not hyperbolically) decreasing, indicative of ”short-memory” although the effect
of a shock to expectation is permanent (see Ding and Granger (1996)).
A more succesful (in the stock markets, at least) extension of GARCH is to let σ2t be
an asymmetric function of the past data. To model the stylized fact that in stock markets
volatility is negatively correlated with lagged returns — the so-called leverage-effect (first
noted by Black (1976)) — Nelson (1988) proposed exponential GARCH (EGARCH) that
model the logarithm of the variance log σ2t . Some more flexibility is achieved by applying a
”fractional differencing operator” d (a parameter that is going to play a big role later; see
App. C), resulting in fractionally integrated EGARCH (FIEGARCH) model (Bollerslev
and Mikkelsen (1996)). This model nests the conventional EGARCH for d = 0. In
general, a fractional generalization has proved to be empirically useful in modeling long-
33Here the prefix ”integrated” does not imply non-stationarity as in the case of random walk, however
(proved in Bougerol and Picard (1992)). Strict stationarity still holds but because the marginal variance
of rt is infinite, weak stationarity does not (Gourieroux and Jasiak (2001, Ch. 6.2.4)).
60
term dependence in conditional variances (see e.g. Vilasuso (2002)).34 This is mainly
because the weakly stationary FIEGARCH succeeds in modeling the slow hyperbolic rate
of decay of a shock to the forecast of log σ2t+T if 0 < d < 1/2, thus capturing the observed
”long-memory” in volatility. Of course there exist numerous other extensions, too. For
some of the most popular ones, see Bollerslev et al. (1992, 1994), Hentschel (1995), and
(in the multivariate case) Kroner and Ng (1998).35
Several other type of measures have been used to approximate volatility. It is for
example known that normalized squared or absolute returns over an appropriate horizon
provide an unbiased estimate for volatility. Although the majority of time series volatil-
ity models are squared returns models (as above), absolute returns based models seem
to produce better volatility forecasts in practice (see e.g. Taylor (1986) and McKenzie
(1999)). Indeed, it seems that the long-memory property is strongest for absolute returns
(in stock markets, at least). For example Ding et al. (1993), Ding and Granger (1996),
and Lobato and Savin (1998) suggest measuring volatility directly from absolute returns.36
Absolute returns are also relatively outlier-resistant compared to squared returns. In par-
ticular, log-squared returns suffer from an ”inlier” problem because a return very close
to zero generate a large negative number. Furthermore, Wright (2000) has demonstrated
that squared returns result in a large downward bias when using semiparametric methods
to estimate long-memory in the context of conditionally heavy-tailed data such as stock
returns (which does not occur with absolute returns). The downside is that absolute (as
well as squared) returns over longer horizons (days, say) provide a very noisy estimate
for volatility (e.g. Andersen and Bollerslev (1997)). By simply taking the average over
a fixed horizon (as e.g. in Poterba and Summers (1986)) does not seem to be a satisfac-
tory solution. Although standard time series techniques could then be applied to assess
34This generalization corresponds to the generalization of the standard ARIMA class of models to
fractionally integrated ARMA models that model long-term dependence in mean.35An estimate of volatility in these models is attained via (quasi) maximum likelihood or via generalized
method of moments.36The use of squares is most likely a reflection of the Gaussian assumption made regarding the data
(McKenzie (1999, p. 50)). The error distribution of stock market returns is not Gaussian however and
therefore higher than the second moment must be considered. Davidian and Carroll (1987) have shown
that absolute returns specification is more robust against asymmetry and non-normality.
61
the temporal dependence, this two-stage procedure is subject to the following criticism
(see Bollerslev et al. (1992, pp. 17—18)): First, it does not make efficient use of all the
data and the conventional standard errors from the second-stage estimation may not be
appropriate. Second, there is the possibility that the actual parameter estimates may be
inconsistant. Third, this kind of a procedure may lead to misleading conclusions about
the true underlying dependence in the second-order movements of the data. It seems that
only by increasing the sampling frequency can the noise be reduced in an appropriate way.
This has become possible in finance after the availability of high-frequency data. Blair et
al. (2001) have for example reported a significant increase in forecasting ability for 1-day
ahead forecast when intraday 5-minute squared returns are used instead of daily ones.
Another interesting approach is to extract volatility from options prices. This method
uses a potentially richer information set and could therefore lead to improved forecasting
performance. For example, the traditional Black—Scholes formula can be inverted to give
an estimate of volatility under the assumption of a constant variance. Engle and Mustafa
(1992), among others, have also succesfully considered the case of ARCH-volatility in
this setting. Unfortunately, when using stochastic volatility (to be discussed in the next
subsection) several complications arise (see e.g. Wiggins (1987) and Melino and Turnbull
(1990)). Moreover, on practical side, not every asset of interest have actively traded
options and implied volatilities derived from frictionless market models may be affected
by institutional factors distorting the time series analysis (Fung and Hsieh (1991)). Option
markets may just not be sufficienty developed to allow for meaningful variations in intraday
implied volatility to be derived (Goodhart and O’Hara (1997)).
Yet another method is the use of historical ”highs and lows” as in Parkinson (1980)
who assumes that prices follow a Brownian motion with constant variance. Garman and
Klass (1980) derive several efficient estimators of volatility using highs, lows, opening,
closing, and transactions volume. Beckers (1983) test the accuracy of these estimators and
suggest an adjustment which could even be improved upon by including implied variances.
However, the generalization of these ideas to other stochastic processes allowing for time-
varying variances is not straightforward (Bollerslev et al. (1992, p. 19)).
Theoretically the most attractive way of measuring volatility seems to be the sum of
62
short-term intraday squared (or absolute) returns of a predetermined horizon (usually a
day), the so-called realized volatility (Fung and Hsieh (1991) and Andersen and Bollerslev
(1997a)). This measure is based on Merton’s (1980) seminal idea that the variance of
returns can be estimated far more accurately from the available time series of realized
returns than can the expected return.37 For example Schwert (1989) has used this ap-
proach to estimate monthly volatility from daily returns. Technically speaking, realized
volatility is a consistant estimator of the 1-day integrated volatility under the assumption
of a continuous-time diffusion. In theory the stochastic error of the measure can be re-
duced arbitrarily by increasing the sampling frequency of returns. Empirically, Andersen
and Bollerslev (1997a) have shown that realized variance takes the beloved ARCH mod-
els ”back into business” in the sense that they again seem to serve as good forecasting
devices (which has been put into serious doubt lately). The problem is however that one
does not observe the price continuously. The small time intervals are also contaminated
by microstructure effects such as ”bid-ask bounce” (see e.g. Roll (1984)) and volatility
seasonalities (see e.g. Andersen and Bollerslev (1997a)). Thus the precision to which one
can measure the volatility using high-frequency data depends on the characteristics of the
return series analyzed (Bai et al. (2001)). And finally, the realized variance approach is
computationally quite expensive.
6.2 Stochastic volatility and long-memory
Most of the modern financial theory is based on continuous-time semimartingales (see e.g.
Shiryaev (1999)). In particular, stochastic volatility (SV) models of the form
dX(t) = µ(t)dt+ σ(t)dW (t)
belong to this family (here the drift µ(t) and the instantaneous standard deviation σ(t)
are time-varying random functions and W (t) is a standard Brownian motion). Many
specifications for σ(t) are available (see Taylor (1994)). It may be, for example, that the
logarithm of the volatility follows an Ornstein—Uhlenbeck process (as in Wiggins (1987)).
37Recall that for a random walk, for example, a minimal exhaustive statistic for volatility is essentially
given by the full set of increments (Corsi et al. (2001)).
63
See Ghysels et al. (1995) for a good review. Although continuous-time models are elegant
to work with, in practice one settles for a discrete model, however. A discrete-time SV
model may be written as
yt = σtεt,
where yt denotes the demeaned return process yt = log(St/St−1)−µ, {εt} is a series of IIDrandom disturbances with mean 0 and variance 1, and the conditional variance {σ2t} ismodeled as a stochastic process {log σ2t} .= {ht}. The modeling of volatility as a stochasticvariable immediately leads to heavy tailed distributions for returns (Poon and Granger
(2003, p. 485)). Here the logarithm ensure that {σ2t} is always positive but it is notdirectly observable. Furthermore, {ht} is independent of {εt}. Notice that the model for{yt} is usually represented in the form
yt = σ exp(ht/2)εt,
where the scale parameter σ > 0 removes the need for the constant term γ in the first-order
autoregression.38
Naturally there exist many specifications for the volatility scheme {ht} such as ARMAor random walk. From the point of view of financial theory, a particularly attractive and
simple model for {ht} is an AR(1)-process ht = γ+φht−1+ηt, where ηt ∼ IID(0,σ2η), and|φ| < 1 ensures that {ht} (and hence {yt}) is strictly stationary.39 Such an autoregressiveterm introduces persistance (i.e. volatility clustering). In general, SV models are more
flexible than GARCH models because of the extra volatility noise term ηt in the volatility
equation. For example, the simple ARSV(1) specification has been shown by Carnero et
al. (2001) to be empirically more adequate than the most popularly used GARCH(1, 1).
Carnero et al. also demonstrate that ARSV(1) produces smoother volatility estimates.
Recently, a long-memory stochastic volatility (LMSV) model proposed in Breidt et al.
(1998) (and in Harvey (1998)) has caught a lot of attention. In their model log-volatility
38The estimation of SV models is however notoriously difficult and usually done by variants of the
method of moments (as in Melino and Turnbull (1990), for example). For a survey of estimation methods
for stochastic volatility models, see Broto and Ruiz (2002).39This ARSV(1) specification (proposed in Taylor (1986)) is attractive because an AR(1)-process is the
natural discrete-time approximation to a continuous-time Ornstein—Uhlenbeck process.
64
{ht} is generated by fractionally integrated Gaussian noise,(1−B)dht = ηt,
where |d| < 1/2 and ηt ∼ NID¡0,σ2η
¢. More generally, {ht} can be modeled as an
ARFIMA(p, d, q) process,
φ(B)(1−B)dht = θ(B)ηt, (21)
where φ(z) = 1 − φ1z − ... − φpzp for |z| ≤ 1 is an autoregressive polynomial of order p,
θ(z) = 1 + φ1z + ...+ φqzq is a moving average polynomial of order q, both φ(z) and θ(z)
have all of their roots outside the unit circle, and θ(z) has no roots in common with φ(z).
Notice that this model encompasses a ”short-memory” model when d = 0. Breidt et al.
argued the LMSV model to have certain advantages over observation driven models (e.g.
FIEGARCH). For example, because it is built from the widely used ARFIMA class of long-
memory models, LMSV inherits most of the statistical properties of ARFIMA models and
is therefore analytically tractable. Even the limiting distribution of the GPH-estimator
of d has been derived (see Velasco (1999) and Deo and Hurvich (2001)). Moreover, the
estimation of d is not crucially dependent on the choice of a unit discrete time interval
(Bollerslev and Wright (2000, p. 87)): although the LMSV (like ARFIMA) model is not
closed under temporal aggregation, the rate of decay of the autocovariance function of
squared (or absolute) returns is invariant to the length of the return interval.
The LMSV model is still a stationary model, however. Thus it ignores ”the known
intraday volatility patterns and the irregular occurances of market crashes, mergers and
political coups” as Jensen and Whitcher (2000) note. In particular, the long-memory
parameter d may not be constant over time. This motivated Jensen and Whitcher to
introduce a non-stationary class of long-memory stochastic volatility models with time-
varying parameters. In their model, the logarithmic transform of the squared returns is a
locally stationary process that has a time-varying spectral representation (see App. D).
This means that the level of persistance associated with a shock to conditional variance
(which itself is allowed to vary in time) is dependent on when the shock takes place.
The shocks themselves, of course, still produce responses that persist hyperbolically. In
specific, Jensen and Whitcher defined yt,T to be
yt,T = exp (Ht,T/2) εt,
65
Φ(t/T,B)(1−B)d(t/T )Ht,T = Θ(t/T,B)ηt,
where |d(u)| < 1/2, εt ∼ NID(0, 1), and ηt ∼ NID(0,σ2η) are independent of each other.The functions Φ(u,B) and Θ(u,B) are, respectively, order p and q polynomials whose
roots lie outside the unit circle uniformly in u and whose coefficients functions, φj(u),
for j = 1, ..., p, and θk(u), for k = 1, ..., q, are continuous on R. The coefficient functions
satisfy φj(u) = φj(0), θk(u) = θk(0) for u < 0, and φj(u) = φj(1), θk(u) = θk(1) for
u > 1, and are differentiable with bounded derivatives for u ∈ [0, 1]. Notice that by settingΦ(u,B) = Φ(B), Θ(u,B) = Θ(B), and d(u) = 0 for all u ∈ [0, 1], one gets the SV modelof Harvey et al. (1994). If, on the other hand, one sets d(u) = d for all u ∈ [0, 1], one getsthe LMSV model (Eq. (21)).
7 Empirical analysis
This section describes in practice how wavelet methodology gives additional insight into
volatility dynamics by time-scale decomposition. Wavelet variance at different time-scales
is related to each other to uncover possible differences among players on the market during
an IT-bubble period and its aftermath. The global and local scaling laws also provide a
consistent estimate of long-memory in volatility. Finally, the effect of volatility periodicity
on these results is studied.
7.1 Data description
The original data set included all stock transactions done at the Helsinki Stock Exchange
(HEX) between January 4 (1999) and December 30 (2002), i.e. it was so-called ”tick-
by-tick” data. Because of its highest liquidity, the stock of Nokia Oyj was chosen and
the data were discretized: 5-minute prices were extracted using the closest transaction
price to the relevant time mark.40 Discretizing is necessary for the wavelet decomposition
40The HEX is by far the most liquid market place trading Nokia: In year 2003, the HEX accounted for
62.1% of the total number of shares traded while the percentage for New York Stock Exchange (NYSE)
was only 20.3% (the HEX (May 4, 2004)). At NYSE (and NASDAQ), Nokia has the largest trading
volume ($1.5 million) of cross-listed non-U.S. companies (Citigroup (June 23, 2004)).
66
to be interpretable in terms of time-scales that capture a band of frequencies (as it is
necessary in spectral analysis, too). From a theoretical perspective discretizing can be
justified by assuming that the DGP does not vary significantly over short time intervals.41
To minimize microstructure effects, one could have also used the last transaction before
the relevant time mark (a method originally introduced by Wasserfallen and Zimmermann
(1985) and used in Hol and Koopman (2002), among others) or linearly interpolated
price (introduced by Andersen and Bollerslev (1997c)) but because of occasional liquidity
problems the closest one was considered the best compromise.42 This particular choice
should not have any significant effect to the conclusions of the subsequent analysis however.
The interval of 5 minutes has been used in many earlier studies (e.g. Andersen and
Bollerslev (1998)). It has been found ”optimal” in the sense that it is often the smallest
interval that doesn’t suffer too badly from effects such as ”bid-ask bounce” (see Campbell
et al. (1997, Ch. 3) or Gourieroux and Jasiak (2002, Ch. 14)). Concerning missing
observations for a specific time mark (such as technical breaks and incomplete trading
days), the previous price standing was always used. The 5-minute returns were then
calculated as the scaled difference between successive log-prices, i.e.,
rt,n = 100 (lnPt,n − lnPt,n−1) ,
where rt,n denotes the return for intraday period n on trading day t, with n ≥ 1 and
t = 1, ..., T. Notice that the prices Pt,n were adjusted for splits but not for dividends.
This is because there were only four divident paying days in the whole four year period
1999—2002 and their impact was very small. As a general rule, the empirical analysis was
done including overnight returns unless otherwise mentioned (like in Sec. 7.6). Finally,
the so-called ”block trades” were not removed, thus possibly causing a few artificially
generated jumps per month. Their impact is considered insignificant for the type of
analysis conducted.
41The statistical behavior of the sampled data could however differ significantly from the behavior of
the DGP from which the sample was obtained. In the current context this principle might manifest itself
through the so-called ”non-synchronous trading” effect and possibly induce negative serial correlation (see
Campbell et al. (1997, Ch. 3.1) and Lo and MacKinlay (1999, Ch. 4)).42For some other ways to deal with the ”sporadic nature” of trading, see the discussion in Goodhart
and O’Hara (1997) or Dacorogna et al. (2001, Ch. 3.2.1).
67
Table 2: Different data periods available.
Time period Trading (I) AMT (I) Trading (II) AMT (II)
1/4/99− 8/31/00 10:30—17:30 17:30—18:00 - 9:00—9:30
9/1/00− 4/11/01 10:00—18:00 18:00—18:15 - 8:30—9:00
4/17/01− 3/27/02 10:00—18:00 18:03—18:30 18:03—21:00 8:30—9:00
4/2/02− 12/30/02 10:00—18:00 18:03—18:30 18:03—20:00 8:30—9:00
At the HEX an electronic trading system called Helsinki Stock Exchange Automated
Trading and Information System (HETI) has been in use since 1990. This means that
there is no ”floor” but brokers trade electronically, the smallest ”tick-size” (i.e. price
change) being 0.01.43 As a general rule, all banking days are market days at the HEX.
From the point of view of data handling, one of the main problems were that the HEX
did not have constant trading hours during the four years. In fact, the trading day was
first extended to include evening hours, but then this trend was reversed. These changes
were mainly caused by an international pressure towards harmonization of exchange open
hours. For example, the long-run trend of longer trading days was suppressed by the weak
market conditions during the last few years.44 Therefore four different time periods were
available (see Table 2).
In Period I, from January 4 (1999) to August 31 (2000), continuous trading took place
between 10:30 a.m. and 5:30 p.m., totaling to 7 hours and 85 intraday 5-minute prices.
Transactions between 8 and 10:30 a.m. were discarded, most of them belonging to the
43Several types of market places are possible (see e.g. Gourieroux and Jasiak (2001, Ch. 14.1)). This
is important to acknowledge since different systems may affect the dynamics of price differently. So to be
precise: the HEX is a continuous, order-driven limit-order-book market place with a call auction at the
market opening. For comparison, the NYSE is an order-driven, floor-based, continuous market with a
specialist (acting as the market maker). One general advantage of a continuous market is that it provides
good intraday market information.44In the beginning of year 2004, for example, the evening trading hours at the HEX formed only around
6% of the total daily trading volume. The HEX is going to cut its trading hours to 10:00—18:20 when
joining the SAXESS-system in September, 2004. A similar cut was carried out in Deutsche Borse (from
9:00—20:00 to 9:00—17:30) in the beginning of November 2003.
68
Day 06/12/01 (17:55:53-18:53:53)
Ret
urn
0 200 400 600 800
-20
-10
010
20
Day 09/11/01 (17:58:38-18:25:18)
Time
Ret
urn
0 100 200 300 400 500
-20
-10
010
20Day 09/14/01 (17:56:13-18:25:12)
0 100 200 300 400 500
-10
-50
510
Day 04/18/02 (17:58:45-18:31:02)
Time
0 200 400 600 800
-15
-50
510
15
Figure 7: Four examples of trading days that experienced extremely high return variability
during the AMT (I), 6:03—6:30 p.m (all transactions included).
69
after market trading II (AMT (II)) taking place between 9 and 9:30 a.m.45 Likewise,
transactions between 5:30 and 6 p.m. were discarded because they belonged to the AMT
(I). Only one day, April 20 (2000), was an incomplete day. In total, there were 419 trading
days resulting in 35,615 (= 419 ∗ 85) price observations (i.e. 35,614 return observations).In Period II, from September 1 (2000) to April 11 (2001), continuous trading was
extended from both ends by half an hour.46 Thus trading took place between 10 a.m. and
6 p.m., totaling to 8 eight hours and 91 intraday 5-minute prices. December 12 (2000)
was an incomplete trading day. In total, there were 155 trading days resulting in 14,105
price observations.
In Period III, from April 17 (2001) to March 27 (2002), continuous trading was ex-
tended further by including evening hours from 6 to 9 p.m. A technical break (when no
transactions took place) occured every day between 6 and 6:03 p.m. Continuous trad-
ing and the AMT (I) took place simultaneously. This simultaneity required very careful
pre-filtering. Especially when the trading day experienced a big cumulative price change,
then artificially big returns (even around 20%) could appear (see Fig. 7). An example
of such a trading day was April 18 (2002) when Nokia announced its 1st quartal result
that triggered a significant price drop earlier on that day. To safeguard from the gener-
ation of artificial returns, the following pre-filtering rule was applied: prices that had a
percentage price change of more than 3% relatively to the last genuine price recorded (be-
fore the technical break at 6 p.m.) were detected as artificial and replaced by the previous
genuine price. This rule was based on a careful inspection of the data (for some other
rules, see Dacorogna et al. (2001)).47 The noise reduction obtained with this 3%-filter was
45During the AMT trading price can fluctuate between the trading range established during continuous
trading for round-lot trades (Http://www.porssisaatio.fi).46Actually, this period ended the day before, but the data of day April 11 included transactions only
up to 6:52 p.m. For simplicity, therefore, this day was ended at 6 p.m.47The percentage change was calculated relatively to the last genuine price because there is no guarantee
that two artificial prices could not be adjacent. In fact, if the percentage would be have been calculated
simply from adjacent prices, an artificial price would then have survived the filter. Admittedly, however,
there is a small ”defect” in this 3%-filter. Namely, fixing the denominator to the last genuine price is not
reasonable if there is a strong price trend in either direction. But because the AMT (I) lasted only for 27
minutes, a trend was regarded of minor importance.
70
so considerable that the difference to the non-filtered series was clearly evident by eye. In
summary, continuous trading took place for 11 hours (including the 3-minute break) and
produced 133 intraday 5-minute prices. There were no incomplete trading days. In total,
there were 237 trading days resulting in 31,521 price observations.
In Period IV, from April 2 (2002) to December 30 (2002), continuous trading was cut
from the end by an hour. So continuous trading took place between 10 a.m. and 8 p.m.
(apart from the technical break and simultaneity just described) totaling to 10 hours and
121 intraday 5-minute prices. The same 3%-filter was employed. There were no incomplete
trading days in this period either. In total, there were 188 trading days resulting in 22,748
price observations.
7.2 Preliminary data analysis
Statistical key figures of Periods I and III are summarized below (Table 3). There are
at least two reasons for preferring to analyze Periods I and III over the other two. First,
Periods I and III are of approximately equal size and also contain the greatest number
of observations which is convenient from the statistical inference point of view. Second,
Periods I and III represent turbulent and calm regimes, respectively: Period I is represen-
tative of the ”IT-bubble” and Period III of its aftermath.48 The volatilities of Periods I
and III seem to differ by simply eyeballing the return series (see the bottom plots of Figs.
8 and 9).49 This observation is valuable because structural breaks can generate artificial
long-memory (see e.g. Lamoureux and Lastrapes (1990b), Granger and Hyung (1999),
and Diebold and Inoue (2001)). In particular, Mikosch and Starica (2004) have argued
that long-memory might be due to non-stationarity, thus lending stationary models (e.g.
GARCH) inappropriate over longer horizons. So it is safer to analyze these periods sep-
arately. There is another aspect in which Periods I and III differ from each other: the
former has a strong positive trend component while the latter does not (see the top plots
48Polzehl et al. (2004) find that the ”2001 recession” in the U.S. might have started as early as October
2000 and ended as late as the Summer of 2003 which neatly supports these two categories.49The standard deviation of returns of Period I is actually smaller than that of Period III (0.3789 and
0.3869, respectively). Similarly, the means of absolute returns (proxying volatility) are 0.1874 and 0.2287,
respectively. Wavelet variances will shed more light on this counterintuitive finding.
71
Table 3: Statistical key figures of Periods I and III.Period I
Min. 1st Q. Med. Mean 3rd Q. Max.
−11.61 −0.1118 0 +3.747e− 03 0.1161 10.97
Std.
0.3789Period III
Min. 1st Q. Med. Mean 3rd Q. Max.
−11.16 −0.1523 0 −6.798e− 04 0.1534 14.05
Std.
0.3869
of Figs. 8 and 9). Because a trend is another possible source of spurious long-memory
(e.g. Bhattacharya et al. (1983)), one may then a priori expect Period I to show stronger
long-memory.
The sample autocorrelation functions (ACFs) of returns in Periods I and III differ from
each other in a non-trivial way (see the top plots of Figs. 10 and 11). It seems that there
is a statistically significant pattern in Period I: the opening of the HEX as well the U.S.
markets (New York) at 5:30 p.m. (Central European Time +1) has caused some linear
dependence.50 This finding does not necessarily imply any kind of arbitrage opportunities
in economic sense, however. When transactions costs are included, a minor amount of
autocorrelation is consistent with a martingale process and the notion of efficient mar-
kets (see Fama (1970)). Considering the slightly different results of Period III, it seems
that markets became more liquid and efficient. A bit surprisingly, though, in Period I
no significant negative autocorrelation of MA(1) type at lag one exists that is typically
reported (Andersen and Bollerslev (1997b) found it to be −0.04 in the FX markets with5-minute data) and attributed to bid-ask bounce. In Period III, a significant negative
first-lag autocorrelation (−0.08) does appear, however. It is then somewhat puzzling howincreased liquidity would be related to the appearance of negative first-lag autocorrela-
tion. One possibility is that the sizes of the trades have increased as well, thus causing
the microstructure effects to last longer. In the analysis below the MA(1) dynamics have
not been considered that essential however and therefore they have not been filtered out.
To proxy volatility, I used absolute returns (for reasons explained in Sec. 6.1). The
50The NYSE opens at 9:30 a.m. local time (Eastern Standard Time). When comparing the figures,
recall that the length of the trading day was different in Periods I and III.
72
Period I (January 4, 1999 - August 31, 2000)
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000 35000
3.0
3.5
4.0
Time
log-
Ret
urn
0 5000 10000 15000 20000 25000 30000 35000
-10
-50
510
Figure 8: Price and return series of Period I (IT-bubble period).
73
Period III (April 17, 2001 - March 30, 2002)
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000
2.6
3.0
3.4
Time
log-
Ret
urn
0 5000 10000 15000 20000 25000 30000
-10
05
10
Figure 9: Price and return series of Period III (aftermath period).
74
sample ACFs of absolute returns stay significantly positive for a long time in both periods,
statistically as well as economically (see the bottom plots of Figs. 10 and 11). In Period
III, for example, the first-lag autocorrelation (0.32) is well above the confidence interval
(Andersen and Bollerslev (1997b) found 0.309). Clearly, then, returns are not independent.
Although the pattern is quite similar in both periods, there are some important differences
here too. First, the ACF peaks higher in Period I than in Period III. This peak is caused
by the large (on average) overnight return in Period I. The larger ”overnight effect” is
most probably caused by the frequent news arrivals, the hype that took place during the
bubble, and the shorter trading day at the HEX (so that information had more time to
accumulate over night). Second, in Period I the first peak just prior to the highest peak
is a reflection of the opening of the New York stock markets, i.e. they are the ”New York
effect” (discussed more closely in Sec. 7.6).51 In Period III no distinct New York effect
exists in autocorrelations which is probably due the weaker link between the U.S. and
European markets after the burst of the IT-bubble.
7.3 Multiresolution decomposition
A description of the long-run dynamics is achieved conveniently by a wavelet MRA. A
MODWT MRA(J = 14) of price using a LA(8) filter (with reflecting boundary) produces
a set of wavelet smooths with varying amount of details included. These smooths show,
for example, that Period I has a strong positive trend while Period III does not (see Figs.
12 and 13). Notice that all the smooths are automatically aligned in time with the original
series (see Eq. (14)). Furthermore, these smooths converge to the original price series as
more and more details are being added.52 Concerning volatility, however, very little can
be inferred from these figures.
51Of course it is possible that other market places affect volatility at the HEX too but as will be
demonstrated later (in Sec. 7.6), the average intraday volatility peaks consistently at the opening of the
New York market. In the literature, volatility spillover effects have been reported for example by Engle
et al. (1990). A possible source of ”meteor showers” (as they call them) are heterogoneous expectations
(Hogan and Melvin (1994)).52These ”moving averages” could be applied in, e.g., forecasting in the spirit of ”double” and ”triple
crossing” methods (methods that are shortly discussed in Gencay et al. (2002a, pp. 48—49)).
75
ACF of Returns (Period I)
AC
F
0 50 100 150 200 250 300
-0.0
4-0
.01
0.01
ACF of Absolute Returns
Lag (5 minutes)
AC
F
0 50 100 150 200 250 300
0.0
0.1
0.2
0.3
Figure 10: The sample ACFs of returns and absolute returns in Period I. The 95% confi-
dence interval (dashed line) is for Gaussian white noise: ±1.96/√N .
76
ACF of Returns (Period III)
AC
F
0 100 200 300 400
-0.0
8-0
.04
0.00
ACF of Absolute Returns
Lag (5 minutes)
AC
F
0 100 200 300 400
0.00
0.10
0.20
0.30
Figure 11: The sample ACFs of returns and absolute returns in Period III.
77
Price (Period I)
log(
Pric
e)
0 5000 15000 25000 35000
3.0
3.5
4.0
wavelet smooth (level 6)
log(
Pric
e)
0 5000 15000 25000 35000
3.0
3.5
4.0
wavelet smooth (level 8)
Time
log(
Pric
e)
0 5000 15000 25000 35000
3.0
3.5
4.0
wavelet smooth (level 10)
0 5000 15000 25000 35000
3.0
3.5
4.0
wavelet smooth (level 12)
0 5000 15000 25000 35000
2.8
3.2
3.6
4.0
wavelet smooth (level 14)
Time
0 5000 15000 25000 35000
2.8
3.2
3.6
4.0
Figure 12: Price series of Period I and its wavelet smooths of varying levels.
78
Price (Period III)
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000
2.6
3.0
3.4
wavelet smooth (level 6)
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000
2.8
3.2
3.6
wavelet smooth (level 8)
Time
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000
2.8
3.2
3.6
wavelet smooth (level 10)
0 5000 10000 15000 20000 25000 30000
2.8
3.0
3.2
3.4
3.6
wavelet smooth (level 12)
0 5000 10000 15000 20000 25000 30000
3.0
3.2
3.4
3.6
wavelet smooth (level 14)
Time
0 5000 10000 15000 20000 25000 30000
3.15
3.25
3.35
Figure 13: Price series of Period III and its wavelet smooths of varying levels.
79
In order to study volatility at different time-scales, the MODWT(J = 12) is performed
to absolute returns using LA(8) (with reflecting boundary). The first 12 wavelet levels
with the corresponding time-scales and associated changes are listed below (see Table 4).
Notice that I here use Daubechies’ indexing so that a bigger level is associated with a
larger time-scale (instead of Mallat’s convention; see Sec. 4.2).53 When interpreting the
time-scales in ”calender time”, one should be careful with the period in question since the
length of the trading day has varied (see Sec. 7.1). So, for instance, in Period I the first 6
levels correspond to intraday (and daily) dynamics capturing frequencies 1/64 ≤ f ≤ 1/2,i.e. oscillations with a period of 10 − 320 minutes (approx. 5 hours). In Period III, onthe other hand, the first 7 levels correspond to intraday (and daily) dynamics capturing
frequencies 1/128 ≤ f ≤ 1/2, i.e. oscillations with a period of 10− 640 minutes (approx.11 hours). In terms of changes (not oscillations), then, the 6th level in Period I corresponds
to approximately a half of a trading day. In Period III this corresponds to the 7th level.
These levels will serve as a watershed between intraday and interday dynamics.
The MODWT wavelet coefficients at different levels j are useful as a descriptive tool
(see Figs. 14—15 and 16—17). The approximate zero-phase filter property (i.e. alignment
in time) is readily apparent: rapid changes in volatility stand out at the smallest scales
(i.e. highest frequencies). As the scale gets bigger (frequency lower), the changes tend
to be smoothed out because a wider filter averages more. For example, in Period III the
large spike in volatility between observations 5, 000 and 10, 000 (see Fig. 16) has died
out already at the 6th level (see Fig. 17). On the other hand, the spike between 10, 000
and 15, 000 continues to prevail even at the 10th level. This means that the former spike
was a high-frequency event only while the latter was a more severe and longer lasting
burst of volatility. Short-time speculators and long-term investors would then have to
react differently in such an event: the former would be a concern for speculators but the
latter would interest investors as well. And since the wavelet coefficients in theory form
a stationary series at each level (see Sec. 5.5), the same statistical characteristics should
53An unfortunate consequence of the dyadic dilation is that time-scales become coarse rapidly so that
not all of the potentially interesting scales are recovered. Thus the non-dyadic extension (Pollock and Lo
Cascio (2003, 2004)) might be worthwhile to look at (see Footnote in Sec. 4).
80
Table 4: Wavelet levels and time-scales.
Level Scale Associated with changes of
1 1 5 min.
2 2 10 min.
3 4 20 min.
4 8 40 min.
5 16 80 min.
6 32 160 min. ≈ 3 h.7 64 320 min. ≈ 5 h.8 128 640 min. ≈ 11 h.9 256 1280 min. ≈ 21 h.10 512 2560 min. ≈ 43 h.11 1024 5120 min. ≈ 85 h.12 2048 10240 min. ≈ 171 h.
persist in the future also (forecasting is not considered explicitly here, however).54
The MODWT coefficients lend themselves to a quantitative study because of their
energy preserving property (see Eq. (15)). Few general observations can be made immedi-
ately. For example, the unconditional distributions convergence from a highly leptokurtic
distribution to a Gaussian (see Figs. 18 and 19). In Period I, for instance, Gaussianity
is reached at the 10th level. There the Jarque—Bera test statistic is 5.1067 (with p-value
of 0.07782). In Period III it is 7.8698 (0.01955) at the 10th level but the distribution
continues to vary its shape. Notice that the mean is zero at all levels (a consequence of
the zero average property of wavelets) while the range gets constantly smaller as the levels
grow. A closer look at the unconditional wavelet variance is the next topic.
54This does not imply, however, that for example the ”cycle” visible at the end of the 10th level of
Period I is something that could be easily exploited. While it is not an artifact of the wavelet filter,
there is simply no reason why such a cycle should persist in an efficient stock market. A wavelet based
forecasting tool for short and long-memory time series is presented for example by Renaud et al. (2002)
who argue the concept to be ”very simple and easy to implement” with ”significant potential”.
81
Volatility (January 4, 1999 - August 31, 2000)
Abso
lute
Ret
urn
0 5000 10000 15000 20000 25000 30000 35000
02
46
810
wavelet coefficients (level 2)
Cha
nge
0 5000 10000 15000 20000 25000 30000 35000
-3-1
12
34
wavelet coefficients (level 4)
Time
Cha
nge
0 5000 10000 15000 20000 25000 30000 35000
-1.0
0.0
1.0
Figure 14: Volatility and the MODWT wavelet coefficients of Period I (j = 2 and 4).
82
wavelet coefficients (level 6)
Cha
nge
0 5000 10000 15000 20000 25000 30000 35000
-0.5
0.0
0.5
wavelet coefficients (level 8)
Cha
nge
0 5000 10000 15000 20000 25000 30000 35000
-0.2
0.0
0.2
wavelet coefficients (level 10)
Time
Cha
nge
0 5000 10000 15000 20000 25000 30000 35000
-0.0
50.
000.
05
Figure 15: The MODWT wavelet coefficients of Period I (j = 6, 8 and 10).
83
Volatility (April 17, 2001 - March 30, 2002)
Abso
lute
Ret
urn
0 5000 10000 15000 20000 25000 30000
02
46
812
wavelet coefficients (level 2)
Cha
nge
0 5000 10000 15000 20000 25000 30000
-20
24
6
wavelet coefficients (level 4)
Time
Cha
nge
0 5000 10000 15000 20000 25000 30000
-1.0
0.0
1.0
Figure 16: Volatility and the MODWT wavelet coefficients of Period III (j = 2 and 4).
84
wavelet coefficients (level 6)
Cha
nge
0 5000 10000 15000 20000 25000 30000
-0.6
-0.2
0.2
0.6
wavelet coefficients (level 8)
Cha
nge
0 5000 10000 15000 20000 25000 30000
-0.1
0.1
0.3
wavelet coefficients (level 10)
Time
Cha
nge
0 5000 10000 15000 20000 25000 30000
-0.0
50.
05
Figure 17: The MODWT wavelet coefficients of Period III (j = 6, 8 and 10).
85
1
Freq
uenc
y
-4 -2 0 2 4 6
020
000
2
Freq
uenc
y
-2 0 2 4
020
000
3
Freq
uenc
y
-1 0 1 2
020
000
4
Change
Freq
uenc
y
-1.0 0.0 0.5 1.0 1.5
020
000
5
-0.5 0.0 0.5 1.0
020
000
6
-0.5 0.0 0.5
015
000
7
-0.4 -0.2 0.0 0.2 0.4
020
000
8
Change
-0.2 0.0 0.1 0.2 0.3
015
000
9
-0.15 -0.05 0.05 0.15
015
000
10
-0.05 0.00 0.05
060
00
11
-0.05 0.00 0.050
6000
12
Change
-0.06 -0.02 0.02 0.06
060
00
Figure 18: Unconditional distributions in Period I (j = 1, ..., 12).
86
1
Freq
uenc
y
-4 -2 0 2 4 6
020
000
2
Freq
uenc
y
-4 -2 0 2 4 6
020
000
3
Freq
uenc
y
-2 -1 0 1 2 3
020
000
4
Change
Freq
uenc
y
-1.5 -0.5 0.5 1.5
020
000
5
-1.0 0.0 0.5 1.0 1.5
020
000
6
-0.5 0.0 0.5
015
000
7
-0.4 -0.2 0.0 0.2 0.4
015
000
8
Change
-0.2 0.0 0.2
015
000
9
-0.1 0.0 0.1 0.2
010
000
10
-0.05 0.00 0.05 0.10
060
00
11
-0.10 0.00 0.05 0.100
1000
0
12
Change
-0.05 0.00 0.05 0.10
060
00
Figure 19: Unconditional distributions in Period III (j = 1, ..., 12).
87
7.4 Global scaling laws and long-memory
Several authors have provided evidence of monotonic scaling laws in the FX markets (e.g.
Muller et al. (1990, 1993), Guillaume et al. (1997), and Andersen et al. (2000)) but less
so in the stock markets. This is probably because of the larger turnover, higher liquidity,
and lower transaction costs in the FX markets. However, in both markets it is possible
that a single scaling factor is appropriate only in a subset of time-scales. To study this
”multi-scaling” in the FX markets, Gencay et al. (2001) used wavelet methodology. Using
absolute returns as volatility proxy, they confirmed that a different scaling regime exists for
intradaily time-scales than does for interday and larger time-scales. It is now interesting
to see if (i) the same phenomena appears with stock market data, (ii) the scaling factor is
stable in time, and if (iii) there is any reasonable explanation for such phenomena.
The MODWT coefficients of absolute returns of Periods I and III were again formed
by the MODWT(J = 12) using LA(8). The reflecting barrier seemed to suffer less from
the boundary effects at large levels than the periodic one so the former was used (see the
top subplots of Fig. 20). The good localization properties of wavelets are able to reveal
that most of the total energy of volatility is located at the smallest time-scales (the highest
frequencies).55 The relationship is actually approximately hyperbolic which is observed
as an approximate linear relationship on a double-logarithmic scale. Notice that although
there is no reason a priori to exclude any specific time-scale from the analysis, the results
from the smallest time-scale (level 1) in Period III are to be interpreted a bit cautiously
because of the negative autocorrelation in returns. Notice also that only the Gaussian
confidence bands were calculated (see the bottom plots of Fig. 20). This in mind, there
seems to exist two different scaling regions in Period I with a visible break at the seventh
level associated with 320-minute changes or oscillations with a period of approximately
640 minutes (see Fig. 21). The first six levels capture frequencies 1/64 ≤ f ≤ 1/2, i.e.oscillations with a period of 10−320 minutes corresponding to intraday dynamics in PeriodI.56 The seventh and higher levels are related to one day and higher dynamics.
One might expect a break at the 7th level in Period III because of its longer trading day,
55As Dr. Stephen Pollock suggested to me at the ”Workshop on Computational Econometrics and
Statistics” (Neuchatel, Switzerland 2004), financial markets tend to ”shriek” under stress.56The first level is discarded below to minimize microstructure effects (just in case).
88
Period I (a)lo
g(W
avel
et v
aria
nce)
2 4 6 8 10 12
-11
-9-8
-7-6
-5
Period I (b)
Scale index
log(
Wav
elet
var
ianc
e)
2 4 6 8 10 12
-12
-10
-8-6
Period III (a)
2 4 6 8 10 12
-11
-9-8
-7-6
-5
Period III (b)
Scale index
2 4 6 8 10 12
-12
-10
-8-6
Figure 20: Wavelet variances of Periods I (on left) and III (right) on a double-logarithmic
scale. The upper plots show the result using reflecting (cts line) and periodic boundary
(dotted). The lower plots show the Gaussian 95% confidence intervals with reflecting
boundary only.
89
Wavelet variances (Period I and III)
Scale index
log(
Wav
elet
var
ianc
e)
2 4 6 8 10 12
-12
-10
-8-6
Figure 21: The wavelet variances of Periods I (cts line) and III (dashed). The Gaussian
95% confidence interval (dotted) of Period III has been drawn to address the significance.
90
Level 1 wavelet coefficients (Period I)
Cha
nge
0 5000 10000 15000 20000 25000 30000 35000
-4-2
02
46
Level 1 wavelet coefficients (Period III)
Time
Cha
nge
0 5000 10000 15000 20000 25000 30000
-4-2
02
46
Figure 22: The 1st level MODWT wavelet coefficients of Periods I and III compared.
91
but there is only a slight one at the 6th level. Indeed, the difference between the scaling
laws of Periods I and III is most evident at level 6. This is visible even by eye when the
wavelet coefficients of level 6 are compared to each other (see Figs. 15 and 17). Clearly
Period I experienced more middle-sized jumps in this particular time-scale than Period III
did (and hence the outlook in Period I is fatter). This observation is however not enough
to explain the extremely jumpy look of Period I. Because sudden jumps are high-frequency
events, they should be well captured by the 1st level. This intuition is confirmed by the
1st level wavelet variance of Period I which lies outside the 95% confidence interval of
Period III, as well (see Fig. 21). By plotting the level 1 wavelet coefficients side-by-side,
the difference becomes obvious (see Fig. 22). So the ”more volatile” outlook of Period I
is mainly caused by the different dynamics at levels 1 and 6 corresponding to 5-minute
and approximately 3-hour changes, respectively. Now the difference in the overall level
of volatility can be attributed to specific time-scales and short-run speculators in general.
More precisely, the jumps at the 1st level measure the flow of new information and the
general level of nervousness of the market. It is easily confirmed that most of these jumps
are caused by overnight returns (for reasons stated in Sec. 7.2). The difference at the 6th
level is not so easily interpretable, though. It may be due the volatility seasonality that
is particularly strong in Period I. This will be studied more carefully later (in Sec. 7.6).
Scaling laws are intimately related to the memory of the DGP. The observed initial
rapid decay of the sample autocorrelation followed by a very slow rate of dissipation (see
Sec. 7.2) is characteristic of slowly mean-reverting fractionally integrated processes that
exhibit hyperbolic rate of decay (i.e. long-memory).57 From a statistical point of view
the quantification of this decay is important as standard statistical tools for inference are
invalid in the case of long-memory. For example, standard errors for the estimates of
the coefficients of ARCH or stochastic volatility models would be incorrect and hence the
confidence intervals for predictions (Lobato and Savin (1998); see also Beran (1994)). Eco-
nomically long-memory has consequences for option pricing. For instance, long-memory
has a significant impact upon the term structure of implied volatilities (see Taylor (2000)).
And of course, estimation of the fractional differencing parameter d allows the use of long-
57Basic ARCH-models exhibit exponential rate of decay and fail in this respect (see e.g. Bollerslev and
Mikkelsen (1996), Breidt et al. (1998), Ding et al. (1993), and Granger and Ding (1996)).
92
memory stochastic volatility models (such as LMSV) for simulation and forecasting.
The semiparametric wavelet domain method is in theory better suited for estimating
the rate of the decay than the one based on spectral density (see Sec. 5.5). Using Equation
(18), the estimation of the fractional differencing parameter d is therefore done for Periods
I and III by the OLS. The same type of approach has been used by Jensen (2000) and
Tkacz (2000), for example. Following Ray and Tsay (2000), the standard errors obtained
from regression theory are used to judge the significance. Overall, the estimates of d
support the conjectured long-memory (see Table 5). Period I has a slightly larger value
than Period III which could in principle be caused by the strong trend in the former.
However, because of the 4 embedded differencing operations of LA(8) (see Sec. 4.5) this
is unlikely as Craigmile et al. (2004) have shown. The coefficients using levels j1 = 2, ..., 6
and j2 = 7, ..., 10 in Period III do not differ statistically but the results of Period I are not
as evident (and are to be discussed later). The relatively short time-span used (approx.
1.5 years) can be criticized in this context and it has been a topic of debate in past years
in fact. It is true that when estimating long-memory dependencies in the mean, the small
sample bias depends crucially on the time span of the data. But the most recent evidence
(see Andersen and Bollerslev (1997a, 1997b) and Bollerslev and Wright (2000)) suggests
that the performance of the estimates from the volatility series may be greatly enhanced
by increasing the observation frequency instead of the time-span. In particular, Bollerslev
and Wright (2000) have argued that high-frequency data allows for vastly superior and
nearly unbiased estimation of d.
7.5 Local scaling laws and long-memory
The assumption of a constant long-memory structure may not always be reasonable.
Bayraktar et al. (2003) tackled the problem of time-varying long-memory by segment-
ing the data before the estimation of the Hurst coefficient H(t) (a closely related measure
of long-memory, see e.g. Beran (1994)). But this scheme might not always be sufficient
as Whitcher and Jensen (2000) have pointed out. In particular, they argued that ”the
ability to estimate local behavior by applying a partitioning scheme to a global estimating
procedure is inadequate when compared with an estimator designed to capture time-varying
93
Table 5: OLS-regression results.Period I
Levels Coefficient Std. error t-value P(> |t|) bd2− 10 −0.62044 0.05016 −12.37 5.19e− 06 0.18978
2− 6 −0.44433 0.09108 −4.879 0.01646 0.277835
7− 10 −0.37730 0.02267 −16.65 0.003590 0.31135Period III
Levels Coefficient Std. error t-value P(> |t|) bd2− 10 −0.64082 0.02061 −31.10 9.18e− 09 0.17959
2− 6 −0.56431 0.04396 −12.84 0.00102 0.217845
7− 10 −0.57769 0.03998 −14.45 0.00475 0.211155
features”. In this respect the work of Goncalves and Abry (1997), who estimated a local
scaling exponent for a continuous-time multifractal Brownian motion, seems more appro-
priate. Unfortunately, their approach involves the construction of non-standard wavelets
which hinders practical implementation. To overcome this difficulty, Whitcher and Jensen
(2000) introduced an estimator based on the MODWT that allowed them to stay in the
traditional ARFIMA framework.
In contrast to Jensen andWhitcher (2000) who use log-squared returns to proxy volatil-
ity I continued to use absolute returns in order to prevent an inlier problem (see Sec. 6.1).
To let Equation (20) hold, I then implicitly assume that absolute returns are generated
by a locally stationary process. Considering the jumps and clustering of volatility, this
assumption seems more reasonable than covariance stationarity (although no formal tests
were conducted). Using only the smallest levels in the OLS-regression resulted in esti-
mates of d(u) that varied in a white noise fashion. This is in agreement with the argument
of Jensen and Whitcher that intraday levels are irrelevant to long-memory phenomena.
But this contradicts the findings of Andersen and Bollerslev (1997b) that intraday volatil-
ity can be informative even in the long-run. Because of this contradiction, and the fact
that 5-minute returns are still subject to bid-ask bounce at the first lag in Period III, I
considered only the 1st level as uninformative. On the other hand, using only the larger
levels resulted in a severe unstability of the estimate, most probably due the small support
94
(only 4 levels).58 Unsurprisingly, levels 2− 10 together gave the most reliable results andtherefore they are cited below.
Consistent with the global results, the median of the local long-memory parameter
estimate of Period I is larger compared to that of Period III (see Table 6). In general, the
estimate of d(t) shows similar characteristics in both periods (see Figs. 23 and 24): most
of the time the estimate stays in the interval (0, 1/2) indicative of stationary long-memory
although ”outliers” tend to pull the estimate downwards and ”out of bounds”. Fortunately
however, the occasional crossovers should not present a serious modeling problem since
the process is mean reverting.59 The estimate also stabilizes during less volatile times.
Notice that an increase in the estimate during periods of steady growth (or decline) is in
agreement with the definition of long-memory. Futhermore, the estimates are uncondi-
tionally Gaussian (see Fig. 25). The distributions are slightly skewed to the left because
of the large drops (especially in Period I). The idea of finding a structure in d(t) (such as
ARFIMA) that could be exploited in forecasting may be hampered by the possibility of
long-memory being spurious, however. For instance, structural breaks might have affected
the estimate of d(t) upwards and hence the timing and size of the breaks would then
become an equally important research problem (see Granger and Hyung (1999)). Fortu-
nately, no clear sign of a structural break is visible in either period. In fact, one of the
reasons for the division of the data was to avoid this argument (see Sec. 7.2). Regarding
modeling with the locally stationary LMSV model (see Sec. 6.2), the possibility of mis-
specification has to be acknowledged, too. In fact, the medians of d(t)s in both periods are
different (larger, in fact) from the corresponding ds although they are expected to match
quite closely (see Sec. 5.5). It is possible that the jumps that pull the estimate downwards
are too frequent and severe. The modeling of d(t) and the potential problems involved are
not studied further here, however.
58Recall that levels larger than level 10 were seriously affected by the boundary and were therefore
excluded. Nevertheless, it is probable that including levels 11 and 12 would stabilize the estimate a bit,
but this is computationally very costly.59I’m indebted to Prof. Christian Gourieroux for pointing this out at the Economics and Econometrics
of the Market Microstructure Summer School (Constance, Germany 2004).
95
Local long-memory parameter estimates (Period I)
d(t)
0 5000 10000 15000 20000 25000 30000 35000
-0.2
0.2
0.6
1.0
Return
log(
Ret
urn)
0 5000 10000 15000 20000 25000 30000 35000
-10
-50
510
Price
Time
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000 35000
3.0
3.5
4.0
Figure 23: Local long-memory parameter estimates of Period I. Return and price series
are plotted below to align the features in time.
96
Local long-memory parameter estimates (Period III)
d(t)
0 5000 10000 15000 20000 25000 30000
-0.2
0.2
0.6
Return
log(
Ret
urn)
0 5000 10000 15000 20000 25000 30000
-10
05
1015
Price
Time
log(
Pric
e)
0 5000 10000 15000 20000 25000 30000
2.6
3.0
3.4
Figure 24: Local long-memory parameter estimates of Period III.
97
Period I
Time
d(t)
0 5000 15000 25000 35000
-0.2
0.2
0.6
1.0
d(t)
Freq
uenc
y
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
010
0030
0050
00Period III
Time
0 5000 15000 25000
-0.2
0.0
0.2
0.4
0.6
d(t)
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
010
0030
0050
00
Figure 25: Unconditional distributions of the estimate of d(t) are Gaussian.
98
Table 6: Statistical key figures of time-varying long-memory.Period I
Min. 1st Q. Med. Mean 3rd Q. Max.
−0.3426 0.2084 0.3117 0.3020 0.4056 1.0313
Jarque—Bera
X2 df p
5.5039 2 0.0638Period III
Min. 1st Q. Med. Mean 3rd Q. Max.
−0.3287 0.1663 0.2505 0.2412 0.3262 0.6624
Jarque—Bera
X2 df p
3.6219 2 0.1635
7.6 Effects of volatility periodicity
The effect of intraday volatility periodicity on the global and local scaling laws and long-
memory will now be studied. Martens et al. (2002) have argued that the intraday patterns
in the FX and stock markets are so distinctive that there is ”a strong case for taking them
into account before attempting to model the dynamics of volatility”.60 Note first that, on
average, the shape of intraday volatility is similar in Periods I and III (see Figs. 26 and 27).
Note also that the overnight 5-minute returns are now excluded. Although this clearly has
not removed all the overnight effects (the first intraday interval still exhibiting considerably
larger volatility than the rest of the intervals), the first interval is not modified. After the
highly volatile first 5 minutes the average volatility calms down smoothly and stabilizes
although some predetermined days have significant turbulence at midday because of the
announcement of quartal reports; a finding that is consistant with the results obtained
by Felixson (2004) at the HEX. At afternoon hours, however, the behavior of volatility
becomes abrupt again. The first peak occurs at 3:35 p.m. and the next one at 4:35. The
former 5-minute peak is most probably due to regular U.S. macro news announcements61
and the latter is the New York effect (reported in Sec. 7.2). There is also a small but
distinct 5-minute peak half an hour later at 5:05 which is probably caused by macro news,
too. In Period III the highest peak is at 6:05 p.m. (and right after it) when the AMT (I)
starts. It is very likely that this is only an artifact of which the 3%-filter (described in
Sec. 7.1) was unable to totally remove. The last 5 minutes of trading also experience a
60In the FX markets the periodicity is clearly associated with the opening and closing of various financial
centers around the world (see e.g. Dacorogna et al. (1993) or Andersen and Bollerslev (1997a, 1997b)).61The most important U.S. macro news announcements are released at 8:30 (e.g. employment report)
and 10 a.m. (e.g. Humphrey—Hawkins testimony) Eastern Standard Time.
99
sudden but relatively small increase in volatility in both periods. Thus the general average
pattern is an ”inverse-J”. Such a pattern is reported in other markets too (see e.g. Wood
et al. (1985), Harris (1986) and Lockwood and Linn (1989)).
The wavelet method could be used to annihilate intraday dependencies. This would
work in a similar fashion as a low-pass filtering technique based on a two-sided weighted
average of both past and future absolute returns used in Andersen and Bollerslev (1997b).
Unfortunately, though, by considering the interdaily and longer dynamics (i.e. the wavelet
smooth of level J ≥ 6) as proposed by Gencay et al. (2000), I was not able to reproducethe hyperbolic decay in the sample ACF of the filtered series. So the intraday seasonalities
were removed by the Fourier flexible form (FFF), instead (see App. E). Luckily the FFF
is straightforward to execute and it has been succesfully applied in the FX as well as stock
markets previously (see e.g. Andersen and Bollerslev (1997c, 1998)). Although the results
are quite adequate here as well, it may be that the FFF is not so well-suited for individual
stocks which tend to have more outliers and abrupt volatility patterns. Clearly the results
also depend in a fairly complex way on howmany sinusoids P are included in the regression.
By simply increasing P does not necessarily imply a better result in every respect: the
filtered returns tend to become more autocorrelated (see the top subplot of Fig. 28). On
the other hand, a small P does not necessarily remove all the intraday patterns (see the
bottom subplots of Figs. 28 and 29). My choice was to settle for the smallest number of
sinusoids that gave a reasonable fit. This meant setting P = 3 and 4 for Period I and III,
respectively. Although Andersen and Bollerslev (1997c) argued the daily volatility factor
(i.e. J ≥ 1) to possibly be important in the FFF-regression in the case of stock markets,now it did not make that much of a difference and it was therefore left out (in fact, using
J = 1 only seemed to emphasize some outliers in the filtered return series). The inclusion
of dummy variables was found particularly important when J = 0, however. In Period I
two dummies at n = 1 and 61 were considered essential (see below), corresponding to the
market opening (10:35 a.m.) and the U.S. macro news announcements (3:35 p.m.). In
Period III additional dummies had to be introduced at n = 97, 98 and 99 corresponding
to the beginning of the AMT (I). To be precise, the FFF-regression fit for Period I is
bf (θ;n) = −4.06(0.94)∗∗∗
+ 1.29(2.69)
n
N1− 0.27
(0.90)
n2
N2+ 2.29
(0.34)∗∗∗1n=d1+ 1.23
(0.27)∗∗∗1n=d2
100
+
µ1.20(0.52)∗
cosn2π
N+ 0.23
(0.10)∗sinn2π
N+ 0.33
(0.13)∗cos
n4π
N
− 0.05(0.06)
sinn4π
N+ 0.14
(0.07)∗cos
n6π
N− 0.06
(0.05)sinn6π
N
¶,
while for Period III the fit is
bf (θ;n) = 1.85(0.87)∗
− 11.17(2.54)∗∗∗
n
N1+ 3.54
(0.84)∗∗∗
n2
N2+ 1.62
(0.27)∗∗∗1n=d1+ 0.74
(0.22)∗∗∗1n=d2
+ 0.14(0.23)
1n=d3+ 1.32(0.23)∗∗∗
1n=d4+ 0.91(0.23)∗∗∗
1n=d5 +
µ− 2.19
(0.50)∗∗∗cos
n2π
N
− 0.55(0.07)∗∗∗
sinn2π
N− 0.63
(0.13)∗∗∗cos
n4π
N+ 0.54
(0.04)∗∗∗sinn4π
N+ 0.13
(0.06)∗cos
n6π
N
+ 0.18(0.04)∗∗∗
sinn6π
N− 0.03
(0.04)cos
n8π
N− 0.04
(0.03)sinn8π
N
¶,
where the numbers in parentheses are standard errors and the asterices are significance
codes (for 0.001, 0.01, and 0.05).62
It is easy to observe from the figures that the associated volatility shocks of pre-
determined events (used as dummies) are short-lived. In fact, a new equilibrium price
is found in just 5 minutes which is consistant with the findings from the FX (Andersen
and Bollerslev (1998)) and the U.S. treasury bond markets (Bollerslev et al. (2000))
(see also Engle and Ng (1993)). Following Andersen and Bollerslev (1998, p. 244) the
regression coefficients for the dummy variables can be interpreted in such a way that
volatility for intervals n = 1 and 61 in Period I increased by exp(2.29/2) ≈ 3.14 and 1.85percent, respectively. The point estimates imply that these events were most probably
also economically, not only statistically, significant events. Their usage in constructing
arbitrage strategies is however limited by the fact that the sign of the change is unknown.
In Period III the effects of these same events were a bit weaker, accounting only for 2.25
and 1.45 percents, respectively. This indicates that markets reacted more strongly in
Period I than in Period III; an observation that is not surprising and is consistant with
observations made earlier (in Sec. 7.2). And by just eyeballing the average volatility
pattern the New York effect is seen to be relatively larger in Period I, too.
62The fit in the figures is obtained by applying |rt,n − r| = bσtN1/2 exp
³ bf(θ;σt, n)/2´ exp (but,n/2) . An-dersen and Bollerslev (1997c) are vague about this point (especially about the scaling factor).
101
Average Volatility Fit (Period I)
Index of Intraday Interval
Ave
rage
Vol
atili
ty
0 20 40 60 80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Figure 26: Average intraday volatility of Period I (cts line) and its FFF fit (dashed).
102
Average Volatility Fit (Period III)
Index of Intraday Interval
Ave
rage
Vol
atili
ty
0 20 40 60 80 100 120
0.2
0.4
0.6
0.8
Figure 27: Average intraday volatility of Period III (cts line) and its FFF fit (dashed).
103
ACF of Filtered Returns (Period I)
AC
F
0 50 100 150 200 250 300
-0.0
40.
000.
04
ACF of Filtered Absolute Returns (Period I)
Lag
AC
F
0 50 100 150 200 250 300
0.00
0.10
0.20
0.30
Figure 28: Sample ACFs of filtered returns and absolute returns of Period I. The 95%
confidence interval (dashed line) is for Gaussian white noise: ±1.96/√N .
104
ACF of Filtered Returns (Period III)
AC
F
0 100 200 300 400
-0.0
40.
00
ACF of Filtered Absolute Returns (Period III)
Lag
AC
F
0 100 200 300 400
0.00
0.10
0.20
0.30
Figure 29: Sample ACFs of filtered returns and absolute returns of Period III.
105
To be able to compare the scaling laws of the periodicity filtered returns to the original
series, the overnight returns must be omitted. Naturally this reduces the total energy of the
series. The form of the scaling laws remain about the same, though (see the upper subplots
of Fig. 30). By then comparing the scaling laws one notices that the removal of the
intraday periodicity has smoothed out the kink at the 6th level. In Period I, in particular,
the wavelet variances at larger than the 6th level have increased considerably. The law is
still not linear, however. The first region includes time-scales smaller than an hour and the
second one the rest up till 2560 minutes (approx. 43 hours). Interestingly, almost all of
the periodicity-filtered wavelet variances are significantly different from the original (with
the overnight returns excluded). In Period III, the change in the distribution of energy
across the scales is not that dramatic but the slight kink at the 6th level has disappeared,
as well. In both periods the wavelet variances at levels 11 and 12 are still behaving wildly,
thus giving support to the claim that the largest levels suffer from boundary effects.
Regarding long-memory, the estimates of d have increased in both periods (see Table
7). This is in contrast what Bayraktar et al. (2003) have found and who argued the OLS-
based wavelet variance estimation to be robust to seasonalities. One possible explanation
for this discrepancy is the FFF method used here (Bayraktar et al. use a different method):
although the mean of bst,n (see Eq. (23) in App. E) is 1 so that on average the returnsare not changed, it may be that the second-order characteristics are affected unduly (this
is not studied here, though). Also, it is known that OLS is sensitive to jumps which are
indeed more frequent in Period I than in Period III. This is however an unlikely cause
because most of the jumps are overnight returns whose exclusion did not seem to change
the shape of the scaling law. Besides, the OLS was fit to the logarithms which reduces
the impact of outliers. It is more probable that the removal of the originally stronger
intraday periodicity in Period I is really the cause of the larger change in the estimate of
d (in Period I the change is from 0.18978 to 0.27583 while in Period III the change is from
0.17959 to 0.226605).
Considering the results this far, it is no surprise that the removal of volatility periodicity
affects the local long-memory estimates as well. In fact, the median of the time-varying
d(t) increased by approximately 0.03 in both periods (see Table 8): in Period I the increase
106
Period I (a)lo
g(W
avel
et v
aria
nce)
2 4 6 8 10 12
-11
-10
-9-8
-7-6
-5
Period I (b)
Scale index
log(
Wav
elet
var
ianc
e)
2 4 6 8 10 12
-12
-10
-9-8
-7-6
Period III (a)
2 4 6 8 10 12
-11
-9-8
-7-6
-5
Period III (b)
Scale index
2 4 6 8 10 12
-12
-10
-8-7
-6-5
Figure 30: The scaling laws of the overnight return excluded series (cts line in the top
subplots) are below the original scaling laws (dark), especially in Period I. The removal
of volatility periodicity (dark in the bottom subplots) smooths out the kink found at the
6th level previously. The Gaussian 95% confidence intervals are drawn (dotted thin line)
to address the significance of the periodicity.
107
Table 7: OLS-regression results (filtered volatility).Period I
Levels Coefficient Std. error t-value P(> |t|) bd2− 10 −0.44833 0.03343 −13.41 3.01e− 06 0.27583
2− 6 −0.33260 0.09553 −3.482 0.040006 0.3337
7− 10 −0.54285 0.01986 −27.34 0.00134 0.228575Period III
Levels Coefficient Std. error t-value P(> |t|) bd2− 10 −0.54679 0.01549 −35.29 3.81e− 09 0.226605
2− 6 −0.621757 0.006855 −90.7 2.95e− 06 0.1891215
7− 10 −0.46923 0.03197 −14.68 0.00461 0.265385
Table 8: Statistical key figures of time-varying long-memory (filtered volatility).Period I
Min. 1st Q. Med. Mean 3rd Q. Max.
−0.3737 0.2591 0.3464 0.3415 0.4300 1.1873
Jarque—Bera
X2 df p
12.322 2 0.0021Period III
Min. 1st Q. Med. Mean 3rd Q. Max.
−0.3446 0.1992 0.2803 0.2682 0.3481 0.7073
Jarque—Bera
X2 df p
5.3449 2 0.0691
is from 0.3117 to 0.3464 and in Period III from 0.2505 to 0.2803. Notice that this increase is
more modest than in the global estimation and that the global estimates and the medians
of local estimates continue to differ from each other. In Period I the FFF method has
increased the amplitude of some big returns, thus making the range of the estimates of
d(t) wider and the unconditional distribution only nearly Gaussian. But in between the
jumps (which are less frequent than in the original return series) there now exist steadier
periods of growth (see the left-hand part of Fig. 31). Indeed, the 1st and 3rd quantile
confirm that the unconditional distribution is more concentrated around the mean value.
In Period III there are no enlarged outliers and the path of the time-varying d(t) is, in
general, more stable too (see the right-hand part of Fig. 31). The range of filtered returns
has there become only a bit wider and the unconditional distribution remains Gaussian.
108
Period I
Time
d(t)
0 5000 15000 25000 35000
0.0
0.5
1.0
d(t)
Freq
uenc
y
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
010
0030
0050
00Period III
Time
0 5000 15000 25000
-0.2
0.0
0.2
0.4
0.6
d(t)
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8
010
0030
0050
00
Figure 31: Unconditional distributions of the estimate of d(t).
109
8 Conclusions
This licentiate thesis has presented wavelet methodology in a manner that should be
accessible to econometricians. First, wavelet theory is contrasted with the more traditional
frequency-oriented Fourier analysis in a deterministic environment. It is then argued that
the local adaptiviness of wavelets make them ideally suited for analyzing high-frequency
data where most of the energy is located in jumps, clusters, and other non-stationarities.
Finally, wavelets are applied in a stochastic environment. Using the liquid stock of Nokia
as an example, it is shown that a wavelet multiresolution analysis can give new useful
insight into the dynamics (in this case the energy) of a key variable in finance: volatility.
This is motivated by the hypothesis that players in stock markets operate at multiple
time-scales with different impacts on the whole.
One of the main findings is that wavelet variances at specific intradaily levels dif-
fer from each other in the turbulent IT-bubble period and its more tranquil aftermath.
This is partly because the former is characterized by a larger number of jumps (that are
attributable to overnight returns) and a stronger New York effect. There is also some
evidence of stronger long-memory in the bubble period. Furthermore, this period experi-
enced multi-scaling so that the traditional time-scale invariant square-root scaling would
have been improper; intraday speculators and long-term investors faced inherently differ-
ent dynamics (in the sense of energy) that should be accounted for in risk management.
The removal of the intradaily periodicity does not change these findings qualitatively so
they are indeed genuine differences between the two periods. Another finding is that in the
aftermath period the scaling law behaves in a nicer fashion implying that the form of the
scaling law is not time-invariant. The intimate relationship between the scaling law and
the fractional differencing parameter d then implies that long-memory in volatility is not
time-invariant either. Standard models of long-memory do not take this into account. On
top of that, the removal of the volatility periodicity increases the global estimate in both
periods. This increase is larger in the bubble period supposedly because of its originally
stronger periodicity. This suggests that the periodicity must be taken into account too.
A local analysis is applied to have a closer look at the time-varying characteristics of the
scaling law and long-memory in volatility. The results support time-varying long-memory
110
over medium length periods (months, say). A locally stationary stochastic volatility model
with time-varying parameters might therefore be a good choice for these and longer hori-
zons. As expected, the removal of the volatility periodicity stabilizes the local estimate
to some extent and increases its median. The bubble period retains its stronger long-
memory property, giving further support to the original conjecture of different long-range
dynamics. Again, the significant effect of the removal of the periodicity implies that the
estimation of d by the OLS-based wavelet method is not robust with respect to intraday
volatility seasonalities. Therefore the periodicity should be first taken care of by some
suitable method. In this thesis the Fourier flexible form was considered adequate but it
might well be improved upon by methods that better account for the inherent roughness
of stock market volatility.
Although specific dynamics and the underlying reason for long-memory are not ad-
dressed here in detail, it is probable that the reported higher market activity and hype
caused the bubble period to experience stronger long-memory. While the long-memory
may be argued to be spurious and caused by structural breaks or a trend, the division to
two different periods and the embedded differencing operations of wavelet filters minimize
the ambiguity and support the hypothesis of a true phenomena. The precise identification
of structural breaks and the separation of a trend could however be helpful in evaluating
the economic significance of the estimates. These can be achieved by wavelet methods too
and it is left for future research.
111
.
References
[1] Ait-Sahalia, Yacine (2003): Disentangling Volatility from Jumps. NBER work-
ing paper 9915. Http://www.nber.org/papers/W9915.
[2] Abry, P. — Goncalves, P. — Flandrin, P. (1993): Wavelet-Based Spectral
Analysis of 1/f processes. IEEE International Conference on Acoustics, Speech and
Signal Processing, Munich, Germany.
[3] Abry, P. — Veitch, D. (1998): Wavelet Analysis of Long-Range-Dependent Traf-
fic. IEEE Transactions on Information Theory 44, 2—15.
[4] Andersen, Torben G. — Bollerslev, Tim (1997a): Answering the Critics:
Yes, ARCH Models Do Provide Good Volatility Forecasts. NBER working paper
6023. Http://www.nber.org/papers/w6023.
[5] – (1997b): Heterogenous Information Arrivals and Return Volatility Dynamics:
Uncovering the Long-run in High Frequency Returns. Journal of Finance 52, 975—
1005.
[6] – (1997c): Intraday Periodicity and Volatility Persistance in Financial Markets.
Journal of Empirical Finance 4, 115—158.
[7] – (1998): Deutsche Mark-Dollar Volatility: Intraday Activity Patterns, Macroe-
conomic Announcements, and Longer Run Dependencies. Journal of Finance 53,
219—265.
[8] Andersen, Torben G. — Bollerslev, Tim — Diebold, Francis X. — Labys,
Paul (2000): The Distribution of Realized Exchange Rate Volatility. NBER working
paper 6961. Http://papers.ssrn.com/sol3/papers.cfm?abstract id=210889.
[9] Andersen, Torben G. — Bollerslev, Tim — Diebold, Francis X.
(2003): Some Like it Smooth, and Some Like it Rough: Untangling Con-
112
tinuous and Jump Components in Measuring, Modeling, and Forecasting As-
set Return Volatility. Working paper, Wharton Financial Insitutions Center.
Http://fic.wharton.upenn.edu/fic/papers/03/p0333.html.
[10] Atkins, Frank J. — Sun, Zhen (2003): Using Wavelets to Un-
cover the Fisher Effect. Discussion paper 2003-09, University of Calgary.
Http://econ.ucalgary.ca/research/WP2003-09.pdf.
[11] Bachman, George — Narici, Lawrence — Beckenstein, Edward (2000):
Fourier and Wavelet Analysis. Springer-Verlag, New York, USA.
[12] Bai, Xuezheng — Russell, Jeffrey R. — Tiao, George C.
(2001): Beyond Merton’s Utopia (I): Effects of Non-Normality and De-
pendence on the Precision of Variance Estimates Using High-Frequency
Financial Data. Working paper, University of Chicago. Http://gsb-
www.uchicago.edu/fac/jeffrey.russell/research/variance.pdf.
[13] Baillie, Richard T. (1996): Long Memory Processes and Fractional Integration
in Econometrics. Journal of Econometrics 73, 5—59.
[14] Barndorff-Nielsen, Ole E. — Shephard, Neil (2003a): Power and Bipower
Variation with Stochastic Volatility and Jumps. Manuscript, Oxford University.
[15] – (2003b): Econometrics of Testing for Jumps in Financial Economics Using
Bipower Variation. Manuscript, Oxford University.
[16] Bayraktar, Erhan — Poor, H. Vincent — Sircar, K. Ron-
nie (2000): Estimating the Fractal Dimension of the S&P 500 In-
dex Using Wavelet Analysis. Submitted manuscript, Princeton University.
Http://www.math.lsa.umich.edu/˜erhan/SP500.pdf.
[17] Beran, Jan (1994): Statistics for Long Memory Processes, Volume 61 of Mono-
graphs on Statistics and Applied Probability. Chapman and Hall, New York.
[18] Bhattacharya, R.N. — Gupra, V.K. — Waymire, E. (1983): The Hurst Effect
under Trends. Journal of Applied Probability 20, 649—662.
113
[19] Black, Fischer (1976): Studies of Stock Price Volatility Changes. Proceedings of
the 1976 Meetings of the American Statistical Association, Business and Economics
Statistics Section, 177—181.
[20] Blair, Bevan J. — Poon, Ser-Huang — Taylor, Stephen J. (2001): Forecast-
ing S&P 100 Volatility: the Incremental Information Content of Implied Volatilities
and High-Frequency Stock Returns. Journal of Econometrics 105, 5—26.
[21] Bollerslev, Tim (1986): Generalized Autoregressive Conditional Heteroskedas-
ticity. Journal of Econometrics 31, 307—327
[22] – (2001): Financial Econometrics: Past Developments and Future Challenges.
Journal of Econometrics 100, 41—51.
[23] Bollerslev, Tim — Chou, Ray Y. — Kroner, Kenneth F. (1992): ARCH
Modeling in Finance; A Review of the Theory and Empirical Evidence. Journal of
Econometrics 52, 5—59.
[24] Bollerslev, Tim — Engle, Robert F. — Nelson, Daniel B. (1994): ARCH
Models. In Engle and McFadden (eds.), Handbook of Econometrics, Vol. IV, 2959—
3038. Elsevier Science.
[25] Bollerslev, Tim — Cai, Jun — Song, Frank M. (2000): Intraday Periodic-
ity, Long Memory Volatility, and Macroeconomic Announcement Effects in the US
Treasury Bond Market. Journal of Empirical Finance 7, 37—55.
[26] Bollerslev, Tim — Mikkelsen, Hans-Ole (1996): Modeling and Pricing Long-
Memory in Stock Market Volatility. Journal of Econometrics 73, 151—184.
[27] Bollerslev, Tim — Wright, Johathan H. (2000): Semiparametric Estimation
of Long-Memory Volatility Dependencies: The Role of High-Frequency Data. Journal
of Econometrics 98, 81—106.
[28] Bougerol, Philippe — Picard, Nico (1992): Stationarity of GARCH Processes
and of Some Nonnegative Time Series. Journal of Econometrics 52, 115—127.
114
[29] Breidt, F. Jay — Crato, Nuno — de Lima, Pedro (1998): The Detection and
Estimation of Long Memory in Stochastic Volatility. Journal of Econometrics 83,
325—348.
[30] Broto, Carmen — Ruiz, Esther (2002): Estimation Methods for Stochastic
Volatility Models: A Survey. Working paper 02-54 (14), Universidad Carlos III de
Madrid.
[31] Bruce, A.G. — Gao, H.-Y. (1996): Applied Wavelet Analysis with S-PLUS.
Springer, New York.
[32] Campbell, John Y. — Lo, Andrew W. — MacKinlay, A. Craig (1997): The
Econometrics of Financial Markets. Second corrected printing, Princeton University
Press.
[33] Carnero, M. Angeles — Pena, Daniel — Ruiz, Esther (2001): Is Stochastic
Volatility More Flexible than GARCH? Working paper 01-08, Universidad Carlos
III de Madrid.
[34] Chew, Cyrene (2001): The Money and Income Relationship of European Coun-
tries by Time Scale Decomposition Using Wavelets. Preliminary version, New York
University. Http://homepages.nyu.edu/˜cc486/paper.pdf.
[35] Clark, Peter K. (1973): A Subordinated Stochastic Process Model with Finite
Variance for Speculative Prices. Econometrica 41, 135—155.
[36] Cont, Rama (2001): Empirical Properties of Asset Returns: Stylized Facts and
Statistical Issues. Quantitative Finance 1, 223—236.
[37] Cooley, James W. — Tukey, John W. (1965): An Algorithm for the Machine
Calculation of Complex Fourier Series. Mathematics of Computation 19, 297—301.
[38] Corsi, Fulvio — Zumbach, Gilles — Muller, Ulrich — Dacorogna,
Michel (2001): Consistent High-Precision Volatility from High-Frequency Data.
Working paper, Olsen Ltd. Http://www.olsen.ch/research/working papers.html.
115
[39] Craigmile, Peter F. — Percival, Donald B. (2002): Asymptotic Decor-
relation of Between-Scale Wavelet Coefficients. Submitted for review, Ohio State
University. Http://www.stat.ohio-state.edu/˜pfc/research/papers/decorrelate.pdf.
[40] Craigmile, Peter F. — Percival, Donald B. — Guttorp, Peter (2000):
The Impact of Wavelet Coefficient Correlations on Fractionally Differenced Process
Estimation. Technical report 049, The National Research Center for Statistics and
the Environment. Http://www.nrcse.washington.edu/pdf/trs49 wave.pdf.
[41] Craigmile, Peter F. — Guttorp, Peter — Percival, Donald B. (2004):
Wavelet Based Estimation for Trend Contaminated Fractionally Differenced Pro-
cesses. Technical report 077, The National Research Center for Statistics and the
Environment. Http://www.nrcse.washington.edu/pdf/trs77.pdf.
[42] Dacorogna, Michel M. — Muller, Ulrich A. — Nagler, Robert J. —
Olsen, Richard B. — Pictet, Olivier V. (1993): A Geographical Model for
the Daily and Weekly Seasonal Volatility in the Foreign Exchange Market. Journal
of International Money and Finance 12, 413—438.
[43] Dacorogna, Michel M. — Gencay, Ramazan — Muller, Ulrich A. —
Olsen, Richard B. — Pictet, Olivier V. (2001): An Introduction to High-
Frequency Finance. Academic Press.
[44] Dahlhaus, R. (1996): On the Kullback—Leibler Information Divergence of Locally
Stationary Processes. Stochastic Processes and their Applications 62, 139—168.
[45] – (1997): Fitting Time Series Models to Nonstationary Processes. The Annals of
Statistics 25, 1—37.
[46] Daubechies, I. (1988): Orthonormal Bases of Compactly Supported Wavelets.
Communications of Pure and Applied Mathematics 41, 909—996.
[47] – (1990): The Wavelet Transform, Time-Frequency Localization and Signal Anal-
ysis. IEEE Transactions on Information Theory 36, 961—1005.
116
[48] Davidian, Marie — Carroll, Raymond J. (1987): Variance Function Estima-
tion. Journal of American Statistical Association 82, 1079—1091.
[49] Deo, Rohit S. — Hurvich, Clifford M. (2001): On the Log Periodogram Re-
gression Estimator of the Memory Parameter in Long Memory Stochastic Volatility
Models. Econometric Theory 17, 686—710.
[50] Diebold, Francis X. (2004): The Nobel Memorial Prize for Robert
F. Engle. Working paper, Wharton Financial Institutions Center.
Http://fic.wharton.upenn.edu/fic/papers/04/p0409.html.
[51] Diebold, Francis X. — Nerlove, Marc (1989): The Dynamics of Exchange
Rate Volatility: A Multivariate Latent Factor ARCH Model. Journal of Applied
Econometrics 4, 1—21.
[52] Diebold, Francis X. — Hickman, Andrew — Inoue, Atsushi — Schuer-
mann, Til (1997): Converting 1-Day Volatility to h-Day Volatility: Scaling by√h
is Worse than You Think. Working paper, Wharton Financial Institutions Center.
Http://fic.wharton.upenn.edu/fic/wfic.html.
[53] Diebold, Francis X. — Inoue, Atsushi (2001): Long Memory and Regime
Switching. Journal of Econometrics 105, 131—159.
[54] Dijkerman, R. — Mazumdar, R. (1994): On the Correlation Structure of Wavelet
Coefficients of Fractional Brownian Motion. IEEE Transactions on Information The-
ory 40, 1609—1612.
[55] Ding, Zhuanxin — Granger, Clive W.J. — Engle, Robert F. (1993): A
Long Memory Property of Stock Market Returns and a New Model. Journal of
Empirical Finance 1, 83—106.
[56] Ding, Zhuanxin — Granger, Clive W.J. (1996): Modeling Volatility Per-
sistance of Speculative Returns: A New Approach. Journal of Econometrics 73,
185—215.
117
[57] Donoho, David L. (1992): De-Noising via Soft-Thresholding. Tech-
nical report, Department of Statistics, Stanford University. Http://www-
stat.stanford.edu/˜donoho/reports.html.
[58] Donoho, David L. — Johnstone, Iain M. (1992): Ideal Spatial Adaptation via
Wavelet Shrinkage. Technical report, Department of Statistics, Stanford University.
Http://www-stat.stanford.edu/˜donoho/reports.html.
[59] Donoho, David L. — Johnstone, Iain M. — Kerkyacharian, Gerard
— Picard, Dominique (1993): Density Estimation by Wavelet Thresholding.
Technical report, Department of Statistics, Stanford University. Http://www-
stat.stanford.edu/˜donoho/reports.html.
[60] – (1995): Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society,
Series B (Methodological) 57, 301—369.
[61] Engle, Robert F. (1973): Band Spectrum Regression. International Economic
Review 15, 1—11.
[62] – (1982): Autoregressive Conditional Heteroskedasticity with Estimates of the
Variance of United Kingdom Inflation. Econometrica 50, 987—1008.
[63] – (2000): The Econometrics of Ultra High Frequency Data. Econometrica 68, 1—22.
[64] Engle, Robert F. — Bollerslev, Tim (1986): Modelling the Persistence of
Conditional Variances. Econometric Reviews 5, 1-50.
[65] Engle, Robert F. — Ito, Takatoshi — Lin, Wen-Ling (1990): Meteor Showers
or Heat Waves? Heteroskedastic Intra Daily Volatility in the Foreign Exchange
Market. Econometrica 58, 525—542.
[66] Engle, Robert F. — Mustafa, Chowdhury (1992): Implied ARCH Models
from Options Prices. Journal of Econometrics 52, 289—311.
[67] Engle, Robert F. — Ng, Victor K. (1993): Measuring and Testing the Impact
of News on Volatility. Journal of Finance 48, 1749—1778.
118
[68] Engle, Robert F. — Russell, J.R. (1998): Autoregressive Conditional Duration:
A NewModel for Irregularly Spaced Transaction Data. Econometrica 66, 1127—1162.
[69] Fama, Eugene F. (1965): The Behavior of Stock Market Prices. Journal of Busi-
ness 38, 34—105.
[70] – (1970): Efficient Capital Markets: A Review of Theory and Empirical Work.
Journal of Finance 25, 383—417.
[71] Felixson, Karl (2004): Finnish Short-Term Stock Returns. PhD thesis, Swedish
School of Economics and Business Administration.
[72] Flandrin, Patrick (1992): Wavelet Analysis and Synthesis of Fractional Brown-
ian Motion. IEEE Transactions on Information Theory 38, 910—917.
[73] Fung, William K.H. — Hsieh, David A. (1991): Empirical Analysis of Implied
Volatility: Stocks, Bonds and Currencies. Working paper, Fuqua School of Business.
Http://faculty.fuqua.duke.edu/˜dah7/WP1991.pdf.
[74] Gallant, A. Ronald (1981): On the Bias in Flexible Functional Forms and an
Essentially Unbiased Form. Journal of Econometrics 15, 211—245.
[75] – (1982): Unbiased Determination of Production Technologies. Journal of Econo-
metrics 20, 285—323.
[76] Garman, Mark B. — Klass, Michael J. (1980): On the Estimation of Security
Price Volatilities from Historical Data. Journal of Business 53, 67—78.
[77] Gencay, Ramazan — Selcuk, Faruk — Whitcher, Brandon (2001): Scaling
Properties of Foreign Exchange Volatility. Physica A 289, 249—266.
[78] – (2002a): An Introduction to Wavelets and Other Filtering Methods in Finance
and Economics. Academic Press.
[79] – (2002b): Asymmetry of Information Flow Between Volatili-
ties Across Time Scales. Submitted manuscript, University of Wind-
119
sor, Bilkent University, National Center for Atmospheric Research.
Http://www.cgd.ucar.edu/˜whitcher/papers/whmm.pdf.
[80] – (2002c): Robustness of Systematic Risk Across Time Scales. Submitted
manuscript, University of Windsor, Bilkent University, National Center for Atmo-
spheric Research. Http://www.sfu.ca/˜rgencay/jarticles/jimf-capm.pdf.
[81] Geweke, J. — Porter-Hudak, S. (1983): The Estimation and Application of
Long Memory Time Series Models. Journal of Time Series Analysis 4, 221—238.
[82] Ghysels, Eric — Harvey, Andrew — Renault, Eric (1995):
Stochastic Volatility. Working paper, CIRANO Scientific Series.
Http://www.cirano.qc.ca/pdf/publication/95s-49.pdf.
[83] Ghysels, Eric — Santa-Clara, Pedro — Valkanov, Rossen (2003): Predict-
ing Volatility: Getting the Most out of Return Data Sampled at Different Frequen-
cies. Presented at the Conference ”New Frontiers of Financial Volatility Modeling”,
Florence, Italy, May 25—27.
[84] Goodhart, Charles A.E. — O’Hara, Maureen (1997): High Frequency Data
in Financial Markets: Issues and Applications. Journal of Empirical Finance 4,
73—114.
[85] Gourieroux, Christian — Jasiak, Joann (2001): Financial Econometrics.
Princeton University Press, Princeton.
[86] Granger, Clive W.J. — Joyeux, R. (1980): An Introduction to Long Memory
Time Series Models and Fractional Differencing. Journal of Time Series Analysis 1,
15—29.
[87] Granger, Clive W.J. — Ding, Zhuanxin (1996): Varieties of Long Memory
Models. Journal of Econometrics 73, 61—77.
[88] Granger, Clive W.J. — Hyung, Namwon (1999): Occasional Structural Breaks
and Long Memory. Discussion paper 99-14, University of California, San Diego.
Http://www.econ.ucsd.edu/papers/files/ucsd9914.pdf.
120
[89] Guillaume, D.M. — Dacorogna, M.M. — Dave, R.D. — Muller, U.A. —
Olsen, R.B. — Pictet, O.V. (1997): From the Bird’s Eye to the Microscope: a
Survey of New Stylized Facts of the Intra-Daily Foreign Exchange Markets. Finance
and Stochastics 1, 95—129.
[90] Hamilton, James D. (1994): Time Series Analysis. Princeton University Press.
[91] Harris, Lawrence (1986): A Transaction Data Study of Weekly and Intradaily
Patterns in Stock Returns. Journal of Financial Economics 16, 99—117.
[92] Harvey, Andrew (1998): Long Memory in Stochastic Volatility. In Stephen
Satchell and John Knight (eds.), Forecasting Volatility in Financial Markets, 307—
320. Butterworth-Heinemann, Oxford.
[93] Harvey, Andrew — Ruiz, Esther — Shephard, Neil (1994): Multivariate
Stochastic Variance Models. Review of Economic Studies 61, 247-64
[94] Hentschel, Ludger (1995): All in the Family Nesting Symmetric and Asymmet-
ric GARCH Models. Journal of Financial Economics 39, 71—104.
[95] Hess-Nielsen, N. — Wicherhauser, M.V. (1996): Wavelets and Time-
Frequency Analysis. Proceeding of the IEEE 84, 523—540.
[96] Hogan, Kedreth C. — Melvin, Michael T. (1994): Sources of Meteor Show-
ers and Heat Waves in the Foreign Exchange Market. Journal of International Eco-
nomics 37, 239—247.
[97] Hol, Eugenie — Koopman, Siem Jan (2002): Stock Index Volatility
Forecasting with High Frequency Data. Discussion paper, Tinbergen Institute.
Http://www.tinbergen.nl/discussionpapers/02068.pdf.
[98] Hosking, J.R.M. (1981): Fractional Differencing. Biometrika 68, 165—176.
[99] Hubbard, Barbara Burke (1998): The World According to Wavelets: The Story
of a Mathematical Technique in the Making. Second edition, A.K. Peters, USA.
121
[100] Hurvich, Clifford M. — Beltrao Kaizo I. (1993): Asymptotics for the Low-
Frequency Ordinates of the Periodogram of a Long-Memory Time Series. Journal of
Time Series Analysis 14, 455—472.
[101] Hardle, Wolfgang — Kerkyacharian, Gerard — Picard, Dominique —
Tsybakov, Alexander (1998): Wavelets, Approximation, and Statistical Appli-
cations. Springer, New York.
[102] Jensen, Mark J. (1998): An Approximate Wavelet MLE of Short and Long-
Memory Parameters. Studies in Nonlinear Dynamics & Econometrics 3, Article 5.
[103] – (1999): Using Wavelets to Obtain a Consistent Ordinary Least Squares Estimator
of the Long-Memory Parameter. Journal of Forecasting 18, 17-32.
[104] – (2000): An Alternative Maximum Likelihood Estimator of Long-Memory Pro-
cesses Using Compactly Supported Wavelets. Journal of Economic Dynamics and
Control 24, 361—387.
[105] Jensen, Mark J. — Whitcher, Brandon (2000): Time-
Varying Long-Memory in Volatility: Detection and Estimation with
Wavelets. Technical report, University of Missouri and EURANDOM.
Http://www.cgd.ucar.edu/˜whitcher/papers/vol.pdf.
[106] Kroner, Kenneth F. — Ng, Victor K. (1998): Modeling Asymmetric Comove-
ments of Asset Returns. Review of Financial Studies 11, 817—844.
[107] Lamoureux, Christopher G. — Lastrapes, William D. (1990a): Het-
eroskedasticity in Stock Return Data: Volume versus GARCH Effects. Journal of
Finance 45, 221—229.
[108] – (1990b): Persistance in Variance, Structural Change, and the GARCH Model.
Journal of Business and Economic Statistics 8, 225—234.
[109] Lo, Andrew W. — MacKinlay, A. Craig (1999): A Non-Random Walk Down
Wall Street. Princeton University Press, New Jersey.
122
[110] Lobato, L.N. — Savin, N.E. (1998): Real and Spurious Long-Memory Properties
of Stock-Market Data. Journal of Business and Economic Statistics 16, 261—267.
[111] Lockwood, Larry J. — Linn, Scott C. (1990): An Examination of Stock Mar-
ket Return Volatility During Overnight and Intraday Periods, 1964—1989. Journal
of Finance 45, 591—601.
[112] Lynch, Paul E. — Zumbach, Gilles O. (2003): Market Hetero-
geneities and the Causal Structure of Volatility. Working paper, Olsen Ltd.
Http://www.olsen.ch/research/working papers.html.
[113] Mallat, Stephane G. (1989): A Theory for Multiresolution Signal Decompo-
sition: the Wavelet Representation. IEEE Transactions on Pattern Analysis and
Machine Intelligence 11, 674-693.
[114] – (1998): A Wavelet Tour of Signal Processing. Second edition, Academic Press,
San Diego.
[115] Mandelbrot, Benoit (1963): The Variation of Certain Speculative Prices. Jour-
nal of Business 36, 394—419.
[116] Martens, Martin — Chang, Yuan-Chen — Taylor, Stephen J. (2002): A
Comparison of Seasonal Adjustment Methods when Forecasting Intraday Volatility.
Journal of Financial Research XXV, 283—299.
[117] McCoy, Emma J. — Walden, Andrew T. (1996): Wavelet Analysis and Synthe-
sis of Stationary Long-Memory Processes. Journal of Computational and Graphical
Statistics 5, 26-56.
[118] McKenzie, Michael D. (1999): Power Transformation and Forecasting the Mag-
nitude of Exchange Rate Changes. International Journal of Forecasting 15, 49—55.
[119] Melino, Angelo — Turnbull, Stuart M. (1990): Pricing Foreign Currency
Options with Stochastic Volatility. Journal of Econometrics 45, 239—265.
123
[120] Merton, Robert C. (1980): On Estimating the Expected Return on the Market:
An Exploratory Investigation. Journal of Financial Economics 8, 323—361.
[121] Meyer, Yves (1994): Wavelets: Algorithms and Applications. Society for Indus-
trial and Applied Mathematics.
[122] Mikosch, Thomas — Starica, Catalin (2004): Non-Stationarities
in Financial Time Series, the Long Range Dependence and the
IGARCH Effects. Submitted paper, Chalmers University of Technology.
Http://www.math.chalmers.se/˜starica/15.04.02.lm.thomas.pdf.
[123] Muller, U.A. — Dacorogna, M.M. — Olsen, R.B. — Pictet, O.V. —
Schwarz, M. — Morgenegg, C. (1990): Statistical Study of Foreign Exchange
Rates, Empirical Evidence of a Price Change Scaling Law, and Intraday Analysis.
Journal of Banking and Finance 14, 1189—1208.
[124] Muller, U.A. — Dacorogna, M.M. — Dave, R.D. — Pictet, O.V. — Ward,
R.B. (1993): Fractals and Intrinsic Time, a Challenge to Econometricians. Working
paper, Olsen Ltd. Http://www.olsen.ch/research/working papers.html.
[125] Muller, Ulrich A. — Dacorogna, Michel M. — Dave, Rakhal D. — Olsen,
Richard B. — Pictet, Olivier V. — von Weizsacker Jacob E. (1997):
Volatilities of Different Time Resolutions — Analyzing the Dynamics of Market Com-
ponents. Journal of Empirical Finance 4, 213—239.
[126] Nason, Guy P. — von Sachs, Rainer (1999): Wavelets in Time Series Analysis.
Philosophical Transactions of the Royal Society of London, Series A 357, 2511—2526.
[127] Nelson, D.B. (1988): Time Series Behavior of Stock Market Volatility and Re-
turns. PhD thesis, MIT.
[128] Norsworthy, John R. — Li, Ding — Gorener, Rifat (2000): Wavelet-Based
Analysis of Time Series: An Export from Engineering to Finance. Proceedings of
the 2000 IEEE International Engineering Management Society Conference, Albu-
querque, New Mexico. Http://www.norsworthy.net/papers.php.
124
[129] Ogden, Todd (1997): On Preconditioning the Data for the Wavelet Transform
When the Sample Size is Not a Power of Two. Communications in Statistics B 26,
267—285.
[130] Osler, C.L. (1995): Exchange Rate Dynamics and Speculator Horizons. Journal
of International Money and Finance 14, 695—719.
[131] Parkinson, Michael (1980): The Extreme Value Method for Estimation Variance
of the Rate of Return. Journal of Business 53, 61—65.
[132] Percival, Donald B. — Mojfeld, Harold O. (1997): Analysis of Subtidal
Coastal Sea Level Fluctuations Using Wavelets. Journal of the American Statistical
Association 92, 868—880.
[133] Percival, Donald B. — Walden, Andrew T. (2000): Wavelet Methods for
Time Series Analysis. Cambridge University Press.
[134] Pollock, D.S.G. — Lo Cascio, Iolanda (2003): Orthogonality Conditions for
Non-Dyadic Wavelet Analysis. Manuscript, Queen Mary, University of London.
[135] – (2004): Adapting Discrete Wavelet Analysis to the Circumstances of Economics.
Manuscript, Queen Mary, University of London.
[136] Polzehl, Jorg — Spokoiny, Vladimir — Starica, Catalin (2004): When
Did the 2001 Recession Really Start? Submitted paper, Chalmers University of
Technology. Http://www.math.chalmers.se/˜starica/paper2004 5 3.pdf.
[137] Poon, Ser-Huang — Granger, Clive W.J. (2003): Forecasting Volatility in
Financial Markets: A Review. Journal of Economic Literature XLI, 478—539.
[138] Poterba, James M. — Summers, Lawrence H. (1986): The Persistance of
Volatility and Stock Market Fluctuations. The American Economic Review 76,
1142—1151.
[139] Press, William H. — Teukolsky, Saul A. — Vetterling, William
T. — Flannery, Brian P. (1992): Numerical Recipes in Fortran 77:
125
The Art of Scientific Computing. Cambridge University Press. Online version:
http://www.library.cornell.edu/nr/bookfpdf.html.
[140] Priestley, M.B. (1988): Nonlinear and Nonstationary Time Series Analysis. Aca-
demic Press, London.
[141] – (1992): Spectral Analysis and Time Series. Academic Press, San Diego.
[142] – (1996): Wavelets and Time-Dependent Spectral Analysis. Journal of Time Series
Analysis 17, 85—103.
[143] Ramsey, James B. (1996): The Contribution of Wavelets to the Analy-
sis of Economic and Financial Data. Unpublished paper, New York University.
Http://www.econ.nyu.edu/user/ramseyj/publish/publish.htm.
[144] – (2002): Wavelets in Economics and Finance: Past and Future. Research report,
New York University. Http://www.econ.nyu.edu/cvstarr/working/2002/RR02-
02.PDF.
[145] Ramsey, James B. — Lampart, Camille (1998a): Decomposition of Economic
Relationships by Time Scale Using Wavelets: Money and Income. Macroeconomic
Dynamics 2, 49—71.
[146] – (1998b): The Decomposition of Economic Relationships by Time Scale Using
Wavelets: Expenditure and Income. Studies in Nonlinear Dynamics and Economet-
rics 3, 23—42.
[147] Ray, Bonnie K. — Tsay, Ruey S. (2000): Long-Range Dependence in Daily
Stock Volatilities. Journal of Business and Economic Statistics 18, 254—262.
[148] Renaud, O. — Starck, J.-L. — Murtagh, F. (2002): Wavelet-Based Forecasting
of Short and Long Memory Time Series. Cahiers du departement d’econometrie,
Faculte des sciences economiques et sociales, Universite de Geneve.
[149] Rioul, O. (1992): Simple Regularity Criteria for Subdivision Schemes. SIAM Jour-
nal on Mathematical Analysis 23, 1544—1576.
126
[150] Robinson, Peter M. (1995): Log-Periodogram Regression of Time Series with
Long Range Dependence. Annals of Statistics 23, 1048—1072.
[151] Roll, Richard (1984): A Simple Implicit Measure of the Effective Bid-Ask Spread
in an Efficient Market. Journal of Finance 39, 1127—1139.
[152] Schleicher, Christoph (2002): An Introduction to Wavelets for Economists.
Working paper 2002-3, Monetary and Financial Analysis Department, Bank of
Canada. Http://www.bankofcanada.ca/en/res/wp02-3.htm.
[153] Schwert, G. William (1989): Why Does Stock Market Volatility Change Over
Time? Journal of Finance 44, 1115—1153.
[154] Serroukh, A. — Walden, A.T. — Percival D.B. (2000): Statistical Properties
and Uses of the Wavelet Variance Estimator for the Scale Analysis of Time Series.
Journal of the American Statistical Association 95, 184—196.
[155] Shann, W.C. — Yen, C.C. (1999): On the Exact Values of Orthonormal Scaling
Coefficients of Lengths 8 and 10. Applied and Computational Harmonic Analysis 6,
109—112.
[156] Shiryaev, A.N. (1999): Essentials of Stochastic Finance: Facts, Models, and The-
ory. World Scientific, Singapore.
[157] Shleifer, Andrei — Vishny, Robert W. (1990): Equilibrium Short Horizons
of Investors and Firms. American Economic Review 80, Papers and Proceedings of
the Hundred and Second Annual Meeting of the American Economic Association,
148-153.
[158] Steffen, P. — Heller, P.N. — Gopinath, R.A. — Burrus, C.S. (1993): The-
ory of RegularM-band Wavelets. IEEE Transactions on Signal Processing 41, 3497—
3511.
[159] Strang, Gilbert (1993): Wavelet Transforms Versus Fourier Transforms. Bulletin
of American Mathematical Society 28, 288-305.
127
[160] Taylor, Stephen J. (1986): Modeling Financial Time Series. John Wiley & Sons,
New York.
[161] – (1994): Modelling Stochastic Volatility : A Review and Comparative Study.
Mathematical Finance 4, 183-204.
[162] – (2000): Consequences for Option Pricing of a Long Memory in Volatility.
Manuscript, Lancaster University.
[163] Tewfik, A.H. — Kim, M. (1992): Correlation Structure of the Discrete Wavelet
Coefficients of Fractional Brownian Motion. IEEE Transactions on Information The-
ory 38, 904—909.
[164] Tkacz, Greg (2000): Estimating the Fractional Order of Integration of Interest
Rates Using a Wavelet OLS Estimator. Working paper 2000-5, Bank of Canada.
Http://www.bankofcanada.ca/en/res/wp00-5.htm.
[165] Vannucci, Marina — Corradi, Fabio (1999): Covariance Structure of Wavelet
Coefficients: Theory and Models in a Bayesian Perspective. Journal of the Royal
Statistical Society, Series B 61, 971-986.
[166] Velasco, Carlos (1999): Non-Stationary Log-Periodogram Regression. Journal
of Econometrics 91, 325—371.
[167] Vidakovic, Brani (1999): Statistical Modeling by Wavelets. John Wiley & Sons.
[168] Vilasuso, Jon (2002): Forecasting Exchange Rate Volatility. Economics Letters
76, 59—64.
[169] Wasserfallen, W. — Zimmermann, H. (1985): The Behavior of Intraday Ex-
change Rates. Journal of Banking and Finance 9, 55—72.
[170] Whitcher, Brandon — Jensen, Mark J. (2000): Wavelet Estimation of a Local
Long Memory Parameter. Exploration Geophysics 31, 94—103.
[171] Wiggins, James B. (1987): Option Values Under Stochastic Volatility. Journal of
Financial Economics 19, 351—372.
128
[172] Woljtaszczyk, P. (1997): A Mathematical Introduction to Wavelets. Cambridge
University Press, Cambridge, UK.
[173] Wood, Robert A. — McInish, Thomas H. — Ord, J. Keith (1985): An
Investigation of Transaction Data for NYSE Stocks. Journal of Finance 40, 723—
739.
[174] Wright, Jonathan H. (2000): Log-Periodogram Estima-
tion of Long Memory Volatility Dependencies with Conditionally
Heavy Tailed Returns. International finance discussion paper 685.
Http://www.federalreserve.gov/pubs/ifdp/2000/685/ifdp685.pdf.
129
A Appendix: Some functional analysis
The following can be found from any basic book on functional analysis (e.g. Bachman et
al. (2000)).
For 1 ≤ p < ∞, the collection of pth power integrable functions on E (a Lebesgue-
measurable set), denoted Lp(E), is equipped with the p-norm kfkp:
Lp(E) =
½f : E →K :
ZE
|f(t)|p dt <∞¾,
kfkp =¯ZE
|f(t)|p dt¯1/p
.
If the interval is finite, then E = [a, b] and if the interval is the whole real line, then
E = R. These cases are denoted by Lp[a, b] and Lp(R), respectively, but because usually
p = 1 or 2, one has L1(R) or L2(R).
For 1 ≤ p < ∞, the collection of pth power summable sequences, denoted `p, isequipped with the p-norm kxkp, x = (an) ∈ `p:
`p(N) = `p =
((an) ∈ K∞ :
Xn∈N
|an|p <∞),
kxkp =ÃXn∈N
|an|p!1/p
.
For f, g ∈ L2(R), the inner product is given by
hf, gi =Z +∞
−∞f(t)g(t)dt.
For x = (ai) and y = (bi) in Kn, the inner product is given by
hx, yi =nXi=1
aibi.
Notice that two vectors x and y are said to be orthogonal if hx, yi = 0. A sequence(xn) in Hilbert space is orthonormal if
hxm, xni = δmn =
1, (m = n)
0, (m 6= n)∀m,n.
130
Example 23 A well-known fact in Fourier analysis is that sines and cosines form an
orthonormal sequence in L2[−π,π] :Z π
−πeinte−imtdt = 2πδmn = 2π
1, (m = n)
0, (m 6= n)∀m,n ∈ Z.
Definition 24 Cauchy sequence. A sequence (an) in a metric space (X, d) is called a
Cauchy sequence if d(ai, aj)→ 0, i, j →∞.
Definition 25 Completeness. A metric space (X, d) is called complete if every
Cauchy sequence converges, i.e., if there exists a point a in X such that for every Cauchy
sequence (an) in X it holds that d(aj, a)→ 0, j →∞.
Definition 26 Hilbert space. Complete inner product space is called Hilbert space.
Consider two (complete) subspaces M and N of the inner product space X such that
M ∩N = {0}. If any vector x in X =M +N can be written as
x = m+ n, m ∈M, n ∈ N,
then X =M ⊕N is called the (internal) direct sum.
It can be proved that an inner product space X is separable if and only if it has
a complete orthonormal sequence (xn), i.e. a sequence (xn) that is an orthonormal basis.
So in every separable Hilbert space there exists an orthonormal basis. In such a space any
x ∈ X can be uniquely written in the form
x =Xn∈N
hx, xnixn
for any orthonormal basis (xn). The values hx, xni are called the Fourier coefficientsof x and the series
Pn∈N hx, xni is called the Fourier series for x. So in Hilbert spaces
one recovers the ability to write a vector as the sum of its projections ”on the basis
vectors”. For instance, any f ∈ L2(R) can be written with orthogonal wavelets ψj,k asf =
PPj,k∈Z
f,ψj,k
®ψj,k in k·k2 (Bachman et al. (2000, p. 419)).
131
B Appendix: Orthonormal transforms
This section borrows from Percival and Walden (2000, Ch. 3.1).
Let O be an orthonormal real-valued N ×N matrix, i.e. OTO = IN . Let Oj• and O•krefer to the jth row vector and kth column vector, respectively. Then
[N×N ]O =
OT0•
OT1•...
OTN−1•
= [O•0,O•1, ...,O•N−1].One can use this matrix to analyze an arbitrary real-valued time series, given by the
column vector x, in the following way:
[N×1]O = Ox =
OT0•
OT1•...
OTN−1•
x =
OT0•x
OT1•x...
OTN−1•x
=
hx,O0•ihx,O1•i...
hx,ON−1•i
,since OT
j•x =Oj•,x® = x,Oj•® . The column vector O consists of the transform coeffi-
cients for x with respect to the orthonormal transform O. Specifically, the jth transformcoefficient Oj is given by the inner product
x,Oj•
®.
On the other hand, premultiplying both sides of the above equation by OT , and using
orthonormality, one can synthesize the time series x as
[N×1]x =OTO = [O0•,O1•, ...,ON−1•]
O0
O1...
ON−1
=N−1Xj=0
OjOj•.
Furthermore, since Oj =x,Oj•
®, it is possible to re-express the time series x as a unique
linear combination of O0•,O1•, ...,ON−1•:
x =N−1Xj=0
x,Oj•
®Oj•.That is, one has recovered the ability to write a vector as the sum of its projections on
the basis vectors.
132
C Appendix: Fractional differencing and long-memory
The fractional differencing operator (1 − B)d is formally defined by its infiniteMaclaurin series expansion,
(1−B)d .=∞Xk=0
Γ(k − d)Γ(k + 1)Γ(−d)B
k,
where B and Γ(·) denote the lag-operator and the gamma function, respectively (e.g.Breidt et al. (1998, p. 328)). A real-valued discrete parameter fractional ARIMA
(ARFIMA) process {Xt} is often defined with a binomial series expansion (Gencay etal. (2002a, p. 163)),
(1−B)dXt .=∞Xk=0
µd
k
¶(−1)kXt−k.
where µa
b
¶.=
a!
b!(a− b)! =Γ(a+ 1)
Γ(b+ 1)Γ(a− b+ 1) .
These models were introduced by Granger and Joyeux (1980) and Hosking (1981).
In ARFIMA models, the ”long-memory” dependency is characterized solely by the
fractional differencing parameter d. A time series is said to exhibit long-memory
when it has a covariance function γ(j) and a spectrum f(λ) such that they are of the
same order as j2d−1 and λ−2d, as j →∞ and λ → 0, respectively.63 For 0 < d < 1/2, an
ARFIMA model exhibits long-memory, and for −1/2 < d < 0 it exhibits antipersistance.In practice, the range |d| < 1/2 is of particular interest because then an ARFIMA modelis stationary and invertible (Hosking (1981)).
More detailed definitions of long-memory can be found in Beran (1994), for example.
Concerning fractionally integrated processes in econometrics, see Baillie (1996).
63The rate of decay in covariance does not necessarily imply the rate of decay in spectrum, as noted in
Bollerslev and Wright (2000, p. 87). Formal conditions for the equivalance are discussed in Beran (1994),
for example (see also Granger and Ding (1996)).
133
D Appendix: Locally stationary process
Dahlhaus (1996, 1997) defines a locally stationary process Xt,T (t = 0, 1, ..., T − 1) as thetriangular array, with transfer function A0, drift µ, and spectral representation
Xt,T = µ (t/T ) +
Z π
−πeiωtA0t,T (ω)dZ(ω),
where the components satisfy certain technical conditions (see Dahlhaus (1997, Def. 2.1)
or Jensen and Whitcher (2000)). For example, autoregressive processes with time-varying
coefficients are locally stationary (Dahlhaus (1996, Th. 2.3)).
Jensen and Whitcher (2000) give another example of a locally stationary process. It is
constructed by considering a stationary, invertible moving average process Yt with spectral
representation
Yt =
Z π
−πeiωtA(ω)dZ(ω),
where the transfer function is A(ω) = (1 + θe−iωt) /2π and |θ| < 1. If the process Xt,T isnow defined as
Xt,T = µ (t/T ) + σ (t/T )Yt,
where µ,σ : [0, 1]→ R are continuous functions, then Xt,T is a locally stationary process
with the time varying transfer function
A(u,ω) = A0t,T (ω) =σ(u)
2π
¡1 + θe−iωt
¢.
Thus the time-path of Xt,T exhibits the periodic behavior of a stationary moving average
process but with time-varying amplitude equal to σ(u).
134
E Appendix: Fourier flexible form
Following Andersen and Bollerslev (1997c, 1998), intraday returns can be decomposed as
rt,n = E (rt,n) +σtst,nZt,n√
N,
where in their notation N refers to the number of return intervals n per day (i.e. not to
the total length of the series!), σt is the daily volatility factor, and Zt,n is an IID random
variable with mean 0 and variance 1. Notice that st,n, the periodic component for the nth
intraday interval, depends on the characteristics of trading day t. By then squaring both
sides and taking logarithms, define xt,n to be
xt,n.= 2 log [|rt,n − E (rt,n)|]− log σ2t + logN = log s2t,n + logZ
2t,n,
so that xt,n consists of a deterministic and a stochastic component.
The modeling of xt,n is done via non-linear regression in n and σt,
xt,n = f (θ;σt, n) + ut,n,
where ut,n.= logZ2t,n − E
¡logZ2t,n
¢is an IID random variable with mean 0. In practice,
the estimation of f is implemented by the following parametric expression:
f (θ;σt, n) =JXj=0
σjt
"µ0j + µ1j
n
N1+ µ2j
n2
N2+
DXi=1
λij1n=di
+PXp=1
µγpj cos
pn2π
N+ δpj sin
pn2π
N
¶#,
where N1.= (N + 1)/2 and N2
.= (N + 1)(N + 2)/6 are normalizing constants. If one sets
J = 0 andD = 0, then this reduces to the standard FFF proposed by Gallant (1981, 1982).
The trigonometric functions are ideally suited for smoothly varying patterns. Andersen
and Bollerslev (1997c) have argued that in equity markets allowing for J ≥ 1 might beimportant, however. By including cross-terms in the regression allows st,n to depend on
the overall level of volatility on trading day t which is often the case in stock market
data. The actual estimation of f is most easily accomplished using a two-step procedure
described in Andersen and Bollerslev (1997c, App. B).
135
The normalized estimator of the intraday periodic component for interval n on day t
is found to be
bst,n = T exp³ bft,n/2´P[T/N ]
t=1
PNn=1 exp
³ bft,n/2´ , (23)
where T is the total length of the sample and [T/N ] denotes the number of trading days.
The filtered returns (returns free from the volatility periodicity) are then obtained via
ert,n .= rt,n/bst,n.
136
F Appendix: List of abbreviations
Technical abbreviations used in this thesis are for the most part standard:
ACF Autocorrelation function
ARFIMA Autoregressive fractionally integrated moving average [process]
ARSV Autoregressive stochastic volatility [process]
CWT Continuous wavelet transform
D(L) Daubechies extremal phase filter of length L
DFT Discrete Fourier transform
DGP Data generating process
DWT Discrete wavelet transform
FFF Fourier flexible form
FFT Fast Fourier transform
FIR Finite impulse response
FRF Frequency response function (or transfer function)
FWT Fast wavelet transform
(G)ARCH (Generalized) Autoregressive conditional heteroskedastic [process]
GPH Geweke—Porter-Hudak
IID Independent and identically distributed
LA(L) Daubechies least asymmetric filter of length L
LMSV Long memory stochastic volatility [process]
MODWT Maximal overlap discrete wavelet transform
MRA Multiresolution analysis
MSE Mean square error
OLS Ordinary least squares
pDWT Partial discrete wavelet transform
QMR Quadrature mirror relationship
SDF Spectral density function (or Fourier spectrum)
SGF Squared gain function
STFT Short-time Fourier transform
137