Licentiate Thesis: A Multiresolution Analysis of Stock Market

Licentiate Thesis:

A Multiresolution Analysis of Stock Market Volatility

Using Wavelet Methodology

Copyrighted material.

Tommi A. Vuorenmaa

[e-mail: [email protected]]

Department of Economics, University of Helsinki

Septemb er 21, 2004

ID=2934

Tiedekunta-Facultet-Faculty Valtiotieteellinen tiedekunta

Laitos-Institution-Department Kansantaloustieteen laitos

Tekijä-Författare-Author Vuorenmaa, Tommi

Työn nimi-Arbetets titel-Title A Multiresolution Analysis of Stock Market Volatility Using Wavelet Methodology

Oppiaine - Läroämne - Subject Ekonometria

Työn laji-Arbetetets art-Level Lisensiaatintyö

Aika-Datum-Month and year2004-09-22

Sivumäärä - Sidantal - Number of pages 140

Tiivistelmä - Referat - Abstract Rahoitusaikasarjat ovat luonteeltaan epästationaarisia, mikä ilmenee volatiliteetin kasautumisena ja suurina hyppyinä. Wavelet-menetelmä soveltuu hyvin tällaisten hankalien aikasarjojen analysointiin toisin kuin esimerkiksi traditionaalinen Fourier-menetelmä, joka muunnosta tehtäessä tuhoaa aika-ulottuvuuden. Wavelet-muunnos säilyttää ajan ja tuo yhden ulottuvuuden lisää: aikaskaalan. "Wavelet-multiresoluutioanalyysillä" voidaankin tarkastella aikasarjaa useilla aikaskaaloilla. Koska osakemarkkinat koostuvat aikaskaalan suhteen heterogeenisista sijoittajista, volatiliteetin käyttäytymistä voidaan tutkia syvällisemmin. Tätä päämäärää edesauttaa viime vuosina saataville tullut "korkeataajuinen" data, joka sopii oivallisesti wavelet-analyysin raaka-aineeksi. Edes datan suuri määrä ei ole ongelma, koska wavelet-muunnokset ovat nopeita laskea.

Tämän lisensiaattityön tarkoituksena on esitellä wavelet-menetelmä ekonometrikoille. Teoriaa havainnollistetaan Helsingin pörssissä (HEX) toteutuneiden Nokia Oyj -osakekauppojen avulla, jotka on poimittu likvideiltä markkinoilta viiden minuutin välein. Tutkimustulosten mielekkyyden varmistamiseksi markkinoiden mikrorakenteeseen liittyviä seikkoja oli otettava huomioon. Näistä eräät ovat ominaisia HEX:lle. Wavelet-analyysi paljastaa, että volatiliteetti IT-kuplan puhkeamista ennen (1999 - 2000) ja sen jälkeen (2001 - 2002) poikkeaa merkittävästi toisistaan tietyillä aikaskaaloilla. Skaalauslaitkaan eivät välttämättä ole aikainvariantteja kuten normaalisti oletetaan. Koska kuplan puhkeamista lisäksi edelsi voimakkaampi volatiliteetin pitkämuistisuus kuin jälkimmäisellä jaksolla, muistikin muuttuu ajassa. Tätä tietoa voidaan hyödyntää lokaalisesti stationaarisen stokastisen volatiliteetin mallin avulla. Myös volatiliteetin periodisuudella näytetään olevan merkitystä lopputuloksiin. Avainsanat Nyckelord Keywords

aikaskaala korkeataajuinen data osakemarkkinat pitkä muisti volatiliteetti wavelet Säilytyspaikka - Förvaringsställe - Where deposited /(täytetään kirjastossa)

Muita tietoja - Övriga uppgifter - Additional information

Page 1 of 1

21.9.2004http://www.valt.helsinki.fi/cgi-shl/gradu.pl

ID=2935

Tiedekunta-Facultet-Faculty Faculty of Social Sciences

Laitos-Institution-Department Department of Economics

Tekijä-Författare-Author Vuorenmaa, Tommi

Työn nimi-Arbetets titel-Title A Multiresolution Analysis of Stock Market Volatility Using Wavelet Methodology

Oppiaine - Läroämne - Subject Econometrics

Työn laji-Arbetetets art-Level Licentiate thesis

Aika-Datum-Month and year2004-09-22

Sivumäärä - Sidantal - Number of pages 140

Tiivistelmä - Referat - Abstract The non-stationary character of stock market returns manifests itself through the volatility clustering effect and large jumps. An efficient way of representing a time series with such complex dynamics is given by wavelet methodology. With the help of a wavelet basis, the discrete wavelet transform (DWT) is able to break a time series with respect to a time-scale while preserving the time dimension and energy unlike the traditional Fourier transform which "trades" time for frequency. Time-scale specific information is important if one accepts the view that stock market consists of heterogenous investors operating at different time-scales. In that case considerable more insight into the volatility dynamics is gained by looking at the data at several time-scales. At small time-scales, in particular, the locality of wavelet analysis allows one to fully exploit high-frequency data. Wavelet transforms are also fast to calculate, so they are ideally suited for analyzing large data sets.

The "large-scale aim" of this licentiate thesis is to first introduce wavelet methodology to econometricians and then to analyze stock market volatility with it. In more detail, the data consists of 5-minute observations of the liquid Nokia Oyj stock at the Helsinki Stock Exchange (HEX). Several microstructure problems have to be dealt with, some characteristic of the HEX. Pre-filtered volatility series is then being analyzed by the "maximal overlap" DWT to study both the global and local scaling laws in a turbulent "IT-bubble" (1999 - 2000) and its calmer aftermath period (2001 - 2002). Significant time-scale specific differences between these two periods are found. The global scaling laws may not be time-invariant as usually claimed. The bubble period also experienced stronger long-memory in volatility than its aftermath. Thus long-memory may be time-varying as well. Such a finding can be applied in a locally stationary stochastic volatility model. Finally, the effects of the intraday volatility periodicity are studied and they are also found to be significant. Avainsanat Nyckelord Keywords

high-frequency data long-memory stock markets time-scale volatility wavelet Säilytyspaikka - Förvaringsställe - Where deposited /(täytetään kirjastossa)

Muita tietoja - Övriga uppgifter - Additional information

Page 1 of 1

21.9.2004http://www.valt.helsinki.fi/cgi-shl/gradu.pl

Acknowledgements

Most of the empirical research for this licentiate thesis was conducted when visiting the

Bank of Finland (Research Department). Their hospitality is gratefully acknowledged. I

also gratefully acknowledge the grants from OP-Ryhman Tutkimussaatio and Yrjo Jahns-

son Foundation. Several people have made academic contributions to this work. I would

especially like to thank Professor Pentti Saikkonen for his patient guidance. In addi-

tion, the following persons gave valuable comments: Professors Erkki Koskela, Seppo

Honkapohja, Markku Lanne, Matti Viren, Lasse Holmstrom, Seppo Pynnonen, Doctors

Juha Tarkka, Tuomas Takalo, and Jouko Vilmunen. I also thank the participants at the

Economics and Econometrics of the Market Microstructure Summer School (Constance,

Germany 2004), RAKA-colleagues, and my father M.Sc. Osmo Vuorenmaa for the helpful

discussions. Finally, I thank the Fulbright Center and the Finnish Cultural Foundation

for giving me the essential financial backing to study a year at the University of California

(San Diego) in La Jolla (”The Jewel” in English), where this thesis was ”grinded”.

La Jolla, CA

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 A historical glance at wavelet methodology . . . . . . . . . . . . . . . . . . . . . 5

3 Essentials of Fourier and wavelet theories . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Fourier theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.2 Continuous Fourier transform . . . . . . . . . . . . . . . . . . . . . 10

3.1.3 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Wavelet theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.2 Continuous wavelet transform . . . . . . . . . . . . . . . . . . . . . 18

3.2.3 Discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . 21

4 Multiresolution analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Multiresolution composition . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Filters and wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 Compactly supported wavelets . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 Daubechies wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Decomposing time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.1 Practical issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.2 Discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3 Partial discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . 48

5.4 Maximal overlap discrete wavelet transform . . . . . . . . . . . . . . . . . 49

5.5 Wavelet variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Volatility modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.1 Measures of volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.2 Stochastic volatility and long-memory . . . . . . . . . . . . . . . . . . . . . 63

7 Empirical analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.1 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.2 Preliminary data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

ii

7.3 Multiresolution decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.4 Global scaling laws and long-memory . . . . . . . . . . . . . . . . . . . . . 88

7.5 Local scaling laws and long-memory . . . . . . . . . . . . . . . . . . . . . . 93

7.6 Effects of volatility periodicity . . . . . . . . . . . . . . . . . . . . . . . . . 99

8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

A Appendix: Some functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . 130

B Appendix: Orthonormal transforms . . . . . . . . . . . . . . . . . . . . . . . . . 132

C Appendix: Fractional differencing and long-memory . . . . . . . . . . . . . . . . 133

D Appendix: Locally stationary process . . . . . . . . . . . . . . . . . . . . . . . . 134

E Appendix: Fourier flexible form . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

F Appendix: List of abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

iii

1 Introduction

Financial time series share common characteristics. For example, in stock market return

data one typically observes discontinuities, i.e. sudden big changes, and clusters of volatil-

ity, i.e. alternating of highly volatile and tranquil periods (see Fig. 1). These phenomena

are so widely recognized today that they are called stylized facts (see e.g. Cont (2001) or

Dacorogna et al. (2001)). Inavoidably, then, a good model of financial returns would at

least need to capture the non-Gaussianity and the time-varying volatility.

In the 1970s a lot of effort was put into the study of discontinuities since the pioneering

work of Mandelbrot (1963) and Fama (1965). In fact, the study of non-Gaussian heavy-

tailed distributions dominated the empirical finance literature back then. Although there

is nowadays a vital new line of research of jumps (see e.g. Ait-Sahalia (2003), Barndorff-

Nielsen and Shephard (2003a, 2003b), Andersen et al. (2003)), for the last two decades the

main emphasis has been put into the research of volatility clustering or what has become

to be known as the ”ARCH-effect”. In particular, the seminal articles of Engle (1982) and

Bollerslev (1986) launched a huge interest in different kinds of (generalized) autoregressive

conditional heteroskedastic ((G)ARCH) models (for a review, see e.g. Bollerslev et al.

(1992, 1994) and Bollerslev (2001)).1 The immense interest in the conditional variance

stems from the fact that a correctly specified volatility model is important in valuation of

stocks and stock options and in designing optimal dynamic hedging strategies for options

and futures, among other reasons.

But as important as the ARCH-models and their numerous extensions have been for

the newborn field of financial econometrics, they have not helped much in explaining the

stylized facts. True enough, they are only models and as such perhaps only meant for

succesful data fitting, but are they missing something crucial? In a way they are, since

they are modeling only one time-scale (usually a day, or larger) at time. But stock market

data have no specific time-scale to analyze! A notable exception in this respect is the

heterogenous ARCH model introduced in Muller et al. (1997) based on the hypothesis of

a heterogenous market of Muller et al. (1993). According to this hypothesis stock markets

1In 2003 Engle was given a (half of) Nobel Prize in Economic Sciencies ”for methods of analyzing

economic time series with time-varying volatility” (see Http://www.nobel.se/economics/laureates/2003).

1

Nokia at the Helsinki Stock Exchange (1999 - 2002)

log(

Pric

e)

0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05

2.5

3.0

3.5

4.0

Time

log-

Ret

urn

0 e+00 2 e+04 4 e+04 6 e+04 8 e+04 1 e+05

-10

05

15

Figure 1: Logarithmic price and return history of Nokia at the HEX sampled every 5

minutes from January 4 (1999) to December 30 (2002).

2

consist of multiple layers of investment horizons (time-scales), varying from an extremely

short (minutes) to long (years). The small time-scales are commonly thought to be related

to speculative activity while the bigger time-scales are related to investment activity. It is

well known that the players in the stock market form a heterogenous group with respect

reasons such as perceptions of the market, risk profiles, institutional constraints, degree of

information, prior beliefs, and other characteristics such as geographical locations. Inter-

estingly, however, Muller et al. argue that many of these differences translate to sensitivity

to different time-scales. Indeed, Muller et al. (1997) show some support for the view that

time-scale is ”one of the most important aspects in which trading behaviours differ” (see

also Lynch and Zumbach (2003)). For example, big institutional investors have relatively

long trading horizons and they trade on economic fundamentals. On the other hand, the

so-called day-traders don’t keep open positions over night and they simply trade on mar-

ket sentiment. These small time-scales, in particular, have become increasingly important

because of the recent availability of high-frequency data (see e.g. Engle (2000)). In fact,

Goodhart and O’Hara (1997) conjecture that ”the ability to analyze higher frequency data

may be particularly useful in pursuing [why volatility persistance endures]”. One can then

argue that to better capture the dynamics of stock markets one must analyze data at

multiple, perhaps even a continuum of, time-scales.

Multiple time-scales are especially important from the view point of risk management.

For example, in the risk management industry one often needs to scale a risk measure

(standard deviation, say) of one time-scale to another. The industry standard is to scale

by the square-root of time, familiar from Brownian motion (i.e. continuous-time random

walk). But one is then implicitly assuming that the data generating process (DGP) is

made of independent and identically distributed (IID) random variables. This assumption

is not reasonable for financial time series; just consider volatility clustering, for example.

Indeed, the existance of serial correlation in the conditional second moments is usually

obvious even by eye (as in Fig. 1). The persistance is universally found so strong and long-

lasting that volatility is said to exhibit long-memory (or long-range dependence). Under

such non-IID circumstances square-root scaling may indeed lead to wrong conlusions (see

Diebold et al. (1997)).

3

Another commonly used overly simplifying assumption is stationarity, often of second-

order. Most of the parametric GARCH models assume this, for instance. Of the less

restricting non-parametric methods, spectral analysis requires ”covariance stationarity”

as well. This method is useful because it allows one to represent a stationary time series

in frequency domain in which the frequency aspects can be easily studied. This should

be contrasted with a time domain presentation — the customary way of presenting stock

market data — which hides the frequency information. In this context the requirement

of stationarity stems from the fact that a power spectrum is just a Fourier transform of

the corresponding sample autocorrelation function.2 As is well known, autocorrelation

is incapable of detecting non-stationarities. And because Fourier transform is looking

only for sinusoids ”globally”, spectral analysis is not suitable for transient and evolving

behavior of stock markets. Clean sinusoid components are rarely (never, dare to say!)

encountered in empirical finance.

Clearly, therefore, one needs to use more flexible tools and representations to study

stock market data. Considering the arguments made above, the most straightforward way

of increasing flexibility would be to use a non-parametric multiscale approach. This is ex-

actly where wavelet analysis enters the picture. Wavelet analysis offers a non-parametric,

mathematically concise way of studying the heterogeneity of stock markets under non-

stationary conditions. But what is wavelet analysis and how is it able to provide a time-

scale perspective on a complex (deterministic) function or, in the case of time series, on a

non-stationary realization of a stochastic process? Loosely speaking, wavelet methodology

extends Fourier methodology by replacing frequency by time-scale while still preserving

time dimension (Fourier ”trades” time for frequency). The best asset of wavelet analy-

sis, local adaptiviness, stems from the mathematical fact that the basis functions used

in wavelet analysis, called wavelets, are well-localized in both time and scale. This gives

wavelets a distinct advantage over standard frequency domain methods when analyzing

complex dynamics such as the one found in stock markets.

There are a wide variety of wavelets available today. However, in the empirical analysis

of this thesis only wavelets belonging to the family of Daubechies compactly supported

2In certain cases one can analyze non-stationary series with Fourier methods. The Fourier transform

can also be time-localized to a certain degree (see e.g. Priestley (1996)).

4

wavelets are applied. Specifically, the least asymmetric wavelets are used to analyze the

volatility of Nokia Oyj sampled every 5 minutes at the Helsinki Stock Exchange. Nu-

merous microstructure problems arise that have to be carefully dealt with. The empirical

results are interesting in several aspects. First, wavelet multiresolution analysis offers

new useful insight into the volatility dynamics. More precisely, the analysis reveals time-

varying changes in global and local scaling laws that have remained hidden with traditional

methods. Quantitatively, the semiparametric wavelet approach allows the estimation of a

locally stationary stochastic volatility model with a time-varying long-memory parameter.

The wavelet-based approach is ideally suited in the high-frequency context because of the

complex volatility dependencies and the large number of observations. It is found that the

bubble period experienced stronger long-memory than its aftermath. The long-memory

is argued to be true and not spuriously caused by structural breaks or a trend. Finally,

the analysis reveals significant effects of the intraday volatility periodicity. As a whole,

this thesis is more than just about ”deployment of wavelet technique to financial data” of

which Norsworthy et al. (2000), and justifiably so, have criticized some early authors of.

Before turning to the results of the empirical analysis (Sec. 7), however, it is essential

to go through the basics of classical Fourier theory as well as wavelet theory (Secs. 3.1 and

3.2). These two subsections serve as the backbone to the idea and theory of multiresolution

analysis, discussed next in somewhat technical manner (Sec. 4). Only after then, I believe,

one is prepared well enough to handle stochasticity (Sec. 5). There is also a quick review

of volatility measures and modeling (Sec. 6) before the actual analysis. I begin this rather

long tour by first taking a quick glance at the history of wavelet methodology, a form of

atomic decomposition.

2 A historical glance at wavelet methodology

The theory of wavelets origins from several different sources overlapping each other which

makes it a relatively hard research topic. Contrary to common beliefs, however, wavelets

have a quite long and fascinating history in mathematics. This delusion is probably due the

fact the word ”wavelet” does not appear in the literature until the 1980s when applications

in signal and image processing started to emerge. The interested reader should turn to

5

Meyer (1994) for more details than given below.

As the reader might know already, the origins of frequency based analysis lies in Joseph

Fourier who in 1807 asserted that any 2π-periodic function can be represented as a sum of

sinusoidal components with appropriate coefficients. In fact, Fourier happened to discover

a new functional universe. This space of square-integrable functions on the interval [0, 2π],

denoted henceforth by L2[0, 2π], was later created by Henri Lebesgue. For example, the

sequence 1/√2π, cos(x)/

√π, sin(x)/

√π, cos(2x)/

√π, sin(2x)/

√π, ... is an orthonormal

basis for this space. In year 1909 Alfred Haar found another orthogonal system of func-

tions, this time defined on [0, 1], that forms a series converging uniformly to a continuous

function on [0, 1] (chosen for convenience). The Haar basis functions were extremely simple

objects: indicator functions 1[a,b] that are 1 at the interval [a, b] and 0 elsewhere. Haar’s

discovery was important because it enabled to describe small and complicated detail that

the Fourier basis could not represent. In the 1930s the Haar basis was used by Paul Levy to

redefine the mathematically complicated Brownian motion with the Haar basis functions

(Schleicher (2002, p. 3)). Unfortunately, there are some serious problems with the simple

Haar basis. First, indicator functions are discontinuous, so approximating a continuous

function with them is not coherent. Secondly, the Haar construction is suitable only for

continuous or square-integrable functions defined on [0, 1].

To overcome these problems, Faber and Schauder replaced the indicator functions by

triangles. One then obtains a Schauder basis for the Banach space of continuous functions

on [0, 1] and the series constructed converges uniformly on [0, 1]. Notice that the Haar

system is not a Schauder basis because of the discontinuity of the Haar basis functions.

Schauder basis is superior to Fourier basis for studying local regularity properties. It can for

example be used to study the multifractal structure of Brownian motion. In contrast, the

trigonometric system does not allow direct and easy access to local regularity properties. It

is also known that the trigonometric system experiences trouble with localizing the energy

of a function. One solution to this problem is to manipulate the series by the so-called

Littlewood—Paley methods. One then essentially analyzes a function in various scales.

This type of analysis has been extensively used in numerical image processing and it is

akin to wavelet analysis.

6

Other orthonormal wavelet basis were discovered too. For example, in 1927 Philip

Franklin created an orthonormal basis from the Schauder basis by using the Gram—Schmidt

process. The Franklin basis can decompose any function in L2[0, 1], and it works well in

relatively irregular cases too. The main problem with the Franklin basis is its complex

algorithmic structure: Franklin wavelets are not derived from a fixed wavelet function.

Another direction of 1930s was that of Lusin’s who worked with Hardy spaces, a space

that can be identified with a closed subspace of Lp(R). These spaces play an important

role in signal processing today. Guido Weiss and Ronald R. Coifman were the first to

interpret Lusin’s theory in terms of ”atoms” (the simplest elements of the function space)

and ”atomic decompositions”. The aim is then to find these atoms and assembly rules that

allow the reconstruction of all the elements of the function space. For example, the Haar

system is the simplest atomic decomposition for the spaces Lp[0, 1], 1 < p <∞. One canuse the so-called Calderon’s identity to look at an atomic decomposition. This identity was

rediscovered by Grossman and Morlet in 1980, although their original interpretation was

in terms of quantum mechanics. They came up with the notion of an analyzing wavelet

ψ (nowadays known as the mother wavelet) and wavelets ψ(a,b) (defined later). Grossman

and Morlet were also the first ones to define wavelet coefficients as the inner product

of a function to-be-analyzed and wavelets. They also gave an inversion formula for the

synthesis of a function from the wavelet coefficients.

It was however not until the work of Stephane Mallat and Yves Meyer in the late

1980s that wavelets entered mainstream science. By combining signal processing theory

of quadrature mirror filters and orthonormal wavelet basis, Mallat came up with the con-

cept of a ”multiresolution analysis”. This notion succeeded in uniting different aspects of

wavelet theory and gave an elegant way of constructing wavelets. After Mallat’s contri-

bution one more major breakthrough took place: in 1988 Ingrid Daubechies constructed

”consumer-ready” wavelets with a preassigned degree of smoothness which made applied

work much easier.

Although applied in engineering for nearly two decades now, the applications of wavelets

in economics and finance started to emerge as late as the mid 1990s. One of the reasons

was that most of the theory of wavelets was developed in the context of deterministic func-

7

tions, not stochastic processes. Indeed, statistical theory and applications lagged behind

(the first most important ones being e.g. Donoho (1992), Donoho and Johnstone (1992)

and Donoho et al. (1993, 1995)). Today however, as Schleicher (2002) notes, the number

of succesful applications indicates that ”wavelets are on the verge of entering mainstream

econometrics”.

3 Essentials of Fourier and wavelet theories

The purpose of this section is to give the necessary technical background for the wavelet

multiresolution analysis (to be defined mathematically in the next section). It is suggested

that also those readers familiar with the basics of Fourier and wavelet theories would at

least skim through this section to be acquintant with the notation used.

3.1 Fourier theory

3.1.1 Introduction

Not long ago many thought that the mathematical world was created out of

analytic functions. It was the Fourier series which disclosed a terra incognita

in a second hemisphere. — E. B. van Vleck 1914 (Bachman et al. (2000, p.

159).)

Begin by considering a Taylor series approximation of a function. Obviously the quality

of this approximation depends on the number of polynomials included. A defect of this

approach is that for a function to have a Taylor series, it must (among other things) be

infinitely differentiable in some interval. On the other hand, sines and cosines are more

versatile ”prime elements” than powers of t. Indeed, sines and cosines may be used not

only to approximate non-analytic functions but also wildly discontinuous ones. Even the

periodicity of sines and cosines does not constitute a very serious limitation. Sines and

cosines triumphantly approximate functions f on [−π,π] in the sense that their Fourierseries converge (in some sense) to f . (Bachman et al. (2000, 139).)

8

In what follows, some basic concepts from functional analysis are required (see App.

A). Standard abbreviations are used throughout the thesis (see App. F) as well as the

following notation:

Notation 1 The boldfaced R denotes the set of real numbers, C denotes the set of complex

numbers, andK = R or C without specifying which. As usual, Z denotes the set of integers

and N denotes the set of natural numbers.

An integrable function f ∈ Lr1[−π,π] (the superscript r refers to real-valued elementsof L1[−π,π]) can be approximated by its trigonometric form of the Fourier series

a02+Xn∈N

an cosnt+Xn∈N

bn sinnt,

where

an =1

π

Z π

−πf(t) cosntdt

= hf, cosnt/πi , n ∈ N ∪ {0}

(i.e the inner product of f and the cosine-term), and

bn =1

π

Z π

−πf(t) sinntdt

= hf, sinnt/πi , n ∈ N,

are the cosine and sine Fourier coefficients of f , respectively (Bachman et al. (2000, pp.

139—40)).3 The coefficients an and bn tell in what way the analyzing function, i.e., cosines

and sines, needs to be modified in order to reconstruct the ”signal” which is in this case

a deterministic function.

Equivalently, Xn∈Z

F [f ](n)eint

3An early attempt to use trigonometric functions to approximate a function was done by Daniel

Bernoulli in 1753 when trying to solve the so-called wave equation ∂2w/∂t2 = a2∂2w/∂x2. In 1757 Euler

came up with the integral form of the Fourier coefficients. (Bachman et al. (2000, p. 140).)

9

is the exponential form of the Fourier series for f ∈ L1[−π,π], where the nth Fouriercoefficient

F [f ](n) = 1

2π

Z π

−πf(t)e−intdt, n ∈ Z,

is the finite Fourier transform of f evaluated at n. It is said to be ”finite” because

the domain of integration is finite. (Bachman et al. (2000, p. 264).)

3.1.2 Continuous Fourier transform

Clearly, one must be able to handle functions defined for all real t instead of just [−π,π].For any f ∈ L1(R), then, the Fourier transform of f is

F [f ](ω) =Z +∞

−∞f(t)e−iωtdt,

and its inverse Fourier transform4,

f(t) =1

2π

Z +∞

−∞F [f ](ω)eiωtdω.

(Bachman et al. (2000, p. 278).) Intuitively, the Fourier transform serves as a bridge

between the time domain and the frequency domain by eliminating all time resolution

and leaving only frequency resolution. The Fourier transform basis functions are indexed

by a single parameter ω. In physical terms this means that the Fourier transform is char-

acteristic of the global behavior of the function. The physical concept of ”frequency” is

related purely to the family of coupled exponential functions that are used in the Fourier

transform (Priestley (1996, p. 90)).

Example 2 (Rectangular pulse) The Fourier transform of an indicator function of

[−a, a], i.e. of

f(t) = 1[−a,a](t) =

1, |t| ≤ a,0, |t| > a,

, a > 0,

4The inversion formula holds for a large class of functions f ∈ L1(R). However, integrable functionscan have non-integrable Fourier transforms. An example of such a function is f(t) = e−tU(t), where

U(t) = 1[0,∞] is the unit step function. Then F [f ](ω) = 1/(1+ iω). But for large ω, |F [f ](ω)| ≈ 1/ |ω|, soF [f ](ω) /∈ L1(R). (Bachman et al. (2000, Ch. 5.5).)

10

is F [1[−a,a]](ω) = sin aωω/2

/∈ L1(R). Thus one has a non-integrable transform even in the verytame case of a time-limited f ; ”time-limited” because the function f of time t vanishes

outside a closed interval. (Bachman et al. (2000, p. 281).)

The general properties of the Fourier transform are skipped here (see e.g. Bachman et

al. (2000, Ch. 5.5)), but the following key concept must be introduced: the convolution

for f, g ∈ L1(R) isf ? g(t) =

Z ∞

−∞f(t− x)g(x)dx.

It satisfies

F [f ? g](ω) =Z ∞

−∞f ? g(t)e−iωtdt = F [f ](ω)F [g](ω).

It is important to realize that from a filtering point of view convolution acts like a linear

filter. Because only ”past values” are being transformed, the filter is causal. If the ouput

is a finite sequence, one has a finite impulse response (FIR) filter.5 FIR-filters are

commonly used in technical analysis of financial markets; just consider a simple moving

average of a finite period for example. More generally, filters are used in economics and

finance filters to extract components of time series such as trends, seasonalities, business

cycles, and noise (see e.g. Hamilton (1994) and Gencay et al. (2002a)). Identification and

extraction of components is important in terms of modeling and inference.6

Example 3 (Rectangular pulse and smoothing) The convolution of a function f ∈L1(R) with a rectangular pulse 1[−a,a] results in a pulse that is generally smoother than f :

f ? 1[−a,a] =

Z ∞

−∞f(t− x)1[−a,a]dx

=

Z a

−af(t− x)dx

=

Z t+a

t−af(u)du,

5”Impulse response” refers to the fact that when convolving an (unit) ”impulse”, i.e. the so-called

Dirac’s delta function, then the output is ”response” to that convolution operation.6Some of the most well-known filters used in economics and finance are the exponentially weighted

moving average, Hodrick—Prescott filter, and Baxter—King filter. In mean square error sense optimal linear

filters are called Wiener filters. Filters based on the use of state-space technique and recursive algorithms

are called Kalman filters. See Gencay et al. (2002a, Chs. 2.4 and 3).

11

where u = t− x (Bachman et al. (2000, p. 292)). This operation eliminates spikes fromf caused by transient phenomena in the manner of a moving average.

Notice that all of the above has been defined for f ∈ L1(R). Functional analysis saysthat for finite intervals [a, b], the bigger p gets, the smaller Lp[a, b] becomes: q > p implies

Lq[a, b] ⊂ Lp[a, b]. But the space Lp(R) does not descend as p increases. It is thereforepossible that a square-integrable function does not belong to L1(R) and have a well-defined

Fourier transform.7 It turns out, fortunately, that it is possible to establish L2-versions of

many results for L1 functions (see Bachman et al. (2000, Ch. 5.18)). In particular, for

f, g ∈ L2(R) it holds thatkfk22 =

1

2πkF [f ]k22 ,

i.e. the Fourier transform preserves energy (Bachman et al. (2000, p. 358)). So the

Fourier transform is (practically) a linear isometry of L2(R) onto L2(R).8

In anticipation of the wavelet transform, a refinement of the Fourier transform is worth

a discussion (for more details, see e.g. Priestley (1996) and Daubechies (1990)). The

windowed Fourier transform is (Bachman et al. (2000, p. 480))

(Tg,bf)(ω) =

Z ∞

−∞f(t)e−iωtg(t− b)dt,

where the window function g ∈ L2(R) is such that tg ∈ L2(R) (and the ”bar” denotes acomplex conjugate). This transform, better known as the short-time Fourier trans-

form (STFT), attempts to balance between time and frequency by sliding a window

across the time series and taking the Fourier transform of the windowed series. The result

is a function of two parameters, frequency and time-shift, so it is time-dependent. The

problem of the STFT is that the window size is fixed with respect to frequency. So the

STFT still suffers from the lack of time resolution and thus does not purely characterize

the properties of the function f. Of course, in theory, one could change the width of the

7For example, the unit step function f(t) = 1tU(t− 1) belongs to L2(R) but not to L1(R) (Bachman

et al. (2000, p. 356)).8A linear map A : X → Y (where X and Y are normed spaces) is said to be a linear isometry if

kAxk = kxk for every x ∈ X. This notion ”enables us to exclude many apparently different objects as onlytrivially different”. (Bachman et al. (2000, p. 25).)

12

window as a function of frequency ω but then the relationship between ω and the width of

the g would be quite arbitrary. Furthermore, the physical interpretation of (Tg,bf) is open

to the following question: ”what exactly it is measuring and what type of representation

for f leads to a meaningful interpretation of (Tg,bf)” (Priestley (1996, p. 91)).

The choice of the window g would appear to be very important in applications.9 If it

is chosen so that it decays to zero very fast, then the integral above operates only over a

very small time domain ”width”. But then, according to the ”uncertainty principle” (see

e.g. Priestley (1988, p. 151)), the STFT loses resolution in frequency domain (and vice

versa). The most famous example is the Gaussian window ga(t) =1

2√πae−t

2/4a resulting in

the Gabor transform (Bachman et al. (2000, p. 480))

G(a, b; f)(ω) =

Z ∞

−∞f(t)e−iωt

1

2√πae−(t−b)

2/4adt.

This window decays to zero as b→ ±∞ (and integrates to one, unlike wavelets).

3.1.3 Discrete Fourier transform

In some troublesome cases a closed form evaluation cannot be found and one has to use

approximation. The range of integration is then truncated to an interval [a, b] and the

integral for F [f ] is approximated by a finite sum such as F [f ](ω) ≈PN−1k=0 f(tk)e

−iωtk∆t.

From a discrete perspective, one is dealing with the values of f at only a finite number of

points {0, 1, ..., N − 1}. Consider f as defined on the cyclic group of integers modulo thepositive integer N ,

ZN = Z/(N) (Z modulo N), where (N) = {kN : k ∈ Z},

and

f : ZN → C, k + (N) 7→ f(k).

9Actually the choice of a window is a theoretically interesting question too. It was proved in 1981 by

Balian (incompletely, though) that one can not have an orthogonal representation with windowed Fourier

analysis when the window is reasonably regular and well-localized (Gaussian, say) (Hubbard (1998, p.

38)). It is thus a small miracle that very smooth (i.e. infinitely differentiable) orthogonal wavelets do

indeed exist as shown later (in Sec. 4).

13

This function f can be viewed as the N-periodic function defined on Z by taking

f(k + nN) = f(k) on Z, for k = 0, 1, ..., N − 1, n ∈ Z.

(Bachman et al. (2000, pp. 383—5).)

Since ZN is finite, any function defined on it is integrable. Thus L1(ZN) = L2(ZN) =

CN , the collection of all functions f : ZN → C. One gets the discrete Fourier trans-

form (DFT) for f : ZN → C,

D[f ]k =N−1Xt=0

fte−i2πtk/N ,

and its inverse discrete Fourier transform,

ft =1

N

N−1Xk=0

D[f ]kei2πtk/N

(Bachman et al. (2000, p. 390)).

Analogously to the continuous case, the energy preservation for discrete f, g ∈ L1(ZN) =L2(ZN) = C

N isN−1Xk=0

|fk|2 = 1

N

N−1Xk=0

|D[f ]k|2

(Bachman et al. (2000, p. 393)). Another analogy is the discrete (cyclic) convolu-

tion, f ? gt of f, g ∈ L1(ZN) defined at each t ∈ ZN by

f ? gt =N−1Xu=0

fugt−u,

satisfying

D[f ? g]k =N−1Xt=0

f ? gte−i2πtk/N = D[f ]kD[g]k

(Bachman et al. (2000, p. 397)). Notice that this can also be written as (Percival and

Walden (2000, pp. 29—30)),

f ? gt =N−1Xu=0

fugt−umodN , for t = 0, ..., N − 1,

14

where the ”modulo operation” is to remind that one is in fact using a periodic extension,

i.e., g−1 = gN−1, g−2 = gN−2, and so forth.10 Yet another way of writing the discrete

convolution is in a matrix form:

f ? g0

f ? g1...

f ? gN−1

=[N×N ]

g0 gN−1 gN−2 · · · g1g1 g0 gN−1 · · · g2...

......

. . ....

gN−1 gN−2 gN−3 · · · g0

f0

f1...

fN−1

,

where the N × N matrix’s rows are just circularly shifted versions of each other, i.e.,

vectors g(1),g(2), ...,g(N−1),g(0) that are shifted to the right by amount k = 0, ..., N − 1.Finally notice the DFT can be represented in matrix form as D[f ] = MNf, f ∈ CN ,

where MN is a N × N matrix. I do not discuss the structure of this matrix here (see

e.g. Bachman et al. (2000, Ch. 6)), but just comment briefly on its computational

complexity. Namely, to evaluate the DFT requires N2 multiplications and N(N − 1)additions. However, the fast Fourier transform (FFT) algorithm forN = 2k, a clever

way of factorizing the matrix MN , reduces the multiplications to something proportional

to N log2N. This reduction has been so important for a wide range of applications that

the FFT has been called ”the most valuable numerical algorithm of our lifetime” (Strang

(1993)). And although the modern literature on the FFT starts from the classic article of

Cooley and Tukey (1965), the notion can be traced back as far as Gauss. (Bachman et al.

(2000, Ch. 6).)

3.2 Wavelet theory

3.2.1 Introduction

One of the main ideas of wavelet analysis is to use functions different from sinusoids to ap-

proximate a function.11 The crucial difference to Fourier analysis is that in wavelet analysis

10More precisely, consider the expression ”jmodN”. If 0 ≤ j ≤ N − 1, then jmodN .= j. But if j is

any other integer, then jmodN.= j+nN , where nN is the unique integer such that 0 ≤ j+nN ≤ N −1.

(Pervival and Walden (2000, p. 30).)11Almost all the literature on wavelet analysis deals with the representation of deterministic square-

integrable functions. Admittedly, this approach is not the most insightful when dealing with time series.

15

one is expressing a possibly continuous function in terms of discontinuous wavelets. By

stretching (dilating) and shifting (translating) a ”mother wavelet”, one is able to capture

features that are local both in time and frequency. This property alone makes wavelets

more suitable for analyzing non-stationary or transient signals which are often present in

nature, finance and economics. In other words, wavelet basis is more interesting compared

to the Fourier one because ”unlike sines and cosines, individual wavelet functions are quite

localized in space; simultaneously, like sines and cosines, individual wavelet functions are

quite localized in frequency or (more precisely) characteristic scale” (Press et al. (1992, p.

584)).

Wavelets are functions that satisfy the following two conditions (an exact definition of

a wavelet will be given later): Z ∞

−∞ψ(t)dt = 0, (1)Z ∞

−∞|ψ(t)|2 dt = 1. (2)

That is, wavelets have zero average and unit energy. This guarantees that a wavelet has

non-zero entries but that those entries must eventually cancel out. Originally, Morlet saw

the zero average condition as a physical necessity: seismic time series undergo compressions

and rarefactions that must eventually cancel out. (Percival and Walden (2000, Ch. 1.1).)

Example 4 (Morlet wavelet) The classic example of a continuous-time wavelet is the

Morlet wavelet (Gencay et al. (2002a, p. 102)):

ψMorlet(t) =1√2πe−iωte−

t2

2

(see Fig. 2).

There are two central concepts in wavelet theory that appear constantly: dilation and

translation. They should be therefore well understood. For a real-valued, square-integrable

function f(x) ∈ L2(R) a (dyadic) dilation by j ∈ Z is defined as fj,0(x) .= 2j/2f (2jx),where the factor 2j/2 guarantees energy preservation, i.e., kf(x)k2 = kfj,0(x)k2. On theother hand, a translation of a real-valued function f(x) by k is defined as f0,k(x)

.=

Nevertheless, I will follow the tradition and deal with stochasticity later (in Sec. 5).

16

-4 -2 0 2 4

-0.2

0.0

0.2

0.4

Morlet wavelet

Time

Re(

Mor

let)

Figure 2: Real portion of the Morlet wavelet in time domain. In frequency domain the

Morlet wavelet is well localized, too (not plotted here, though).

17

f (x− k) for all k ∈ Z. However, in practical applications the indexing of j often runsto the opposite direction and the definition of dilation must then be changed accordingly.

In this ”reverse” custom (used in Secs. 5 and 7), the jth level dilation of the kth level

translation of f(x) is defined as

fj,k(x).= 2−j/2f

¡2−jx− k¢ ,

where x ∈ (−∞,∞) and j, k ∈ Z (see e.g. Percival and Walden (2000, p. 459)).

3.2.2 Continuous wavelet transform

The wavelet transform was originally developed as an analysis and synthesis tool for finite-

energy signals x(t) ∈ L2(R). Loosely speaking, the wavelet transform is an intelligently

adaptive tool that, in Meyer’s words (from Hubbard (1998, pp. 33—35)), jumps ”straight

to essentials”; and in ”contrary to what happens with Fourier series, the coefficients of

the wavelet series translate the properties of the function or distribution simply, precisely

and faithfully”. Indeed, the wavelet transform is especially suited for analyzing signals

that display strong transients, for example discontinuity, rupture, or ”the unforeseen”.

More precisely, if f(t) has such singularities, these will affect only the coefficients at time

points near the singularities. In contrast, the standard Fourier transform described above

depends on the global properties of f(t) and any singularity in it will affect all coefficients.

The continuous wavelet transform (CWT) of f ∈ L2(R) by ψ ∈ L2(R) is aprojection of a function f onto a particular wavelet ψs,u(t):

(Wψf)(s, u) =

Z ∞

−∞f(t)ψs,u(t)dt,

where

ψs,u(t) =1√sψ

µt− us

¶is the dilated (by s) and translated (by u) version of the mother wavelet ψ (Bachman et

al. (2000, p. 483)). The most obvious difference between the Fourier transform and the

CWT is that the wavelet basis functions are indexed by two parameters instead of just

one. The scale s is assumed to be restricted to R+, which is natural since s may, although

tenuously, be interpreted as a reciprocal of frequency. As Priestley (1996, p. 90) explains,

18

this relationship may be established in the case of an oscillatory mother wavelet ψ (such

as Morlet), because then as s decreases the ”oscillations” become more intense and show

”high-frequency” behavior. Similarly, when s increases the ”oscillations” become drawn

out and show ”low-frequency” behavior.

So the CWT turns the function f(t) of one parameter t to continuous wavelet

coefficients (Wψf)(s, u) that depend on two parameters, scale and location. Obviously,

a large wavelet coefficient occurs when the wavelet ψs,u(t) and the function f(t) match

in shape. Thus the above integral measures the variation of f in the neighborhood of u,

whose size is proportional to s. In other words, by calculating the wavelet transform one is

analyzing the potentially complex structure of the function by decomposing it into simpler

components. The key element of wavelets is their adaptiviness to different components of

a signal. Namely, by using a small window one looks at high-frequency components and

by using a large window one looks at low-frequency components. For this reason wavelets

are often called a ”mathematical microscope”. The capability of zooming by changing

the width of the window differentiates the CWT from the STFT. And because of the

time indexing of wavelets (and their compactness), the projection onto a wavelet space is

essentially local. This should be contrasted with the projection in Fourier analysis which

is essentially global although some localization can be achieved by first convolving the

function with a filter that decreases rapidly in modulus of the weights (as explained in

Sec. 3.1.2). The uncertainty principle still holds true, however: in an ideal sense, one

cannot choose a mother wavelet in such a way that it achieves ”good localization” in both

the time and and frequency domains (Priestley (1996, p. 91)).

The wavelet transform is an energy preserving transformation,Z ∞

−∞|f(x)|2 dx = 1

Cψ

Z ∞

−∞

Z ∞

−∞|(Wψf)(s, u)|2 ds

s2du,

if the mother wavelet ψ ∈ L2(R) satisfies the so-called admissibility condition (Mallat(1998, Th. 4.3))

Cψ =

Z ∞

0

|F [ψ(t)](ω)|ω

dω <∞;

first proven by the mathematician Calderon in 1964. To guarantee the finitiness of the

above integral, one must have F [ψ(0)] = 0 meaning thatR∞−∞ ψ(t)dt = 0. Conversily,

19

if the zero average property holds and ifR∞−∞ (1 + |t|α) |ψ(t)| dt < ∞ for some α > 0,

then Cψ < ∞. So if the admissibility condition indeed does hold, then the inversecontinuous wavelet transform exists and any f ∈ L2(R) is given by the so-calledresolution of identity (or Calderon’s reproducing identity)

f(t) =1

Cψ

Z ∞

0

Z ∞

−∞(Wψf)(s, u)ψs,u(t)

ds

s2du.

That is, the function f(t) can be synthesized by reconstructing it from the corresponding

wavelet coefficients (Wψf)(s, u). In fact, this perfect reconstruction property of square-

integrable functions is the key property of wavelet transforms.

Other properties of the CWT are skipped here (see e.g. Vidakovic (1999, pp. 46—

47)). Also the regularity (smoothness) of scaling and wavelet functions, measured in

terms of Lipschitz (or Holder) exponents, are skipped because of their excess technicality.

One practically useful characterization of wavelets is their number of vanishing moments,

however. If the wavelet function ψ(t) and P−1 of its derivatives are everywhere continuousand satisfy certain regularity conditions, then ψ(t) has P vanishing moments, i.e.,Z ∞

−∞tpψ(t)dt = 0, for p = 0, 1, ..., P − 1.

Only if the wavelet function has P vanishing moments, can the wavelet function ψ(t)

and P − 1 of its derivatives be continuous — but not necessarily (Percival and Walden(2000, p. 483)).12 The continuity of the wavelet and its derivatives is important when

analyzing a signal in practice because it helps to prevent artifacts due the wavelet itself.

As a rule of thumb, the smoother the analyzing wavelet, the more ”reliable” the outcome.

Furthermore, a wavelet with P vanishing moments guarantees that one has a multiscale

differential operator of order P (see below).

12One can measure the local regularity of a signal with a wavelet that has enough vanishing moments.

The decay of the wavelet transform amplitude across scales is related to the uniform and pointwise

Lipschitz regularity of the signal. By zooming into signal structures with a scale going to zero one can

then measure the asymptotic decay. (Mallat (1998, Ch. 6.1).)

20

3.2.3 Discrete wavelet transform

The CWT is a highly reduntant transform.13 It is defined at all points in the time-

frequency plane and the wavelet coefficients contain more information than necessary for

the perfect reconstruction property to hold. By a clever discretization of the CWT one

can reduce the number of wavelet coefficients to the minimum while still preserving all

information of the function. By choosing s and u according to a rule that is known as the

critical sampling (Vidakovic (1999, Sec. 3.2)),

s = 2−j and u = k2−j,

one gets the discrete wavelet transform (DWT)

(Wψf)(j, k) = 2j/2Z ∞

−∞f(t)ψ(2jt− k)dt

=f,ψj,k

®, j, k ∈ Z.

Another possibility is to choose s = 2−j and u = k, in which case one has the maximal

overlap DWT (Gencay et al. (2002a, p. 106)). This transform has different properties

from the ”ordinary” DWT and it will be discussed in more detail later. The computational

aspects of these transforms are discussed later, as well.

The critical sampling gives the functions ψj,k(t) = 2j/2ψ(2jt − k) and the set of dis-

crete wavelet coefficients (Wψf)(j, k) for f ∈ L2(R), where j and k represent theset of discrete dilations and translations, respectively. Importantly, the functions ψj,k form

an orthonormal system called a wavelet basis. One has arrived at a key definition (see e.g.

Woljtaszczyk (1997, p. 17)):

Definition 5 Wavelet. A mother wavelet is a function ψ(t) ∈ L2(R) such that thefamily of functions, called wavelets,

ψj,k(t) = 2j/2ψ(2jt− k), j, k ∈ Z,

is an orthonormal basis in the Hilbert space L2(R).

13Typically the CWT is redundant by a factor of ten. Alternatively, the signal can be said to be over-

sampled meaning that Shannon’s sampling theorem would hold with less number of samples. (Hubbard

(1998, p. 36).)

21

Example 6 (Haar wavelet) The oldest, the most famous, and the simplest wavelet basis

is the Haar basis. It was discovered by the mathematician Alfred Haar in 1909. The

functions that create the Haar basis are constructed from

ψHaar(t) =

1 on [0, 1/2)

−1 on [1/2, 1)0 elsewhere

(see Fig. 3). The function ψHaar is in time space extremely localized but clearly not

continuous. However, in frequency space it is poorly localized (see Fig. 4). It enjoys a

special role because it is the only known symmetric compactly supported wavelet.

4 Multiresolution analysis

This section explains, mathematically, how wavelet methodology is used to decompose

a deterministic finite-energy function with respect to a resolution (time-scale). The key

concept of multiresolution analysis is introduced. Importantly, wavelets are shown to act

as linear filters. The construction of compactly supported wavelets belonging to the family

of Daubechies is discussed in detail.

4.1 Introduction

The process of understanding is always facilitated if more complicated struc-

tures are known to be synthesized from simpler ones — matter from molecules,

molecules from atoms, atoms from quarks, organisms from cells, integers from

primes. (Bachman et al. (2000, p. 139).)

Economists often emphasize the importance of discerning between long-run and short-

run behavior. The distinction between permanent and transitory shocks, or the distinction

between equilibrium and the efficiency of dynamic adjustment are examples of situations

that involve the notion of time-scale. In econometrics, a cointegrated vector autoregressive

model is an implementation of the idea of long-run equilibrium supplemented with short-

run dynamics as Diebold (2004) notes. But there is no a priori reason why restrict to this

22

-4 -2 0 2 4

-1.0

-0.5

0.0

0.5

1.0

(0,0)

-4 -2 0 2 4

-1.0

-0.5

0.0

0.5

1.0

(-1,0)

-4 -2 0 2 4

-1.0

-0.5

0.0

0.5

1.0

(0,1)

-4 -2 0 2 4

-1.0

-0.5

0.0

0.5

1.0

(-1,1)

Figure 3: The Haar mother wavelet ψ being first translated by 1 to yield ψ0,1, dilated by

−1 to yield ψ−1,0, and finally dilated by −1 and translated by 1 to yield ψ−1,1.

23

Haar function in time domain

0 2000 4000 6000 8000 10000

-1.0

0.0

0.5

1.0

Haar function in frequency domain

Index

0 20 40 60 80 100

020

040

060

0

Figure 4: Time and frequency representation of the Haar function. In time domain the

Haar function is extremely localized, but in frequency domain it is very dispersed.

24

simplistic dichotomy instead of generalizing into multiple time horizons! This has led to

a growing awareness of the importance of different time-scales on economic and financial

actions and decisions. In fact, one of the central themes running through Nobel laureate

Engle’s work is the difference in economic dynamics across frequencies. For example, Engle

(1973) explicitly considered the idea of decomposition of dynamics by frequency through

band-spectral regression. More recent examples that use a time-scale dependent approach

in economics are Shleifer and Vishny (1990) and Osler (1995).

Not surprisingly, time-scale dependency is found useful also in finance where the het-

erogeneity of investors (see e.g. Muller et al. (1993)) makes it natural to study different

time-scales. Consider for example data on interest rates and bond maturities that encap-

sulate the distinctions between decision makers with different time horizons. Furthermore,

the recent availability of high-frequency financial data has allowed the analysis of high-

frequency dynamics such as microstructure noise, intraday calender effects, and the arrival

of quotes and trades in real time (see e.g. Engle and Russell (1998) and Engle (2000)).

A recent example taking a multiscale look at high-frequency data is Ghysels et al. (2003)

who use the so-called ”mixed data sampling” regressions to compare the predictive per-

formance of various models of volatility at several different ”frequencies”.

Wavelet methodology is by its nature a multiscale approach: by using several different

combinations of dilations and translations one is able to capture the information hidden

in several different time-scales. In a selective review article, Ramsey (1996) describes

the time-scale decomposition based on wavelets as their ”most promising” opportunity

in economics. In particular, Ramsey regards the investigation of potentially time varying

phase relationships (causalities) between variables of interest as a uniquely well suited

for wavelet decompositions. Such time varying relationships are studied in, for example,

Ramsey and Lampart (1998a) and Ramsey and Lampart (1998b). In the former, the

causality between money and income is found to be ambiguous: the direction of the

relationship depends on the level of the time-scale. Extending this to European countries

entering the European Monetary Union, Chew (2001) found that the lower frequency

components of volatility of velocity are primarily affected. Similarly, Atkins and Sun (2003)

found that the Fisher effect, i.e. the empirical relationship between nominal interest rates

25

and inflation, cannot be identified at the short time-scale, but at the largest time-scale.

In stock and foreign exchange rate markets the success of time-scale decomposition

seems to be less persuasive, though. Ramsey (2002) argues that this might be so because

there is much greater degree of ”mixing” inherent in these data originating from rapid and

extensive arbitrage activities. Maybe so, but this does not necessarily imply a lost case.

For example, Gencay and co-authors report several interesting applications of wavelets in

finance (see Gencay et al. (2002a)). In particular, Gencay et al. (2002c) study the robust-

ness of systematic risk (beta of an asset) across time-scales using wavelets and conclude

that the beta varies as a function of time-scale. In a financial risk management related

paper, Gencay et al. (2002b) use hidden Markov models in combination with wavelets

to study the asymmetry of information flow between volatilities across time-scales. The

lesson to be learned is that time-scale decompositions can provide very useful information

of the underlying complex dynamics of financial markets. At the bare minimum, one will

obtain a first-hand qualititative description that can be utilized in risk management, for

instance.

In the next subsection the concept of multiresolution analysis is first discussed in a fairly

formal way. Theorems and proofs are avoided to make the exposition more readable. Those

looking for a more rigorous treatment can for example turn to Bachman et al. (2000), who,

quite amusingly in their own Foreword, wonder if their coverage might be best described

as ”wavelets for idiots?” (mine is for econometricians?).

4.2 Multiresolution composition

One of the central theoretical questions of wavelets is how to construct an orthonormal

basis different from the Fourier basis. Even the existance of such a basis is not obvious

for a smooth function. To solve this problem one can use a technique called (dyadic)

multiresolution analysis (MRA) originally developed by Yves Meyer and Stephane

Mallat in the 1980s. In short, the construction relies heavily on the fact that MRAs

produce an orthogonal direct sum decomposition of L2(R) (shown in this subsection). It

then follows that an MRA is able to produce a mother wavelet ψ(t) such that the wavelets

ψj,k(t) = 2j/2ψ(2jt− k), where j, k ∈ Z, comprise an orthonormal basis for L2(R) (see the

26

next two subsections). Indeed, one often sees statements like ”good wavelets are usually

constructed starting from a multiresolution analysis” (Woljtaszczyk (1997, p. 17)).14

Definition 7 Multiresolution analysis. A sequence of closed subspaces {Vj : j ∈ Z}of L2(R) together with a function ϕ ∈ V0 is called a multiresolution analysis if itsatisfies the following conditions (e.g. Bachman et al. (2000, p. 414) and Vidakovic

(1999, p. 51)):

(a) · · · ⊂ V−1 ⊂ V0 ⊂ V1 ⊂ · · · (Increasing);(b) Closure

³Sj∈Z Vj

´= L2(R) (Density);

(c)Tj∈Z Vj = {0} (Separation);

(d) f(t) ∈ Vj if and only if f(2t) ∈ Vj+1 (Scaling);(e) There exists a scaling function (of the MRA) ϕ ∈ V0 whose integer translates span

the space V0, i.e., V0 =©f ∈ L2(R) | f(t) =

Pk∈Z gkϕ(t− k)

ª, and the set {ϕ(t−k) : k ∈

Z} is an orthonormal basis for V0 (Orthonormality).

In words, Condition (a) means that the signal to be analyzed at given resolution

contains all the information of the signal at coarser resolutions.15 Condition (b) means

that any signal can be approximated with arbitrary precision. Condition (c) means that

the function 0 is the only object common to all the spaces Vj. Condition (d) shows that

there is really only one space, e.g. V0 — all the other spaces are scaled versions (resolutions)

of the prototype V0. And finally, Condition (e) means that the scaling function must be

14Dyadic MRAs have been powerful in image compression and noise reduction. However, also non-

dyadic MRAs exist. There the study has concentrated on generalizing dyadic wavelets toM -band wavelets

(Steffen et al. (1993)) and dyadic wavelet analysis to mixed-radix wavelet analysis (Pollock and Lo Cascio

(2003, 2004)). The motivation behind non-dyadic constructions is that the dyadic scheme might be

too restrictive in empirical time series because there is no a priori reason why economic and financial

structures, in particular, should stay within dyadic bands. Nevertheless, in what follows I will only

consider a dyadic MRA.15One should be very careful with the dilation index j here. The above convention is due Mallat, but in

applications (algorithms) one often prefers Daubechies’ convention where the indexing runs in the opposite

direction. For this reason I will follow Mallat’s convention in this section but use Daubechies’ in the next

ones that deal with empirics.

27

orthonormal to its translates by integers.16

It is assumed that the scaling function ϕ satisfiesR∞−∞ ϕ(t)dt 6= 0 (made more precise

later) which obviously differentiates it from wavelets (see Eg. (1)). By Conditions (d)

and (e), the functions ϕ1,k(t) =√2ϕ(2t − k), k ∈ Z, constitute an orthonormal basis in

V1. Since V0 ⊂ V1, the scaling function ϕ belongs to V1 and can be represented as a linear

combination of functions from V1:

ϕ(t) =Xk∈Z

gk√2ϕ(2t− k) (dilation equation), (3)

where the gk’s are the Fourier coefficients of ϕ in V1 with respect to the basis {√2ϕ(2t−n),

n ∈ Z} :gk =

Z +∞

−∞ϕ(t)ϕ1,k(t)dt =

Dϕ(t),

√2ϕ(2t− k)

E.

The dilation equation (Eq. (3)) is also known as the two-scale difference equation, relating

two scales to each other.

By assuming that functions satisfying the conditions (a)—(e) of an MRA exist (they

do in fact!) it follows that there exists a wavelet that ”with its translates by integers

and dilates by a factor of two, can encode the difference of information between the signal

seen at two successive resolutions” (Hubbard (1998, p. 173)).17 In purely mathematical

terms this means that the space Wj, called resolution level j of an MRA, associated

with the wavelet is orthogonal to the space Vj, i.e. Wj = V ⊥j (the superscript refers to

orthogonality) and it represents the difference between Vj and Vj+1 :

Vj+1 = Vj ⊕Wj,

i.e. every x ∈ Vj+1 can be uniquely expressed as a sum x = y + z, y ∈ Vj, z ∈ Wj and

Vj ∩Wj = {0}. By iteration one then finds that

Vn+1 = V0 ⊕³⊕+∞

n=0Wn

´, n ∈ N,

16The requirement of orthogonality can be relaxed to obtain a Riesz MRA. Namely, it is sufficient to

assume that {h(t − k) : k ∈ Z} be a Riesz basis for V0 which was Mallat’s original assumption (see e.g.Mallat (1998)). There would then exist a function ϕ ∈ V0 such that {ϕ(t− k) : k ∈ Z} is an orthonormalbasis for V0.17These conditions are not independent of each other, however: Conditions (a), (d), and (e) imply (c)

(Bachman et al. (2000, 449—50)). Furthermore, Conditions (a) and (e) are not at all obvious.

28

and the closed subspaces V0,W0,W1, . . . are mutually orthogonal by construction. And

since (by similar arguments) V0 = ⊕+∞n=1W−n, it follows that an MRA ((Vn),ϕ) produces

an orthogonal direct sum decomposition of L2(R):

L2(R) =⊕n∈ZWn,

where Wn = V⊥n in Vn+1 (Bachman et al. (2000, pp. 417—8)).

Concretely, the above means that any finite-energy function can be decomposed in

terms of scaling functions and wavelets. In specific, the so-called inhomogenous wavelet

expansion is

x(t) =Xk

v0,kϕ0,k(t) +∞Xj=0

Xk

wj,kψj,k(t),

where the v0,k coefficients summarize the general form of the function and the wj,k coef-

ficients represent the local details. These coefficients, known as the scaling and wavelet

coefficients, are calculated as, respectively,

v0,k =

Z ∞

−∞x(t)ϕ0,k(t)dt and wj,k =

Z ∞

−∞x(t)ψj,k(t)dt.

This can also be formulated in terms of wavelets only by the so-called homogenous

wavelet expansion

x(t) =∞Xj−∞

Xk

wj,kψj,k(t),

where the ”reference” space V0 is being eliminated.18 (Hardle et al. (1998, p. 28).)

4.3 Filters and wavelets

Wavelets can be regarded as special kind of ”band-pass” filters. In the present context, the

notion of a filter is defined mathematically as follows (Bachman et al. (2000, pp. 423—4)):

Definition 8 Filter. Let ((Vn),ϕ) be an MRA. Then any g ∈ V1 can be written in termsof the orthonormal basis {√2ϕ(2t− k) : k ∈ Z} for V1 as

g(t) =Xk∈Z

bk√2ϕ(2t− k).

18One could as well choose Vj0 , j0 ∈ Z, as the reference space. One then obtains the same expansionformulas with 0 replaced by j0. (Hardle et al. (1998, p. 28).)

29

SinceP

k∈Z |bk|2 <∞, one can form the 2π-periodic function

mg(ω) =1√2

Xk∈Z

bke−ikω ∈ L2(T),

where T represents the circle group, i.e., a multiplicative group consisting the points of

the unit circle T = {z ∈ C : |z| = 1} of the complex plane. The function mg is called the

filter associated with g.

The function mg is better known as the transfer function or the frequency

response function (FRF). In general, a transfer function is able to classify a filter.

The use of the transfer function mϕ, in particular (notice the subscript ϕ), is that it

enables one to express a Fourier transform F [ϕ] of ϕ ∈ V1 in terms of mϕ and a Fourier

transformed scaling function F [ϕ]. Namely, by taking a Fourier transform of the dilation

equation (Eq. (3)) one has (Vidakovic (1999, p. 53))

F [ϕ](ω) =Xk

√2gk

Z ∞

−∞ϕ(2t− k)e−iωtdt

=Xk

gk√2e−ikω/2

Z ∞

−∞ϕ(2t− k)e−i(2t−k)ω/2d(2t− k)

= mϕ

³ω2

´F [ϕ]

³ω2

´(filter equality), (4)

for ϕ ∈ V1; or equivalently, F [ϕ](2ω) = mϕ(ω)F [ϕ](ω). This implies that different

mϕ(ω/2) give different scaling functions.

In Bachman et al. (2000, Th. 7.4.8) it is shown that a similar equality holds for a

mother wavelet ψ:

F [ψ](ω) = e−i(ω2+π)mϕ

³ω2+ π

´F [ϕ]

³ω2

´.

By the inverse Fourier transform one then recovers a mother wavelet ψ ∈ V0 whose integertranslates {ψ(t − n) : n ∈ Z} form an orthonormal basis for W0 = V1 ∩ V ⊥0 , and thatsatisfies

ψ(t) =Xk∈Z

hk√2ϕ(2t− k), (5)

where the wavelet coefficients are hk = (−1)kg1−k, and (gk) ∈ `2(Z) the scaling coefficients.It follows that ϕ and ψ have the same regularity properties.

30

Importantly, the filter (transfer function, FRF)

mϕ(ω) =1√2

Xk∈Z

gke−ikω

is a low-pass filter associated with the scaling function ϕ, defined as follows:

Definition 9 low-pass filter. A ”system” consists of an output v, an impulse response

h, and an input u. If the Fourier transformed impulse response h in a system F [v](ω) =F [h](ω)F [u](ω) is zero for high-frequency components, i.e. if F [h](ω) = 0 for |ω| ≥ ω0,

then the system is called a low-pass filter.

A simple discrete example of a filter of finite length (width) 2 is the following:

Example 10 (2-period simple moving average) Gencay et al. (2002a, p. 35) con-

sider a moving average of the form yt =12(xt + xt−1) having the FRF

1

2

¡1 + e−iω

¢=1

2

¡e−iπf + eiπf

¢e−iπf = [cos(πf)] e−iπf .

When f = 0 then the FRF is 1, and when f approaches 1/2, then the FRF approaches 0

implying that this is a low-pass filter.

In practice, this means that convolving the scaling function with data gives the em-

pirical scaling coefficients which are an approximation of the original series with high-

frequency detail filtered out. Notice that when the filter is of finite length, FRF is often

defined a bit differently (as in Percival and Walden (2000)). I adapt to this by using a

parallel notation and define

G(f).=

L−1Xl=0

gle−il2πf

for the transfer function of a scaling filter gl of length L. Especially notice that the scaling

factor has changed by√2. It can now be shown that such a filter must satisfy:

L−1Xl=0

gl = ±√2, (6)

L−1Xl=0

g2l = 1, andL−1Xl=0

glgl+2n = 0, (7)

for all non-zero integers n (Percival and Walden (2000, p. 76)).

Similarly than above:

31

Definition 11 High-pass filter. If F [h](ω) = 0 for |ω| ≤ ω0, then the system is called

a high-pass filter.

Example 12 (2-period moving difference) Gencay et al. (2002a, p. 38) consider a

moving difference yt =12(xt − xt−1) having the FRF

1

2

¡1− e−iω¢ = 1

2

¡e−iπf − e−iπf¢ eiπf = sin(πf)ie−iπf .

When f = 0 then the FRF is 0, and when f approaches 1/2, then the FRF approaches 1

implying that this a high-pass filter.

Definition 13 Band-pass filter. If F [h](ω) = 0 for ω1 ≤ |ω| ≤ ω0, then the system

is called a band-pass filter.

The wavelet filter acts as a high-pass (or band-pass) filter. By analogy, therefore, the

FRF H(f) for hl is defined as

H(f).=

L−1Xl=0

hle−i2πfl.

In practice, convolving a wavelet filter with data gives the empirical wavelet (sometimes

called differencing) coefficients, i.e. the details of the original series with low-frequencies

filtered out. It is important to notice that the above two filter types are interconnected

in a simple, rather remarkable way. By what are known as the quadrature mirror

relationships (QMRs)19,

gl.= (−1)l+1hL−1−l or hl .= (−1)l+1gL−1−l, (8)

a wavelet filter hl of length L must satisfy (cf. Eqs. (1) and (2)):

L−1Xl=0

hl = 0, (9)

19Mathematically speaking, the QMRs (below) and hk = (−1)kg1−k implying gk = (−1)k−1h1−k (usedabove) are equivalent (Percival and Walden (2000, p. 79)). The latter one is useful for filters with infinite

width since the formula does not include the filter width L. I prefer to use the former however because

in applications filters are of finite width.

32

L−1Xl=0

h2l = 1, andL−1Xl=0

hlhl+2n = 0 (orthonormality), (10)

for all non-zero integers n (Percival and Walden (2000, p. 69)).

Example 14 (Haar filter coefficients) The Haar scaling and wavelet filter coefficients

are, respectively,

gHaar0 = 1/√2, gHaar1 = 1/

√2, hHaar0 = 1/

√2, and hHaar1 = −1/

√2.

The QMRs (Eq. (8)) and the three conditions for both filters (Eqs. (6), (7), (9) and (10))

are easily checked to hold.

Example 15 (Daubechies 4 filter coefficients) The Haar filter belongs to the family

of Daubechies filters (see Sec. 4.5) for which there are, in general, no explicit time-domain

formulae. However, in the case of ”Daubechies 4”, the unique solution of scaling filter

coefficients is (see e.g. Press et al. (1992, pp. 585—6)):

gDaub0 = (1 +√3)/4√2, gDaub1 = (3 +

√3)/4√2,

gDaub2 = (3−√3)/4√2, and gDaub3 = (1−√3)/4√2.

The QMRs can then be used to find the wavelet filter coefficients, for instance.

The squared gain function (SGF) associated with the wavelet filters is defined as

H(f) .= |H(f)|2, where the gain function |H(f)| represents the magnitude of the FRF.The orthonormality property of wavelet filters in time domain (Eq. (10)) can then be

stated in terms of SGFs. By defining G(f) .= |G(f)|2 , the analogous formula for scalingfilters (Eq. (7)) becomes (Percival and Walden (2000, p. 69))

G(f) + G(f + 1/2) = 2 for all f. (11)

The FRF (of wavelet filters) can also be expressed in polar notation as H(f) =

|H(f)| eiθ(f), where the phase function θ(f) represents the phase of the FRF (the same

is true for scaling filters, of course). It is worthwhile to notice that neither the gain function

nor the SGF carry any phase information but that the gain function is handy in visualizing

33

the frequency properties of a filter. For example, when plotted as a function of frequency,

the gain function of an ideal band-pass filter would have well defined cut-off frequencies

(cf. Def. 13). In practice, however, an ideal filter is computationally not realizable as

it would require infinitely many coefficients, and one must settle for an approximation.

(Gencay et al. (2002a, Sec. 2.3.1).)

The case of θ(f) = 0 for all f is special because then H(f) = |H(f)| ; such a filter iscalled a zero phase filter. Zero phase filters are important in practice because they do

not shift the location of a discontinuity from the original series. The filter in Example

10 is not a zero phase filter because its phase is πf. But a centered moving average is

a zero phase filter (see Gencay et al. (2002a, pp. 35—36)). If a filter has a zero phase,

then θ(f) = 2πfν corresponds to the case of a linear phase filter, H(f) = |H(f)| ei2πfν.Linear phase filters are such that advancing the filter by ν units advances the output by

ν units. For example, if ν = 2, then the events in the output occur two units in advance

of the original output. This would correspond to using a filter whose coefficients are

being advanced circularly by two units. The above concepts are needed to understand the

rationale for using the so-called Daubechies least asymmetric filters later (in Sec. 4.5).

(Percival and Walden (2000, pp. 110—1).)

4.4 Compactly supported wavelets

The construction of a compactly supported scaling function and wavelets is now sketched

because this class is often preferred in applications. In the construction I will use the fact

that the associated filtermϕ is a trigonometric polynomialmϕ(ω) =PM

k=−M¡gk/√2¢e−ikω.

Details can be found in Bachman et al. (2000, Ch. 7.5), for example.

First, by iterating the filter equality (Eq. (4)), one has

F [ϕ](ω) = mϕ

³ω2

´F [ϕ]

³ω2

´= mϕ

³ω2

´mϕ

³ω4

´F [ϕ]

³ω4

´= · · ·=

ÃnYj=1

mϕ

³ ω2j

´!F [ϕ]

³ ω

2n

´, n ∈ N.

34

The uniform convergence and continuity of F [ϕ](ω) can be proved, as well as the the factthat F [ϕ] ∈ L2(R). From the continuity it follows that limnF [ϕ](ω/2n) = F [ϕ](0) = 1.This means that in the limit one has infinitely many multiplications,

F [ϕ](ω) =Yj∈N

mϕ

³ ω2j

´(convolution cascade), (12)

which can be interpreted as corresponding ”in ’physical’ space to a cascade of convolutions

of the low-pass filter with itself at different scales” (Hubbard (1998, p. 177)). In practice

the product converges very fast to the limit, so one routinely needs only as few as six

terms.

Example 16 (Convolution cascade for Haar) The convolution cascade in the Haar

filter case is F [ϕ]Haar(ω) =Qj∈NmHaarϕ

¡ω2j

¢= e−iω/2 sinω/2

ω/2(Woljtaszczyk (1997, p. 98)).

At this point some of the main properties of the filter mϕ are worth summarizing (see

Bachman et al. (2000, pp. 436—7) for more details):

Summary 17 Properties of the filter mϕ. The function mϕ is a trigonometric

polynomial if ϕ has compact support and F [ϕ](0) 6= 0. Then it is also true that F [ϕ](ω)is continuous and mϕ satisfies

(a) mϕ is continuous and 2π-periodic;

(b) |mϕ(ω)|2 + |mϕ(ω + π)|2 = 1 for all ω ∈ R (scaling identity);

(c) mϕ(0) = 1.

Condition (b) corresponds to a condition for the low-pass filter of a pair of comple-

mentary twin filters (with just a different scaling factor, see Eq. (11)). In general it is not

clear that one can find such a polynomial mϕ but for the Haar system it is relatively easy

to check:

Example 18 (Scaling identity for Haar) Hubbard (1998, pp. 174—6) happens to de-

fine the Fourier transform a bit differently by replacing e−ikω by eikω. This, and ω = 2πf,

then implies that mHaarϕ (f) = 1

2

¡1 + e2πif

¢. From basic trigonometry one knows that

cosω = 12(eiω + e−iω), and so

mHaarϕ (f) = eπif cosπf.

35

Because cos¡ω + π

2

¢= − sinω, and because eπi2 = −i, it is easy to confirm that

mHaarϕ

µf +

1

2

¶= eπif (−i sinπf) .

Thus ¯eπif cosπf

¯2+¯eπif (−i sinπf)¯2 = 1,

which is Equation (11), or equivalently, the scaling identity (i.e. Property (b)).

The convolution cascade (Eq. (12)) satisfies F [ϕ](ω/2) =Q∞j=1mϕ(ω/2

j+1). So

F [ϕ](ω) = mϕ

³ω2

´Yj∈N

mϕ

³ ω

2j+1

´= mϕ

³ω2

´F [ϕ]

³ω2

´=

ÃnX

k=−n

gk√2e−ikω/2

!F [ϕ]

³ω2

´.

Because F [ϕ](ω) ∈ L2(R), the inverse Fourier transform F−1[ϕ] is well defined. It turnsout to be ϕ(t) =

Pnk=−n gk

√2ϕ(2t − k), which indeed has compact support (Bachman

et al. (2000, Ex. 7.5.1)). By Equation (5), then, one gets the mother wavelet ψ(t) =Pnk=−n hk

√2ϕ(2t − k), which as a linear combination of compactly supported scaling

functions is also compactly supported.20

Notice that the filter mϕ being a trigonometric polynomial is equivalent to the dilation

equation ϕ(t) =P

k∈Z gk√2ϕ(2t − k) when there are only a finite number l of non-zero

terms gl. This is indeed the case for the Haar and, more generally, for the scaling functions

corresponding to the Daubechies wavelets (Hubbard (1998, p. 177)). As Woljtaszczyk

(1997, p. 75) puts it: ”It is a surprising fact that [compactly supported wavelets], other

than Haar wavelet, exist and moreover can be chosen arbitrary smooth.”

20To be careful, there does exist trigonometric polynomials satisfying Properties (a)—(c) of Summary

17 for compactly supported scaling function ϕ but for which {ϕ(t − n) : n ∈ Z} is not orthonormal. Itactually takes a bit more than a trigonometric polynomial to generate a compactly supported wavelet.

To guarantee the orthonormality of {ϕ(t − n) : n ∈ Z}, one must have mϕ(ω) 6= 0 for ω ∈ [−π/2,π/2](see Bachman et al. (2000, pp. 443—4)). When one has an orthonormal family {ϕ(t− n) : n ∈ Z}, thenone is able to construct an MRA for which ϕ is the scaling function.

36

4.5 Daubechies wavelets

Although Daubechies wavelets are not the only compactly supported wavelets (for coiflets,

see e.g. Hardle et al. (1998, Ch. 7.2)), these filters are very practical because they yield a

DWT that can be described in terms of generalized differences of weighted averages. This

in turn implies that the Daubechies wavelet filters are capable of producing stationary

wavelet coefficient vectors from ”higher degree” non-stationary stochastic processes. In

fact, these particular filters have L/2 embedded differencing operations. Such a property

is handy in many applications.21 Several studies have also confirmed that ”long-memory”

processes such as fractional Brownian motion (Flandrin (1992), Tewfik and Kim (1992),

and Dijkerman and Mazumdar (1994)), autoregressive fractionally integrated moving av-

erage (ARFIMA) processes (Jensen (1998, 2000)) and fractionally differenced (FD) pro-

cesses (McCoy and Walden (1996) and Vannucci and Corradi (1999)) can be decorrelated

up to a certain degree, both within and between scales. Craigmile and Percival (2002),

in particular, demonstrate that for a wide class of stochastic processes the covariance of

between-scale wavelet coefficiens decreases to zero as the width L of the wavelet filter in-

creases. Unfortunately, as documented in Percival and Walden (2000), increasing L can

lead to an increase in the covariance of within-scale wavelet coefficients. This dilemma

can be solved by modeling the remaining within-scale covariance using an autoregressive

process of order p. Useful statistical asymptotic theory of wavelet coefficients could then

be achieved by letting L→∞ and p→∞ (see Craigmile et al. (2000)).

A concrete problem with the Daubechies class of filters is that there are in general no

explicit time-domain formulae for them. Originally the Daubechies scaling and wavelet

filters were obtained by specifying certain vanishing moment conditions on a wavelet func-

tion that is entirely determined by the associated scaling filter. More precisely, Daubechies

wanted to find the exact form of trigonometric polynomials mϕ(ω) which produce scaling

and wavelet functions with compact supports such that the moments of the scaling and

wavelet function of order from 1 to n vanish. This would guarantee good approximation

21Coiflets have remarkably good phase properties (i.e. they provide a very good approximation to zero

phase filters). The problem with coiflets is that they can introduce artifacts into an MRA and they have

only L/3 embedded differencing operations. (Percival and Walden (2000, Ch. 4.9).)

37

properties of the corresponding wavelet expansions.22 A sketch of Daubechies’ (1988) con-

struction can be found in Hardle et al. (1998, Ch. 7.1), from which the following definition

is also adapted:

Definition 19 Daubechies wavelets (Hardle et al. (1998, p. 61)). Wavelets con-

structed with the use of functions mϕ(ω) satisfying

|mϕ(ω)|2 = cNZ π

ω

sin2N−1 tdt, (13)

where the constant cN is chosen so thatmϕ(0) = 1, are called the Daubechies wavelets.

Example 20 (Daubechies 2 coincides with Haar) By setting cN =12and N = 1 in

Equation (13), one gets

|mϕ(ω)|2 = 1

2

Z π

ω

sin tdt =1 + cosω

2.

On the other hand, by choosing mHaarϕ (ω) = 1

2(1 + e−iω) , one has

|mϕ(ω)|2 = mϕ(ω)mϕ(−ω) = 1 + cosω

2,

so that the Daubechies 2 coincides with the Haar. (Hardle et al. (1998, p. 61).)

For such functions mϕ(ω) one can tabulate the scaling filter coefficients gl (see e.g.

Percival and Walden (2000, Ch. 4.8)). Discretely, the definition of Daubechies wavelets

can be stated in terms of the SGF for the associated Daubechies scaling filters gl (Gencay

et al. (2002a, p. 112)):

GDaub(f) = 2 cosL(πf)L/2−1Xl=0

µL/2− 1 + l

l

¶sin2l(πf),

where L is a positive even integer. Notice that by setting L = 2, one has GHaar(f) =2 cos2(πf), from which one gets the Haar scaling filter coefficients by inversion. Thus the

Haar is again seen to belong to the Daubechies family.

22Daubechies’ wavelets have vanishing moments for wavelet functions, but not for scaling functions.

Coiflets have vanishing moments also for scaling functions. (Hardle et al. (1998, Ch. 7.2).)

38

The problem with this approach is that SGF does not uniquely characterize a sequence

of Daubechies wavelet filters. This is because the phase information is lost when doing

a modulus operation on the FRF (see Sec. 4.3). In fact, given GDaub(f), one can obtainall possible gl by a procedure known as spectral factorization (see Percival and Walden

(2000)). The factorization that Daubechies originally used corresponds to an extremal

phase choice for the transfer function and produces what is known as a minimum delay

filter in the engineering literature. These filters are henceforth referred to as D(L) filters

(see Fig. 5, where ”N” refers to the number of the filter in the wavelet family that is

family specific). Like mentioned in many occasions now, only the D(4) and D(2) (Haar)

wavelets have simple expressions.23 As a rule of thumb, the number of vanishing moments

for the Daubechies wavelets is half the filter length (i.e. L/2).

Another factorization leads to the least asymmetric family of scaling filters which

are henceforth referred to as the LA(L) filters (Hardle et al. (1998) call them symmlets).

These filters have a phase function that has the smallest maximum deviation in frequency

from the best fitting linear phase function. Or put differently, the phase of mϕ(ω) is

minimal among all the mϕ(ω) with the same value of |mϕ(ω)| . This means that the degreeof asymmetry for a filter is measured by the deviation from linearity of its phase. The

LA(L) filters try to be as close as possible to symmetry without losing compactness (see

Fig. 6).24 Shann and Yen (1999) provide exact values for both the D(L) and LA(L) filters

of length L = {8, 10}. Notice that a greater degree of symmetry does not mean increasedregularity even in the case of the same gain function (Percival and Walden (2000, p. 494)).

In fact, although the LA(L) filters are more symmetric than the D(L) filters, the ”Holder

regularity” for the LA(L) scaling and wavelet functions is lower than for the D(L) scaling

and wavelet functions (see Rioul (1992)).

23A mathematical curiosity of the D(4) filter is that although its scaling and wavelet function are

continuous, they do not have continuous derivatives (Percival and Walden (2000, p. 386—7)).24Recall that only in the Haar system can both the scaling and wavelet function be at the same time

compactly supported and symmetric.

39

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

(a)

Haar wavelet

ψ(x)

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

0.0

0.5

1.0

1.5

(b)

Daub cmpct on ext. phase N=2x

ψ(x)

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

0.0

0.5

1.0

1.5

(c)

Daub cmpct on ext. phase N=3

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

-1.0

-0.5

0.0

0.5

1.0

(d)

Daub cmpct on ext. phase N=4x

Figure 5: Daubechies extremal phase compactly supported wavelets of different lengths

L: (a) D(L = 2), (b) D(L = 4), (c) D(L = 6), and (d) D(L = 8).

40

-0.5 0.0 0.5 1.0 1.5 2.0

-1.0

-0.5

0.0

0.5

1.0

1.5

(a)

Daub cmpct on least asymm N=4

ψ(x)

-1.0 -0.5 0.0 0.5 1.0 1.5

-1.0

-0.5

0.0

0.5

1.0

(b)

Daub cmpct on least asymm N=5x

ψ(x)

-0.5 0.0 0.5 1.0 1.5 2.0

-1.0

-0.5

0.0

0.5

1.0

1.5

(c)

Daub cmpct on least asymm N=6

-1 0 1 2 3

-1.0

-0.5

0.0

0.5

1.0

(d)

Daub cmpct on least asymm N=7x

Figure 6: Daubechies least asymmetric compactly supported wavelets of different lengths

L: (a) LA(L = 8), (b) LA(L = 10), (c) LA(L = 12), and (d) LA(L = 14).

41

5 Decomposing time series

This section aims to show how multiresolution analysis is done in a time series context.

Notice that in this section the function-to-analyzed is not deterministic but instead a

realization of a stochastic process. Special attention is paid to the energy preservation

property that allows the decomposition of total variance into time-scale specific wavelet

variances. The most comprehensive coverage of wavelet analysis in statistical time series

context at the moment is Percival and Walden (2000). A nice complementary review

article is Nason and von Sachs (1999).

5.1 Practical issues

In order to come up with a useful wavelet analysis of a time series, one must begin by

taking into account several things. The most important ones are (see Percival and Walden

(2000, Ch. 4.11)):

1. the choice of a wavelet filter;

2. handling boundary conditions;

3. sample sizes that are not a power of two.

Choice of a wavelet filter. The problem with the smallest width wavelet filters is that

they can sometimes introduce undesirable artifacts into the resulting analysis, such as

unrealistic blocks, ”sharks’ fins”, etc. The wider wavelet filters can better match to the

characteristic features in a time series. Unfortunately, however, as the width gets wider,

(i) more coefficients are being unduly influenced by boundary conditions, (ii) there is some

decrease in the degree of localization of the DWT coefficients, and (iii) there is an increase

in computational burden. Thus one should search for the smallest L that gives reasonable

results. In practice, if one also wants to have the DWT coefficients be alignable in time,

the optimal choice is often LA(8).

Handling boundary conditions. The DWT uses circular filtering which means that the

time series is treated as a portion of a periodic sequence with period N . For financial

time series this is problematic since there is rarely evidence to support this assumption.

42

Furthermore, there may be a large discontinuity between the last and first observations.

The extent that circularity influences the DWT coefficients and corresponding MRA is

quantified in Percival and Walden (2000, pp. 145—9). Notice that the Haar wavelet yields

coefficients that are free of the circularity assumption.

As a minimal way of dealing with the circularity assumption, Percival and Walden

suggest to exactly indicate on plots the DWT coefficients and MRAs that are affected

by the boundary. However, they continue to argue that the influence of circularity can

be quite small, particularly when the discrepancy between the beginning and end of the

series is not too large. Thus the marked regions are usually quite conservative measures

of the influence of circularity. One way of reducing the impact of circularity is to reflect

the time series about its end point. The resulting series of length 2N has the same

mean and variance as the original series. This method eliminates the effects due a serious

mismatch between the first and last values. The cost is increased, but ”quite acceptable”,

computational burden.25

Handling sample sizes that are not a power of two. The ordinary DWT requires N to

be a power of two and the ”partial” DWT requires N to be an integer multiple of 2J0.

In reality, however, it rarely happens that the data at hand is of dyadic length or even

an integer multiple of it. There are some ad hoc methods for dealing with this problem.

The most obvious one is to truncate the series to the closest integer multiple of 2J0 but

then one needs to consider what choice J0 is reasonable. An easy alternative — familiar

from Fourier analysis — is to ”pad” the series with zeros or the sample mean. Note that

padding with the sample mean does not change the sample mean from the original series.

One could also pad by replicating a data value (typically the last one). Ogden (1997) has

compared various ways of preconditioning data not meeting the criteria of power of two.

He found no method to be clearly superior in every respect but dictated by the particular

application of interest. As one might suspect, though, extending the data by padding is

the easiest to implement and the least computationally expensive too. Ogden points out

that the wavelet coefficients resulting from preconditioned data should be used cautiously,

25There are also other ways to cope with circularity like polynomial extrapolation at both ends of the

time series and specially designed ”boundary wavelets” that are zero outside the range of the data (see

e.g. Bruce and Gao (1996)). These are however not considered in this thesis.

43

however. Padding by repeating the last value will introduce a flat artifact towards the end

of the interval causing new problems that remain even with very large samples.

In what follows, I will consider only DWTs for two reasons: (i) financial data are

inherently discrete, and (ii) discrete transforms are computationally less demanding than

continuous ones. This is also the standard way in time series applications.

5.2 Discrete wavelet transform

For a throughout description of the DWT, see Percival and Walden (2000, Ch. 4). For

a quick and ”dirty” treatment, see Gencay et al. (2002a, Ch. 4.4). My presentation

parallels with the latter because of space limitations. Notice that in practice the DWT is

calculated via Mallat’s (1989) pyramid algorithm (see e.g. Percival and Walden (2000)).

This algorithm has been nick-named the fast wavelet transform (FWT) because it

needs only O(N) (i.e., at most of order N) multiplications instead of the DWT’s O (N2)

multiplications. And quite remarkably, the FWT is even faster than the celebrated FFT

which demands O(N log2N) calculations. This computational efficiency makes the FWT a

good choice for analyzing very large data sets like the ones used in high-frequency finance.

Construction. An easy way to introduce the DWT is through a matrix operation.

Consider a dyadic length (i.e., N = 2J) column vector of observations x. The length N

column vector of discrete wavelet coefficients w is obtained via

w =Wx,

where W ∈ M(N ×N) is an orthonormal matrix defining the DWT (see App. B). Thematrix W is composed of the wavelet and scaling filter coefficients arranged on a row-by-

row basis. The structure of the resulting w and the matrix W may be seen through the

subvectors w1,w2, ...,wJ ,vJ and submatrices W1, ...,WJ ,VJ , respectively:

w =

w1

w2...

wJ

vJ

and W =

W1

W2

...

WJ

VJ

,

44

where wj is a length N/2j column vector of wavelet coefficients associated with changes

on a scale of length λj = 2j−1 and vJ is a length N/2J column vector of scaling coefficients

associated with averages on a scale of length 2J = 2λJ . Similarly, Wj ∈ M(N/2j × N)and VJ ∈M(N/2J ×N).As an illustration of the structure of the matrix W, consider a filter of length L = 2

and a signal of length N = 8. Then the matrix W1 ∈M(4× 8) is

W1 =

h1 h0 0 0 0 0 0 0

0 0 h1 h0 0 0 0 0

0 0 0 0 h1 h0 0 0

0 0 0 0 0 0 h1 h0

=h(2)1

h(4)1

h(6)1

h1

,

where h(k)1 , k ∈ {2, 4, 6}, is the vector of zero-padded unit scale wavelet filter coefficients

in reverse order, circularly shifted to the right by amount k.

Similarly, by letting h2 and h4 denote the vector of zero-padded scale 2 and 4 wavelet

filter coefficients, respectively, one can construct the matrix W2 ∈ M(2× 8) and the rowvector W3 ∈ M(1 × 8). In this case, the circular shift is by factors of 4 and 8 (i.e., nochange), respectively:

W2 =

h(4)2h2

and W3 = h3.

The matrix V3 ∈ M(1 × 8) is just a row vector whose elements are all equal to 1/√N

(Gencay et al. (2002a, p. 120)).

Of course one must be able to explicitly compute the wavelet filter coefficients for level

j = 1, ..., J to complete the construction of W. Given the FRFs of the unit scale waveletand scaling filters, it is possible to recover the wavelet filter hj,l for scale λj by the inverse

DFT of

Hj,k = H1,2j−1kmodN

j−2Yl=0

G1,2lkmodN , for k = 0, ..., N − 1.

The length of the resulting wavelet filter is Lj = (2j − 1)(L − 1) + 1. Similarly, one can

recover the scaling filter gJ for scale λJ by the inverse DFT of

GJ,k =J−1Yl=0

G1,2lkmodN , for k = 0, ..., N − 1.

45

(Gencay et al. (2002a, p. 121).)

Example 21 (Haar) In the case of L = 2 and N = 8, the matrix W is

W =

W1

W2

W3

V3

=

1√2− 1√

20 0 0 0 0 0

0 0 1√2− 1√

20 0 0 0

0 0 0 0 1√2− 1√

20 0

0 0 0 0 0 0 1√2− 1√

2

12

12−12−12

0 0 0 0

0 0 0 0 12

12

−12−12

1√8

1√8

1√8

1√8− 1√

8− 1√

8− 1√

8− 1√

8

1√8

1√8

1√8

1√8

1√8

1√8

1√8

1√8

.

Boundary effects. Clearly the number of affected coefficients grow as the level j and

length L grow. More precisely, Percival and Walden (2000, Ch. 4.11) show that the

number of the affected DWT coefficients is

L0j =§(L− 2) ¡1− 2−j¢¨ ,

where dxe is the smallest integer greater than or equal to x. Thus, for display purposes,it is possible to approximately line up the DWT coefficient vectors with the original time

series by circularly shifting the level j vector of DWT coefficients appropriately. Percival

and Walden (2000, Ch. 4.11) give a precise table of integer shifts for the least asymmetric

wavelet filter. The following heuristic can be used, however: If the number of coefficients

affected by the boundary is even, then place half of the of the boundary coefficients at

each end of the series. If, on the other hand, the number of coefficients affected by the

boundary is odd, then place the ”extra” coefficient at the beginning of the series. Notice

that shifting coefficients from the extremal phase wavelet filter is not as straigthforward

given its poor phase properties (see Sec. 4.5). (Gencay et al. (2002a, p. 145.))

Multiresolution analysis. An additive decomposition of a time series can be obtained

using the DWT by first defining the jth level wavelet detail

dj.=WT

j wj, for j = 1, ..., J,

46

which is associated with changes in x at scale λj. The wavelet coefficients wj = Wjx

represent the portion of the wavelet analysis attributable to scale λj. Thus WTj wj is the

portion of the wavelet synthesis attributable to scale λj. For a dyadic length N = 2J time

series, the final wavelet detail dJ+1 = VTJ vJ is equal to the sample mean of observations.Next define the jth level wavelet smooth as

sj.=

J+1Xk=j+1

dk for j = 0, ..., J,

where sJ+1 is defined to be a vector of zeros. In contrast to the wavelet detail dj which is

associated with variations at a particular scale, the wavelet smooth sj becomes smoother

as more details are summed. Indeed, it holds that x − sj =Pj

k=1 dk. The jth level

wavelet rough characterizes the remaining lower-scale details through

rj.=

jXk=1

dk, for j = 1, ..., J + 1,

where r0 is defined to be a vector of zeros. A time series x may then be decomposed as

x = rj + sj =

jXk=1

dk +J+1Xk=j+1

dk =J+1Xk=1

dk.

Variance decomposition. One of the most important properties of the DWT is its

ability to decompose the sample variance of a time series on a scale-by-scale basis. This

is possible because the DWT is an energy (variance) preserving transform:

kxk2 = xTx = (Ww)TWw = wTWTWw = wTw = kwk2 ,

where W is an orthonormal matrix defining the DWT. In another words,

kxk2 =N−1Xt=0

x2t =JXj=1

N/2j−1Xt=0

w2j,t + v2J,0 = kwk2 .

Given the structure of the wavelet coefficients, kxk2 is decomposed on a scale-by-scalebasis via

kxk2 =JXj=1

kwjk2 + kvJk2 ,

47

where kwjk2 is the energy (proportional to variance) of x due to changes at scale λj

and kvJk2 is the information due to changes at scales λJ and higher. Because W and Vare orthonormal matrices, one has dTj dj = wT

j wj for 1 ≤ j ≤ J and sTJ sJ = vTJvJ , so

alternatively,

kxk2 =JXj=1

kdjk2 + ksJk2 .

The theoretical counterpart of the variance decomposition in the context of stationary

long-memory processes will be discussed later (in Sec. 5.5).

5.3 Partial discrete wavelet transform

For a bit closer look at the Partial discrete wavelet transform (pDWT) than

presented below, see either Percival and Walden (2000, Ch. 4.7) or Gencay et al. (2002a,

Ch. 4.4.2). I will be rather brief here because the pDWT is a straightforward generalization

of the DWT. The pDWT offers more flexibility due the choice of a scale beyond which

a wavelet analysis into individual large scales is no longer of real interest. A practical

benefit of this is that the sample size no longer needs to be a of dyadic length. It is

enough that the sample size to be a multiple of 2J0 (the choice of J0 depends on the goals

of the analysis, of course).

Construction. The structure of the orthonormal matrix W is similar to the DWT:

W =

W1

W2

...

WJp

VJp

,

except that the matrix of scaling filter coefficients VJp ∈ M(N/2Jp × N) is a matrix ofcircularly shifted scaling coefficients vectors.

Multiresolution analysis. For a level J0 < J pDWT, one has the MRA

x =J0Xj=1

dj + sJ0 ,

48

where the details dj are related to changes on a scale of λj = 2j−1 and the smooth sJ0 to

averages of a scale of λJ0 = 2J0 (as above).

Variance decomposition. The energy decomposition is

kxk2 =J0Xj=1

kwjk2 + kvJ0k2

=J0Xj=1

kdjk2 + ksJ0k2 .

5.4 Maximal overlap discrete wavelet transform

For a throughout discussion of the maximal overlap discrete wavelet transform

(MODWT), see Percival and Walden (2000, Ch. 5). Once again, Gencay et al. (2002a, Ch.

4.5) give a compact introduction. I will mostly follow the latter because of the increased

complexity of the transform. Notice that in practice a pyramid algorithm similar to

that of the DWT is utilized (see Percival and Mojfeld (1997)). This algorithm requires

O(N log2N) multiplications, so it is computationally a bit heavier to execute than the

DWT, but still only as heavy as the FFT. The increased complexity stems from the fact

that the MODWT gives up orthogonality in order to gain features that the DWT does

not posses.26

The following properties distinguish the MODWT from the DWT (Percival andWalden

(2000, pp. 159—60)):

1. The MODWT can handle any sample size N , while the Jpth order pDWT restricts

the sample size to a multiple of 2Jp;

2. The detail and smooth coefficients of a MODWT multiresolution analysis are asso-

ciated with zero-phase filters;

3. The MODWT is invariant to circularly shifting the original time series;

4. The MODWTwavelet variance (to be defined in Sec. 5.5) estimator is asymptotically

more efficient than the same estimator based on the DWT.26The MODWT is also known as the ”stationary DWT”, the ”translation-invariant DWT” and the

”time-invariant DWT”.

49

Construction. Let x be a length N column vector of observations. The length (J+1)N

column vector of MODWT coefficients ew is obtained via

ew = fWx,where fW ∈ M((J + 1)N × N) is a non-orthogonal matrix defining the MODWT. Theresulting ew and the matrix fW consist of column subvectors ew1, ..., ewJ , evJ , each of lengthN , and submatrices fW1, ...,fWJ , eVJ ∈M(N ×N):

ew=

ew1ew2...ewJevJ

and fW =

fW1fW2

...fWJeVJ

,

where ewj is associated with changes on a scale of length λj = 2j−1 and evJ is associated

with averages on a scale of length 2J = 2λJ . For any positive integer J0, the level J0

MODWT of x is a transform consisting of the J0+1 vectors ew1, ..., ewJ0 and evJ0 which areall N-dimensional. The vector ewj contains the MODWT wavelet coefficients associatedwith changes on scale λj, while evJ0 contains the MODWT scaling coefficients associatedwith averages on scale λJ0 = 2J0. In the special case of a dyadic length time series,

the MODWT may be subsampled and rescaled to obtain the DWT wavelet and scaling

coefficients via

wj,t = 2j/2 ewj,2j(t+1)−1 and vJ,t = 2J/2evJ,2J (t+1)−1, for t = 0, ..., N/2j − 1.

In this case the DWT and MODWT filter coefficients are related in the following way:

instead of using the wavelet and scaling filters from the previous section, the MODWT

uses the rescaled filters

ehj = hj/2j and egJ = gJ/2J , for j = 1, ..., J.The construction of the submatrix fW1 is done by circular shifting the rescaled wavelet

filter vector eh1 by integer units. The submatrices fW2 and fW3 are formed similarly by

50

replacing eh1 by eh2 and eh3. Thus revisiting the case of L = 2 and N = 8 of Example 21,

one has

fW1 =

eh1 0 0 0 0 0 0 eh2eh2 eh1 0 0 0 0 0 0

0 eh2 eh1 0 0 0 0 0

0 0 eh2 eh1 0 0 0 0

0 0 0 eh2 eh1 0 0 0

0 0 0 0 eh2 eh1 0 0

0 0 0 0 0 eh2 eh1 0

0 0 0 0 0 0 eh2 eh1

=

eh(1)1eh(2)1eh(3)1eh(4)1eh(5)1eh(6)1eh(7)1eh1

.

Example 22 (Haar) Insert eh1 = −1/√8 and eh2 = 1/√8 in the above matrix fW1. The

matrix fW ∈M(32× 8) is too large to write down here.Boundary effects. The MODWT uses integer translates of the wavelet and scaling

filters, both of length Lj = (2j−1)(L−1)+1. This causes there to be a total of Lj wavelet

coefficients affected by the boundary at each j. The time-alignment property of an MRA

(Property 2) does not hold for the MODWT wavelet and scaling coefficients anymore

without a proper adjustment, however. Precise integer shifts for the least asymmetric

low-pass filter gj,l and high-pass filter hj,l are given by

ξgj =

− (Lj−1)(L−2)

2(L−1) if L2is even

− (Lj−1)L2(L−1) if L = 10 or 18

− (Lj−1)(L−4)2(L−1) if L = 14

,

ξhj =

−Lj

2if L

2is even

−Lj2+ 1 if L = 10 or 18

−Lj2− 1 if L = 14

,

respectively. (Gencay et al. (2002a, p. 145).)27

27An alternative definition of shifts for both extremal phase and least asymmetric wavelet filters is

provided by Hess-Nielsen and Wickerhauser (1996), but Gencay et al. (2002a, p. 145) argue the differences

be of minor importance.

51

Multiresolution analysis. A MODWT MRA can be written as

xt =J+1Xj=1

edj,t for t = 0, ..., N − 1,where edj,t is the tth element of the the jth level MODWT detail edj .= fWT

j ewj, for j =1, ..., J. The MODWT wavelet smooth and rough are, respectively,

esJ,t = J+1Xk=j+1

edk,t and erj,t = jXk=1

edk,t, for t = 0, ..., N − 1.Importantly, although MODWT is not an orthonormal transformation, the MRA

x =J0Xj=1

edj +esJ0 , (14)

where esJ0 .= VTJ0evJ0 is the J0 level MODWT smooth, still holds true. This is useful in

practice because as stated above, features in the original time series are aligned with the

wavelet details and smooth without an adjustment.

Variance decomposition. In order to retain the variance preserving property of the

DWT, the wavelet and scaling coefficients must be rescaled properly as seen above. But

although the MODWT is capable of producing a scale-by-scale analysis of variance upon

the energy decomposition (Percival and Mojfeld (1997)),

kxk2 =J0Xj=1

kewjk2 + kevJ0k2 , (15)

energy preservation does not hold for the MODWT details and smooths in general:

kxk2 6=J0Xj=1

°°°edj°°°2 + kesJ0k2 .This is because the MODWT is not an orthonormal transformation. Percival and Walden

(2000, Ch. 5.3) show for example that°°°ed1°°°2 ≤ kew1k2 . Thus when using the MODWT,

one is restricted to analyzing the wavelet and scaling coefficients in order to quantitatively

study the scale-dependent variance properties.

52

5.5 Wavelet variance

As seen above, the DWT and the MODWT can decompose the sample variance of a time

series on a scale-by-scale basis. A wavelet based analysis of variance is sometimes called

a wavelet spectrum. Such a spectrum may be of interest for several reasons (Percival

and Walden (2000, p. 296)):

1. a scale-by-scale decomposition of variance is useful if the phenomena consists of

variations over a range of different scales;

2. wavelet variance (to be defined below) is closely related to the concept of spectral

density function (SDF, Fourier spectrum);

3. wavelet variance is a useful substitute for the variance of a process for certain pro-

cesses with infinite variance.

Consider a discrete parameter real-valued stochastic ARFIMA process {Xt} (see App.C) whose dth order backward difference Yt is a stationary process with mean µY (not

necessarily zero). Then a Daubechies wavelet filter ehl of width L ≥ d results in the jthwavelet coefficient process

wj,t.=

Lj−1Xl=0

ehj,lXt−l,being a stationary process (for a stationary process any L would suffice). Now define the

(time-independent, global) wavelet variance for {Xt} at scale λj to be

ν2X(λj).= V {wj,t} ,

which represents the contribution to the total variability in {Xt} due to changes at scaleλj. Then by summing up these time-scale specific wavelet variances, one gets the variance

of {Xt}: ∞Xj=1

ν2X(λj) = V{Xt}. (16)

The wavelet variance is well defined for both stationary and non-stationary processes with

stationary dth order backward differences as long as the width L of the wavelet filter is

large enough. In the non-stationary case the sum of the wavelet variances diverges to

53

infinity. An advantage of the wavelet variance is that it handles both types of processes

equally well. (Percival and Walden (2000, Ch. 8.2).)

The wavelet coefficients affected by the boundary warrant special attention. By taking

into account only coefficients that are not affected by the periodic boundary conditions,

an unbiased estimator of ν2X(λj) is

eν2X(λj) = 1

Mj

N−1Xt=Lj−1

ew2j,t,whereMj

.= N−Lj+1 > 0 and ewj,t .=PLj−1

l=0ehj,lXt−l modN are the (periodically extended)

MODWT coefficients. Furthermore, if a sufficiently long wavelet filter is used, i.e. if L > 2d

(or if µY = 0), then E {wj,t} = 0, which in turn implies that

ν2X(λj) = E©w2j,tª= E

©ew2j,tª ,where the last equality follows from using coefficients not affected by the boundary. If the

sample mean over all possible t would be used, then in general one would get a biased

estimator of ν2X(λj). (Percival and Walden (2000, Ch. 8).)

Now consider {Xt} to be a stationary process with SX(f) defined over the frequencyinterval [−1/2, 1/2]. A fundamental property of the Fourier spectrum is thatZ 1/2

−1/2SX(f)df = V{Xt},

i.e. the SDF decomposes the variance of a series across different frequencies (Percival

and Walden (2000, p. 296)). On the other hand, by Equation (16) the wavelet spectrum

decomposes the variance of a series across different scales. Since there is a close (although

fragile) relationship between frequency and time-scale (see Sec. 3.2.2), it is then no surprise

that the estimates of the wavelet variance can be turned into SDF estimates. The band-

pass nature of the MODWT wavelet filter implies that

ν2X(λj) ≈ 2Z 1/(2j∆t)

1/(2j+1∆t)

SX(f)df,

where SX is the SDF estimated by the squared magnitude of the coefficients of the DFT,

called periodogram,

bSX(fk) = 1

N

¯¯N−1Xt=0

Xte−2πfkt

¯¯2

,

54

and fk = k/N denotes the kth Fourier frequency, k = 0, ..., bN/2c. This approximationimproves as the width L of the wavelet filter increases because then ehj,l becomes a betterapproximation to an ideal band-pass filter. In fact, if the filter is wide enough, one can es-

timate SX using piecewise constant functions over each interval [1/2j+1∆t, 1/2j∆t]. In the

case of long-memory, however, this approximation underestimates the lowest frequencies

(see Percival and Walden (2000, Ch. 8.5)).

The SDF can be used to construct a confidence interval for ν2X(λj). Assuming that

{wj,t} is a Gaussian process, then for largeMj the random variable eν2X(λj) is approximatelyGaussian distributed with mean eν2X(λj) and variance 2Aj/Mj, where Aj

.=R 1/2−1/2 S

2j (f)df

provided that Aj is finite and Sj(f) > 0 (almost everywhere). The downside of this con-

truction is that the confidence interval may have a negative lower limit which is problematic

when plotting wavelet variance estimates in a double-logarithmic scale. Furthermore, an

incorrect Gaussian assumption will produce too narrow intervals that do not reflect the

true variability of the point estimate.28

It is well known that the periodogram is an inconsistant estimator of the Fourier

spectrum (see e.g. Priestley (1992, p. 425)). Likewise, the popularly used GPH-estimator

(Geweke and Porter-Hudak (1983)), based on an ordinary least squares (OLS) regression of

the log-periodogram for frequencies close to zero, is in general an inconsistant estimator of

the long-memory parameter from a fractionally integrated process with |d| < 1/2. Otherasymptotic properties of this estimator are problematic too (see Hurvich and Beltrao

(1993) and Robinson (1995)).29 Using the fact that wavelet variance is a regularization

of the Fourier spectrum (so that those scales that contribute the most to the variance of

the series are associated with those coefficients with the largest variance), Jensen (1999)

showed that an OLS-based wavelet estimator is consistant when the sample variance of

the wavelet coefficients is used in the regression. Specifically, using the wavelet variance

28There does exist other more complex ways of constructing confidence intervals. Confidence intervals

could for example be constructed using an ”equivalent degrees of freedom” argument and a chi-squared

distribution (Percival and Walden (2000, pp. 336—7)) or multitaper spectrum estimation (Serroukh et al.

(2000)). These are not considered in this thesis however.29However, in the case of 0 < d < 1/2 and under certain regularity conditions — Gaussianity, in particular

— Robinson (1995) has proven that the GPH-estimator is consistant and asymptotically Gaussian. The

problem then is that volatility is not distributed normally (but exponentially).

55

of the DWT coefficients wj,t,

ν2X(λj) =1

2j

2j−1Xk=0

w2j,k, (17)

one has that

V {wj,t} = ν2X(λj)→ σ22j(2d−1),

as j → ∞ (here σ2 is a finite constant). If a large number of wavelet coefficients are

available for scale j, then the sample wavelet variance provides a consistent estimator of

the true wavelet variance (Jensen (1999, p. 22)). Thus, by taking logarithms on both

sides, one obtains the (approximate) log-linear relationship

log ν2X(λj) = log σ2 + (2d− 1) log 2j, (18)

from which the unknown d can be estimated consistently by the OLS-regression by re-

placing ν2X with its sample variance ν2X of Equation (17). The asymptotic variance of the

estimator of d was derived by Jensen, too. He also found that the (negative) bias found in

this estimator is offset by its low variance. In mean square error (MSE) sense the wavelet

OLS-estimator fared significantly better than the GPH-estimator.30

Wavelet variance can be defined also locally. But unlike in Jensen (1999) where all

wavelet coefficients were used in calculating the wavelet variance, now only those ”close”

to the time point t are used. Given L > 2d(u), an unbiased estimator of local wavelet

variance for {Xt} at scale λj based upon the MODWT is

eν2X(u,λj) = 1

Kj

τj+KjXs=τj

ew2j,t+s,T , (19)

where u represents a time point in the rescaled time domain [0, 1] (i.e. u = t/T ), Kj

is a ”cone of influence”, and τ j is an ”offset” (Whitcher and Jensen (2000, p. 98)). In

principle, Kj (the central portion of a filter) includes only those wavelet coefficients where

the corresponding observation made a significant contribution. This is motivated by the

fact that the width of the filter L is not a very good measure of the effective width of

30To further reduce the MSE of bd, a weighted least squares estimator could be applied (see e.g. Abry etal. (1993) and Abry and Veitch (1998)). In particular, Percival and Walden (2000, Ch. 9.5) have shown

via simulation that weighting reduces the MSE by a factor of two in comparison to the OLS estimator.

56

Table 1: Cones of influence and offsets.

Level j KHaarj K

LA(8)j τ

LA(8)j

1 2 3 (8) 2

2 4 9 (22) 6

3 8 19 (50) 14

4 16 38 (106) 31

5 32 76 (218) 64

6 64 150 (442) 131

7 128 299 (890) 264

8 256 597 (1786) 531

9 512 1192 (3578) 1065

10 1024 2383 (7162) 2132

the filter because coefficients around l = 0 and l = Lj − 1 are very close to zero and thusdo not significantly contribute to the calculation of the wavelet coefficient (Percival and

Walden (2000, p. 103)). A slight inconvenience in using Kj is that it varies across scales

and different filters. The tabulated values for Daubechies family of wavelets are given in

Whitcher and Jensen (2000). Also the values of ”offsets” τ j for each wavelet filter L > 2

are needed to indicate where the width Kj begins (given in Whitcher and Jensen). For

the relevant part, these tables are reproduced here (see Table 1, where the numbers in

parentheses are the lengths Lj of the scale λj wavelet filter). Notice that the offset τHaarj

for the Haar is zero for all levels since KHaarj = LHaarj = 2j.

Whitcher and Jensen (2000) have shown that when the MODWT is being applied

to a locally stationary (in the sense of Dahlhaus (1996, 1997); see App. D) long-

memory process {Xt,T}, then the level-j MODWT wavelet coefficients {ewj,t,T} form a

locally stationary process with mean zero and time-varying variance

V {ewj,t,T} = ν2X(u,λj)→ σ2(u)2j[2d(u)−1],

as j →∞ (the expression for σ2(u) is given in Whitcher and Jensen). Thus, analogously

to Equation (18),

log ν2X(u,λj) = log σ2(u) + [2d(u)− 1] log 2j, (20)

57

from which the unknown d(u)’s can be estimated consistently by the OLS by replacing

ν2X by its time-varying sample variance eν2X from Equation (19). Gencay et al. (2002a, p.

172) argue that the parameter estimation for a non-stationary long-memory time series

model through the OLS ”should benefit greatly” from wavelet-based methods.

Using simulations, Whitcher and Jensen (2000) showed that the median of bd(u) ac-curately estimates the true value of the fractional differencing parameter (with a slight

negative bias near the boundaries) in the case of globally stationary ARFIMA. Because less

information is used to construct the local estimator than the global one, bd(u) also exhib-ited a slight increase in its MSE. Importantly, when the ARFIMA process was disturbed

by a sudden shift in the long-memory parameter to imitate local stationarity, the esti-

mated fractional differencing parameter still performed well (on both sides of the change)

although with a slight bias and increase in MSE at the boundaries.

6 Volatility modeling

Volatility, interpreted as uncertainty, is one of the key variables in most models in modern

finance.31 The explosive growth in derivative markets and the recent availability of high-

frequency data have only highlighted its relevance. In option pricing, for example, volatility

of the underlying asset (which may be volatility itself, by the way) must be known from now

until the option expires as accurately as possible. In financial risk management, volatility

forecasting has even become compulsary after the ”1996 Basle Accord Amendment” which

sets the minimal capital requirements in banks. More generally, periods of high uncertainty

have economically paralyzing consequences. Just consider the terrorist attack in New York

on September 11 (2001). In a way, therefore, volatility estimates can be considered ”as a

barometer for the vulnerability of financial markets and the economy” (Poon and Granger

(2003, p. 479)).

31Volatility is not the same as risk, however. In particular, risk is usually associated with small or

negative returns (the so-called ”downside risk”) whereas most measures of dispersion (e.g. standard

deviation) make no such distinction. Furthermore, standard deviation is a useful risk measure only when

it is attached to a distribution or a pricing dynamic. For further details on the conceptual differences

between volatility, risk, and standard deviation, see Poon and Granger (2003, Sec. 2.1).

58

In this section some of the most popular measures of volatility and types of models

are reviewed. There are basicly two strands of volatility models, those assuming that

conditional variance depends on past values (i.e. observation driven models) and those

assuming that conditional variance is stochastic (made precise later). Although this section

starts with the former approach, most attention is paid to the latter. Poon and Granger

(2003) provide an extensive up-to-date review of different types of volatility models used

for forecasting in financial markets.

6.1 Measures of volatility

It is an indisputable stylized fact that volatility tends to cluster so that the variance is

time-varying and shows persistant behavior. A systematic search for the causes of serial

correlation in conditional second moments is in its infancy, though. Diebold and Nerlove

(1989) discuss the possibility of a serially correlated news arrival process as the generating

mechanism for which Engle et al. (1990) find some evidence. Using the mixture-of-

distribution hypothesis (see Clark (1973)), Andersen and Bollerslev (1997b) show that

long-memory features of volatility (the slowly decaying autocorrelation function) may in-

deed arise through the interaction of a large number of heterogenous information arrivals.

Such a finding is important because then long-memory characteristics reflect inherent prop-

erties of the DGP, rather than structural shifts as suggested for example by Lamoureux

and Lastrapes (1990b).

The basic paradigm in volatility estimation is that ”volatility” (now approximated by

the square of returns rt) can be decomposed into predictable and unpredictable compo-

nents via

rt = σtεt,

where εt are IID disturbances with mean 0 and variance 1. By definition, then, the pre-

dictable component is the conditional variance σ2t of a series.32 In a seminal paper, Engle

(1982) proposed that the conditional variance depends linearly on the past squared values

32The determinants of the predictable part are of special interest in finance because the risk premium

is a function of it.

59

of the process,

σ2t = σ2 +

qXk=1

αkr2t−k;

the well-known ARCH(q) model. Bollerslev (1986) generalized this to the parsimonious

GARCH(p, q) model,

σ2t = σ2 +

pXj=1

βjσ2t−j +

qXk=1

αkr2t−k,

where the volatility is a linear function of both lagged squares of returns and lagged volatili-

ties. Bollerslev showed that this equation defines a second-order stationary solution if (and

only if)Pp

j=1 βj +Pq

k=1 αk < 1 (and σ2 > 0). In practice, GARCH(1, 1) often suffices.

Moreover, it is usually found (in long time series, at least) that the estimated parameters β

and α sum close to one suggesting the presence of a ”unit root” in the volatility equation.

This so-called ”Taylor-effect” (Taylor (1986)) has commonly been interpreted as evidence

of volatility persistance (see e.g. Poterba and Summers (1986)) although it has faced

a considerably amount of criticism lately (see e.g. Mikosch and Starica (2004)). In any

case, Engle and Bollerslev (1986) extended the model to integrated GARCH (IGARCH).33

Counterintuitively, the (approximate) autocorrelation function for IGARCH(1, 1) is expo-

nentially (not hyperbolically) decreasing, indicative of ”short-memory” although the effect

of a shock to expectation is permanent (see Ding and Granger (1996)).

A more succesful (in the stock markets, at least) extension of GARCH is to let σ2t be

an asymmetric function of the past data. To model the stylized fact that in stock markets

volatility is negatively correlated with lagged returns — the so-called leverage-effect (first

noted by Black (1976)) — Nelson (1988) proposed exponential GARCH (EGARCH) that

model the logarithm of the variance log σ2t . Some more flexibility is achieved by applying a

”fractional differencing operator” d (a parameter that is going to play a big role later; see

App. C), resulting in fractionally integrated EGARCH (FIEGARCH) model (Bollerslev

and Mikkelsen (1996)). This model nests the conventional EGARCH for d = 0. In

general, a fractional generalization has proved to be empirically useful in modeling long-

33Here the prefix ”integrated” does not imply non-stationarity as in the case of random walk, however

(proved in Bougerol and Picard (1992)). Strict stationarity still holds but because the marginal variance

of rt is infinite, weak stationarity does not (Gourieroux and Jasiak (2001, Ch. 6.2.4)).

60

term dependence in conditional variances (see e.g. Vilasuso (2002)).34 This is mainly

because the weakly stationary FIEGARCH succeeds in modeling the slow hyperbolic rate

of decay of a shock to the forecast of log σ2t+T if 0 < d < 1/2, thus capturing the observed

”long-memory” in volatility. Of course there exist numerous other extensions, too. For

some of the most popular ones, see Bollerslev et al. (1992, 1994), Hentschel (1995), and

(in the multivariate case) Kroner and Ng (1998).35

Several other type of measures have been used to approximate volatility. It is for

example known that normalized squared or absolute returns over an appropriate horizon

provide an unbiased estimate for volatility. Although the majority of time series volatil-

ity models are squared returns models (as above), absolute returns based models seem

to produce better volatility forecasts in practice (see e.g. Taylor (1986) and McKenzie

(1999)). Indeed, it seems that the long-memory property is strongest for absolute returns

(in stock markets, at least). For example Ding et al. (1993), Ding and Granger (1996),

and Lobato and Savin (1998) suggest measuring volatility directly from absolute returns.36

Absolute returns are also relatively outlier-resistant compared to squared returns. In par-

ticular, log-squared returns suffer from an ”inlier” problem because a return very close

to zero generate a large negative number. Furthermore, Wright (2000) has demonstrated

that squared returns result in a large downward bias when using semiparametric methods

to estimate long-memory in the context of conditionally heavy-tailed data such as stock

returns (which does not occur with absolute returns). The downside is that absolute (as

well as squared) returns over longer horizons (days, say) provide a very noisy estimate

for volatility (e.g. Andersen and Bollerslev (1997)). By simply taking the average over

a fixed horizon (as e.g. in Poterba and Summers (1986)) does not seem to be a satisfac-

tory solution. Although standard time series techniques could then be applied to assess

34This generalization corresponds to the generalization of the standard ARIMA class of models to

fractionally integrated ARMA models that model long-term dependence in mean.35An estimate of volatility in these models is attained via (quasi) maximum likelihood or via generalized

method of moments.36The use of squares is most likely a reflection of the Gaussian assumption made regarding the data

(McKenzie (1999, p. 50)). The error distribution of stock market returns is not Gaussian however and

therefore higher than the second moment must be considered. Davidian and Carroll (1987) have shown

that absolute returns specification is more robust against asymmetry and non-normality.

61

the temporal dependence, this two-stage procedure is subject to the following criticism

(see Bollerslev et al. (1992, pp. 17—18)): First, it does not make efficient use of all the

data and the conventional standard errors from the second-stage estimation may not be

appropriate. Second, there is the possibility that the actual parameter estimates may be

inconsistant. Third, this kind of a procedure may lead to misleading conclusions about

the true underlying dependence in the second-order movements of the data. It seems that

only by increasing the sampling frequency can the noise be reduced in an appropriate way.

This has become possible in finance after the availability of high-frequency data. Blair et

al. (2001) have for example reported a significant increase in forecasting ability for 1-day

ahead forecast when intraday 5-minute squared returns are used instead of daily ones.

Another interesting approach is to extract volatility from options prices. This method

uses a potentially richer information set and could therefore lead to improved forecasting

performance. For example, the traditional Black—Scholes formula can be inverted to give

an estimate of volatility under the assumption of a constant variance. Engle and Mustafa

(1992), among others, have also succesfully considered the case of ARCH-volatility in

this setting. Unfortunately, when using stochastic volatility (to be discussed in the next

subsection) several complications arise (see e.g. Wiggins (1987) and Melino and Turnbull

(1990)). Moreover, on practical side, not every asset of interest have actively traded

options and implied volatilities derived from frictionless market models may be affected

by institutional factors distorting the time series analysis (Fung and Hsieh (1991)). Option

markets may just not be sufficienty developed to allow for meaningful variations in intraday

implied volatility to be derived (Goodhart and O’Hara (1997)).

Yet another method is the use of historical ”highs and lows” as in Parkinson (1980)

who assumes that prices follow a Brownian motion with constant variance. Garman and

Klass (1980) derive several efficient estimators of volatility using highs, lows, opening,

closing, and transactions volume. Beckers (1983) test the accuracy of these estimators and

suggest an adjustment which could even be improved upon by including implied variances.

However, the generalization of these ideas to other stochastic processes allowing for time-

varying variances is not straightforward (Bollerslev et al. (1992, p. 19)).

Theoretically the most attractive way of measuring volatility seems to be the sum of

62

short-term intraday squared (or absolute) returns of a predetermined horizon (usually a

day), the so-called realized volatility (Fung and Hsieh (1991) and Andersen and Bollerslev

(1997a)). This measure is based on Merton’s (1980) seminal idea that the variance of

returns can be estimated far more accurately from the available time series of realized

returns than can the expected return.37 For example Schwert (1989) has used this ap-

proach to estimate monthly volatility from daily returns. Technically speaking, realized

volatility is a consistant estimator of the 1-day integrated volatility under the assumption

of a continuous-time diffusion. In theory the stochastic error of the measure can be re-

duced arbitrarily by increasing the sampling frequency of returns. Empirically, Andersen

and Bollerslev (1997a) have shown that realized variance takes the beloved ARCH mod-

els ”back into business” in the sense that they again seem to serve as good forecasting

devices (which has been put into serious doubt lately). The problem is however that one

does not observe the price continuously. The small time intervals are also contaminated

by microstructure effects such as ”bid-ask bounce” (see e.g. Roll (1984)) and volatility

seasonalities (see e.g. Andersen and Bollerslev (1997a)). Thus the precision to which one

can measure the volatility using high-frequency data depends on the characteristics of the

return series analyzed (Bai et al. (2001)). And finally, the realized variance approach is

computationally quite expensive.

6.2 Stochastic volatility and long-memory

Most of the modern financial theory is based on continuous-time semimartingales (see e.g.

Shiryaev (1999)). In particular, stochastic volatility (SV) models of the form

dX(t) = µ(t)dt+ σ(t)dW (t)

belong to this family (here the drift µ(t) and the instantaneous standard deviation σ(t)

are time-varying random functions and W (t) is a standard Brownian motion). Many

specifications for σ(t) are available (see Taylor (1994)). It may be, for example, that the

logarithm of the volatility follows an Ornstein—Uhlenbeck process (as in Wiggins (1987)).

37Recall that for a random walk, for example, a minimal exhaustive statistic for volatility is essentially

given by the full set of increments (Corsi et al. (2001)).

63

See Ghysels et al. (1995) for a good review. Although continuous-time models are elegant

to work with, in practice one settles for a discrete model, however. A discrete-time SV

model may be written as

yt = σtεt,

where yt denotes the demeaned return process yt = log(St/St−1)−µ, {εt} is a series of IIDrandom disturbances with mean 0 and variance 1, and the conditional variance {σ2t} ismodeled as a stochastic process {log σ2t} .= {ht}. The modeling of volatility as a stochasticvariable immediately leads to heavy tailed distributions for returns (Poon and Granger

(2003, p. 485)). Here the logarithm ensure that {σ2t} is always positive but it is notdirectly observable. Furthermore, {ht} is independent of {εt}. Notice that the model for{yt} is usually represented in the form

yt = σ exp(ht/2)εt,

where the scale parameter σ > 0 removes the need for the constant term γ in the first-order

autoregression.38

Naturally there exist many specifications for the volatility scheme {ht} such as ARMAor random walk. From the point of view of financial theory, a particularly attractive and

simple model for {ht} is an AR(1)-process ht = γ+φht−1+ηt, where ηt ∼ IID(0,σ2η), and|φ| < 1 ensures that {ht} (and hence {yt}) is strictly stationary.39 Such an autoregressiveterm introduces persistance (i.e. volatility clustering). In general, SV models are more

flexible than GARCH models because of the extra volatility noise term ηt in the volatility

equation. For example, the simple ARSV(1) specification has been shown by Carnero et

al. (2001) to be empirically more adequate than the most popularly used GARCH(1, 1).

Carnero et al. also demonstrate that ARSV(1) produces smoother volatility estimates.

Recently, a long-memory stochastic volatility (LMSV) model proposed in Breidt et al.

(1998) (and in Harvey (1998)) has caught a lot of attention. In their model log-volatility

38The estimation of SV models is however notoriously difficult and usually done by variants of the

method of moments (as in Melino and Turnbull (1990), for example). For a survey of estimation methods

for stochastic volatility models, see Broto and Ruiz (2002).39This ARSV(1) specification (proposed in Taylor (1986)) is attractive because an AR(1)-process is the

natural discrete-time approximation to a continuous-time Ornstein—Uhlenbeck process.

64

{ht} is generated by fractionally integrated Gaussian noise,(1−B)dht = ηt,

where |d| < 1/2 and ηt ∼ NID¡0,σ2η

¢. More generally, {ht} can be modeled as an

ARFIMA(p, d, q) process,

φ(B)(1−B)dht = θ(B)ηt, (21)

where φ(z) = 1 − φ1z − ... − φpzp for |z| ≤ 1 is an autoregressive polynomial of order p,

θ(z) = 1 + φ1z + ...+ φqzq is a moving average polynomial of order q, both φ(z) and θ(z)

have all of their roots outside the unit circle, and θ(z) has no roots in common with φ(z).

Notice that this model encompasses a ”short-memory” model when d = 0. Breidt et al.

argued the LMSV model to have certain advantages over observation driven models (e.g.

FIEGARCH). For example, because it is built from the widely used ARFIMA class of long-

memory models, LMSV inherits most of the statistical properties of ARFIMA models and

is therefore analytically tractable. Even the limiting distribution of the GPH-estimator

of d has been derived (see Velasco (1999) and Deo and Hurvich (2001)). Moreover, the

estimation of d is not crucially dependent on the choice of a unit discrete time interval

(Bollerslev and Wright (2000, p. 87)): although the LMSV (like ARFIMA) model is not

closed under temporal aggregation, the rate of decay of the autocovariance function of

squared (or absolute) returns is invariant to the length of the return interval.

The LMSV model is still a stationary model, however. Thus it ignores ”the known

intraday volatility patterns and the irregular occurances of market crashes, mergers and

political coups” as Jensen and Whitcher (2000) note. In particular, the long-memory

parameter d may not be constant over time. This motivated Jensen and Whitcher to

introduce a non-stationary class of long-memory stochastic volatility models with time-

varying parameters. In their model, the logarithmic transform of the squared returns is a

locally stationary process that has a time-varying spectral representation (see App. D).

This means that the level of persistance associated with a shock to conditional variance

(which itself is allowed to vary in time) is dependent on when the shock takes place.

The shocks themselves, of course, still produce responses that persist hyperbolically. In

specific, Jensen and Whitcher defined yt,T to be

yt,T = exp (Ht,T/2) εt,

65

Φ(t/T,B)(1−B)d(t/T )Ht,T = Θ(t/T,B)ηt,

where |d(u)| < 1/2, εt ∼ NID(0, 1), and ηt ∼ NID(0,σ2η) are independent of each other.The functions Φ(u,B) and Θ(u,B) are, respectively, order p and q polynomials whose

roots lie outside the unit circle uniformly in u and whose coefficients functions, φj(u),

for j = 1, ..., p, and θk(u), for k = 1, ..., q, are continuous on R. The coefficient functions

satisfy φj(u) = φj(0), θk(u) = θk(0) for u < 0, and φj(u) = φj(1), θk(u) = θk(1) for

u > 1, and are differentiable with bounded derivatives for u ∈ [0, 1]. Notice that by settingΦ(u,B) = Φ(B), Θ(u,B) = Θ(B), and d(u) = 0 for all u ∈ [0, 1], one gets the SV modelof Harvey et al. (1994). If, on the other hand, one sets d(u) = d for all u ∈ [0, 1], one getsthe LMSV model (Eq. (21)).

7 Empirical analysis

This section describes in practice how wavelet methodology gives additional insight into

volatility dynamics by time-scale decomposition. Wavelet variance at different time-scales

is related to each other to uncover possible differences among players on the market during

an IT-bubble period and its aftermath. The global and local scaling laws also provide a

consistent estimate of long-memory in volatility. Finally, the effect of volatility periodicity

on these results is studied.

7.1 Data description

The original data set included all stock transactions done at the Helsinki Stock Exchange

(HEX) between January 4 (1999) and December 30 (2002), i.e. it was so-called ”tick-

by-tick” data. Because of its highest liquidity, the stock of Nokia Oyj was chosen and

the data were discretized: 5-minute prices were extracted using the closest transaction

price to the relevant time mark.40 Discretizing is necessary for the wavelet decomposition

40The HEX is by far the most liquid market place trading Nokia: In year 2003, the HEX accounted for

62.1% of the total number of shares traded while the percentage for New York Stock Exchange (NYSE)

was only 20.3% (the HEX (May 4, 2004)). At NYSE (and NASDAQ), Nokia has the largest trading

volume ($1.5 million) of cross-listed non-U.S. companies (Citigroup (June 23, 2004)).

66

to be interpretable in terms of time-scales that capture a band of frequencies (as it is

necessary in spectral analysis, too). From a theoretical perspective discretizing can be

justified by assuming that the DGP does not vary significantly over short time intervals.41

To minimize microstructure effects, one could have also used the last transaction before

the relevant time mark (a method originally introduced by Wasserfallen and Zimmermann

(1985) and used in Hol and Koopman (2002), among others) or linearly interpolated

price (introduced by Andersen and Bollerslev (1997c)) but because of occasional liquidity

problems the closest one was considered the best compromise.42 This particular choice

should not have any significant effect to the conclusions of the subsequent analysis however.

The interval of 5 minutes has been used in many earlier studies (e.g. Andersen and

Bollerslev (1998)). It has been found ”optimal” in the sense that it is often the smallest

interval that doesn’t suffer too badly from effects such as ”bid-ask bounce” (see Campbell

et al. (1997, Ch. 3) or Gourieroux and Jasiak (2002, Ch. 14)). Concerning missing

observations for a specific time mark (such as technical breaks and incomplete trading

days), the previous price standing was always used. The 5-minute returns were then

calculated as the scaled difference between successive log-prices, i.e.,

rt,n = 100 (lnPt,n − lnPt,n−1) ,

where rt,n denotes the return for intraday period n on trading day t, with n ≥ 1 and

t = 1, ..., T. Notice that the prices Pt,n were adjusted for splits but not for dividends.

This is because there were only four divident paying days in the whole four year period

1999—2002 and their impact was very small. As a general rule, the empirical analysis was

done including overnight returns unless otherwise mentioned (like in Sec. 7.6). Finally,

the so-called ”block trades” were not removed, thus possibly causing a few artificially

generated jumps per month. Their impact is considered insignificant for the type of

analysis conducted.

41The statistical behavior of the sampled data could however differ significantly from the behavior of

the DGP from which the sample was obtained. In the current context this principle might manifest itself

through the so-called ”non-synchronous trading” effect and possibly induce negative serial correlation (see

Campbell et al. (1997, Ch. 3.1) and Lo and MacKinlay (1999, Ch. 4)).42For some other ways to deal with the ”sporadic nature” of trading, see the discussion in Goodhart

and O’Hara (1997) or Dacorogna et al. (2001, Ch. 3.2.1).

67

Table 2: Different data periods available.

Time period Trading (I) AMT (I) Trading (II) AMT (II)

1/4/99− 8/31/00 10:30—17:30 17:30—18:00 - 9:00—9:30

9/1/00− 4/11/01 10:00—18:00 18:00—18:15 - 8:30—9:00

4/17/01− 3/27/02 10:00—18:00 18:03—18:30 18:03—21:00 8:30—9:00

4/2/02− 12/30/02 10:00—18:00 18:03—18:30 18:03—20:00 8:30—9:00

At the HEX an electronic trading system called Helsinki Stock Exchange Automated

Trading and Information System (HETI) has been in use since 1990. This means that

there is no ”floor” but brokers trade electronically, the smallest ”tick-size” (i.e. price

change) being 0.01.43 As a general rule, all banking days are market days at the HEX.

From the point of view of data handling, one of the main problems were that the HEX

did not have constant trading hours during the four years. In fact, the trading day was

first extended to include evening hours, but then this trend was reversed. These changes

were mainly caused by an international pressure towards harmonization of exchange open

hours. For example, the long-run trend of longer trading days was suppressed by the weak

market conditions during the last few years.44 Therefore four different time periods were

available (see Table 2).

In Period I, from January 4 (1999) to August 31 (2000), continuous trading took place

between 10:30 a.m. and 5:30 p.m., totaling to 7 hours and 85 intraday 5-minute prices.

Transactions between 8 and 10:30 a.m. were discarded, most of them belonging to the

43Several types of market places are possible (see e.g. Gourieroux and Jasiak (2001, Ch. 14.1)). This

is important to acknowledge since different systems may affect the dynamics of price differently. So to be

precise: the HEX is a continuous, order-driven limit-order-book market place with a call auction at the

market opening. For comparison, the NYSE is an order-driven, floor-based, continuous market with a

specialist (acting as the market maker). One general advantage of a continuous market is that it provides

good intraday market information.44In the beginning of year 2004, for example, the evening trading hours at the HEX formed only around

6% of the total daily trading volume. The HEX is going to cut its trading hours to 10:00—18:20 when

joining the SAXESS-system in September, 2004. A similar cut was carried out in Deutsche Borse (from

9:00—20:00 to 9:00—17:30) in the beginning of November 2003.

68

Day 06/12/01 (17:55:53-18:53:53)

Ret

urn

0 200 400 600 800

-20

-10

010

20

Day 09/11/01 (17:58:38-18:25:18)

Time

Ret

urn

0 100 200 300 400 500

-20

-10

010

20Day 09/14/01 (17:56:13-18:25:12)

0 100 200 300 400 500

-10

-50

510

Day 04/18/02 (17:58:45-18:31:02)

Time

0 200 400 600 800

-15

-50

510

15

Figure 7: Four examples of trading days that experienced extremely high return variability

during the AMT (I), 6:03—6:30 p.m (all transactions included).

69

after market trading II (AMT (II)) taking place between 9 and 9:30 a.m.45 Likewise,

transactions between 5:30 and 6 p.m. were discarded because they belonged to the AMT

(I). Only one day, April 20 (2000), was an incomplete day. In total, there were 419 trading

days resulting in 35,615 (= 419 ∗ 85) price observations (i.e. 35,614 return observations).In Period II, from September 1 (2000) to April 11 (2001), continuous trading was

extended from both ends by half an hour.46 Thus trading took place between 10 a.m. and

6 p.m., totaling to 8 eight hours and 91 intraday 5-minute prices. December 12 (2000)

was an incomplete trading day. In total, there were 155 trading days resulting in 14,105

price observations.

In Period III, from April 17 (2001) to March 27 (2002), continuous trading was ex-

tended further by including evening hours from 6 to 9 p.m. A technical break (when no

transactions took place) occured every day between 6 and 6:03 p.m. Continuous trad-

ing and the AMT (I) took place simultaneously. This simultaneity required very careful

pre-filtering. Especially when the trading day experienced a big cumulative price change,

then artificially big returns (even around 20%) could appear (see Fig. 7). An example

of such a trading day was April 18 (2002) when Nokia announced its 1st quartal result

that triggered a significant price drop earlier on that day. To safeguard from the gener-

ation of artificial returns, the following pre-filtering rule was applied: prices that had a

percentage price change of more than 3% relatively to the last genuine price recorded (be-

fore the technical break at 6 p.m.) were detected as artificial and replaced by the previous

genuine price. This rule was based on a careful inspection of the data (for some other

rules, see Dacorogna et al. (2001)).47 The noise reduction obtained with this 3%-filter was

45During the AMT trading price can fluctuate between the trading range established during continuous

trading for round-lot trades (Http://www.porssisaatio.fi).46Actually, this period ended the day before, but the data of day April 11 included transactions only

up to 6:52 p.m. For simplicity, therefore, this day was ended at 6 p.m.47The percentage change was calculated relatively to the last genuine price because there is no guarantee

that two artificial prices could not be adjacent. In fact, if the percentage would be have been calculated

simply from adjacent prices, an artificial price would then have survived the filter. Admittedly, however,

there is a small ”defect” in this 3%-filter. Namely, fixing the denominator to the last genuine price is not

reasonable if there is a strong price trend in either direction. But because the AMT (I) lasted only for 27

minutes, a trend was regarded of minor importance.

70

so considerable that the difference to the non-filtered series was clearly evident by eye. In

summary, continuous trading took place for 11 hours (including the 3-minute break) and

produced 133 intraday 5-minute prices. There were no incomplete trading days. In total,

there were 237 trading days resulting in 31,521 price observations.

In Period IV, from April 2 (2002) to December 30 (2002), continuous trading was cut

from the end by an hour. So continuous trading took place between 10 a.m. and 8 p.m.

(apart from the technical break and simultaneity just described) totaling to 10 hours and

121 intraday 5-minute prices. The same 3%-filter was employed. There were no incomplete

trading days in this period either. In total, there were 188 trading days resulting in 22,748

price observations.

7.2 Preliminary data analysis

Statistical key figures of Periods I and III are summarized below (Table 3). There are

at least two reasons for preferring to analyze Periods I and III over the other two. First,

Periods I and III are of approximately equal size and also contain the greatest number

of observations which is convenient from the statistical inference point of view. Second,

Periods I and III represent turbulent and calm regimes, respectively: Period I is represen-

tative of the ”IT-bubble” and Period III of its aftermath.48 The volatilities of Periods I

and III seem to differ by simply eyeballing the return series (see the bottom plots of Figs.

8 and 9).49 This observation is valuable because structural breaks can generate artificial

long-memory (see e.g. Lamoureux and Lastrapes (1990b), Granger and Hyung (1999),

and Diebold and Inoue (2001)). In particular, Mikosch and Starica (2004) have argued

that long-memory might be due to non-stationarity, thus lending stationary models (e.g.

GARCH) inappropriate over longer horizons. So it is safer to analyze these periods sep-

arately. There is another aspect in which Periods I and III differ from each other: the

former has a strong positive trend component while the latter does not (see the top plots

48Polzehl et al. (2004) find that the ”2001 recession” in the U.S. might have started as early as October

2000 and ended as late as the Summer of 2003 which neatly supports these two categories.49The standard deviation of returns of Period I is actually smaller than that of Period III (0.3789 and

0.3869, respectively). Similarly, the means of absolute returns (proxying volatility) are 0.1874 and 0.2287,

respectively. Wavelet variances will shed more light on this counterintuitive finding.

71

Table 3: Statistical key figures of Periods I and III.Period I

Min. 1st Q. Med. Mean 3rd Q. Max.

−11.61 −0.1118 0 +3.747e− 03 0.1161 10.97

Std.

0.3789Period III


−11.16 −0.1523 0 −6.798e− 04 0.1534 14.05

Std.

0.3869

of Figs. 8 and 9). Because a trend is another possible source of spurious long-memory

(e.g. Bhattacharya et al. (1983)), one may then a priori expect Period I to show stronger

long-memory.

The sample autocorrelation functions (ACFs) of returns in Periods I and III differ from

each other in a non-trivial way (see the top plots of Figs. 10 and 11). It seems that there

is a statistically significant pattern in Period I: the opening of the HEX as well the U.S.

markets (New York) at 5:30 p.m. (Central European Time +1) has caused some linear

dependence.50 This finding does not necessarily imply any kind of arbitrage opportunities

in economic sense, however. When transactions costs are included, a minor amount of

autocorrelation is consistent with a martingale process and the notion of efficient mar-

kets (see Fama (1970)). Considering the slightly different results of Period III, it seems

that markets became more liquid and efficient. A bit surprisingly, though, in Period I

no significant negative autocorrelation of MA(1) type at lag one exists that is typically

reported (Andersen and Bollerslev (1997b) found it to be −0.04 in the FX markets with5-minute data) and attributed to bid-ask bounce. In Period III, a significant negative

first-lag autocorrelation (−0.08) does appear, however. It is then somewhat puzzling howincreased liquidity would be related to the appearance of negative first-lag autocorrela-

tion. One possibility is that the sizes of the trades have increased as well, thus causing

the microstructure effects to last longer. In the analysis below the MA(1) dynamics have

not been considered that essential however and therefore they have not been filtered out.

To proxy volatility, I used absolute returns (for reasons explained in Sec. 6.1). The

50The NYSE opens at 9:30 a.m. local time (Eastern Standard Time). When comparing the figures,

recall that the length of the trading day was different in Periods I and III.

72

Period I (January 4, 1999 - August 31, 2000)

log(

Pric

e)

0 5000 10000 15000 20000 25000 30000 35000

3.0

3.5

4.0

Time

log-

Ret

urn

0 5000 10000 15000 20000 25000 30000 35000

-10

-50

510

Figure 8: Price and return series of Period I (IT-bubble period).

73

Period III (April 17, 2001 - March 30, 2002)

log(

Pric

e)

0 5000 10000 15000 20000 25000 30000

2.6

3.0

3.4

Time

log-

Ret

urn

0 5000 10000 15000 20000 25000 30000

-10

05

10

Figure 9: Price and return series of Period III (aftermath period).

74

sample ACFs of absolute returns stay significantly positive for a long time in both periods,

statistically as well as economically (see the bottom plots of Figs. 10 and 11). In Period

III, for example, the first-lag autocorrelation (0.32) is well above the confidence interval

(Andersen and Bollerslev (1997b) found 0.309). Clearly, then, returns are not independent.

Although the pattern is quite similar in both periods, there are some important differences

here too. First, the ACF peaks higher in Period I than in Period III. This peak is caused

by the large (on average) overnight return in Period I. The larger ”overnight effect” is

most probably caused by the frequent news arrivals, the hype that took place during the

bubble, and the shorter trading day at the HEX (so that information had more time to

accumulate over night). Second, in Period I the first peak just prior to the highest peak

is a reflection of the opening of the New York stock markets, i.e. they are the ”New York

effect” (discussed more closely in Sec. 7.6).51 In Period III no distinct New York effect

exists in autocorrelations which is probably due the weaker link between the U.S. and

European markets after the burst of the IT-bubble.

7.3 Multiresolution decomposition

A description of the long-run dynamics is achieved conveniently by a wavelet MRA. A

MODWT MRA(J = 14) of price using a LA(8) filter (with reflecting boundary) produces

a set of wavelet smooths with varying amount of details included. These smooths show,

for example, that Period I has a strong positive trend while Period III does not (see Figs.

12 and 13). Notice that all the smooths are automatically aligned in time with the original

series (see Eq. (14)). Furthermore, these smooths converge to the original price series as

more and more details are being added.52 Concerning volatility, however, very little can

be inferred from these figures.

51Of course it is possible that other market places affect volatility at the HEX too but as will be

demonstrated later (in Sec. 7.6), the average intraday volatility peaks consistently at the opening of the

New York market. In the literature, volatility spillover effects have been reported for example by Engle

et al. (1990). A possible source of ”meteor showers” (as they call them) are heterogoneous expectations

(Hogan and Melvin (1994)).52These ”moving averages” could be applied in, e.g., forecasting in the spirit of ”double” and ”triple

crossing” methods (methods that are shortly discussed in Gencay et al. (2002a, pp. 48—49)).

75

ACF of Returns (Period I)

AC

F

0 50 100 150 200 250 300

-0.0

4-0

.01

0.01

ACF of Absolute Returns

Lag (5 minutes)

AC

F

0 50 100 150 200 250 300

0.0

0.1

0.2

0.3

Figure 10: The sample ACFs of returns and absolute returns in Period I. The 95% confi-

dence interval (dashed line) is for Gaussian white noise: ±1.96/√N .

76

ACF of Returns (Period III)

AC

F

0 100 200 300 400

-0.0

8-0

.04

0.00

ACF of Absolute Returns

Lag (5 minutes)

AC

F

0 100 200 300 400

0.00

0.10

0.20

0.30

Figure 11: The sample ACFs of returns and absolute returns in Period III.

77

Price (Period I)

log(

Pric

e)

0 5000 15000 25000 35000

3.0

3.5

4.0

wavelet smooth (level 6)

log(

Pric

e)

0 5000 15000 25000 35000

3.0

3.5

4.0


Time

log(

Pric

e)

0 5000 15000 25000 35000

3.0

3.5

4.0


0 5000 15000 25000 35000

3.0

3.5

4.0


0 5000 15000 25000 35000

2.8

3.2

3.6

4.0


Time

0 5000 15000 25000 35000

2.8

3.2

3.6

4.0

Figure 12: Price series of Period I and its wavelet smooths of varying levels.

78

Price (Period III)

log(

Pric

e)

0 5000 10000 15000 20000 25000 30000

2.6

3.0

3.4


log(

Pric

e)

0 5000 10000 15000 20000 25000 30000

2.8

3.2

3.6


Time

log(

Pric

e)

0 5000 10000 15000 20000 25000 30000

2.8

3.2

3.6


0 5000 10000 15000 20000 25000 30000

2.8

3.0

3.2

3.4

3.6


0 5000 10000 15000 20000 25000 30000

3.0

3.2

3.4

3.6


Time

0 5000 10000 15000 20000 25000 30000

3.15

3.25

3.35

Figure 13: Price series of Period III and its wavelet smooths of varying levels.

79

In order to study volatility at different time-scales, the MODWT(J = 12) is performed

to absolute returns using LA(8) (with reflecting boundary). The first 12 wavelet levels

with the corresponding time-scales and associated changes are listed below (see Table 4).

Notice that I here use Daubechies’ indexing so that a bigger level is associated with a

larger time-scale (instead of Mallat’s convention; see Sec. 4.2).53 When interpreting the

time-scales in ”calender time”, one should be careful with the period in question since the

length of the trading day has varied (see Sec. 7.1). So, for instance, in Period I the first 6

levels correspond to intraday (and daily) dynamics capturing frequencies 1/64 ≤ f ≤ 1/2,i.e. oscillations with a period of 10 − 320 minutes (approx. 5 hours). In Period III, onthe other hand, the first 7 levels correspond to intraday (and daily) dynamics capturing

frequencies 1/128 ≤ f ≤ 1/2, i.e. oscillations with a period of 10− 640 minutes (approx.11 hours). In terms of changes (not oscillations), then, the 6th level in Period I corresponds

to approximately a half of a trading day. In Period III this corresponds to the 7th level.

These levels will serve as a watershed between intraday and interday dynamics.

The MODWT wavelet coefficients at different levels j are useful as a descriptive tool

(see Figs. 14—15 and 16—17). The approximate zero-phase filter property (i.e. alignment

in time) is readily apparent: rapid changes in volatility stand out at the smallest scales

(i.e. highest frequencies). As the scale gets bigger (frequency lower), the changes tend

to be smoothed out because a wider filter averages more. For example, in Period III the

large spike in volatility between observations 5, 000 and 10, 000 (see Fig. 16) has died

out already at the 6th level (see Fig. 17). On the other hand, the spike between 10, 000

and 15, 000 continues to prevail even at the 10th level. This means that the former spike

was a high-frequency event only while the latter was a more severe and longer lasting

burst of volatility. Short-time speculators and long-term investors would then have to

react differently in such an event: the former would be a concern for speculators but the

latter would interest investors as well. And since the wavelet coefficients in theory form

a stationary series at each level (see Sec. 5.5), the same statistical characteristics should

53An unfortunate consequence of the dyadic dilation is that time-scales become coarse rapidly so that

not all of the potentially interesting scales are recovered. Thus the non-dyadic extension (Pollock and Lo

Cascio (2003, 2004)) might be worthwhile to look at (see Footnote in Sec. 4).

80

Table 4: Wavelet levels and time-scales.

Level Scale Associated with changes of

1 1 5 min.

2 2 10 min.

3 4 20 min.

4 8 40 min.

5 16 80 min.

6 32 160 min. ≈ 3 h.7 64 320 min. ≈ 5 h.8 128 640 min. ≈ 11 h.9 256 1280 min. ≈ 21 h.10 512 2560 min. ≈ 43 h.11 1024 5120 min. ≈ 85 h.12 2048 10240 min. ≈ 171 h.

persist in the future also (forecasting is not considered explicitly here, however).54

The MODWT coefficients lend themselves to a quantitative study because of their

energy preserving property (see Eq. (15)). Few general observations can be made immedi-

ately. For example, the unconditional distributions convergence from a highly leptokurtic

distribution to a Gaussian (see Figs. 18 and 19). In Period I, for instance, Gaussianity

is reached at the 10th level. There the Jarque—Bera test statistic is 5.1067 (with p-value

of 0.07782). In Period III it is 7.8698 (0.01955) at the 10th level but the distribution

continues to vary its shape. Notice that the mean is zero at all levels (a consequence of

the zero average property of wavelets) while the range gets constantly smaller as the levels

grow. A closer look at the unconditional wavelet variance is the next topic.

54This does not imply, however, that for example the ”cycle” visible at the end of the 10th level of

Period I is something that could be easily exploited. While it is not an artifact of the wavelet filter,

there is simply no reason why such a cycle should persist in an efficient stock market. A wavelet based

forecasting tool for short and long-memory time series is presented for example by Renaud et al. (2002)

who argue the concept to be ”very simple and easy to implement” with ”significant potential”.

81

Volatility (January 4, 1999 - August 31, 2000)

Abso

lute

Ret

urn

0 5000 10000 15000 20000 25000 30000 35000

02

46

810

wavelet coefficients (level 2)

Cha

nge

0 5000 10000 15000 20000 25000 30000 35000

-3-1

12

34


Time

Cha

nge

0 5000 10000 15000 20000 25000 30000 35000

-1.0

0.0

1.0

Figure 14: Volatility and the MODWT wavelet coefficients of Period I (j = 2 and 4).

82


Cha

nge

0 5000 10000 15000 20000 25000 30000 35000

-0.5

0.0

0.5


Cha

nge

0 5000 10000 15000 20000 25000 30000 35000

-0.2

0.0

0.2


Time

Cha

nge

0 5000 10000 15000 20000 25000 30000 35000

-0.0

50.

000.

05

Figure 15: The MODWT wavelet coefficients of Period I (j = 6, 8 and 10).

83

Volatility (April 17, 2001 - March 30, 2002)

Abso

lute

Ret

urn

0 5000 10000 15000 20000 25000 30000

02

46

812


Cha

nge

0 5000 10000 15000 20000 25000 30000

-20

24

6


Time

Cha

nge

0 5000 10000 15000 20000 25000 30000

-1.0

0.0

1.0

Figure 16: Volatility and the MODWT wavelet coefficients of Period III (j = 2 and 4).

84


Cha

nge

0 5000 10000 15000 20000 25000 30000

-0.6

-0.2

0.2

0.6


Cha

nge

0 5000 10000 15000 20000 25000 30000

-0.1

0.1

0.3


Time

Cha

nge

0 5000 10000 15000 20000 25000 30000

-0.0

50.

05

Figure 17: The MODWT wavelet coefficients of Period III (j = 6, 8 and 10).

85

1

Freq

uenc

y

-4 -2 0 2 4 6

020

000

2

Freq

uenc

y

-2 0 2 4

020

000

3

Freq

uenc

y

-1 0 1 2

020

000

4

Change

Freq

uenc

y

-1.0 0.0 0.5 1.0 1.5

020

000

5

-0.5 0.0 0.5 1.0

020

000

6

-0.5 0.0 0.5

015

000

7

-0.4 -0.2 0.0 0.2 0.4

020

000

8

Change

-0.2 0.0 0.1 0.2 0.3

015

000

9

-0.15 -0.05 0.05 0.15

015

000

10

-0.05 0.00 0.05

060

00

11

-0.05 0.00 0.050

6000

12

Change

-0.06 -0.02 0.02 0.06

060

00

Figure 18: Unconditional distributions in Period I (j = 1, ..., 12).

86

1

Freq

uenc

y

-4 -2 0 2 4 6

020

000

2

Freq

uenc

y

-4 -2 0 2 4 6

020

000

3

Freq

uenc

y

-2 -1 0 1 2 3

020

000

4

Change

Freq

uenc

y

-1.5 -0.5 0.5 1.5

020

000

5

-1.0 0.0 0.5 1.0 1.5

020

000

6

-0.5 0.0 0.5

015

000

7

-0.4 -0.2 0.0 0.2 0.4

015

000

8

Change

-0.2 0.0 0.2

015

000

9

-0.1 0.0 0.1 0.2

010

000

10

-0.05 0.00 0.05 0.10

060

00

11

-0.10 0.00 0.05 0.100

1000

0

12

Change

-0.05 0.00 0.05 0.10

060

00

Figure 19: Unconditional distributions in Period III (j = 1, ..., 12).

87

7.4 Global scaling laws and long-memory

Several authors have provided evidence of monotonic scaling laws in the FX markets (e.g.

Muller et al. (1990, 1993), Guillaume et al. (1997), and Andersen et al. (2000)) but less

so in the stock markets. This is probably because of the larger turnover, higher liquidity,

and lower transaction costs in the FX markets. However, in both markets it is possible

that a single scaling factor is appropriate only in a subset of time-scales. To study this

”multi-scaling” in the FX markets, Gencay et al. (2001) used wavelet methodology. Using

absolute returns as volatility proxy, they confirmed that a different scaling regime exists for

intradaily time-scales than does for interday and larger time-scales. It is now interesting

to see if (i) the same phenomena appears with stock market data, (ii) the scaling factor is

stable in time, and if (iii) there is any reasonable explanation for such phenomena.

The MODWT coefficients of absolute returns of Periods I and III were again formed

by the MODWT(J = 12) using LA(8). The reflecting barrier seemed to suffer less from

the boundary effects at large levels than the periodic one so the former was used (see the

top subplots of Fig. 20). The good localization properties of wavelets are able to reveal

that most of the total energy of volatility is located at the smallest time-scales (the highest

frequencies).55 The relationship is actually approximately hyperbolic which is observed

as an approximate linear relationship on a double-logarithmic scale. Notice that although

there is no reason a priori to exclude any specific time-scale from the analysis, the results

from the smallest time-scale (level 1) in Period III are to be interpreted a bit cautiously

because of the negative autocorrelation in returns. Notice also that only the Gaussian

confidence bands were calculated (see the bottom plots of Fig. 20). This in mind, there

seems to exist two different scaling regions in Period I with a visible break at the seventh

level associated with 320-minute changes or oscillations with a period of approximately

640 minutes (see Fig. 21). The first six levels capture frequencies 1/64 ≤ f ≤ 1/2, i.e.oscillations with a period of 10−320 minutes corresponding to intraday dynamics in PeriodI.56 The seventh and higher levels are related to one day and higher dynamics.

One might expect a break at the 7th level in Period III because of its longer trading day,

55As Dr. Stephen Pollock suggested to me at the ”Workshop on Computational Econometrics and

Statistics” (Neuchatel, Switzerland 2004), financial markets tend to ”shriek” under stress.56The first level is discarded below to minimize microstructure effects (just in case).

88

Period I (a)lo

g(W

avel

et v

aria

nce)

2 4 6 8 10 12

-11

-9-8

-7-6

-5

Period I (b)

Scale index

log(

Wav

elet

var

ianc

e)

2 4 6 8 10 12

-12

-10

-8-6

Period III (a)

2 4 6 8 10 12

-11

-9-8

-7-6

-5

Period III (b)

Scale index

2 4 6 8 10 12

-12

-10

-8-6

Figure 20: Wavelet variances of Periods I (on left) and III (right) on a double-logarithmic

scale. The upper plots show the result using reflecting (cts line) and periodic boundary

(dotted). The lower plots show the Gaussian 95% confidence intervals with reflecting

boundary only.

89

Wavelet variances (Period I and III)

Scale index

log(

Wav

elet

var

ianc

e)

2 4 6 8 10 12

-12

-10

-8-6

Figure 21: The wavelet variances of Periods I (cts line) and III (dashed). The Gaussian

95% confidence interval (dotted) of Period III has been drawn to address the significance.

90

Level 1 wavelet coefficients (Period I)

Cha

nge

0 5000 10000 15000 20000 25000 30000 35000

-4-2

02

46

Level 1 wavelet coefficients (Period III)

Time

Cha

nge

0 5000 10000 15000 20000 25000 30000

-4-2

02

46

Figure 22: The 1st level MODWT wavelet coefficients of Periods I and III compared.

91

but there is only a slight one at the 6th level. Indeed, the difference between the scaling

laws of Periods I and III is most evident at level 6. This is visible even by eye when the

wavelet coefficients of level 6 are compared to each other (see Figs. 15 and 17). Clearly

Period I experienced more middle-sized jumps in this particular time-scale than Period III

did (and hence the outlook in Period I is fatter). This observation is however not enough

to explain the extremely jumpy look of Period I. Because sudden jumps are high-frequency

events, they should be well captured by the 1st level. This intuition is confirmed by the

1st level wavelet variance of Period I which lies outside the 95% confidence interval of

Period III, as well (see Fig. 21). By plotting the level 1 wavelet coefficients side-by-side,

the difference becomes obvious (see Fig. 22). So the ”more volatile” outlook of Period I

is mainly caused by the different dynamics at levels 1 and 6 corresponding to 5-minute

and approximately 3-hour changes, respectively. Now the difference in the overall level

of volatility can be attributed to specific time-scales and short-run speculators in general.

More precisely, the jumps at the 1st level measure the flow of new information and the

general level of nervousness of the market. It is easily confirmed that most of these jumps

are caused by overnight returns (for reasons stated in Sec. 7.2). The difference at the 6th

level is not so easily interpretable, though. It may be due the volatility seasonality that

is particularly strong in Period I. This will be studied more carefully later (in Sec. 7.6).

Scaling laws are intimately related to the memory of the DGP. The observed initial

rapid decay of the sample autocorrelation followed by a very slow rate of dissipation (see

Sec. 7.2) is characteristic of slowly mean-reverting fractionally integrated processes that

exhibit hyperbolic rate of decay (i.e. long-memory).57 From a statistical point of view

the quantification of this decay is important as standard statistical tools for inference are

invalid in the case of long-memory. For example, standard errors for the estimates of

the coefficients of ARCH or stochastic volatility models would be incorrect and hence the

confidence intervals for predictions (Lobato and Savin (1998); see also Beran (1994)). Eco-

nomically long-memory has consequences for option pricing. For instance, long-memory

has a significant impact upon the term structure of implied volatilities (see Taylor (2000)).

And of course, estimation of the fractional differencing parameter d allows the use of long-

57Basic ARCH-models exhibit exponential rate of decay and fail in this respect (see e.g. Bollerslev and

Mikkelsen (1996), Breidt et al. (1998), Ding et al. (1993), and Granger and Ding (1996)).

92

memory stochastic volatility models (such as LMSV) for simulation and forecasting.

The semiparametric wavelet domain method is in theory better suited for estimating

the rate of the decay than the one based on spectral density (see Sec. 5.5). Using Equation

(18), the estimation of the fractional differencing parameter d is therefore done for Periods

I and III by the OLS. The same type of approach has been used by Jensen (2000) and

Tkacz (2000), for example. Following Ray and Tsay (2000), the standard errors obtained

from regression theory are used to judge the significance. Overall, the estimates of d

support the conjectured long-memory (see Table 5). Period I has a slightly larger value

than Period III which could in principle be caused by the strong trend in the former.

However, because of the 4 embedded differencing operations of LA(8) (see Sec. 4.5) this

is unlikely as Craigmile et al. (2004) have shown. The coefficients using levels j1 = 2, ..., 6

and j2 = 7, ..., 10 in Period III do not differ statistically but the results of Period I are not

as evident (and are to be discussed later). The relatively short time-span used (approx.

1.5 years) can be criticized in this context and it has been a topic of debate in past years

in fact. It is true that when estimating long-memory dependencies in the mean, the small

sample bias depends crucially on the time span of the data. But the most recent evidence

(see Andersen and Bollerslev (1997a, 1997b) and Bollerslev and Wright (2000)) suggests

that the performance of the estimates from the volatility series may be greatly enhanced

by increasing the observation frequency instead of the time-span. In particular, Bollerslev

and Wright (2000) have argued that high-frequency data allows for vastly superior and

nearly unbiased estimation of d.

7.5 Local scaling laws and long-memory

The assumption of a constant long-memory structure may not always be reasonable.

Bayraktar et al. (2003) tackled the problem of time-varying long-memory by segment-

ing the data before the estimation of the Hurst coefficient H(t) (a closely related measure

of long-memory, see e.g. Beran (1994)). But this scheme might not always be sufficient

as Whitcher and Jensen (2000) have pointed out. In particular, they argued that ”the

ability to estimate local behavior by applying a partitioning scheme to a global estimating

procedure is inadequate when compared with an estimator designed to capture time-varying

93

Table 5: OLS-regression results.Period I

Levels Coefficient Std. error t-value P(> |t|) bd2− 10 −0.62044 0.05016 −12.37 5.19e− 06 0.18978

2− 6 −0.44433 0.09108 −4.879 0.01646 0.277835

7− 10 −0.37730 0.02267 −16.65 0.003590 0.31135Period III


2− 6 −0.56431 0.04396 −12.84 0.00102 0.217845

7− 10 −0.57769 0.03998 −14.45 0.00475 0.211155

features”. In this respect the work of Goncalves and Abry (1997), who estimated a local

scaling exponent for a continuous-time multifractal Brownian motion, seems more appro-

priate. Unfortunately, their approach involves the construction of non-standard wavelets

which hinders practical implementation. To overcome this difficulty, Whitcher and Jensen

(2000) introduced an estimator based on the MODWT that allowed them to stay in the

traditional ARFIMA framework.

In contrast to Jensen andWhitcher (2000) who use log-squared returns to proxy volatil-

ity I continued to use absolute returns in order to prevent an inlier problem (see Sec. 6.1).

To let Equation (20) hold, I then implicitly assume that absolute returns are generated

by a locally stationary process. Considering the jumps and clustering of volatility, this

assumption seems more reasonable than covariance stationarity (although no formal tests

were conducted). Using only the smallest levels in the OLS-regression resulted in esti-

mates of d(u) that varied in a white noise fashion. This is in agreement with the argument

of Jensen and Whitcher that intraday levels are irrelevant to long-memory phenomena.

But this contradicts the findings of Andersen and Bollerslev (1997b) that intraday volatil-

ity can be informative even in the long-run. Because of this contradiction, and the fact

that 5-minute returns are still subject to bid-ask bounce at the first lag in Period III, I

considered only the 1st level as uninformative. On the other hand, using only the larger

levels resulted in a severe unstability of the estimate, most probably due the small support

94

(only 4 levels).58 Unsurprisingly, levels 2− 10 together gave the most reliable results andtherefore they are cited below.

Consistent with the global results, the median of the local long-memory parameter

estimate of Period I is larger compared to that of Period III (see Table 6). In general, the

estimate of d(t) shows similar characteristics in both periods (see Figs. 23 and 24): most

of the time the estimate stays in the interval (0, 1/2) indicative of stationary long-memory

although ”outliers” tend to pull the estimate downwards and ”out of bounds”. Fortunately

however, the occasional crossovers should not present a serious modeling problem since

the process is mean reverting.59 The estimate also stabilizes during less volatile times.

Notice that an increase in the estimate during periods of steady growth (or decline) is in

agreement with the definition of long-memory. Futhermore, the estimates are uncondi-

tionally Gaussian (see Fig. 25). The distributions are slightly skewed to the left because

of the large drops (especially in Period I). The idea of finding a structure in d(t) (such as

ARFIMA) that could be exploited in forecasting may be hampered by the possibility of

long-memory being spurious, however. For instance, structural breaks might have affected

the estimate of d(t) upwards and hence the timing and size of the breaks would then

become an equally important research problem (see Granger and Hyung (1999)). Fortu-

nately, no clear sign of a structural break is visible in either period. In fact, one of the

reasons for the division of the data was to avoid this argument (see Sec. 7.2). Regarding

modeling with the locally stationary LMSV model (see Sec. 6.2), the possibility of mis-

specification has to be acknowledged, too. In fact, the medians of d(t)s in both periods are

different (larger, in fact) from the corresponding ds although they are expected to match

quite closely (see Sec. 5.5). It is possible that the jumps that pull the estimate downwards

are too frequent and severe. The modeling of d(t) and the potential problems involved are

not studied further here, however.

58Recall that levels larger than level 10 were seriously affected by the boundary and were therefore

excluded. Nevertheless, it is probable that including levels 11 and 12 would stabilize the estimate a bit,

but this is computationally very costly.59I’m indebted to Prof. Christian Gourieroux for pointing this out at the Economics and Econometrics

of the Market Microstructure Summer School (Constance, Germany 2004).

95

Local long-memory parameter estimates (Period I)

d(t)

0 5000 10000 15000 20000 25000 30000 35000

-0.2

0.2

0.6

1.0

Return

log(

Ret

urn)

0 5000 10000 15000 20000 25000 30000 35000

-10

-50

510

Price

Time

log(

Pric

e)

0 5000 10000 15000 20000 25000 30000 35000

3.0

3.5

4.0

Figure 23: Local long-memory parameter estimates of Period I. Return and price series

are plotted below to align the features in time.

96

Local long-memory parameter estimates (Period III)

d(t)

0 5000 10000 15000 20000 25000 30000

-0.2

0.2

0.6

Return

log(

Ret

urn)

0 5000 10000 15000 20000 25000 30000

-10

05

1015

Price

Time

log(

Pric

e)

0 5000 10000 15000 20000 25000 30000

2.6

3.0

3.4

Figure 24: Local long-memory parameter estimates of Period III.

97

Period I

Time

d(t)

0 5000 15000 25000 35000

-0.2

0.2

0.6

1.0

d(t)

Freq

uenc

y

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

010

0030

0050

00Period III

Time

0 5000 15000 25000

-0.2

0.0

0.2

0.4

0.6

d(t)

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

010

0030

0050

00

Figure 25: Unconditional distributions of the estimate of d(t) are Gaussian.

98

Table 6: Statistical key figures of time-varying long-memory.Period I


−0.3426 0.2084 0.3117 0.3020 0.4056 1.0313

Jarque—Bera

X2 df p

5.5039 2 0.0638Period III


−0.3287 0.1663 0.2505 0.2412 0.3262 0.6624

Jarque—Bera

X2 df p

3.6219 2 0.1635

7.6 Effects of volatility periodicity

The effect of intraday volatility periodicity on the global and local scaling laws and long-

memory will now be studied. Martens et al. (2002) have argued that the intraday patterns

in the FX and stock markets are so distinctive that there is ”a strong case for taking them

into account before attempting to model the dynamics of volatility”.60 Note first that, on

average, the shape of intraday volatility is similar in Periods I and III (see Figs. 26 and 27).

Note also that the overnight 5-minute returns are now excluded. Although this clearly has

not removed all the overnight effects (the first intraday interval still exhibiting considerably

larger volatility than the rest of the intervals), the first interval is not modified. After the

highly volatile first 5 minutes the average volatility calms down smoothly and stabilizes

although some predetermined days have significant turbulence at midday because of the

announcement of quartal reports; a finding that is consistant with the results obtained

by Felixson (2004) at the HEX. At afternoon hours, however, the behavior of volatility

becomes abrupt again. The first peak occurs at 3:35 p.m. and the next one at 4:35. The

former 5-minute peak is most probably due to regular U.S. macro news announcements61

and the latter is the New York effect (reported in Sec. 7.2). There is also a small but

distinct 5-minute peak half an hour later at 5:05 which is probably caused by macro news,

too. In Period III the highest peak is at 6:05 p.m. (and right after it) when the AMT (I)

starts. It is very likely that this is only an artifact of which the 3%-filter (described in

Sec. 7.1) was unable to totally remove. The last 5 minutes of trading also experience a

60In the FX markets the periodicity is clearly associated with the opening and closing of various financial

centers around the world (see e.g. Dacorogna et al. (1993) or Andersen and Bollerslev (1997a, 1997b)).61The most important U.S. macro news announcements are released at 8:30 (e.g. employment report)

and 10 a.m. (e.g. Humphrey—Hawkins testimony) Eastern Standard Time.

99

sudden but relatively small increase in volatility in both periods. Thus the general average

pattern is an ”inverse-J”. Such a pattern is reported in other markets too (see e.g. Wood

et al. (1985), Harris (1986) and Lockwood and Linn (1989)).

The wavelet method could be used to annihilate intraday dependencies. This would

work in a similar fashion as a low-pass filtering technique based on a two-sided weighted

average of both past and future absolute returns used in Andersen and Bollerslev (1997b).

Unfortunately, though, by considering the interdaily and longer dynamics (i.e. the wavelet

smooth of level J ≥ 6) as proposed by Gencay et al. (2000), I was not able to reproducethe hyperbolic decay in the sample ACF of the filtered series. So the intraday seasonalities

were removed by the Fourier flexible form (FFF), instead (see App. E). Luckily the FFF

is straightforward to execute and it has been succesfully applied in the FX as well as stock

markets previously (see e.g. Andersen and Bollerslev (1997c, 1998)). Although the results

are quite adequate here as well, it may be that the FFF is not so well-suited for individual

stocks which tend to have more outliers and abrupt volatility patterns. Clearly the results

also depend in a fairly complex way on howmany sinusoids P are included in the regression.

By simply increasing P does not necessarily imply a better result in every respect: the

filtered returns tend to become more autocorrelated (see the top subplot of Fig. 28). On

the other hand, a small P does not necessarily remove all the intraday patterns (see the

bottom subplots of Figs. 28 and 29). My choice was to settle for the smallest number of

sinusoids that gave a reasonable fit. This meant setting P = 3 and 4 for Period I and III,

respectively. Although Andersen and Bollerslev (1997c) argued the daily volatility factor

(i.e. J ≥ 1) to possibly be important in the FFF-regression in the case of stock markets,now it did not make that much of a difference and it was therefore left out (in fact, using

J = 1 only seemed to emphasize some outliers in the filtered return series). The inclusion

of dummy variables was found particularly important when J = 0, however. In Period I

two dummies at n = 1 and 61 were considered essential (see below), corresponding to the

market opening (10:35 a.m.) and the U.S. macro news announcements (3:35 p.m.). In

Period III additional dummies had to be introduced at n = 97, 98 and 99 corresponding

to the beginning of the AMT (I). To be precise, the FFF-regression fit for Period I is

bf (θ;n) = −4.06(0.94)∗∗∗

+ 1.29(2.69)

n

N1− 0.27

(0.90)

n2

N2+ 2.29

(0.34)∗∗∗1n=d1+ 1.23

(0.27)∗∗∗1n=d2

100

+

µ1.20(0.52)∗

cosn2π

N+ 0.23

(0.10)∗sinn2π

N+ 0.33

(0.13)∗cos

n4π

N

− 0.05(0.06)

sinn4π

N+ 0.14

(0.07)∗cos

n6π

N− 0.06

(0.05)sinn6π

N

¶,

while for Period III the fit is

bf (θ;n) = 1.85(0.87)∗

− 11.17(2.54)∗∗∗

n

N1+ 3.54

(0.84)∗∗∗

n2

N2+ 1.62

(0.27)∗∗∗1n=d1+ 0.74

(0.22)∗∗∗1n=d2

+ 0.14(0.23)

1n=d3+ 1.32(0.23)∗∗∗

1n=d4+ 0.91(0.23)∗∗∗

1n=d5 +

µ− 2.19

(0.50)∗∗∗cos

n2π

N

− 0.55(0.07)∗∗∗

sinn2π

N− 0.63

(0.13)∗∗∗cos

n4π

N+ 0.54

(0.04)∗∗∗sinn4π

N+ 0.13

(0.06)∗cos

n6π

N

+ 0.18(0.04)∗∗∗

sinn6π

N− 0.03

(0.04)cos

n8π

N− 0.04

(0.03)sinn8π

N

¶,

where the numbers in parentheses are standard errors and the asterices are significance

codes (for 0.001, 0.01, and 0.05).62

It is easy to observe from the figures that the associated volatility shocks of pre-

determined events (used as dummies) are short-lived. In fact, a new equilibrium price

is found in just 5 minutes which is consistant with the findings from the FX (Andersen

and Bollerslev (1998)) and the U.S. treasury bond markets (Bollerslev et al. (2000))

(see also Engle and Ng (1993)). Following Andersen and Bollerslev (1998, p. 244) the

regression coefficients for the dummy variables can be interpreted in such a way that

volatility for intervals n = 1 and 61 in Period I increased by exp(2.29/2) ≈ 3.14 and 1.85percent, respectively. The point estimates imply that these events were most probably

also economically, not only statistically, significant events. Their usage in constructing

arbitrage strategies is however limited by the fact that the sign of the change is unknown.

In Period III the effects of these same events were a bit weaker, accounting only for 2.25

and 1.45 percents, respectively. This indicates that markets reacted more strongly in

Period I than in Period III; an observation that is not surprising and is consistant with

observations made earlier (in Sec. 7.2). And by just eyeballing the average volatility

pattern the New York effect is seen to be relatively larger in Period I, too.

62The fit in the figures is obtained by applying |rt,n − r| = bσtN1/2 exp

³ bf(θ;σt, n)/2´ exp (but,n/2) . An-dersen and Bollerslev (1997c) are vague about this point (especially about the scaling factor).

101

Average Volatility Fit (Period I)

Index of Intraday Interval

Ave

rage

Vol

atili

ty

0 20 40 60 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Figure 26: Average intraday volatility of Period I (cts line) and its FFF fit (dashed).

102

Average Volatility Fit (Period III)

Index of Intraday Interval

Ave

rage

Vol

atili

ty

0 20 40 60 80 100 120

0.2

0.4

0.6

0.8

Figure 27: Average intraday volatility of Period III (cts line) and its FFF fit (dashed).

103

ACF of Filtered Returns (Period I)

AC

F

0 50 100 150 200 250 300

-0.0

40.

000.

04

ACF of Filtered Absolute Returns (Period I)

Lag

AC

F

0 50 100 150 200 250 300

0.00

0.10

0.20

0.30

Figure 28: Sample ACFs of filtered returns and absolute returns of Period I. The 95%

confidence interval (dashed line) is for Gaussian white noise: ±1.96/√N .

104

ACF of Filtered Returns (Period III)

AC

F

0 100 200 300 400

-0.0

40.

00

ACF of Filtered Absolute Returns (Period III)

Lag

AC

F

0 100 200 300 400

0.00

0.10

0.20

0.30

Figure 29: Sample ACFs of filtered returns and absolute returns of Period III.

105

To be able to compare the scaling laws of the periodicity filtered returns to the original

series, the overnight returns must be omitted. Naturally this reduces the total energy of the

series. The form of the scaling laws remain about the same, though (see the upper subplots

of Fig. 30). By then comparing the scaling laws one notices that the removal of the

intraday periodicity has smoothed out the kink at the 6th level. In Period I, in particular,

the wavelet variances at larger than the 6th level have increased considerably. The law is

still not linear, however. The first region includes time-scales smaller than an hour and the

second one the rest up till 2560 minutes (approx. 43 hours). Interestingly, almost all of

the periodicity-filtered wavelet variances are significantly different from the original (with

the overnight returns excluded). In Period III, the change in the distribution of energy

across the scales is not that dramatic but the slight kink at the 6th level has disappeared,

as well. In both periods the wavelet variances at levels 11 and 12 are still behaving wildly,

thus giving support to the claim that the largest levels suffer from boundary effects.

Regarding long-memory, the estimates of d have increased in both periods (see Table

7). This is in contrast what Bayraktar et al. (2003) have found and who argued the OLS-

based wavelet variance estimation to be robust to seasonalities. One possible explanation

for this discrepancy is the FFF method used here (Bayraktar et al. use a different method):

although the mean of bst,n (see Eq. (23) in App. E) is 1 so that on average the returnsare not changed, it may be that the second-order characteristics are affected unduly (this

is not studied here, though). Also, it is known that OLS is sensitive to jumps which are

indeed more frequent in Period I than in Period III. This is however an unlikely cause

because most of the jumps are overnight returns whose exclusion did not seem to change

the shape of the scaling law. Besides, the OLS was fit to the logarithms which reduces

the impact of outliers. It is more probable that the removal of the originally stronger

intraday periodicity in Period I is really the cause of the larger change in the estimate of

d (in Period I the change is from 0.18978 to 0.27583 while in Period III the change is from

0.17959 to 0.226605).

Considering the results this far, it is no surprise that the removal of volatility periodicity

affects the local long-memory estimates as well. In fact, the median of the time-varying

d(t) increased by approximately 0.03 in both periods (see Table 8): in Period I the increase

106

Period I (a)lo

g(W

avel

et v

aria

nce)

2 4 6 8 10 12

-11

-10

-9-8

-7-6

-5

Period I (b)

Scale index

log(

Wav

elet

var

ianc

e)

2 4 6 8 10 12

-12

-10

-9-8

-7-6

Period III (a)

2 4 6 8 10 12

-11

-9-8

-7-6

-5

Period III (b)

Scale index

2 4 6 8 10 12

-12

-10

-8-7

-6-5

Figure 30: The scaling laws of the overnight return excluded series (cts line in the top

subplots) are below the original scaling laws (dark), especially in Period I. The removal

of volatility periodicity (dark in the bottom subplots) smooths out the kink found at the

6th level previously. The Gaussian 95% confidence intervals are drawn (dotted thin line)

to address the significance of the periodicity.

107

Table 7: OLS-regression results (filtered volatility).Period I


2− 6 −0.33260 0.09553 −3.482 0.040006 0.3337

7− 10 −0.54285 0.01986 −27.34 0.00134 0.228575Period III


2− 6 −0.621757 0.006855 −90.7 2.95e− 06 0.1891215

7− 10 −0.46923 0.03197 −14.68 0.00461 0.265385

Table 8: Statistical key figures of time-varying long-memory (filtered volatility).Period I


−0.3737 0.2591 0.3464 0.3415 0.4300 1.1873

Jarque—Bera

X2 df p

12.322 2 0.0021Period III


−0.3446 0.1992 0.2803 0.2682 0.3481 0.7073

Jarque—Bera

X2 df p

5.3449 2 0.0691

is from 0.3117 to 0.3464 and in Period III from 0.2505 to 0.2803. Notice that this increase is

more modest than in the global estimation and that the global estimates and the medians

of local estimates continue to differ from each other. In Period I the FFF method has

increased the amplitude of some big returns, thus making the range of the estimates of

d(t) wider and the unconditional distribution only nearly Gaussian. But in between the

jumps (which are less frequent than in the original return series) there now exist steadier

periods of growth (see the left-hand part of Fig. 31). Indeed, the 1st and 3rd quantile

confirm that the unconditional distribution is more concentrated around the mean value.

In Period III there are no enlarged outliers and the path of the time-varying d(t) is, in

general, more stable too (see the right-hand part of Fig. 31). The range of filtered returns

has there become only a bit wider and the unconditional distribution remains Gaussian.

108

Period I

Time

d(t)

0 5000 15000 25000 35000

0.0

0.5

1.0

d(t)

Freq

uenc

y

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

010

0030

0050

00Period III

Time

0 5000 15000 25000

-0.2

0.0

0.2

0.4

0.6

d(t)

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8

010

0030

0050

00

Figure 31: Unconditional distributions of the estimate of d(t).

109

8 Conclusions

This licentiate thesis has presented wavelet methodology in a manner that should be

accessible to econometricians. First, wavelet theory is contrasted with the more traditional

frequency-oriented Fourier analysis in a deterministic environment. It is then argued that

the local adaptiviness of wavelets make them ideally suited for analyzing high-frequency

data where most of the energy is located in jumps, clusters, and other non-stationarities.

Finally, wavelets are applied in a stochastic environment. Using the liquid stock of Nokia

as an example, it is shown that a wavelet multiresolution analysis can give new useful

insight into the dynamics (in this case the energy) of a key variable in finance: volatility.

This is motivated by the hypothesis that players in stock markets operate at multiple

time-scales with different impacts on the whole.

One of the main findings is that wavelet variances at specific intradaily levels dif-

fer from each other in the turbulent IT-bubble period and its more tranquil aftermath.

This is partly because the former is characterized by a larger number of jumps (that are

attributable to overnight returns) and a stronger New York effect. There is also some

evidence of stronger long-memory in the bubble period. Furthermore, this period experi-

enced multi-scaling so that the traditional time-scale invariant square-root scaling would

have been improper; intraday speculators and long-term investors faced inherently differ-

ent dynamics (in the sense of energy) that should be accounted for in risk management.

The removal of the intradaily periodicity does not change these findings qualitatively so

they are indeed genuine differences between the two periods. Another finding is that in the

aftermath period the scaling law behaves in a nicer fashion implying that the form of the

scaling law is not time-invariant. The intimate relationship between the scaling law and

the fractional differencing parameter d then implies that long-memory in volatility is not

time-invariant either. Standard models of long-memory do not take this into account. On

top of that, the removal of the volatility periodicity increases the global estimate in both

periods. This increase is larger in the bubble period supposedly because of its originally

stronger periodicity. This suggests that the periodicity must be taken into account too.

A local analysis is applied to have a closer look at the time-varying characteristics of the

scaling law and long-memory in volatility. The results support time-varying long-memory

110

over medium length periods (months, say). A locally stationary stochastic volatility model

with time-varying parameters might therefore be a good choice for these and longer hori-

zons. As expected, the removal of the volatility periodicity stabilizes the local estimate

to some extent and increases its median. The bubble period retains its stronger long-

memory property, giving further support to the original conjecture of different long-range

dynamics. Again, the significant effect of the removal of the periodicity implies that the

estimation of d by the OLS-based wavelet method is not robust with respect to intraday

volatility seasonalities. Therefore the periodicity should be first taken care of by some

suitable method. In this thesis the Fourier flexible form was considered adequate but it

might well be improved upon by methods that better account for the inherent roughness

of stock market volatility.

Although specific dynamics and the underlying reason for long-memory are not ad-

dressed here in detail, it is probable that the reported higher market activity and hype

caused the bubble period to experience stronger long-memory. While the long-memory

may be argued to be spurious and caused by structural breaks or a trend, the division to

two different periods and the embedded differencing operations of wavelet filters minimize

the ambiguity and support the hypothesis of a true phenomena. The precise identification

of structural breaks and the separation of a trend could however be helpful in evaluating

the economic significance of the estimates. These can be achieved by wavelet methods too

and it is left for future research.

111

.

References

[1] Ait-Sahalia, Yacine (2003): Disentangling Volatility from Jumps. NBER work-

ing paper 9915. Http://www.nber.org/papers/W9915.

[2] Abry, P. — Goncalves, P. — Flandrin, P. (1993): Wavelet-Based Spectral

Analysis of 1/f processes. IEEE International Conference on Acoustics, Speech and

Signal Processing, Munich, Germany.

[3] Abry, P. — Veitch, D. (1998): Wavelet Analysis of Long-Range-Dependent Traf-

fic. IEEE Transactions on Information Theory 44, 2—15.

[4] Andersen, Torben G. — Bollerslev, Tim (1997a): Answering the Critics:

Yes, ARCH Models Do Provide Good Volatility Forecasts. NBER working paper

6023. Http://www.nber.org/papers/w6023.

[5] – (1997b): Heterogenous Information Arrivals and Return Volatility Dynamics:

Uncovering the Long-run in High Frequency Returns. Journal of Finance 52, 975—

1005.

[6] – (1997c): Intraday Periodicity and Volatility Persistance in Financial Markets.

Journal of Empirical Finance 4, 115—158.

[7] – (1998): Deutsche Mark-Dollar Volatility: Intraday Activity Patterns, Macroe-

conomic Announcements, and Longer Run Dependencies. Journal of Finance 53,

219—265.

[8] Andersen, Torben G. — Bollerslev, Tim — Diebold, Francis X. — Labys,

Paul (2000): The Distribution of Realized Exchange Rate Volatility. NBER working

paper 6961. Http://papers.ssrn.com/sol3/papers.cfm?abstract id=210889.

[9] Andersen, Torben G. — Bollerslev, Tim — Diebold, Francis X.

(2003): Some Like it Smooth, and Some Like it Rough: Untangling Con-

112

tinuous and Jump Components in Measuring, Modeling, and Forecasting As-

set Return Volatility. Working paper, Wharton Financial Insitutions Center.

Http://fic.wharton.upenn.edu/fic/papers/03/p0333.html.

[10] Atkins, Frank J. — Sun, Zhen (2003): Using Wavelets to Un-

cover the Fisher Effect. Discussion paper 2003-09, University of Calgary.

Http://econ.ucalgary.ca/research/WP2003-09.pdf.

[11] Bachman, George — Narici, Lawrence — Beckenstein, Edward (2000):

Fourier and Wavelet Analysis. Springer-Verlag, New York, USA.

[12] Bai, Xuezheng — Russell, Jeffrey R. — Tiao, George C.

(2001): Beyond Merton’s Utopia (I): Effects of Non-Normality and De-

pendence on the Precision of Variance Estimates Using High-Frequency

Financial Data. Working paper, University of Chicago. Http://gsb-

www.uchicago.edu/fac/jeffrey.russell/research/variance.pdf.

[13] Baillie, Richard T. (1996): Long Memory Processes and Fractional Integration

in Econometrics. Journal of Econometrics 73, 5—59.

[14] Barndorff-Nielsen, Ole E. — Shephard, Neil (2003a): Power and Bipower

Variation with Stochastic Volatility and Jumps. Manuscript, Oxford University.

[15] – (2003b): Econometrics of Testing for Jumps in Financial Economics Using

Bipower Variation. Manuscript, Oxford University.

[16] Bayraktar, Erhan — Poor, H. Vincent — Sircar, K. Ron-

nie (2000): Estimating the Fractal Dimension of the S&P 500 In-

dex Using Wavelet Analysis. Submitted manuscript, Princeton University.

Http://www.math.lsa.umich.edu/˜erhan/SP500.pdf.

[17] Beran, Jan (1994): Statistics for Long Memory Processes, Volume 61 of Mono-

graphs on Statistics and Applied Probability. Chapman and Hall, New York.

[18] Bhattacharya, R.N. — Gupra, V.K. — Waymire, E. (1983): The Hurst Effect

under Trends. Journal of Applied Probability 20, 649—662.

113

[19] Black, Fischer (1976): Studies of Stock Price Volatility Changes. Proceedings of

the 1976 Meetings of the American Statistical Association, Business and Economics

Statistics Section, 177—181.

[20] Blair, Bevan J. — Poon, Ser-Huang — Taylor, Stephen J. (2001): Forecast-

ing S&P 100 Volatility: the Incremental Information Content of Implied Volatilities

and High-Frequency Stock Returns. Journal of Econometrics 105, 5—26.

[21] Bollerslev, Tim (1986): Generalized Autoregressive Conditional Heteroskedas-

ticity. Journal of Econometrics 31, 307—327

[22] – (2001): Financial Econometrics: Past Developments and Future Challenges.

Journal of Econometrics 100, 41—51.

[23] Bollerslev, Tim — Chou, Ray Y. — Kroner, Kenneth F. (1992): ARCH

Modeling in Finance; A Review of the Theory and Empirical Evidence. Journal of

Econometrics 52, 5—59.

[24] Bollerslev, Tim — Engle, Robert F. — Nelson, Daniel B. (1994): ARCH

Models. In Engle and McFadden (eds.), Handbook of Econometrics, Vol. IV, 2959—

3038. Elsevier Science.

[25] Bollerslev, Tim — Cai, Jun — Song, Frank M. (2000): Intraday Periodic-

ity, Long Memory Volatility, and Macroeconomic Announcement Effects in the US

Treasury Bond Market. Journal of Empirical Finance 7, 37—55.

[26] Bollerslev, Tim — Mikkelsen, Hans-Ole (1996): Modeling and Pricing Long-

Memory in Stock Market Volatility. Journal of Econometrics 73, 151—184.

[27] Bollerslev, Tim — Wright, Johathan H. (2000): Semiparametric Estimation

of Long-Memory Volatility Dependencies: The Role of High-Frequency Data. Journal

of Econometrics 98, 81—106.

[28] Bougerol, Philippe — Picard, Nico (1992): Stationarity of GARCH Processes

and of Some Nonnegative Time Series. Journal of Econometrics 52, 115—127.

114

[29] Breidt, F. Jay — Crato, Nuno — de Lima, Pedro (1998): The Detection and

Estimation of Long Memory in Stochastic Volatility. Journal of Econometrics 83,

325—348.

[30] Broto, Carmen — Ruiz, Esther (2002): Estimation Methods for Stochastic

Volatility Models: A Survey. Working paper 02-54 (14), Universidad Carlos III de

Madrid.

[31] Bruce, A.G. — Gao, H.-Y. (1996): Applied Wavelet Analysis with S-PLUS.

Springer, New York.

[32] Campbell, John Y. — Lo, Andrew W. — MacKinlay, A. Craig (1997): The

Econometrics of Financial Markets. Second corrected printing, Princeton University

Press.

[33] Carnero, M. Angeles — Pena, Daniel — Ruiz, Esther (2001): Is Stochastic

Volatility More Flexible than GARCH? Working paper 01-08, Universidad Carlos

III de Madrid.

[34] Chew, Cyrene (2001): The Money and Income Relationship of European Coun-

tries by Time Scale Decomposition Using Wavelets. Preliminary version, New York

University. Http://homepages.nyu.edu/˜cc486/paper.pdf.

[35] Clark, Peter K. (1973): A Subordinated Stochastic Process Model with Finite

Variance for Speculative Prices. Econometrica 41, 135—155.

[36] Cont, Rama (2001): Empirical Properties of Asset Returns: Stylized Facts and

Statistical Issues. Quantitative Finance 1, 223—236.

[37] Cooley, James W. — Tukey, John W. (1965): An Algorithm for the Machine

Calculation of Complex Fourier Series. Mathematics of Computation 19, 297—301.

[38] Corsi, Fulvio — Zumbach, Gilles — Muller, Ulrich — Dacorogna,

Michel (2001): Consistent High-Precision Volatility from High-Frequency Data.

Working paper, Olsen Ltd. Http://www.olsen.ch/research/working papers.html.

115

[39] Craigmile, Peter F. — Percival, Donald B. (2002): Asymptotic Decor-

relation of Between-Scale Wavelet Coefficients. Submitted for review, Ohio State

University. Http://www.stat.ohio-state.edu/˜pfc/research/papers/decorrelate.pdf.

[40] Craigmile, Peter F. — Percival, Donald B. — Guttorp, Peter (2000):

The Impact of Wavelet Coefficient Correlations on Fractionally Differenced Process

Estimation. Technical report 049, The National Research Center for Statistics and

the Environment. Http://www.nrcse.washington.edu/pdf/trs49 wave.pdf.

[41] Craigmile, Peter F. — Guttorp, Peter — Percival, Donald B. (2004):

Wavelet Based Estimation for Trend Contaminated Fractionally Differenced Pro-

cesses. Technical report 077, The National Research Center for Statistics and the

Environment. Http://www.nrcse.washington.edu/pdf/trs77.pdf.

[42] Dacorogna, Michel M. — Muller, Ulrich A. — Nagler, Robert J. —

Olsen, Richard B. — Pictet, Olivier V. (1993): A Geographical Model for

the Daily and Weekly Seasonal Volatility in the Foreign Exchange Market. Journal

of International Money and Finance 12, 413—438.

[43] Dacorogna, Michel M. — Gencay, Ramazan — Muller, Ulrich A. —

Olsen, Richard B. — Pictet, Olivier V. (2001): An Introduction to High-

Frequency Finance. Academic Press.

[44] Dahlhaus, R. (1996): On the Kullback—Leibler Information Divergence of Locally

Stationary Processes. Stochastic Processes and their Applications 62, 139—168.

[45] – (1997): Fitting Time Series Models to Nonstationary Processes. The Annals of

Statistics 25, 1—37.

[46] Daubechies, I. (1988): Orthonormal Bases of Compactly Supported Wavelets.

Communications of Pure and Applied Mathematics 41, 909—996.

[47] – (1990): The Wavelet Transform, Time-Frequency Localization and Signal Anal-

ysis. IEEE Transactions on Information Theory 36, 961—1005.

116

[48] Davidian, Marie — Carroll, Raymond J. (1987): Variance Function Estima-

tion. Journal of American Statistical Association 82, 1079—1091.

[49] Deo, Rohit S. — Hurvich, Clifford M. (2001): On the Log Periodogram Re-

gression Estimator of the Memory Parameter in Long Memory Stochastic Volatility

Models. Econometric Theory 17, 686—710.

[50] Diebold, Francis X. (2004): The Nobel Memorial Prize for Robert

F. Engle. Working paper, Wharton Financial Institutions Center.

Http://fic.wharton.upenn.edu/fic/papers/04/p0409.html.

[51] Diebold, Francis X. — Nerlove, Marc (1989): The Dynamics of Exchange

Rate Volatility: A Multivariate Latent Factor ARCH Model. Journal of Applied

Econometrics 4, 1—21.

[52] Diebold, Francis X. — Hickman, Andrew — Inoue, Atsushi — Schuer-

mann, Til (1997): Converting 1-Day Volatility to h-Day Volatility: Scaling by√h

is Worse than You Think. Working paper, Wharton Financial Institutions Center.

Http://fic.wharton.upenn.edu/fic/wfic.html.

[53] Diebold, Francis X. — Inoue, Atsushi (2001): Long Memory and Regime

Switching. Journal of Econometrics 105, 131—159.

[54] Dijkerman, R. — Mazumdar, R. (1994): On the Correlation Structure of Wavelet

Coefficients of Fractional Brownian Motion. IEEE Transactions on Information The-

ory 40, 1609—1612.

[55] Ding, Zhuanxin — Granger, Clive W.J. — Engle, Robert F. (1993): A

Long Memory Property of Stock Market Returns and a New Model. Journal of

Empirical Finance 1, 83—106.

[56] Ding, Zhuanxin — Granger, Clive W.J. (1996): Modeling Volatility Per-

sistance of Speculative Returns: A New Approach. Journal of Econometrics 73,

185—215.

117

[57] Donoho, David L. (1992): De-Noising via Soft-Thresholding. Tech-

nical report, Department of Statistics, Stanford University. Http://www-

stat.stanford.edu/˜donoho/reports.html.

[58] Donoho, David L. — Johnstone, Iain M. (1992): Ideal Spatial Adaptation via

Wavelet Shrinkage. Technical report, Department of Statistics, Stanford University.

Http://www-stat.stanford.edu/˜donoho/reports.html.

[59] Donoho, David L. — Johnstone, Iain M. — Kerkyacharian, Gerard

— Picard, Dominique (1993): Density Estimation by Wavelet Thresholding.

Technical report, Department of Statistics, Stanford University. Http://www-

stat.stanford.edu/˜donoho/reports.html.

[60] – (1995): Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society,

Series B (Methodological) 57, 301—369.

[61] Engle, Robert F. (1973): Band Spectrum Regression. International Economic

Review 15, 1—11.

[62] – (1982): Autoregressive Conditional Heteroskedasticity with Estimates of the

Variance of United Kingdom Inflation. Econometrica 50, 987—1008.

[63] – (2000): The Econometrics of Ultra High Frequency Data. Econometrica 68, 1—22.

[64] Engle, Robert F. — Bollerslev, Tim (1986): Modelling the Persistence of

Conditional Variances. Econometric Reviews 5, 1-50.

[65] Engle, Robert F. — Ito, Takatoshi — Lin, Wen-Ling (1990): Meteor Showers

or Heat Waves? Heteroskedastic Intra Daily Volatility in the Foreign Exchange

Market. Econometrica 58, 525—542.

[66] Engle, Robert F. — Mustafa, Chowdhury (1992): Implied ARCH Models

from Options Prices. Journal of Econometrics 52, 289—311.

[67] Engle, Robert F. — Ng, Victor K. (1993): Measuring and Testing the Impact

of News on Volatility. Journal of Finance 48, 1749—1778.

118

[68] Engle, Robert F. — Russell, J.R. (1998): Autoregressive Conditional Duration:

A NewModel for Irregularly Spaced Transaction Data. Econometrica 66, 1127—1162.

[69] Fama, Eugene F. (1965): The Behavior of Stock Market Prices. Journal of Busi-

ness 38, 34—105.

[70] – (1970): Efficient Capital Markets: A Review of Theory and Empirical Work.

Journal of Finance 25, 383—417.

[71] Felixson, Karl (2004): Finnish Short-Term Stock Returns. PhD thesis, Swedish

School of Economics and Business Administration.

[72] Flandrin, Patrick (1992): Wavelet Analysis and Synthesis of Fractional Brown-

ian Motion. IEEE Transactions on Information Theory 38, 910—917.

[73] Fung, William K.H. — Hsieh, David A. (1991): Empirical Analysis of Implied

Volatility: Stocks, Bonds and Currencies. Working paper, Fuqua School of Business.

Http://faculty.fuqua.duke.edu/˜dah7/WP1991.pdf.

[74] Gallant, A. Ronald (1981): On the Bias in Flexible Functional Forms and an

Essentially Unbiased Form. Journal of Econometrics 15, 211—245.

[75] – (1982): Unbiased Determination of Production Technologies. Journal of Econo-

metrics 20, 285—323.

[76] Garman, Mark B. — Klass, Michael J. (1980): On the Estimation of Security

Price Volatilities from Historical Data. Journal of Business 53, 67—78.

[77] Gencay, Ramazan — Selcuk, Faruk — Whitcher, Brandon (2001): Scaling

Properties of Foreign Exchange Volatility. Physica A 289, 249—266.

[78] – (2002a): An Introduction to Wavelets and Other Filtering Methods in Finance

and Economics. Academic Press.

[79] – (2002b): Asymmetry of Information Flow Between Volatili-

ties Across Time Scales. Submitted manuscript, University of Wind-

119

sor, Bilkent University, National Center for Atmospheric Research.

Http://www.cgd.ucar.edu/˜whitcher/papers/whmm.pdf.

[80] – (2002c): Robustness of Systematic Risk Across Time Scales. Submitted

manuscript, University of Windsor, Bilkent University, National Center for Atmo-

spheric Research. Http://www.sfu.ca/˜rgencay/jarticles/jimf-capm.pdf.

[81] Geweke, J. — Porter-Hudak, S. (1983): The Estimation and Application of

Long Memory Time Series Models. Journal of Time Series Analysis 4, 221—238.

[82] Ghysels, Eric — Harvey, Andrew — Renault, Eric (1995):

Stochastic Volatility. Working paper, CIRANO Scientific Series.

Http://www.cirano.qc.ca/pdf/publication/95s-49.pdf.

[83] Ghysels, Eric — Santa-Clara, Pedro — Valkanov, Rossen (2003): Predict-

ing Volatility: Getting the Most out of Return Data Sampled at Different Frequen-

cies. Presented at the Conference ”New Frontiers of Financial Volatility Modeling”,

Florence, Italy, May 25—27.

[84] Goodhart, Charles A.E. — O’Hara, Maureen (1997): High Frequency Data

in Financial Markets: Issues and Applications. Journal of Empirical Finance 4,

73—114.

[85] Gourieroux, Christian — Jasiak, Joann (2001): Financial Econometrics.

Princeton University Press, Princeton.

[86] Granger, Clive W.J. — Joyeux, R. (1980): An Introduction to Long Memory

Time Series Models and Fractional Differencing. Journal of Time Series Analysis 1,

15—29.

[87] Granger, Clive W.J. — Ding, Zhuanxin (1996): Varieties of Long Memory

Models. Journal of Econometrics 73, 61—77.

[88] Granger, Clive W.J. — Hyung, Namwon (1999): Occasional Structural Breaks

and Long Memory. Discussion paper 99-14, University of California, San Diego.

Http://www.econ.ucsd.edu/papers/files/ucsd9914.pdf.

120

[89] Guillaume, D.M. — Dacorogna, M.M. — Dave, R.D. — Muller, U.A. —

Olsen, R.B. — Pictet, O.V. (1997): From the Bird’s Eye to the Microscope: a

Survey of New Stylized Facts of the Intra-Daily Foreign Exchange Markets. Finance

and Stochastics 1, 95—129.

[90] Hamilton, James D. (1994): Time Series Analysis. Princeton University Press.

[91] Harris, Lawrence (1986): A Transaction Data Study of Weekly and Intradaily

Patterns in Stock Returns. Journal of Financial Economics 16, 99—117.

[92] Harvey, Andrew (1998): Long Memory in Stochastic Volatility. In Stephen

Satchell and John Knight (eds.), Forecasting Volatility in Financial Markets, 307—

320. Butterworth-Heinemann, Oxford.

[93] Harvey, Andrew — Ruiz, Esther — Shephard, Neil (1994): Multivariate

Stochastic Variance Models. Review of Economic Studies 61, 247-64

[94] Hentschel, Ludger (1995): All in the Family Nesting Symmetric and Asymmet-

ric GARCH Models. Journal of Financial Economics 39, 71—104.

[95] Hess-Nielsen, N. — Wicherhauser, M.V. (1996): Wavelets and Time-

Frequency Analysis. Proceeding of the IEEE 84, 523—540.

[96] Hogan, Kedreth C. — Melvin, Michael T. (1994): Sources of Meteor Show-

ers and Heat Waves in the Foreign Exchange Market. Journal of International Eco-

nomics 37, 239—247.

[97] Hol, Eugenie — Koopman, Siem Jan (2002): Stock Index Volatility

Forecasting with High Frequency Data. Discussion paper, Tinbergen Institute.

Http://www.tinbergen.nl/discussionpapers/02068.pdf.

[98] Hosking, J.R.M. (1981): Fractional Differencing. Biometrika 68, 165—176.

[99] Hubbard, Barbara Burke (1998): The World According to Wavelets: The Story

of a Mathematical Technique in the Making. Second edition, A.K. Peters, USA.

121

[100] Hurvich, Clifford M. — Beltrao Kaizo I. (1993): Asymptotics for the Low-

Frequency Ordinates of the Periodogram of a Long-Memory Time Series. Journal of

Time Series Analysis 14, 455—472.

[101] Hardle, Wolfgang — Kerkyacharian, Gerard — Picard, Dominique —

Tsybakov, Alexander (1998): Wavelets, Approximation, and Statistical Appli-

cations. Springer, New York.

[102] Jensen, Mark J. (1998): An Approximate Wavelet MLE of Short and Long-

Memory Parameters. Studies in Nonlinear Dynamics & Econometrics 3, Article 5.

[103] – (1999): Using Wavelets to Obtain a Consistent Ordinary Least Squares Estimator

of the Long-Memory Parameter. Journal of Forecasting 18, 17-32.

[104] – (2000): An Alternative Maximum Likelihood Estimator of Long-Memory Pro-

cesses Using Compactly Supported Wavelets. Journal of Economic Dynamics and

Control 24, 361—387.

[105] Jensen, Mark J. — Whitcher, Brandon (2000): Time-

Varying Long-Memory in Volatility: Detection and Estimation with

Wavelets. Technical report, University of Missouri and EURANDOM.

Http://www.cgd.ucar.edu/˜whitcher/papers/vol.pdf.

[106] Kroner, Kenneth F. — Ng, Victor K. (1998): Modeling Asymmetric Comove-

ments of Asset Returns. Review of Financial Studies 11, 817—844.

[107] Lamoureux, Christopher G. — Lastrapes, William D. (1990a): Het-

eroskedasticity in Stock Return Data: Volume versus GARCH Effects. Journal of

Finance 45, 221—229.

[108] – (1990b): Persistance in Variance, Structural Change, and the GARCH Model.

Journal of Business and Economic Statistics 8, 225—234.

[109] Lo, Andrew W. — MacKinlay, A. Craig (1999): A Non-Random Walk Down

Wall Street. Princeton University Press, New Jersey.

122

[110] Lobato, L.N. — Savin, N.E. (1998): Real and Spurious Long-Memory Properties

of Stock-Market Data. Journal of Business and Economic Statistics 16, 261—267.

[111] Lockwood, Larry J. — Linn, Scott C. (1990): An Examination of Stock Mar-

ket Return Volatility During Overnight and Intraday Periods, 1964—1989. Journal

of Finance 45, 591—601.

[112] Lynch, Paul E. — Zumbach, Gilles O. (2003): Market Hetero-

geneities and the Causal Structure of Volatility. Working paper, Olsen Ltd.

Http://www.olsen.ch/research/working papers.html.

[113] Mallat, Stephane G. (1989): A Theory for Multiresolution Signal Decompo-

sition: the Wavelet Representation. IEEE Transactions on Pattern Analysis and

Machine Intelligence 11, 674-693.

[114] – (1998): A Wavelet Tour of Signal Processing. Second edition, Academic Press,

San Diego.

[115] Mandelbrot, Benoit (1963): The Variation of Certain Speculative Prices. Jour-

nal of Business 36, 394—419.

[116] Martens, Martin — Chang, Yuan-Chen — Taylor, Stephen J. (2002): A

Comparison of Seasonal Adjustment Methods when Forecasting Intraday Volatility.

Journal of Financial Research XXV, 283—299.

[117] McCoy, Emma J. — Walden, Andrew T. (1996): Wavelet Analysis and Synthe-

sis of Stationary Long-Memory Processes. Journal of Computational and Graphical

Statistics 5, 26-56.

[118] McKenzie, Michael D. (1999): Power Transformation and Forecasting the Mag-

nitude of Exchange Rate Changes. International Journal of Forecasting 15, 49—55.

[119] Melino, Angelo — Turnbull, Stuart M. (1990): Pricing Foreign Currency

Options with Stochastic Volatility. Journal of Econometrics 45, 239—265.

123

[120] Merton, Robert C. (1980): On Estimating the Expected Return on the Market:

An Exploratory Investigation. Journal of Financial Economics 8, 323—361.

[121] Meyer, Yves (1994): Wavelets: Algorithms and Applications. Society for Indus-

trial and Applied Mathematics.

[122] Mikosch, Thomas — Starica, Catalin (2004): Non-Stationarities

in Financial Time Series, the Long Range Dependence and the

IGARCH Effects. Submitted paper, Chalmers University of Technology.

Http://www.math.chalmers.se/˜starica/15.04.02.lm.thomas.pdf.

[123] Muller, U.A. — Dacorogna, M.M. — Olsen, R.B. — Pictet, O.V. —

Schwarz, M. — Morgenegg, C. (1990): Statistical Study of Foreign Exchange

Rates, Empirical Evidence of a Price Change Scaling Law, and Intraday Analysis.

Journal of Banking and Finance 14, 1189—1208.

[124] Muller, U.A. — Dacorogna, M.M. — Dave, R.D. — Pictet, O.V. — Ward,

R.B. (1993): Fractals and Intrinsic Time, a Challenge to Econometricians. Working

paper, Olsen Ltd. Http://www.olsen.ch/research/working papers.html.

[125] Muller, Ulrich A. — Dacorogna, Michel M. — Dave, Rakhal D. — Olsen,

Richard B. — Pictet, Olivier V. — von Weizsacker Jacob E. (1997):

Volatilities of Different Time Resolutions — Analyzing the Dynamics of Market Com-

ponents. Journal of Empirical Finance 4, 213—239.

[126] Nason, Guy P. — von Sachs, Rainer (1999): Wavelets in Time Series Analysis.

Philosophical Transactions of the Royal Society of London, Series A 357, 2511—2526.

[127] Nelson, D.B. (1988): Time Series Behavior of Stock Market Volatility and Re-

turns. PhD thesis, MIT.

[128] Norsworthy, John R. — Li, Ding — Gorener, Rifat (2000): Wavelet-Based

Analysis of Time Series: An Export from Engineering to Finance. Proceedings of

the 2000 IEEE International Engineering Management Society Conference, Albu-

querque, New Mexico. Http://www.norsworthy.net/papers.php.

124

[129] Ogden, Todd (1997): On Preconditioning the Data for the Wavelet Transform

When the Sample Size is Not a Power of Two. Communications in Statistics B 26,

267—285.

[130] Osler, C.L. (1995): Exchange Rate Dynamics and Speculator Horizons. Journal

of International Money and Finance 14, 695—719.

[131] Parkinson, Michael (1980): The Extreme Value Method for Estimation Variance

of the Rate of Return. Journal of Business 53, 61—65.

[132] Percival, Donald B. — Mojfeld, Harold O. (1997): Analysis of Subtidal

Coastal Sea Level Fluctuations Using Wavelets. Journal of the American Statistical

Association 92, 868—880.

[133] Percival, Donald B. — Walden, Andrew T. (2000): Wavelet Methods for

Time Series Analysis. Cambridge University Press.

[134] Pollock, D.S.G. — Lo Cascio, Iolanda (2003): Orthogonality Conditions for

Non-Dyadic Wavelet Analysis. Manuscript, Queen Mary, University of London.

[135] – (2004): Adapting Discrete Wavelet Analysis to the Circumstances of Economics.

Manuscript, Queen Mary, University of London.

[136] Polzehl, Jorg — Spokoiny, Vladimir — Starica, Catalin (2004): When

Did the 2001 Recession Really Start? Submitted paper, Chalmers University of

Technology. Http://www.math.chalmers.se/˜starica/paper2004 5 3.pdf.

[137] Poon, Ser-Huang — Granger, Clive W.J. (2003): Forecasting Volatility in

Financial Markets: A Review. Journal of Economic Literature XLI, 478—539.

[138] Poterba, James M. — Summers, Lawrence H. (1986): The Persistance of

Volatility and Stock Market Fluctuations. The American Economic Review 76,

1142—1151.

[139] Press, William H. — Teukolsky, Saul A. — Vetterling, William

T. — Flannery, Brian P. (1992): Numerical Recipes in Fortran 77:

125

The Art of Scientific Computing. Cambridge University Press. Online version:

http://www.library.cornell.edu/nr/bookfpdf.html.

[140] Priestley, M.B. (1988): Nonlinear and Nonstationary Time Series Analysis. Aca-

demic Press, London.

[141] – (1992): Spectral Analysis and Time Series. Academic Press, San Diego.

[142] – (1996): Wavelets and Time-Dependent Spectral Analysis. Journal of Time Series

Analysis 17, 85—103.

[143] Ramsey, James B. (1996): The Contribution of Wavelets to the Analy-

sis of Economic and Financial Data. Unpublished paper, New York University.

Http://www.econ.nyu.edu/user/ramseyj/publish/publish.htm.

[144] – (2002): Wavelets in Economics and Finance: Past and Future. Research report,

New York University. Http://www.econ.nyu.edu/cvstarr/working/2002/RR02-

02.PDF.

[145] Ramsey, James B. — Lampart, Camille (1998a): Decomposition of Economic

Relationships by Time Scale Using Wavelets: Money and Income. Macroeconomic

Dynamics 2, 49—71.

[146] – (1998b): The Decomposition of Economic Relationships by Time Scale Using

Wavelets: Expenditure and Income. Studies in Nonlinear Dynamics and Economet-

rics 3, 23—42.

[147] Ray, Bonnie K. — Tsay, Ruey S. (2000): Long-Range Dependence in Daily

Stock Volatilities. Journal of Business and Economic Statistics 18, 254—262.

[148] Renaud, O. — Starck, J.-L. — Murtagh, F. (2002): Wavelet-Based Forecasting

of Short and Long Memory Time Series. Cahiers du departement d’econometrie,

Faculte des sciences economiques et sociales, Universite de Geneve.

[149] Rioul, O. (1992): Simple Regularity Criteria for Subdivision Schemes. SIAM Jour-

nal on Mathematical Analysis 23, 1544—1576.

126

[150] Robinson, Peter M. (1995): Log-Periodogram Regression of Time Series with

Long Range Dependence. Annals of Statistics 23, 1048—1072.

[151] Roll, Richard (1984): A Simple Implicit Measure of the Effective Bid-Ask Spread

in an Efficient Market. Journal of Finance 39, 1127—1139.

[152] Schleicher, Christoph (2002): An Introduction to Wavelets for Economists.

Working paper 2002-3, Monetary and Financial Analysis Department, Bank of

Canada. Http://www.bankofcanada.ca/en/res/wp02-3.htm.

[153] Schwert, G. William (1989): Why Does Stock Market Volatility Change Over

Time? Journal of Finance 44, 1115—1153.

[154] Serroukh, A. — Walden, A.T. — Percival D.B. (2000): Statistical Properties

and Uses of the Wavelet Variance Estimator for the Scale Analysis of Time Series.

Journal of the American Statistical Association 95, 184—196.

[155] Shann, W.C. — Yen, C.C. (1999): On the Exact Values of Orthonormal Scaling

Coefficients of Lengths 8 and 10. Applied and Computational Harmonic Analysis 6,

109—112.

[156] Shiryaev, A.N. (1999): Essentials of Stochastic Finance: Facts, Models, and The-

ory. World Scientific, Singapore.

[157] Shleifer, Andrei — Vishny, Robert W. (1990): Equilibrium Short Horizons

of Investors and Firms. American Economic Review 80, Papers and Proceedings of

the Hundred and Second Annual Meeting of the American Economic Association,

148-153.

[158] Steffen, P. — Heller, P.N. — Gopinath, R.A. — Burrus, C.S. (1993): The-

ory of RegularM-band Wavelets. IEEE Transactions on Signal Processing 41, 3497—

3511.

[159] Strang, Gilbert (1993): Wavelet Transforms Versus Fourier Transforms. Bulletin

of American Mathematical Society 28, 288-305.

127

[160] Taylor, Stephen J. (1986): Modeling Financial Time Series. John Wiley & Sons,

New York.

[161] – (1994): Modelling Stochastic Volatility : A Review and Comparative Study.

Mathematical Finance 4, 183-204.

[162] – (2000): Consequences for Option Pricing of a Long Memory in Volatility.

Manuscript, Lancaster University.

[163] Tewfik, A.H. — Kim, M. (1992): Correlation Structure of the Discrete Wavelet

Coefficients of Fractional Brownian Motion. IEEE Transactions on Information The-

ory 38, 904—909.

[164] Tkacz, Greg (2000): Estimating the Fractional Order of Integration of Interest

Rates Using a Wavelet OLS Estimator. Working paper 2000-5, Bank of Canada.

Http://www.bankofcanada.ca/en/res/wp00-5.htm.

[165] Vannucci, Marina — Corradi, Fabio (1999): Covariance Structure of Wavelet

Coefficients: Theory and Models in a Bayesian Perspective. Journal of the Royal

Statistical Society, Series B 61, 971-986.

[166] Velasco, Carlos (1999): Non-Stationary Log-Periodogram Regression. Journal

of Econometrics 91, 325—371.

[167] Vidakovic, Brani (1999): Statistical Modeling by Wavelets. John Wiley & Sons.

[168] Vilasuso, Jon (2002): Forecasting Exchange Rate Volatility. Economics Letters

76, 59—64.

[169] Wasserfallen, W. — Zimmermann, H. (1985): The Behavior of Intraday Ex-

change Rates. Journal of Banking and Finance 9, 55—72.

[170] Whitcher, Brandon — Jensen, Mark J. (2000): Wavelet Estimation of a Local

Long Memory Parameter. Exploration Geophysics 31, 94—103.

[171] Wiggins, James B. (1987): Option Values Under Stochastic Volatility. Journal of

Financial Economics 19, 351—372.

128

[172] Woljtaszczyk, P. (1997): A Mathematical Introduction to Wavelets. Cambridge

University Press, Cambridge, UK.

[173] Wood, Robert A. — McInish, Thomas H. — Ord, J. Keith (1985): An

Investigation of Transaction Data for NYSE Stocks. Journal of Finance 40, 723—

739.

[174] Wright, Jonathan H. (2000): Log-Periodogram Estima-

tion of Long Memory Volatility Dependencies with Conditionally

Heavy Tailed Returns. International finance discussion paper 685.

Http://www.federalreserve.gov/pubs/ifdp/2000/685/ifdp685.pdf.

129

A Appendix: Some functional analysis

The following can be found from any basic book on functional analysis (e.g. Bachman et

al. (2000)).

For 1 ≤ p < ∞, the collection of pth power integrable functions on E (a Lebesgue-

measurable set), denoted Lp(E), is equipped with the p-norm kfkp:

Lp(E) =

½f : E →K :

ZE

|f(t)|p dt <∞¾,

kfkp =¯ZE

|f(t)|p dt¯1/p

.

If the interval is finite, then E = [a, b] and if the interval is the whole real line, then

E = R. These cases are denoted by Lp[a, b] and Lp(R), respectively, but because usually

p = 1 or 2, one has L1(R) or L2(R).

For 1 ≤ p < ∞, the collection of pth power summable sequences, denoted `p, isequipped with the p-norm kxkp, x = (an) ∈ `p:

`p(N) = `p =

((an) ∈ K∞ :

Xn∈N

|an|p <∞),

kxkp =ÃXn∈N

|an|p!1/p

.

For f, g ∈ L2(R), the inner product is given by

hf, gi =Z +∞

−∞f(t)g(t)dt.

For x = (ai) and y = (bi) in Kn, the inner product is given by

hx, yi =nXi=1

aibi.

Notice that two vectors x and y are said to be orthogonal if hx, yi = 0. A sequence(xn) in Hilbert space is orthonormal if

hxm, xni = δmn =

1, (m = n)

0, (m 6= n)∀m,n.

130

Example 23 A well-known fact in Fourier analysis is that sines and cosines form an

orthonormal sequence in L2[−π,π] :Z π

−πeinte−imtdt = 2πδmn = 2π

1, (m = n)

0, (m 6= n)∀m,n ∈ Z.

Definition 24 Cauchy sequence. A sequence (an) in a metric space (X, d) is called a

Cauchy sequence if d(ai, aj)→ 0, i, j →∞.

Definition 25 Completeness. A metric space (X, d) is called complete if every

Cauchy sequence converges, i.e., if there exists a point a in X such that for every Cauchy

sequence (an) in X it holds that d(aj, a)→ 0, j →∞.

Definition 26 Hilbert space. Complete inner product space is called Hilbert space.

Consider two (complete) subspaces M and N of the inner product space X such that

M ∩N = {0}. If any vector x in X =M +N can be written as

x = m+ n, m ∈M, n ∈ N,

then X =M ⊕N is called the (internal) direct sum.

It can be proved that an inner product space X is separable if and only if it has

a complete orthonormal sequence (xn), i.e. a sequence (xn) that is an orthonormal basis.

So in every separable Hilbert space there exists an orthonormal basis. In such a space any

x ∈ X can be uniquely written in the form

x =Xn∈N

hx, xnixn

for any orthonormal basis (xn). The values hx, xni are called the Fourier coefficientsof x and the series

Pn∈N hx, xni is called the Fourier series for x. So in Hilbert spaces

one recovers the ability to write a vector as the sum of its projections ”on the basis

vectors”. For instance, any f ∈ L2(R) can be written with orthogonal wavelets ψj,k asf =

PPj,k∈Z

f,ψj,k

®ψj,k in k·k2 (Bachman et al. (2000, p. 419)).

131

B Appendix: Orthonormal transforms

This section borrows from Percival and Walden (2000, Ch. 3.1).

Let O be an orthonormal real-valued N ×N matrix, i.e. OTO = IN . Let Oj• and O•krefer to the jth row vector and kth column vector, respectively. Then

[N×N ]O =

OT0•

OT1•...

OTN−1•

= [O•0,O•1, ...,O•N−1].One can use this matrix to analyze an arbitrary real-valued time series, given by the

column vector x, in the following way:

[N×1]O = Ox =

OT0•

OT1•...

OTN−1•

x =

OT0•x

OT1•x...

OTN−1•x

=

hx,O0•ihx,O1•i...

hx,ON−1•i

,since OT

j•x =Oj•,x® = x,Oj•® . The column vector O consists of the transform coeffi-

cients for x with respect to the orthonormal transform O. Specifically, the jth transformcoefficient Oj is given by the inner product

x,Oj•

®.

On the other hand, premultiplying both sides of the above equation by OT , and using

orthonormality, one can synthesize the time series x as

[N×1]x =OTO = [O0•,O1•, ...,ON−1•]

O0

O1...

ON−1

=N−1Xj=0

OjOj•.

Furthermore, since Oj =x,Oj•

®, it is possible to re-express the time series x as a unique

linear combination of O0•,O1•, ...,ON−1•:

x =N−1Xj=0

x,Oj•

®Oj•.That is, one has recovered the ability to write a vector as the sum of its projections on

the basis vectors.

132

C Appendix: Fractional differencing and long-memory

The fractional differencing operator (1 − B)d is formally defined by its infiniteMaclaurin series expansion,

(1−B)d .=∞Xk=0

Γ(k − d)Γ(k + 1)Γ(−d)B

k,

where B and Γ(·) denote the lag-operator and the gamma function, respectively (e.g.Breidt et al. (1998, p. 328)). A real-valued discrete parameter fractional ARIMA

(ARFIMA) process {Xt} is often defined with a binomial series expansion (Gencay etal. (2002a, p. 163)),

(1−B)dXt .=∞Xk=0

µd

k

¶(−1)kXt−k.

where µa

b

¶.=

a!

b!(a− b)! =Γ(a+ 1)

Γ(b+ 1)Γ(a− b+ 1) .

These models were introduced by Granger and Joyeux (1980) and Hosking (1981).

In ARFIMA models, the ”long-memory” dependency is characterized solely by the

fractional differencing parameter d. A time series is said to exhibit long-memory

when it has a covariance function γ(j) and a spectrum f(λ) such that they are of the

same order as j2d−1 and λ−2d, as j →∞ and λ → 0, respectively.63 For 0 < d < 1/2, an

ARFIMA model exhibits long-memory, and for −1/2 < d < 0 it exhibits antipersistance.In practice, the range |d| < 1/2 is of particular interest because then an ARFIMA modelis stationary and invertible (Hosking (1981)).

More detailed definitions of long-memory can be found in Beran (1994), for example.

Concerning fractionally integrated processes in econometrics, see Baillie (1996).

63The rate of decay in covariance does not necessarily imply the rate of decay in spectrum, as noted in

Bollerslev and Wright (2000, p. 87). Formal conditions for the equivalance are discussed in Beran (1994),

for example (see also Granger and Ding (1996)).

133

D Appendix: Locally stationary process

Dahlhaus (1996, 1997) defines a locally stationary process Xt,T (t = 0, 1, ..., T − 1) as thetriangular array, with transfer function A0, drift µ, and spectral representation

Xt,T = µ (t/T ) +

Z π

−πeiωtA0t,T (ω)dZ(ω),

where the components satisfy certain technical conditions (see Dahlhaus (1997, Def. 2.1)

or Jensen and Whitcher (2000)). For example, autoregressive processes with time-varying

coefficients are locally stationary (Dahlhaus (1996, Th. 2.3)).

Jensen and Whitcher (2000) give another example of a locally stationary process. It is

constructed by considering a stationary, invertible moving average process Yt with spectral

representation

Yt =

Z π

−πeiωtA(ω)dZ(ω),

where the transfer function is A(ω) = (1 + θe−iωt) /2π and |θ| < 1. If the process Xt,T isnow defined as

Xt,T = µ (t/T ) + σ (t/T )Yt,

where µ,σ : [0, 1]→ R are continuous functions, then Xt,T is a locally stationary process

with the time varying transfer function

A(u,ω) = A0t,T (ω) =σ(u)

2π

¡1 + θe−iωt

¢.

Thus the time-path of Xt,T exhibits the periodic behavior of a stationary moving average

process but with time-varying amplitude equal to σ(u).

134

E Appendix: Fourier flexible form

Following Andersen and Bollerslev (1997c, 1998), intraday returns can be decomposed as

rt,n = E (rt,n) +σtst,nZt,n√

N,

where in their notation N refers to the number of return intervals n per day (i.e. not to

the total length of the series!), σt is the daily volatility factor, and Zt,n is an IID random

variable with mean 0 and variance 1. Notice that st,n, the periodic component for the nth

intraday interval, depends on the characteristics of trading day t. By then squaring both

sides and taking logarithms, define xt,n to be

xt,n.= 2 log [|rt,n − E (rt,n)|]− log σ2t + logN = log s2t,n + logZ

2t,n,

so that xt,n consists of a deterministic and a stochastic component.

The modeling of xt,n is done via non-linear regression in n and σt,

xt,n = f (θ;σt, n) + ut,n,

where ut,n.= logZ2t,n − E

¡logZ2t,n

¢is an IID random variable with mean 0. In practice,

the estimation of f is implemented by the following parametric expression:

f (θ;σt, n) =JXj=0

σjt

"µ0j + µ1j

n

N1+ µ2j

n2

N2+

DXi=1

λij1n=di

+PXp=1

µγpj cos

pn2π

N+ δpj sin

pn2π

N

¶#,

where N1.= (N + 1)/2 and N2

.= (N + 1)(N + 2)/6 are normalizing constants. If one sets

J = 0 andD = 0, then this reduces to the standard FFF proposed by Gallant (1981, 1982).

The trigonometric functions are ideally suited for smoothly varying patterns. Andersen

and Bollerslev (1997c) have argued that in equity markets allowing for J ≥ 1 might beimportant, however. By including cross-terms in the regression allows st,n to depend on

the overall level of volatility on trading day t which is often the case in stock market

data. The actual estimation of f is most easily accomplished using a two-step procedure

described in Andersen and Bollerslev (1997c, App. B).

135

The normalized estimator of the intraday periodic component for interval n on day t

is found to be

bst,n = T exp³ bft,n/2´P[T/N ]

t=1

PNn=1 exp

³ bft,n/2´ , (23)

where T is the total length of the sample and [T/N ] denotes the number of trading days.

The filtered returns (returns free from the volatility periodicity) are then obtained via

ert,n .= rt,n/bst,n.

136

F Appendix: List of abbreviations

Technical abbreviations used in this thesis are for the most part standard:

ACF Autocorrelation function

ARFIMA Autoregressive fractionally integrated moving average [process]

ARSV Autoregressive stochastic volatility [process]

CWT Continuous wavelet transform

D(L) Daubechies extremal phase filter of length L

DFT Discrete Fourier transform

DGP Data generating process

DWT Discrete wavelet transform

FFF Fourier flexible form

FFT Fast Fourier transform

FIR Finite impulse response

FRF Frequency response function (or transfer function)

FWT Fast wavelet transform

(G)ARCH (Generalized) Autoregressive conditional heteroskedastic [process]

GPH Geweke—Porter-Hudak

IID Independent and identically distributed

LA(L) Daubechies least asymmetric filter of length L

LMSV Long memory stochastic volatility [process]

MODWT Maximal overlap discrete wavelet transform

MRA Multiresolution analysis

MSE Mean square error

OLS Ordinary least squares

pDWT Partial discrete wavelet transform

QMR Quadrature mirror relationship

SDF Spectral density function (or Fourier spectrum)

SGF Squared gain function

STFT Short-time Fourier transform

137

Licentiate Thesis: A Multiresolution Analysis of Stock Market

Documents

Licentiate Thesis: A Multiresolution Analysis of Stock Market