Two Empirical Methods for Forecasting Spot Prices and Constructing Price Forward Curves in the Swiss Power Market A master thesis submitted to EIDGEN ¨ OSSISCHE TECHNISCHE HOCHSCHULE Z ¨ URICH Master of Science in Management, Technology and Economics GR ´ EGOIRE CARO Jointly supervized by: Centre for Energy Economics and Policy (CEPE) Department of Management, Technology and Economics Swiss Federal Institute of Technology Dr. Carlos Ord´as Criado Prof. Thomas Rutherford swissQuant Group MSc. Marcus Hildmann Dr. Florian Herzog Z¨ urich, May 2010
74
Embed
TwoEmpiricalMethodsforForecastingSpotPricesand ... · Electricity demand is affected by cycli-cal components which expand over three distinct time horizons linked to global economic
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Two Empirical Methods for Forecasting Spot Prices and
Constructing Price Forward Curves in the Swiss Power
Market
A master thesis submitted to
EIDGENOSSISCHE TECHNISCHE HOCHSCHULE ZURICH
Master of Science in Management, Technology and Economics
GREGOIRE CARO
Jointly supervized by:
Centre for Energy Economics and Policy (CEPE)
Department of Management, Technology and Economics
Swiss Federal Institute of Technology
Dr. Carlos Ordas Criado
Prof. Thomas Rutherford
swissQuant Group
MSc. Marcus Hildmann
Dr. Florian Herzog
Zurich, May 2010
Contents
1 Introduction 1
2 Electricity Market Settings 5
2.1 Physical and Economic Features of Electricity . . . . . . . . . 6
3An arbitrage occurs when one can make a profit out of the difference in prices in twomarkets. As an example, if the forward price was not to converge to the spot price atthe delivery date, there would be an arbitrage opportunity. This convergence property isassumed to hold in all models presented in this paper.
4See Harris [2006].
Chapter 2. Electricity Market Settings 14
where Ft refers to the market price of a forward delivered in date t, S0 is
the current spot price and rf corresponds to the risk-free rate. This formula
proposes a theoretical link between the price of buying a storable commod-
ity and storing it for further usage and the price of buying a right (forward
contract) of getting the commodity when needed in the future. This equality
simply states that the owner of the commodity must be compensated for all
incurred costs. Note that a commodity bought now can be used any time and
therefore provides a convenience yield. The difference between the storage
costs and the convenience yield is called the net convenience yield. Since
electricity cannot be stored, the net convenience yield becomes the conve-
nience yield which is difficult to estimate if the terms of the contracts are not
standardized [Carmona and Ludkovski, 2004].
Electricity futures contracts are defined according to three main criteria:
the delivery date, the length of the delivery period and a daily component
(8am-8pm vs. 24/7). Delivery starts at the beginning of either a month, a
quarter (January, April, July and October) or a year5. Peak futures are de-
livered only between 8am and 8pm whereas base futures are delivered 7/24.
Note that country-specific standards exist. In Germany, the upper limits for
each delivery range are seven months, seven quarters and six years ahead.
In France, these limits are three months, four quarters and three years ahead.
Fig. 2.6 shows a series of typical Peak and Base futures prices where the
horizontal regions indicate delivery period intervals. We employ a representa-
tion called ”implied Futures curve” which allows to build a none overlapping
and continuous sequence of prices for futures with different delivery periods.
5The only exception is for the first monthly contract which covers a delivery for theon-going month.
Chapter 2. Electricity Market Settings 15
It consists in plotting the price of the futures contracts by starting with con-
tracts with the shortest delivery period over the permitted trading horizon
(monthly delivery starting at the beginning of months 1 to 7 for the Ger-
man case), followed by those with longer delivery ranges (quarterly contracts
starting in the next 2nd to 7th quarters and finally yearly contracts delivered
in next 2nd to 6th year).
Aug09 Dec10 May12 Sep13 Feb15 Jun1630
40
50
60
70
80
90
100
Pric
e/M
Wh
Futures BaseFutures Peak
Figure 2.6: Implied German Futures.
2.2.3 The Hourly Price Forward Curve
Futures curves such as the one in Fig. 2.6 provide information about
the long-term expectations of the market players. However, products with
maturities exceeding 6-12 months are rather scarce and interpolating a small
amount of points with similar (but not identical) characteristics is not an
optimal way to price futures. Looking at forward prices would be of little
help. As noted by Fleten and Lemming [2003], forward contracts are usually
exchanged in large chunks which involves problems for finding prices for
specific maturity times or for the more general purpose of constructing a high-
resolution term-structure curve. This is why futures or forward curves gain at
being complemented with high-resolution Price Forward Curves. The Hourly
Price Forward Curves (HPFC) is such a curve and it describes the prices as
Chapter 2. Electricity Market Settings 16
of today for the delivery of electricity at each hour in the future. It represents
the term-structure of forward prices in hourly resolution. It is important to
remember that the HPFC is not a forecast of the spot price. One can hardly
say whether today’s futures price will reflect the spot price at the delivery
date. This depends on many uncertain factors such as the global economic
climate, oil prices and agents irrational behaviour. Therefore, assessing the
quality of the generated forward curves is not straightforward.
2.3 The Swiss Electricity Market
Compared to other West-European electricity markets, the Swiss market
has three major differences: the relative importance of hydraulic power, its
particular position in the middle of Europe and its relative youth.
2.3.1 Power Production
Due to the federal system, the Swiss Power Market is highly fragmented.
About 900 companies are active in production and retailing, most of them
working at a cantonal or regional scale. However, only one transporter is in
charge of the grid: Swissgrid. Like 75% of the power utilities, Swissgrid is
state-owned.
With 56 % of electricity produced from hydropower plants, Switzerland
has a particularly high share of hydropower in its total electricity output.
Combined with 5 nuclear plants representing about 35% of the total produc-
tion, Switzerland has almost CO2-free power generation.
Chapter 2. Electricity Market Settings 17
Figure 2.7: Swiss power production in 2008. Source: Swiss Federal Office forEnergy - Schweizerische Gesamtenergiestatistik 2008.
2.3.2 A Key-Location
The location of Switzerland in the heart of Europe makes it an important
transit country. The main transit axis is the French-Italian one (see Fig. 2.9).
Switzerland has been for years a net power exporter, but has become in 2007
a net importer. With a total production of 64 TWh and a total consumption
of 63 TWh in 2008, the Swiss trade balance of energy is almost null, but there
are strong seasonal and daily variations. Switzerland cannot rely entirely on
hydropower from winter to late spring because of the snow that starts melt-
ing only in spring (see Fig. 2.8). The reservoirs reach their lowest level in
May and get filled in summer when precipitations are important. In order
to keep a safety margin for spring in case of winters with excessive heating,
the Swiss market is a net importer in winter (-4.5 TWh). By contrast, once
the reservoir starts to fill up again in late spring Switzerland becomes a net
exporter until the end of summer (+5.5 TWh for summer 2008).
Interestingly enough, the large hydropower capacity of Switzerland give
it a comparative advantage with respect to its neighbors. Since the only way
to store electrical power is by pumping the water up to the reservoirs, and
since water plants can deliver electricity at any time, Switzerland has become
Chapter 2. Electricity Market Settings 18
Dec07 Apr08 Jul08 Oct08 Jan09 May09 Aug090
0.2
0.4
0.6
0.8
1
Time
Leve
l (%
)
Figure 2.8: Average reservoirs level in Switzerland. The filling is done inSummer, when precipitations are the most important. To avoid shortage inSpring the Swiss market is a net importer in Winter. Source: Swiss FederalOffice of Energy - Schweizerische Gesamtenergiestatistik 2008.
a supplier for the EU-neighbours at peak hours. Indeed, as one can hardly
slow down a nuclear or a thermal plant, the energy price drops significantly
during off-peak hours. Swiss generators take advantage of this situation by
pumping water into their dams during off-peak hours and sell their ‘stored’
electricity during peak hours. With a 8.5 TWh water storage capacity, Swiss
generators have enough capacity to meet demand from home and from abroad
during peak hours. This profit scenario applies especially to France which
relies heavily on nuclear power: as shown in Fig. 2.9, France is the only net
exporter of electricity to Switzerland. This interdependence between France
and Switzerland will be exploited to build the HPFC in section 5.2
2.3.3 Financial Data
The only financial indexes available for the Swiss power market are the
Swiss Electricity Price Index (SWEP) and the Swiss Electricity Index (Swis-
six). The SWEP is a local indicator of one-day-ahead over-the-counter prices.
It was launched in 1998 and it became the first wholesale electricity price in-
dex published on the European continent. The Swissix is the average price
Chapter 2. Electricity Market Settings 19
Figure 2.9: Trade balance (yearly-based and winter-based). Source: Centrefor Energy Policy and Economics, Swiss Federal Institute of Technology -Report: Electricity gap; ways to face the challenge.
at the European Energy Exchange (EEX) for next-day deliveries in the Swiss
grid, hourly based, with base and peak series. It was launched in 2006. Rel-
atively to the SWEP it has a wider range and is not affected by local effect6.
The Swissix is the reference spot price for this study.
Although these two indicators give a good idea of the historical evolution
of electricity prices, they capture only around 10%7 of the Swiss market. The
remaining 90% are over-the-counter contracts (not referenced by the SWEP),
largely influenced by public service obligations and therefore with biased, un-
referenced prices. This is a usual problem often encountered in the rest of
the European countries as well.
Finally, the main issue with the Swiss financial data, relatively to France
and Germany, is the absence of futures products. The only financial data
we possess are the spot price which is past-oriented. so that there is no
6Since there is no official price for over-the-counter trading, the SWEP is just thevolume-weighted average at the 380-kV Laufenburg’s grid hub.
7IEA [2007].
Chapter 2. Electricity Market Settings 20
long-term expectations indicators. The models developed in the literature to
build Price Forward Curves therefore cannot be directly applied. This issue
is addressed in Section 5.2.
Chapter 3
Literature Review
Since the beginning of the liberalization of the European Energy market,
papers related to predicting energy consumption or prices have flourished
with the purpose of improving risk management. The HPFC is a key tool
to optimize power plants production capacities and better estimate firms’
upcoming income. Energy producers can also use load models to make real-
time scheduling of electricity generation. In this literature review, we focus
on the papers related to price forecasting and the construction of forward
curves.
3.1 Load Models
Given the strong correlation existing between the electrical load and the
electricity spot prices, load models can help in identifying the main drivers
of electricity prices. These models generally assume a deterministic path and
employ hourly data to forecast up to seven days ahead. Based on load data
from Brazil, Soares and Medeiros [2008] compares a purely stochastic-trend
model (SARIMA-type) with an autoregressive model with a flexible deter-
21
Chapter 3. Literature Review 22
ministic trend component (TLSAR-type) and conclude that a deterministic-
based approach performs better for short run forecasts1. They also find no
evidence that nonlinear models are better in terms of predictive performance.
Therefore. capturing explicitly the deterministic trend seems to be important
for short-run load forecasts. These authors do not include weather variables
in their model but they emphasize that they can significantly improve the fit.
Taylor [2008] uses a very short-term model (ten minutes ahead) based
only on past load data. He finds that for forecasts longer than four hours
ahead, models with weather variables are superior to the purely autoregres-
sive models based on last week data. He then emphasizes the importance of
the accuracy of the weather forecasts for the predictive performance.
Finally, Amaral et al. [2008] compared different linear and non-linear
methods, based on Australian load data. They propose a specific treatment
for special days like holidays, and point out that non-linear methods can be
more efficient than linear ones for short-term forecasting (one day ahead)
while basics linear models are better for longer time-spans.
3.2 Spot and Forward Price Models
Most of the energy price models consist in short-term prediction models
of the price on the spot market (the day-ahead market). They usually focus
on the most liberalized markets, where data are abundant and rigidities are
low, such as the North Pool market (which includes Sweden, Norway and
Finland). Weron and Misiorek [2008] use this market to contrast paramet-
1SARIMA stands for Seasonal Integrated Autoregressive Moving Average and TLSTARis a Two-Level Seasonal Autoregressive model.
Chapter 3. Literature Review 23
ric versus semi-parametric models and show that the semi-parametric models
perform better and are more sensitive to exceptional market conditions (peak
demand, weather conditions). They conclude that autoregressive models for
short-term forecasts on an hourly basis are the best in terms of predictive
power.
Not all price models focus on short-term predictions. Some recent mod-
els aim at constructing forward prices, i.e the Price Forward Curve itself.
Based on the Nordic market, Fleten and Lemming [2003] use bid-ask data
of futures products to construct the long-term product and emphasize that
the method performs well in the range of four to ten months ahead. Jump-
diffusion models are also often employed to capture the spiky behavior of
the spot price. However, Chan et al. [2008] argue that traditional financial
approaches (like the jump-diffusion model) are unsuccessful in capturing the
spot price dynamics.
Chapter 4
Estimation Methodology
As outlined in Chapter 2, electricity prices display multiple cyclical be-
haviors which must be taken into account in the modeling process. These
structural determinants (e.g.: daily/monthly/yearly seasonal patterns) may
be heavily correlated among themselves and with other independent vari-
ables (e.g. weather indicators). Moreover, daily transactions generate a small
amount of peak values (extreme spot prices) which may influence excessively
the fits. Section 4.1 presents the spot price and the forward price models
while section 4.2 details the regression methods used to estimate them. The
that allow to control for the bias-variance trade-off with a robust approach
(LAD minimization). The ls-svm regression is particularly appropriate to
estimate non-linear relationships in the presence of strongly correlated pre-
dictors.
24
Chapter 4. Estimation Methodology 25
4.1 Presentation of the Models
In this section we describe two models based on the mixed approach in-
troduced by Fleten and Lemming [2003], see page 6. The short-term spot
price model is directly inspired from the short term vertical load model de-
scribed in Espinoza et al. [2007]. This HPFC model has been developed by
swissQuant Group and Axpo and it corresponds to a model widely used in
the power industry.
4.1.1 A Short-Term Spot Price Model
Methodology
Espinoza et al. [2007] use an ls-svm regression technique to estimate an
autoregressive equation (called ar-lssvm in JAK and Vandewalle [2000]) for
predicting short term loads. Their model mixes a load dynamics component
(autoregressive part) with daily trends and weather indicators to predict fu-
ture loads. Although vertical load series display a similar pattern to the spot
price, the former ones are smoother and less noisy in general.
Let’s consider the following model:
yt = f(xt) + et,
where yt denotes the electricity price at time t (each hour), f(xt) is an
unknown (possibly non-linear) function and xt ∈ Òn is the regressors’ matrix:
xt = {yt−1, ..., yt−j ,Ht,Dt,Wt} ,
Chapter 4. Estimation Methodology 26
with
• Wt: weather forecasts indicators composed of the temperature FTt and
heating and cooling indicators defined respectively as FHt =max(18 −Tt,0) and FCt =max(Tt − 20,0);
• Dt ∈ {0,1}7 a binary-valued vector which captures the effects of each
day of a week;
• Ht ∈ {0,1}24 a binary-valued vector which captures the effects of each
hour in a day.
The parameter j denotes the size of the auto-regressive part. Let ∆ be the
number of days forming the auto-regressive part (j = 24 ×∆). When hourly
spot price curves are estimated, hourly temperature forecasts are needed in
Wt. In general, temperature for the forthcoming days is predicted in terms
of expected mean, maximum and minimum (as it is the case in Switzerland).
Hourly forecasts can be reconstructed in some way with the help of profiles
stemming from temperatures observed on a hourly basis in the past. The
global estimation procedure is illustrated on Fig. 4.1.
Validation of the Model
Performance Indicators Since our spot price model provides spot price
forecasts, standard indicators can be used to assess the quality of the fit out-
of-the-sample (this is called ‘backtesting the model’ in finance). Denoting the
observed data y, the fit y, the number of observations N and the arithmetic
mean of z as z, the following four performance indicators are considered :
• The correlation coefficient:
σy,y
σyσy
= ∑i (yi − y)(yi − ¯y)√∑i(yi − y)2
√∑i (yi − ¯y)2
,
Chapter 4. Estimation Methodology 27
Figure 4.1: The short term spot price model framework.
• The mean error:1
N ∑i ∣yi−yi∣1
N ∑i yi× 100,
• The mean absolute prediction error:
1
N∑i
∣yi − yi∣∣yi∣ ,
• The mean standard deviation error:
var(y − yy) × 100 = 1
N
N
∑i=1
(ǫi − ǫ)2 × 100,
with y > 0 and ǫ the relative error: ǫi = yi−yiyi
.
Falsifiability Tests In addition to the statistical measures of fit, we sub-
mit the spot model to a ‘falsifiability test’ a la Popper. Indeed, the true
data generation process of the spot prices is expected to depend on weather
Chapter 4. Estimation Methodology 28
predictions, among other determinants. The reason for this is that the trans-
actions on the spot market are settled one day before delivery. The traders
use weather forecasts to make up their mind on the quantity to purchase
or sell and at which price. Therefore we would expect the spot model to
perform better when weather predictions for the next day are used as predic-
tor as compared to the use of, say, the true weather observed the next day.
Indeed several weather scenarios can be tested against the weather forecasts
variable in that perspective:
• the observed (true) weather
• a normalized weather (seasonal weather)
• a random weather.
Letting ℘(.) be the prediction accuracy of a model, we expect the follow-
ing preference order to hold
℘(forecasted weather) ≻ ℘(seasonal expectations) ≻ ℘(random walk weather).
We also expect that
℘(forecasted weather) ⪰ ℘(observed weather).
In the latter preference order, the identity relationship arises in case of perfect
weather forecasts. These tests may be useful to discriminate two estimation
methods which perform similarly in terms of out-of-sample predictions.
Chapter 4. Estimation Methodology 29
4.1.2 A Hourly Price Forward Curve Model for elec-
tricity (HPFC model)
A fundamental component of the HPFC is futures price series. No fu-
tures products for electricity exist in Switzerland. However, the German and
French markets propose a small set of them. Figure 2.6 in page 15 shows the
only forward curve that can be observed in the market. Note the September
and the winter peaks which correspond to periods of the year where firms
need to secure power supply. After the first seven months, we notice that the
curve becomes flat and its shape provides no precise guidance on future fluc-
tuations (see [Espinoza et al., 2006]). Note the upward trend of long-term
contracts. The further away from the delivery date, the higher the cost of
hedging.
The construction of the HPFC relies on a combination of characteristics
extracted from the spot price series and observed futures prices. The sea-
sonal, weekly and daily variations are taken from the historical spot prices,
whereas the average values of the HPFC is provided by observed futures prod-
uct. In other words, the guideline is to apply a complex coefficients structure
to the observed futures prices1 to obtain hourly values. This approach relies
on structural links between the short-term (spot) and long-term markets.
Although equation 2.1 does not apply formally to electricity products, we
assume that an approximate link between the spot and the forward prices
exists.
1The construction of the futures curve is described in page 15.
Chapter 4. Estimation Methodology 30
Methodology
Our goal is to estimate P (t, h) the ”hourly forward price” for every day
t and hour h over the time horizon defined by the futures curve. The Peak
and Base futures curve can be denoted by (FBasei , F Peak
i ) with i ∈ [1,NF ]and where NF represents the number of products used to build the curve.
These two curves can be alternatively represented as a single stepwise func-
tion fluctuating between Peak and Base values. The latter representation is
denoted F (t, h). In order for the HPFC to be fully consistent with F (t, h),the following arbitrage-free conditions must hold
E(t,h)[P (t, h)] = FBasei for (t, h) ∈ TFBase
i
(4.1)
E(t,h)[P (t, h)] = F Peaki for (t, h) ∈ TFPeak
i
(4.2)
where TF represents the time horizon of a specific contract represented
in the implied curve. In order to capture the seasonal patterns in the high-
resolution representation of the price forward curve, we introduce the hourly
coefficient s(t, h) and set
P (t, h) = F (t, h) × s(t, h),
where s(.) must comply with conditions (4.1) and (4.2). To determine s(.),we adopt a bottom-up approach which captures the weather factors and
daily and seasonal components likely to influence F (.). The details of the
four steps procedure for building s(.) are outlines below:
1. We first estimate the following regression over two years of data and we
get m, i.e. daily estimates of the spot price based on purely bottom-up
Chapter 4. Estimation Methodology 31
components:
S(t) =m(W (t),D(t),M(t)) + ǫt,with
• S(t): historical daily spot prices;
• W (t): daily weather consisting of the mean, the highest and the
lowest temperature of the day, the precipitation, the wind-speed,
the relative humidity, and the heating and cooling indicators;
• D(t): matrix of dummy variables with 0/1 values for each different
day of the week. A single variable is used for Tuesday, Wednesday
and Thursday and holidays are treated as Sundays.
• M(t): matrix of dummy variables with 0/1 values for each differ-
ent month of the year.
2. The second step consists in an out-of-sample simulation over a time
horizon given by the time range of the implied futures curve, where
each component of W (t) is set to its daily mean value over the last
40 years to capture expected values for the season and where the daily
and monthly indicators (Dfuture(t),Mfuture(t)) are projected over the
pertinent time horizon :
S(t)future =m(Wnorm(t),Dfuture(t),Mfuture(t)).
3. In the third step, we transform the daily predicted values S(t)futureinto hourly profiles. This is done with the coefficients p(t, h) which are
build from daily means of hourly spot prices observed during the last
two years2. Note that for each day, the profile is normalized so that
2Here, we use month-day clusters. As an example, an historical profile for the Mondays
Chapter 4. Estimation Methodology 32
the following condition is met3:
∀t,Eh[p(t, h)] = 1.
Then, an estimate for s(.) can be obtained by setting
s(t, h) = S(t) × p(t, h).
4. In the last step, once the predicted St are hourly-based, they are cal-
ibrated or adjusted to each of the steps of the implied futures curve,
so that they fluctuate around these steps and match an arbitrage free
condition. More precisely, for each futures product (e.g. Month 2 Peak
product4), we normalize the coefficients s(t, h) to ˆsTF(t, h) in order to
fulfill the following arbitrage-free conditions:
E(t,h)∈TF[sTF(t, h)] = 1.
Recall that TF is the time horizon of a specific contract represented
in the implied futures curve, i.e the length of one of the steps in the
implied futures curve. Denoting F (t, h) the price of the futures for thedelivery hour h at day t and P (t, h) the estimated HPFC at day t and
hour h, we have:
P (t, h) = F (t, h) × ˆsTF(t, h).
of September is obtained by averaging for each hour all spot values the eight Mondays ofSeptember for the last 2 years.
3This is done by simply dividing the mean hourly-based daily profiles by their meanover the whole day.
4Starting arbitrarily in September 15, 2009, since the first monthly product concernsthe current month, the Month 2 Peak product would have a delivery period from the 1stto the 31st of October, from Monday to Friday, 8 am to 8 pm.
Chapter 4. Estimation Methodology 33
Some words on the arbitrage-free condition used at stage 4 in the HPFC
procedure may be necessary. This condition guarantees that the mean value
of the HPFC over the delivery period of a certain product is equal to the
product value itself. The global estimation steps of the HPFC are illustrated
on Fig. 4.2.
Validation of the HPFC Model
Since the HPFC is not a prediction of the spot price, evaluating the
quality of the fit is not straightforward. A qualitative assessment can be done
based on criteria such that the plausibility of HPCF shape, i.e. its ability to
capture seasonal patterns usually observed with high-frequency price series,
or stylized facts such that low/high prices during holiday/working periods.
In addition, the first stage of the HPFC estimation procedure provides results
that can be contrasted with facts observed in the market under study. Finally,
the HPFC can also be tested in the short run in the same spirit as the spot
price model. Indeed, for a long term contract close to its delivery date, the
non-arbitrage condition implies convergence toward the spot price and we
expect the contract value to be close to the spot price.
Chapter 4. Estimation Methodology 34
Figure 4.2: The statistical approach for the HPFC model
Chapter 4. Estimation Methodology 35
4.2 Regression Techniques
4.2.1 The Lad-lasso Regression
According to Tibshirani [1996], the ordinary least squares estimator (OLS)
suffers from two main drawbacks:
• its lack of accuracy: OLS estimates achieve low bias at the expense of
a high variance. The prediction accuracy can be improved by tolerat-
ing a small amount of bias, i.e chopping the less significant factors or
”shrinking” some coefficient toward zero.
• Interpretation: as all factors have non-zero coefficients, a smaller subset
of coefficients (with the strongest effect) may suffice to achieve a model
that performs well and provide sensible interpretation.
One of the main ideas to address these two drawbacks is to introduce
a Tikhonov regularization term [Tikhonov, 1963] that controls the bias-
variance trade-off in the regression. This procedure has originated the ridge
regression, a variant of the least squares technique intended to fight the effects
of collinearity among the regressors. More recently Tibshirani [1996] intro-
duced the Lasso (Least absolute shrinkage and selection operator), which
combines in a single procedure the nice features from ridge regression and
subset selection, i.e. a continuous process of ”shrinking coefficients” that re-
sults in setting some coefficients to zero. Later, Wang et al. [2007] suggested
a robust regression method based on a combination of the least absolute de-
viation and the lasso, giving the lad-lasso. The use of the absolute deviation
instead of the squared deviation makes the model more robust to outliers,
and the lasso guarantees a good variable selection and therefore a good bias-
variance trade-off.
Chapter 4. Estimation Methodology 36
Consider the usual linear model equation:
Y = Xβ + ǫ,
where β is the unknown coefficients vector, X represents the matrix of the
explanatory variables, Y is the response or explained variable and ǫ is an
error term. The lad-lasso optimization problem is given under its canonical
form by:
β(s) = argmin∑j∈[1,p] ∣βj ∣≤s
∣Y −Xβ∣,where p is the dimension of the factor matrix X and s is the regularization
coefficient which controls the amount of shrinkage that is applied to the
estimates. This parameter can be determined by cross-validation5. However,
since there are p tuning parameters, the search can be computationally heavy.
To address this issue, Wang et al. propose the BIC-type objective function :
n
∑i=1
∣yi − xiβ∣ + n p
∑j=1
λj ∣βj ∣ − log(5nλj)log(n),
which avoids the lengthy cross-validation process and sets the λj parameters
to
λj = log(n)n∣βj ∣ ,
where n is the number of sample points and λj is the tuning parameters
controlling the shrinkage. It corresponds to a lasso relatively ‘tight’ and
therefore very robust. The code used for the lad-lasso has been developed by
swissQuant Group.
5A numeric criteria measuring the forecasting power of the regression.
Chapter 4. Estimation Methodology 37
4.2.2 The Least-squares Support Vector Machine Re-
gression
A major development in the field of non-linear regression is the statis-
tical learning theory developed by Vapnik. This author gives a framework
based on the concept of empirical risk minimization (how to minimize the
loss in data modeling from an empirical sample), leading to the support vec-
tor machine (svm) theory. Suykens and Vandewalle [1999] propose a slightly
modified version of this theory, the least-squares svm, which introduces the
least squares term in the optimization problem. Support vector machine is
currently considered as being the state-of-the-art technique in classification
problems. Subsequent developments include the least-squares svm (ls-svm)
with a symmetric part [Espinoza et al., 2005] or the fixed-size least squares
svm [Espinoza et al., 2006]. It is important to point out that the ls-svm
estimator behaves like a black box, i.e. the non-linear transformations in-
volved do not allow to recover parameters associated with each predictor.
The philosophy here is therefore very different compared to the lad-lasso.
For the computation, the toolbox LS-SVMLab developed by Suykens et al.
[2002] was used.
Overview of the Objective Function
The key idea of the ls-svm regression is to map the regression space into a
higher dimensional space and find a linear hyperplane with the help of kernel
functions (the so-called kernel trick). We introduce briefly here the objective
function, but the optimization problem as well as the two key ideas of the
ls-svm theory are presented in the Appendix A.
Chapter 4. Estimation Methodology 38
Let’s consider the sample of points (xt, yt) where t ∈ [1,N], M indepen-
dent variables (∀t ∈ [1,N], xt = [xt1, ...xtM ]) entering the unknown function
f , and let
yt = f(xt) + et, t ∈ [1,N].In the absence of prior information about the structure of f(), this func-
tion can parametrized in a primal space based on ls-svm, that is
yt = ωTϕ(xt) + b + etwhere ω ∈ Ò is an unknown coefficient vector and b a bias term. The
feature map ϕ is unknown and transforms the input data into a higher di-
mensional vector. Introducing the least squares cost function, we can define
the objective function of the problem and its constraints:
minω,b,et
1
2ωTω + γ 1
2
N
∑i=1
e2t ,
with yt = ωTφ(xt) + b + et.The parameter γ is a regularization parameter.
As in the lad-lasso, we use a Tikhonov regularization. Introducing Lagrange
multipliers and using Mercer’s theorem (see Appendix A), the ls-svm theory
shows that f() can be expressed in terms of a positive-definite kernel function
K() without having to compute the feature map ϕ. The new objective
function expressed in this dual space is:
yt =N
∑t=1
αiKσ(xt, xi) + b + et,
where σ is an exogenous parameter. The kernel function can be any standard
kernel, like the polynomial or the uniform kernel. In our application, we use
Chapter 4. Estimation Methodology 39
the Gaussian radial basis function kernel given by
K(x, z) = exp(− ∣∣x − z∣∣22σ2
) .
The estimates of the model depend heavily on the two hyper-parameters
(γ,σ).
Determination of the Optimal Parameters (γ,σ)
The two parameters (γ,σ) control the bias-variance trade-off. If the bias
is low the fit on the training set will be very good but the variance of the
prediction might be very high. The better the fit, the higher the sensitivity
to outliers and the risk of overfitting6. Increasing the smoothness of the fits
surely reduce the variance but increases the bias. There is therefore an opti-
mal pair (γ,σ) to find.
One popular way to optimize the predictive power of the model for a given
set of parameters is cross-validation (CV). So once the kernel matrix has
been found, the leave-one-out cross-validation score can be easily obtained
for different pairs (γ,σ) defined over a grid. Another method proposed by
Keerthi et al. [2007] is to perform a gradient search on the same grid, like
a Newton’s algorithm. According to Cawley and Talbot [2007], the cross-
validation criteria prevents overfitting if there are only two hyper-parameters.
For more than two parameters, they recommend a Bayesian regularization
approach. In this paper, we use cross-validation to find the optimal pair.
6Overfitting occurs when a regression captures irrelevant features of a particular sample.
Chapter 5
Empirical Analysis
The main goal of the thesis is to apply the models presented in the pre-
vious section to Switzerland. This chapter describes the data employed in
each model as well as the results obtained with the lad-lasso and the ls-svm
approaches. Note that all data come from Bloomberg.
5.1 Spot Price Model
The equation of the spot price model is given by:
St = f(St−1, ..., St−j ,Ht,Dt,Wt).
The first decision we need to make is about the time horizon and the fre-
quency of the observations. We estimate hourly spot prices based on 30 days
data taken from August 1-31, 2009, i.e. the St vector is of size 24 × 30. We
use the Swiss spot price index traded on EPEX (the Swissix).
Regarding the weather indicator, the only hourly-based temperatures
40
Chapter 5. Empirical Analysis 41
available in Switzerland are those measured at Geneva and Zurich airports.
The mean of these two series is used as the Swiss hourly temperature indi-
cator. We build hourly temperature forecasts by combining the latter mean
with the expected daily mean, maximum and minimum temperatures from
MeteoSuisse1.
The daily dummy matrix Dt defined in Section 4.1.1 is used with the
following slight modification: Swiss national holidays are coded as Sundays.
Finally, the spot model is tested for different lag orders and its predic-
tive performance is evaluated over various forecasting horizons (1, 4 and 7
days ahead). Before presenting the empirical results, we analyze the cross-
validation procedure for the ls-svm regression model.
Cross-Validation Score for the ls-svm Model
The cross-validation score of the ls-svm fits is computed for a given grid
of plausible values for the pair (γ,σ). Fig. 5.1 presents the results for two
different specification of the spot model: the left-hand side plot refers to
an auto-regressive model with 24 lags (a full day), while the right-hand side
model has no lag. We notice that the auto-regressive component is key to get
accurate in-sample predictions. Though the global shape of the CV function
is quite similar, the CV magnitudes are much lower for the auto-regressive
1Denoting Thigh(d) and Tlow(d) the minimum and maximal predicted temperaturesfor day d, z(h) the hourly mean temperature for each hour over the 30 training days andFT (d,h) the hourly forecasts, we have:
FT (d,h) =(Thigh − Tlow)(d)
maxh z(h) −minh z(h)× (z(h) −min
hz(h)) + Tlow(d).
Note that z(h) is been first normalized by dividing each observed hourly temperature bythe mean over the 30 days.
Chapter 5. Empirical Analysis 42
model.
0200
400600
8001000
0200
400600
8001000
26
28
30
32
34
36
38
sigmalambda
0
200
400
600
800
1000
0
200
400
600
800
1000
195
200
205
210
215
220
sigmalambda
Figure 5.1: Grid search for a training set (a) with a one day autoregressivepart and (b) without autoregressive part
Lad-lasso vs Ls-svm In-sample Fits
We notice in Fig. 5.2 that the in-sample fit is excellent for both the lad-
lasso and ls-svm when an auto-regressive part is included in the model. The
visual inspection of the graphs also seems to indicate that increasing the lag
order does not improve spectacularly the fit.
Out-of-sample Predictive Performance
A typical out-of-sample prediction is shown in Fig. 5.3. The prediction
starts on 10/30 and ends on 11/3. November 1st and 2nd correspond to a
Saturday and a Sunday. We notice that the fit performs well in the short-run
(day one and two), but the week-end peaks are not well captured. A more ac-
curate picture of the out-of-sample predictive performance of the spot model
is given in Tables 5.1 to 5.3 for the lad-lasso and ls-svm regressions and for
different lag orders over three forecast horizons. It is important to note that
the performance indicators are mean values. In order to be able to evaluate
the predictive performance of the model more robustly, we performed 1, 3,
and 7 days-ahead predictions every three days starting from Sept 1, 2009,
Chapter 5. Empirical Analysis 43
(a)
(b)
(c)
Figure 5.2: In-sample fit with parameters (15,900) for three window-sizes (inday): (a) ∆ = 0, (b) ∆ = 1, (c) ∆ = 4.
until Dec 15, 2009. This gave us 33 measures for each performance indicator
whose average2 is reported in Tables 5.1 and 5.3.
We first notice on Tables 5.1 to 5.3 that ls-svm performs generally bet-
2We are aware that a confidence interval may have been provided.
Chapter 5. Empirical Analysis 44
Figure 5.3: A typical one week ahead forecast with parameters: (∆, γ, σ) =(1,50,500)ter than lad-lasso whatever the performance indicator. Second, when the
forecast horizon is extended, the quality of the prediction increases if the
length of the auto-regressive part is also increased. Finally, the best per-
formance in terms of correlation and mean standard deviation is given by
the ls-svm 1-day-ahead/2-lags model, the ls-svm 4-days-ahead/2-lags model
has the lowest MAPE while the lowest mean error is given by ths ls-svm
7-days-ahead/4-lags model.
28−Oct 30−Oct 01−Nov 03−Nov 05−Nov 07−Nov 09−Nov
0
2
4
6
8
10
12
14
16
18
Time
Deg
ree
Cel
sius
Weather ForecastObserved weather
Figure 5.4: The weather forecast and the observed weather (Fall 2009). By
construction, the peak values of the forecast model are the one day ahead
forecast.
Chapter 5. Empirical Analysis 45
Lags (∆)Performance
0 Day 2 Days 4 Days 6 Daysladlasso lssvm ladlasso lssvm ladlasso lssvm ladlasso lssvm
As indicated in Section 4.1.1, we can apply various falsifiability tests to
check whether or not the model behaves as the true model should do. The
most powerful way to verify this hypothesis is by testing :
℘(forecasted weather) ⪰ ℘(observed weather).
We also test the two less stringent conditions described in Section 4.1.1.
Two independent quality indicators are used for these falsifiability tests: the
Chapter 5. Empirical Analysis 46
MAPE and the correlation coefficient. We apply the following rule:
⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩
℘(A) ≻ ℘(B) if the two indicators of A perform better,
℘(A) ∼ ℘(B) if only one indicator of A performs better,
℘(A) ≺ ℘(B) if none performs better.
Fig. 5.4 simply compares the true or the predicted hourly-based tem-
perature series with those observed a posteriori. As expected, the observed
temperature exhibits a more wiggly pattern due to the variations induced by
the wind, precipitation and snowfall. The results of the falsifiability tests are
presented in Table 5.4 for both the auto-regressive ls-svm and the lad-lasso.
The ls-svm model passes successfully the test while the lad-lasso model per-
forms better when the observed temperature is used as predictor. We can
therefore state that the ls-svm estimator is likely to capture the true data
generation process. Regarding the lad-lasso results, we notice that the model
performs better when real temperatures are employed as predictor in lieu of
temperature forecasts. Therefore, this estimation technique seems less ade-
quate in terms of the falsifiability criteria. This seems to indicate that the
true model is non-linear.
Performance under transitions/vacation periods
All the above results were obtained with a dataset that mixes regular
patterns with more noisy ones such as those encountered during transition
phases from fall to winter or official/local holiday periods. As an example,
the regression based on September values may include a no-heating period,
but the forecasts are done for the heating period. Another example is the
forecasts for early January which are based on a training set that includes
Christmas and New Year’s Eve and erratic patterns in between.
Chapter 5. Empirical Analysis 47
AR-lssvm:
• ℘(Forecast one day ahead) ≻ ℘(Observed (real) weather)• ℘(Forecast one day ahead) ≻ ℘(Seasonal expectations)• ℘(Seasonal expecations) ≻ ℘(Random walk weather)
Lad-lasso:
• ℘(Forecast one day ahead) ≺ ℘(Observed weather)• ℘(Forecast one day ahead) ≻ ℘(Seasonal expectations)• ℘(Seasonal expecations) ≻ ℘(Random walk weather)
Table 5.4: Falsifiability test: results
Empirical evidence show that under these particular periods, the ls-svm
approach can perform very bad whereas the lad-lasso is more robust and
displays lower volatility. And this is indeed what we find in our data. In
Fig. 5.5, the presence of the heating days at the beginning of the simulation
set in Exhibit (a) biases the forecasts in Exhibit (b), though the lad-lasso fits
are better than the ls-svm ones. Another example is given in Fig. 5.6, where
the training set is includes Christmas holidays. The past spot price displays
a pretty irregular pattern (note the midnight peak of New Years Eve) due to
the drop in many economic activities. Again the ls-svm fits are misled while