-
Value at Risk for a high-dimensionalequity portfolio
A comparative study investigating computational complexityand
accuracy for different methods
Robin Lundberg
Master Thesis, 30 creditsM.Sc. in Management & Industrial
Engineering, Risk Management 300 credits
Department of Mathematics and Mathematical StatisticsSpring
2018
-
Abstract
Risk management is practiced in many financial institutions and
one of the mostcommonly used risk measures is Value at Risk. This
measure represents howmuch a portfolio of assets could lose over a
pre-specified time horizon to a cer-tain probability. Value at Risk
is often utilized to calculate capital requirementsand margins,
which work as collateral to cover potential losses that might
occurdue to market turbulence. It is important that the calculation
of Value at Risk isaccurate which require complex and time
demanding models but many financialinstitutions also wishes to
calculate Value at Risk continuously throughout theday, which
requires computational speed.
Today’s most commonly used method for calculating Value at Risk
is his-torical simulation which is a simple but often inaccurate
method. It is criticizedby many scholars since it heavily depends
on the assumption that history willrepeat itself. A substitute
method to historical simulation is the Monte Carlosimulation which
is seen as a more accurate and robust method. However, for
ahigh-dimensional portfolio, Monte Carlo simulated Value at Risk is
very com-putationally demanding and in many cases it is not
possible to use it due to timeconstraints.
The study investigates alternative methods for calculating Value
at Risk withthe purpose of finding a method that could be used as a
substitute to the MonteCarlo method. The portfolio used in this
thesis is a high-dimensional equityportfolio containing 2520
equities with 10 years of observations. I find thatby first using a
clustering algorithm to divide the equities in to groups basedon
their correlation, and then applying principal component analysis
to achievea lower dimensional problem, computational time can be
reduced by approxi-mately 99% and still provide an accurate
result.
Sammanfattning
Risk managemant är ett verktyg vanligt förekommande bland
finansiella insti-tutioner där ett av de mest vanliga riskmåstten
är Value at Risk. Detta är ettkvantitativt mått som beskriver hur
mycket av en portfölj tillgångar man riskeraratt förlora under en
förebestämd tidsperiod till en specifik sannolikhet.
Dessaegenskaper gör Value at Risk till ett bra mått att använda för
att beräkna mäng-den kapital som behövs för att täcka potentiella
framtida förluster. Det är vik-tigt att beräkningen av Value at
Risk är noggrann, vilket ställer höga krav påmodellens komplexitet.
Dessutom vill många finansiella institutioner beräknaValue at Risk
kontinuerligt över dagen, vilket ställer ytterligare krav på
model-lens beräkningstid.
Den modell som idag är mest vanligt förekommande vid Value at
Risk-beräkningar är historisk simulering vilket är en simpel men
ofta felaktig modell.Historisk simulering kritiseras ofta på grund
av dess höga beroende av antagan-det att historiska
marknadsrörelser kommer att upprepa sig. En alternativ metodtill
historisk simulering är Monte Carlo-simulering vilket ofta anses
vara en merrobust och noggrann metod att använda. Metoden har dock
nackdelar när detkommer till beräkningstid och för en
flerdimensionell portfölj så riskerar tidsre-striktionerna att
överskridas.
I denna uppsats så undersöker jag alternativa metoder för att
beräkna Valueat Risk med målet att hitta ett substitut till Monte
Carlo-simulering. Portföljensom testas är en aktieportfölj av höga
dimensioner, bestående av 2520 aktieroch 10 års data. I studien
presenterar jag en metod som är en kombinationav klusterfiering
baserat på aktiernas korrelation och
principalkomponentanalys.Metoden reducerar beräkningstiden med
cirka 99% jämfört med Monte Carlo-metoden samtidigt som den
resulterar i ett noggrant resultat.
i
-
Acknowledgements
I would like to express my gratitude to my supervisors Mikael
Öhman andMonika Monstvilaite at Cinnober Financial Technology who
have guided methrough any problems that have arisen during the
model implementation. Fur-thermore, I’d like to thank Lisa Hed at
Umeå University for her guidance inwriting this thesis and Markus
Ådahl at Umeå University who have brought mesome valuable insights
when it comes to the choice of methods.
ii
-
CONTENTS
Contents1 Introduction 8
1.1 Background 81.1.1 Cinnober Financial Technology 81.1.2 Value
at Risk 91.1.3 Risk Management in Practice 101.1.4 Risks 11
1.2 Problem Statement 111.3 Purpose 121.4 Delimitations 121.5
Data Description 121.6 Approach & Outline 12
2 Theory 142.1 Risk Measures 14
2.1.1 Coherent Risk Measure 142.1.2 Value at Risk 15
2.2 Financial Time Series 172.2.1 Asset Returns 172.2.2
Volatility clustering 182.2.3 Autocorrelation 192.2.4 EWMA 202.2.5
ARMA-GARCH 212.2.6 IGARCH 242.2.7 Model verification 25
2.3 Multivariate Time Series 272.3.1 Dependence 282.3.2 Copula
292.3.3 Simulation with copula 33
2.4 VaR approaches 342.4.1 Historical Simulation 342.4.2 Monte
Carlo Simulation 36
2.5 Dimension Reduction 402.5.1 Principal Component Analysis
402.5.2 Orthogonal ARMA-GARCH 432.5.3 Cluster Analysis 44
2.6 Backtesting 48
3 Model Implementation 493.1 The Historical Simulation Method
503.2 The Monte Carlo Method 513.3 The Principal Component Analysis
Method 533.4 The Cluster Principal Component Analysis Method 553.5
Backtesting 56
iii
-
CONTENTS
4 Results 574.1 Data Exploration 574.2 VaR Result 624.3 Model
Verification 664.4 Computational Complexity 72
5 Conclusion and Further Studies 74
6 Appendix 80
A Total Backtesting Results 80
B Best Fitting Models for Multiple Lags 86
iv
-
GLOSSARY
Glossary
Annualizing factor The factor which you have to multiply with
toannualize a rate of length into a rate that reflectsthe rate on a
yearly basis.
Clearing house A third-party intermediary between buyers
andsellers of financial instruments responsible forsettling trading
accounts, clearing trades, collect-ing and maintaining margins, and
reporting trad-ing data.
Copula Translates to "connection" in Latin. Here, it con-nects
the marginal distributions of a multivariatetime series to a
dependency structure.
Derivative A financial contract which value depends on oneor
multiple underlying assets.
Exchange An institution used for the trading of
financialinstruments.
Exponential smoothing A technique applied to a time series where
expo-nentially decreasing weights are assigned to theobservations
decreasingly, from newest to oldestobservation.
Instrument A financial contract.
Long/short position Generally describes if the investor has
purchasedan asset believing it will increase in value or ifthe
investor has sold the asset believing that itwill decrease in
value.
Margin requirement Percentage of a position’s value that may be
usedas a collateral to finance its purchase.
Quantile A cut point dividing the range of a
probabilitydistribution.
1
-
GLOSSARY
Regulatory requirements Rules or laws imposed by an outside
(usuallygovernmental) agency that must be met by ev-ery product or
service under the purview of thatagency.
Risk Possibility of financial losses from investments.
Risk management The process of identification, analysis and
accep-tance or mitigation of uncertainty of
investmentdecisions.
Standard and Poor’s 500 An American equity index based on the
marketcapitalization of the 500 largest companies on theNYSE or
NASDAQ.
Tail loss The extreme losses that corresponds to the "tails"of
the distribution.
Time lag The period between two related events.
Volatility shock A shock or a change to the standard
deviation.
White noise A time series is called white noise if it is a
se-quence of independent and identically distributedrandom
variables with finite mean and variance.
2
-
ACRONYMS
Acronyms
BIC Bayesian Information Criterion.
CCI Christoffersen’s Confidence Interval.
CDF Cumulative Distribution Function.
CPCA Cluster PCA.
ES Expected Shortfall.
EWMA Exponentially Weighted Moving Average.
GARCH Generalized Autoregressive
ConditionalHeteroskedasticity.
HS Historical Simulation.
i.i.d Independent and identically distributed.
IGARCH Integrated GARCH.
MCS Monte Carlo Simulated.
ML Maximum Likelihood.
O-GARCH Orthogonal GARCH.
P&L Profit and Loss.
PC Principal Component.
PCA Principal Component Analysis.
PDF Probability Density Function.
POF Proportion of Failures.
S&P500 Standard and Poor’s 500.
ST Spanning Tree.
VaR Value at Risk.
3
-
LIST OF TABLES
List of Tables
1 HS 1-day VaR for S&P500 with different values of α and
differentlength of observation windows. 36
2 MCS 1-day VaR for S&P500 with different values of α and
dif-ferent numbers of simulations. 40
3 Correlation matrix for the five stocks. 46
4 Distance matrix for the five stocks. 46
5 New distance matrix including one cluster of two stocks and
3individual stocks. 46
6 New distance matrix including two clusters of two stocks and
1individual stock. 47
7 Distance matrix for one cluster of three stocks and one
cluster oftwo stocks. 47
8 Presentation of how many percent of the log return series and
PCsthat experience time dependent observations and volatility
clus-tering. 59
9 Presentation of how many log returns series, standard PCs
andEWMA PCs that are modeled with what type of model. 60
10 MCS 1-day VaRα with different values of α and different
num-bers of simulations. 63
11 HS results for different windows of observations. 64
12 Standard PCA and EWMA PCA results. 64
13 CPCA results for cluster sizes of 50, 25, 10 and 5. 65
14 Backtesting statistics for HS with observation windows of
252,500 and 1000, and MCS with 10000 simulations and
differentcopula parameters. 67
15 Backtesting statistics for Standard PCA, EWMA PCA and
CPCAwith one PC and different numbers of clusters. 70
16 Computational time for MCS and CPCA with different
clustersizes and 10000 simulations. 72
17 Backtesting statistics for standard and EWMA PCA for all ε∗.
81
4
-
LIST OF TABLES
18 CPCA backtesting results for 50 clusters and all ε∗. 82
19 CPCA backtesting results for 25 clusters and all ε∗. 83
20 CPCA backtesting results for 10 clusters and all ε∗. 84
21 CPCA backtesting results for 5 clusters and all ε∗. 85
5
-
LIST OF FIGURES
List of Figures
1 Illustration of VaR0.05 which is represented by the 5%
quantile ofthe P&L distribution. 15
2 Illustration of SP500’s closing prices between January 2000
andDecember 2016. 17
3 Illustration of SP500’s log returns between January 2000 and
De-cember 2016. 18
4 Illustration of randomly drawn observations from a log
normaldistribution and normal distribution fitted to historical
S&P500log returns. 19
5 Illustration of the autocorrelation of S&P500’s log
returns and ab-solute log returns with lags between 1 and 150 days.
20
6 16 year simulation of S&P500’s returns using GARCH(1,1)
andARMA(1,1)-GARCH(1,1) 24
7 Seadrill’s log returns between 2008-02-07 and 2017-12-29.
25
8 P&L distribution of S&P500 obtained from HS with 252
(a), 500(b) and 1000 days (c) as observation window. The red lines
repre-sent the 1-day VaR0.01. 35
9 Distribution of S&P500 log returns. 39
10 P&L distribution of S&P500 obtained from the MCS
method with1000 (a), 5000 (b), 10000 (c) and 100000 simulations
(d). The redline represent the 1-day VaR0.01. 39
11 Illustration of historical closing prices. 44
12 ST of the five stocks. 47
13 Dendrogram of the five stocks 48
14 Illustration of HS, MC, CPCA and PCA methods for
calculatingVaRα . 50
15 Portfolio log returns between 2008-02-07 and 2017-12-29.
57
16 Illustration of the three first PCs achieved by standard PCA
andEWMA PCA. 58
17 Cumulative variance explained by PCs. 58
6
-
LIST OF FIGURES
18 The number of equities within each clusters for 5 (a), 10
(b), 25(c) and 50 clusters (d). 61
19 Backtesting of HS with observation windows of 252 (a), 500
(b)and 1000 days (c). The blue, orange and yellow lines
indicatesVaR0.01, VaR0.05 and VaR0.1 respectively and the red dots
illus-trates where a loss greater than VaR0.01 has occurred in the
back-testing. 68
20 Backtesting of MCS with 1000 days of estimation window
anddifferent copulas. The blue, orange and yellow lines
indicatesVaR0.01, VaR0.05 and VaR0.1 respectively and the red dots
illus-trates where a loss greater than VaR0.01 has occurred in the
back-testing. 69
21 Backtesting of Standard PCA (a) and EWMA PCA (b) with
es-timation window of 1000 days. The blue, orange and yellowlines
indicates VaR0.01, VaR0.05 and VaR0.1 respectively and thered dots
illustrates where a loss greater than VaR0.01 has occurredin the
backtesting. 71
22 Backtesting of CPCA with estimation window of 1000 days
andstandard correlation method. Results from 50 clusters are
illus-trated in (a), 25 clusters in (b), 10 clusters in (c) and 5
clusters in(d). The blue, orange and yellow lines indicates
VaR0.01, VaR0.05and VaR0.1 respectively and the red dots
illustrates where a lossgreater than VaR0.01 has occurred in the
backtesting. 71
23 Illustration of best fitting models with multiple lags tested
for. 86
7
-
1 INTRODUCTION
1 Introduction
1.1 Background
Risk management practices are used all over the world and
especially within thefinancial markets. Authorities and regulators
demand that parties within this sec-tor apply methods for
quantifying their risks so that they can hold capital to coverthem.
However, as history has shown, regulations seems to not be
sufficient meth-ods for preventing crises. We have experienced new
crises even with stricter de-mands and it seems like regulators are
always lagging behind. In other words,even though regulations are
not enough to prevent a new crisis, they work as aninsurance that
past crises will not repeat themselves.
The biggest regulator within the banking industry is the Basel
Committee. It wasfounded in the aftermath of some serious problems
regarding the internationalcurrency market with the purpose of
enhancing the financial stability around theworld. Today, the Basel
Committee consists of 45 institutions from 28 jurisdic-tions and
are most famous for its regulations Basel I, Basel II and Basel III
(Bankof International Settlements, 2016).
One of the most common ways of quantifying a financial
institution’s risks isthrough the measure Value at Risk (VaR). The
Basel framework requires banks touse this measure and even though
the explicit method used for calculations mightdiffer, they all
have in common to estimate the maximum potential future lossbased
on a specified probability, over a preset time period.
1.1.1 Cinnober Financial Technology
The study is completed at Cinnober Financial Technology
(Cinnober) which isa software provider within the finance sector
that develops systems for clearinghouses, exchanges and banks,
amongst others. In particular, trading, clearing andrisk management
systems are focal points and the company is an internationalplayer
with customers all over the world.
The financial markets are continuously changing, making the
requirements on thesystems used in the industry extremely high,
both in terms of speed and accuracy.New regulatory requirements
come in to play each year, meaning that Cinnoberneeds to be able to
quickly adapt their systems’ functionality to new market
con-ditions with more complex algorithms and faster
calculations.
One functionality that Cinnober implements in the majority of
their system is thecalculation of VaR. It is therefore of great
importance for them to stay updated onnewly developed calculation
methods which they can offer customers.
8
-
1 INTRODUCTION
1.1.2 Value at Risk
VaR is perhaps the most common risk measure for banks, it is
used in the everyday practice and works as a foundation for capital
requirements. VaR is also usedby other financial institutions such
as clearing houses and exchanges to calculatemargin requirements
needed by their members in order for them to be allowed tohold
their positions. In other words, VaR is used to find out how much
margin themembers need to hold so that the clearing house/exchange
can be certain that themembers can cover their potential
losses.
It is common to see VaR denoted as
1 day-VaRα = $X
or1 day-VaRα = Y%.
In other words; "with a probability of (1−α) ·100%, the
potential loss over oneday will not be greater than $X, or Y%."
There are different ways of calculating VaRα but the ultimate
goal is always to es-timate the potential future profit & loss
(P&L) distribution and find the α-quantileof it. One of the
most common methods to do so is through historical simulation(HS).
This method is a quick way of calculating VaRα and an easy approach
toexplain to management, but the result depend solely on historical
data and it as-sumes that history will be relived. One can think of
it as estimating the futurepotential P&L distribution under the
empirical distribution of data.
As an example, suppose a bank applies a 252 day window (average
trading daysper year) to calculate its 1 day-VaR0.05 for a specific
portfolio of assets (i.e. thebank wants to know the maximum 1
day-loss this portfolio can experience with aprobability of 95%).
HS implies that we can either look at the P&L distributionof
the portfolio’s daily returns for the past 252 days and extract the
value thatrepresents the 5% quantile, or we can evaluate the 252
portfolio returns as losses(i.e. the positive returns will here be
seen as negative losses) and extract thevalue that represents the
95% quantile. The latter method uses the loss-distributionrather
than the P&L-distribution but both methods will carry the same
result.
The biggest drawback of HS is that the result might not be
accurate enough dueto its dependence on historical observations.
For example, if the market has beenupward trending over the past
252 days, the magnitude of the VaRα measure willbe low and thus not
reflect the potential loss in case of a crisis.
An alternative method is the Monte Carlo simulation (MCS) which
involves thesimulation of an explicit parametric model for risk
factor changes such as stockreturns. These changes are then applied
to the portfolio to find the future poten-tial P&L distribution
(or loss distribution). Important to note is that generally,
9
-
1 INTRODUCTION
the number of simulations can be chosen to be much larger than
the number ofobservations used in HS and thus, it is possible to
obtain a more probable result.
While MCS might seem preferable, it too has its weaknesses. For
large portfolios,the computational cost can be extensive, as every
simulation requires a full evalu-ation of the portfolio to compute
the P&L. In fact, the dimensions of the problemincrease rapidly
with the number of risk factors and simulations so with a
moreextensive portfolio, comes a larger dimensional problem and the
computationaltime might be too high to be useful.
Another problem is the assumption of the returns’ distributions.
In contrast toHS where we use the historical data to represent the
distribution, we now haveto make an assumption about it. A common
practice is to fit a distribution to thefinancial time series we
are interested in simulating and then draw samples from itbut this
cause another problem. Even though it is possible to tweak the
distributionbased on personal assumptions, it might also be a
tedious assignment. Making thewrong assumption about the
distribution will heavily alter the final result.
1.1.3 Risk Management in Practice
Through history, financial disasters have arisen in both
financial and non-financialfirms and authorities all over the world
have increased the regulatory require-ments for various forms of
risk management in attempts to prevent this. To em-phasize the need
for continuously updated regulations, Pyle (1999) states in
hisarticle about risk management that "Financial misadventures are
hardly a newphenomenon, but the rapidity with which economic
entities can get into troubleis". It might therefore be more
important for regulators to focus on preventing afinancial crisis
rather than fixing it when it already has happened as by then,
it’soften too late to react.
An example of fiscal irresponsibility that could have been
avoided with commonrisk management practices is the Orange County’s
bankruptcy. In 1994, the man-ager of the Orange County Investment
Pool used the power of leverage in interestrate derivatives when
trying to create excess returns for the county’s schools, citiesand
districts. What the manager didn’t account for was the interest
rate risk thataccompanies such investments but rather focused the
entire fund’s capital on hisown speculations; that interest rates
would keep falling (Jorion, 2011). $1.6 bil-lion was lost due to
the Fed’s unexpected increments of interest rates that yearand the
county have yet not been able to repay the bonds they had to use to
avoidbankruptcy (Castillo, 2017). While this is just one example,
it illustrates the needfor financial firms to control their risk
and maybe more so, the need for authori-ties to ensure that
financial institutions actually manage their risks. Jorion
(2011)claims that if the county would have used VaR as a risk
measure to hold capital,they could easily have avoided the severe
impact of these events.
10
-
1 INTRODUCTION
1.1.4 Risks
For a financial institution such as a bank, there are multiple
risk factors to ac-count for when quantifying risk exposure. Three
main risks should be of concern:Market risk, operational risk and
credit risk. According to European BankingAuthority (2017), the
risks are defined as:
• Market risk is the risk of losses in on and off-balance sheet
positions arisingfrom adverse movements in market prices.
• Operational risk is the risk of losses stemming from
inadequate or failed in-ternal processes, people and systems or
from external events, such as fraud,security, legal situations
etc.
• Credit risk is measured in respect to the bank’s activities,
excluding thetrading book business.
Together, market risk, operational risk and credit risk cover a
big portion of thebank’s risk exposure and quantification methods
for each one are required by theBasel Committee to be in place.
This thesis will be focusing on market risk which stems from the
market positionsthat the bank holds. Thus, risk factors such as
equity price changes are of biginterest.
1.2 Problem Statement
Banks and other financial institutions using VaR face a big
problem regardingcalculating the measure. On the one hand, they
often need to calculate the riskmeasure multiple times per day
which require speed, and on the other, they wantthe calculations to
be accurate, which require time. Therefore, the problem boilsdown
to investigating:
• How different techniques of calculating VaR differ in terms of
time andaccuracy
as well as resolving:
• Which method/methods, if any, provide acceptable computational
time andaccuracy?
11
-
1 INTRODUCTION
1.3 Purpose
The project aims to investigate different methods of quantifying
VaR in terms ofaccuracy and speed. The goal is to find out if there
are any preferable method-/methods for a financial institution to
use in their daily calculations of VaR whichare both accurate and
not too computationally demanding.
1.4 Delimitations
Regarding the main risks; market risk, operational risk and
credit risk, this studywill only investigate market risk.
Due to time constraints, the study will only investigate
portfolios including equi-ties. Portfolios of other types of
instruments could be the subject of a sequentialstudy.
Equities that have low liquidity cause problems when trying to
fit a conditionalvariance model and are in real-life often handled
separately. Fitting the condi-tional variance model is a vital part
when modeling returns and therefore, theseequities are excluded
from the study.
1.5 Data Description
The study includes 2520 equities that are traded in any of the
US exchanges. Theoriginal data set contained approximately 5000
equities but due to low liquidity,nearly half were excluded. Each
equity consists of almost 10 years of observa-tions, from
2008-02-07 to 2017-12-29, which amounts to 2493 closing prices.
Alldata has been downloaded from Yahoo Finance.
1.6 Approach & Outline
In this thesis, different methods of calculating VaR will be
investigated and imple-mented on a high-dimensional equity
portfolio using the software Matlab 2017b.The approach is to
investigate the two most conventional methods, historical
sim-ulation and Monte Carlo simulation (as well as different
variations of them), interms of both accuracy and computational
time. Subsequently, an attempt to im-prove these methods will be
implemented with the aim to reduce the computa-tional time of the
Monte Carlo method while still upholding accuracy. Here,
astatistical technique called principal component analysis will be
applied in orderto reduce the high-dimensional problem to a lower
one and thus, hopefully alsoreduce the computational time.
The thesis is structured as follows: In Chapter 2, important
theories will be pro-vided so the reader can establish some
knowledge of the area. The chapter in-
12
-
1 INTRODUCTION
cludes theories regarding risks, returns, VaR, time series
analysis, simulation ap-proaches as well as dimension reduction
techniques. In Chapter 3, an explanationof the model implementation
will be established together with a review of nec-essary steps to
complete the simulations and evaluate the models. In Chapter 4,a
presentation and visualization of the results will be provided to
the reader andfinally, in Chapter 5, a discussion regarding the
results will be in place followedby some recommendations for
further studies.
13
-
2 THEORY
2 Theory
2.1 Risk Measures
A market risk measure is used to quantify the uncertainty about
the future valueof a portfolio, i.e. it is related to randomness
and uncertainty of the risk factorsthat are affecting a portfolio.
The fundamentals of handling risk is essentially toestimate the
possible deviation from an expected value and in today’s
financialinstitutions it is of interest to both measure and manage
these risk exposures.
Some types of financial institutions are particularly exposed to
market risk dueto their exposures to the financial markets. Banks
for example, hold positions invarious types of instruments and to
be able to understand and handle the risks thatfollows, they
utilize the practice of modeling.
2.1.1 Coherent Risk Measure
A risk measure is a single number which summarize the
uncertainty of an out-come. In finance, a common risk measure is
volatility but there are several otherssuch as VaR or Expected
shortfall (ES) that are used in daily practices. To definea fair
risk measure, Artzner et al. (1999) provides a definition of what’s
called acoherent risk measure. This illustrates properties that a
good risk measure mighthave.
Definition 2.1 (Coherent risk measure)Let G denote a vector
space of random variables representing portfolio valuesat a fixed
future time. Furthermore, let X and Y be random variables denotinga
set of future net worths of an investment. A coherent risk measure
is a func-tion ρ : G→ R that satisfies the following axioms:
MonotonicityFor all X ,Y ∈ G, with X ≤ Y :
ρ(X)≤ ρ(Y ).
Sub-additivityIf X ,Y ∈ G, then
ρ(X +Y )≤ ρ(X)+ρ(Y ).
HomogeneityFor all λ ≥ 0 and all X ∈ G :
ρ(λX) = λρ(X).
14
-
2 THEORY
Translation invarianceFor all X ∈ G and all real numbers α , we
have:
ρ(X +α · r) = ρ(X)−α
where r is the total gain of a risk-free investment.
2.1.2 Value at Risk
VaR measures a portfolio’s exposure to certain set of risk
factors. The measurecan be interpreted as the loss that to a
specified probability will not be exceededif the current portfolio
is held over some period of time.
VaR has two basic parameters (Alexander, 2009c):
• Significance level of α ∈ (0,1).
• The time horizon, denoted h, which is the period of time,
traditionally mea-sured in trading days rather than calender days,
over which the VaR is mea-sured.
VaR is a quantile based method, meaning that it represents the α
quantile of adistribution, in this case the P&L distribution.
Figure 1 illustrates VaR0.05 for ahypothetical portfolio where the
value is obtained by extracting the 5% quantilefrom the
distribution. The red line illustrates where we can find the value
thatrepresents this quantile. Since VaR is measured as a loss and
the α quantile rep-resents a negative value, we will interpret the
VaR measure as the negative valueof the α quantile.
Figure 1: Illustration of VaR0.05 which is represented by the 5%
quantile of the P&L distribution.
15
-
2 THEORY
Definition 2.2 (Value at Risk)Value at Risk at significance
level α for a portfolio with value at some futuretime described by
the random variable X is defined as:
VaRα(X) = min{m ∈ R : P(m · r+X < 0)≤ α} (1)
where r is the total return of a risk-free asset.
If X is assumed to have right continuous and increasing
cumulative distri-bution function (CDF) F(·), it also follows from
Equation 1 that
VaRα(X) = min{m ∈ R : P(−X > m · r)≤ α}=−F−1X (α)= min{m ∈ R
: P(L≤ m)≥ 1−α}= F−1L (1−α)
where L = −Xr is the discounted loss.
In Figure 1, note that VaR does not take the most extreme losses
into account,i.e. the distribution below the α level is not
included. This can cause big prob-lems, especially when the
portfolio returns are not normally distributed, but ratherhave
distributions with fat tails. A risk measure like this could be
exploited byso called shadow-traders through "hiding" their risky
investments by making thelosses more extreme and thus, missed by
the VaR measure.
Since VaR does not account for the tail loss, it is easy to find
examples where itdoes not fulfill the sub-additivity condition from
Definition 2.1 and thus, is nota coherent risk measure (Alexander,
2009c). However, the measure is intuitiveto work with and provides
a good quantification of a major part of the risk. Tocomplement VaR
with a risk measure that accounts for the tail loss, banks of-ten
incorporates the calculation of ES, which is the expected loss for
a portfolio,given that the VaR is exceeded. The reader can find out
more about ES in Hultet al. (2012).
16
-
2 THEORY
2.2 Financial Time Series
When simulating VaR, we are interested in forecasting the future
of our equitiesin the portfolio. A common way of doing so is
through fitting a mathematicalmodel to the equity’s financial time
series which accurately can describe the timeseries’
characteristics. A financial time series can be presented in terms
of differ-ent units and for an equity, it is common to see it in
terms of prices in certain timefrequencies. As an example, Figure 2
illustrates a univariate financial time se-ries representing the
S&P500 index’s prices between January 2000 and
December2016.
2000 2002 2005 2007 2010 2012 2015 2017
Year
600
800
1000
1200
1400
1600
1800
2000
2200
2400
Valu
e
S&P500
Figure 2: Illustration of SP500’s closing prices between January
2000 and December 2016.
2.2.1 Asset Returns
Many studies that include financial time series use returns
rather than prices. If Ptdenotes the price of an asset at time t,
we can describe the discrete return betweentime period t − 1 and t
as: PtPt−1 − 1. Tsay (2010) proposes two main reasonsfor using
returns instead of prices. First, for the average investor, asset
returnsis a complete and scale-free summary of the investment
opportunity and second,return series are easier to handle than
price series because the former have moreattractive statistical
properties.
There are more than one type of return, but we will be
considering the contin-uously compounded return, or log return,
since it is simple to work with whenapplying continuous stochastic
models and it approximates the discrete return ac-
17
-
2 THEORY
curately for short time periods (e.g. daily returns).
Definition 2.3 (Log return)Let Pt be the price of an asset at
time t. The log return over the time intervalt−1 to t is then
defined as:
rt = ln(
PtPt−1
). (2)
Figure 3 presents the log return series for the S&P500 index
over the same timeperiod as Figure 2. If we try to fit a
mathematical model to the time series of logreturns there are some
important characteristics which we have to account for, oneof them
is volatility clustering (Cont, 2007).
2000 2002 2005 2007 2010 2012 2015 2017
Year
-0.1
-0.05
0
0.05
0.1
0.15
log r
etu
rn
S&P500
Figure 3: Illustration of SP500’s log returns between January
2000 and December 2016.
2.2.2 Volatility clustering
Non-randomness in a time series is important to account for when
we want tosimulate thousands of new hypothetical risk factor
changes. Volatility clusteringis one of these characteristics and
it can be observed in Figure 3. By inspection,we can find clusters
of low and high volatility which means that there are timeperiods
that are experiencing multiple sequential days of large price
movementsas well as periods with low price movements.
18
-
2 THEORY
When modeling the future returns of a financial time series, it
is important to ac-count for the volatility clusters rather than
simulating random numbers from a dis-tribution. Figure 4 presents
an excellent visualization of how wrong this approachcan be. Here,
we have random drawings from a fitted log-normal distibution aswell
as from a fitted normal distribution. Comparing them to Figure 3,
one mightnotice that there is no volatility clustering present in
either of the distributions inFigure 4. In other words, we have not
accounted for the non-randomness that thetime series
experience.
2000 2002 2005 2007 2010 2012 2015 2017-0.2
-0.1
0
0.1
0.2
log r
etu
rn
log normal distribution
2000 2002 2005 2007 2010 2012 2015 2017-0.05
0
0.05
log r
etu
rn
normal distribution
Figure 4: Illustration of randomly drawn observations from a log
normal distribution and normaldistribution fitted to historical
S&P500 log returns.
2.2.3 Autocorrelation
A common way to describe volatility clustering is through the
autocorrelationfunction. Here we denote the log return of an asset
over a time period ∆t as rt .Observations are sampled at discrete
times tn = n∆t and time lags will be denotedby k. If we let ∆t = 1
day for simplicity, we can describe the correlation betweenthe
daily returns at period t and the daily return at t + k as
Corr(rt+k,rt).
19
-
2 THEORY
Definition 2.4 (Autocorrelation function)Given the observations
r1,r2, ...,rN at time t1, t2, ..., tN and the lag k, the
auto-correlation function is defined as
ρk =∑N−ki=1 (ri− r)(ri+k− r)
∑Ni=1(ri− r)2
where r denotes the mean of the returns.
The autocorrelation function is used to find non-randomness in
data in the formof volatility clustering and apply that knowledge
to find an appropriate modelfor simulation. Figure 5 illustrates
the autocorrelation with lags k = 1,2, ...,150for the S&P500
time series illustrated in Figure 3. While there is no clear
auto-correlation for the log returns, there is a pattern for the
absolute log returns thatconverges to zero as the lag k increases.
Thus, we can conclude that there is a non-randomness in the
financial data, a return of a specific magnitude (either positiveor
negative) seems to be followed by similar return magnitudes (either
positiveor negative) the next day. Consequently, we can likely
predict the magnitude oftomorrows return, but not the
direction.
-0.1
0
0.1
Auto
corr
ela
tion
Log returns
1 20 40 60 80 100 120 140
Lag
0
0.2
0.4
Auto
corr
ela
tion
Absolute log returns
1 20 40 60 80 100 120 140
Lag
Figure 5: Illustration of the autocorrelation of S&P500’s
log returns and absolute log returns withlags between 1 and 150
days.
2.2.4 EWMA
A method which is commonly used for forecasting the volatility
of financial timeseries is the exponentially weighted moving
average (EWMA) model. This modelutilizes the concept of exponential
smoothing, meaning that more weight is put on
20
-
2 THEORY
the most recently observed volatilities.
Definition 2.5 (EWMA)Consider the univariate financial time
series X with an observation window
WE which describes the number of time periods included in our
time series.The EWMA model for forecasting the variance is then
described as:
σ2t =1−λ
λ (1−λWE )
WE
∑i=1
λ ix2t−i (3)
where σ2t is the forecasted variance, 0 < λ < 1 is the
chosen weight and x areobserved values in X.
McNeil, Rüdiger, and Embrechts (2015) shows that if WE is large,
we can use anapproximation of Equation 3:
σ2t =1−λ
λ (1−λWE )
WE
∑i=1
λ ix2t−i ≈1−λ
λ
∞
∑i=1
λ ix2t−i
and then, they show that:
σ2t = (1−λ )x2t−1 +λσ2t−1. (4)
To describe Equation 4 more intuitively, we call (1−λ ) the
coefficient of reac-tion and λ the coefficient of endurance. The
former determines how much theforecasted variance σ2t should react
to the new information x2t−1 while the latterdetermines how
enduring the variance is. A common value for λ is 0.94 for
dailydata (Zumbach, 2007).
The EWMA model is easy to implement and accounts for the problem
of volatilityclustering. However, a substantial drawback with EWMA
is that it only includesone parameter, λ . Therefore, in every EWMA
model, the sum of the coeffiecientof reaction and the coefficient
of endurance always equals one. Due to the com-plexity of some time
series, we might need a more flexible model to describe
itaccurately and a commonly used substitute model is the GARCH
model, whichwill be described in the next section.
2.2.5 ARMA-GARCH
Tsay (2010) states that to describe the dynamic structure of
observed data, wemight need a high-ordered model including multiple
parameters. In fact, apartfrom volatility clusters, daily log
returns are often also time dependent, which isanother non-random
characteristic that must be accounted for when simulating.
21
-
2 THEORY
To account for seasonality, trends and dependencies we can fit
an autoregressivemoving-average (ARMA) model and for volatility
clustering we can fit a gen-eralized autoregressive conditional
heteroskedasticity (GARCH) model. Below,the reader can find
descriptions of each model separately before a combination ofthem
is presented.
First of all, it is important to understand how we apply these
models to a timeseries. The first step is to understand how we
describe a financial time series ofreturns, rt , as a process:
rt = µt + εt (5)
where rt is the log return at time t, µt is the expected value
of conditional rt andεt is a white noise or error term.
The ARMA(m,n) model is a combination of an autoregressive model
(l.h.s ofEquation 6) and a moving average model (r.h.s of Equation
6) and we use it toforecast rt in a more accurate manner when the
time series is time-dependent.Formally Tsay (2010) define it
as:
Definition 2.6 (ARMA(m,n) process)The ARMA(m,n) process can be
written as
rt−m
∑i=1
airt−i = c+ εt +n
∑j=1
b jεt− j (6)
which gives
rt = c+m
∑i=1
airt−i + εt +n
∑j=1
b jεt− j (7)
where m and n represent how many time lags we include in the
model. c,{a1, ...,am} and {b1, ...,bn} are ARMA parameters and the
random variable εtis white noise.
Basically, the ARMA model is used to specify the conditional
mean of the returnprocess:
E(rt |Ft−1) = c+m
∑i=1
airt−i +n
∑j=1
b jεt− j
where Ft−1 is the information about the time series obtained
until t−1.
With a model for the conditional mean specified, one is needed
for the conditionalvariance as well. Engle (1982) introduced the
autoregressive conditional het-eroscedasicity (ARCH) model in 1982
which in 1986 got generalized (GARCH)by Tim Bollerslev (Bollerslev,
1986). The model is used for volatility modelingpurposes and is
defined by:
22
-
2 THEORY
Definition 2.7 (GARCH process)Let εt = rt − µt be the error at
time t for a process rt . Then εt follows a
GARCH(p,q) process if it satisfies:
εt = σtzt , σ2t = ω +p
∑i=1
αiε2t−i +q
∑j=1
β jσ2t− j
where p and q represent how many time lags we include in the
model. zt arei.i.d. random variables with mean 0 and unit variance,
ω > 0, {α1≥ 0, ...,αp≥0} and {β1≥ 0, ...,βq≥ 0} are GARCH
parameters and ∑pi=1 αi+∑
qj=1 β j < 1.
In the GARCH model, α describes to what degree the last period’s
volatility shockshould affect next period’s volatility and β to
what degree the last volatility shouldaffect next period’s
volatility. ω is a constant which assures that the volatilitynever
equals zero and Alexander (2009b) suggests using Maximum
Likelihood(ML) to estimate these parameters, which will be applied
in this thesis.
Finally, a model that accurately describes a time series which
experience both timedependencies and volatility clustering is
obtained and can be applied for simula-tion purposes. The model is
called ARMA-GARCH and is defined as:
Definition 2.8 (ARMA(m,n)-GARCH(p,q) process)For time t, the
ARMA(m,n)-GARCH(p,q) process is defined through:
rt = c+m
∑i=1
airt−i + εt +n
∑j=1
b jεt− j (8)
εt = σtzt , z∼ i.i.d.(0,1) (9)
σ2t = ω +p
∑i=1
αiε2t−i +q
∑j=1
β jσ2t− j (10)
GARCH(1,1) models were among the first models to take the
volatility clusteringinto account. Here, the variance, σ2t , only
depends on the last period’s shock andvariance (Cont, 2007):
σ2t = ω +αε2t−1 +βσ
2t−1,
which leads to a positive autocorrelation in the volatility
process σ , with a rate ofdecay described by α +β . For
illustration purposes, Example 2.1 provides resultsfrom forecasting
with GARCH(1,1) and ARMA(1,1)-GARCH(1,1) which shouldbe compared to
the results with just random drawings in Figure 4.
23
-
2 THEORY
Example 2.1 (GARCH(1,1) & ARMA(1,1)-GARCH(1,1))If we fit a
GARCH(1,1) model to the S&P500 log returns in Figure 3 and
simulate new returns we obtain a financial time series which
take volatilityclustering in to account. The result of this is
illustrated in Figure 6.
For illustration purposes, Figure 6 also presents a fitted
ARMA(1,1)-GARCH(1,1)model which not only accounts for volatility
clustering, but also time depen-dencies.
2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036-0.05
0
0.05
0.1
log r
etu
rn
GARCH(1,1)
2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036-0.1
-0.05
0
0.05
0.1
log r
etu
rn
ARMA(1,1)-GARCH(1,1)
Figure 6: 16 year simulation of S&P500’s returns using
GARCH(1,1) and ARMA(1,1)-GARCH(1,1)
As observed in Figure 6, these simulations show a similar
pattern to the ob-served returns in Figure 3, especially compared
to the simulations in Figure4.
2.2.6 IGARCH
The constraint of ∑pi=1 αi +∑qj=1 β j < 1 in Definition 2.7
ensures that the finan-
cial time series can be described by a stationary process such
as the GARCHmodel (Alexander, 2009a). However, the constraint is
not always fulfilled, so Tsay(2010) describes a model where ∑pi=1
αi +∑
qj=1 β j = 1. Such a process is referred
to as a GARCH process with a unit-root or an integrated GARCH
(IGARCH)process.
24
-
2 THEORY
Definition 2.9 (IGARCH(p,q) process)Let εt = rt − µt be the
error at time t for a process rt . Then εt follows an
IGARCH(p,q) process if the fitted GARCH parameters
satisfies:
p
∑i=1
αi +q
∑j=1
β j = 1. (11)
In that case, we can model the time series as:
εt = σtzt , σ2t = ω +p
∑i=1
(1−βi)ε2t−i +q
∑j=1
β jσ2t− j
where p, and q represent how many time lags we include in teh
model. zt arei.i.d. random variables with mean 0 and unit variance,
ω ≥ 0 , {β1≥ 0, ...,βq≥0} are GARCH parameters.
For an insight into how a time series looks like where the
errors follow an IGARCH(1,1)model, Figure 7 present such a case.
The figure illustrates the log returns of Sead-rill’s stock (SDRL)
over the time period 2008-02-07 to 2017-12-29.
2008-02-07 2010-01-04 2012-01-03 2014-01-02 2016-01-04
2017-12-29
Date
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Log-r
etu
rn
IGARCH(1,1)-process
SDRL log returns
Figure 7: Seadrill’s log returns between 2008-02-07 and
2017-12-29.
2.2.7 Model verification
When presented with a financial time series it is important to
know if the observa-tions are time-dependent, if they experience
volatility clustering or perhaps both,before fitting a model. It
would be unnecessary, time consuming work, to fit amore complex
model (e.g. ARMA-GARCH rather than only GARCH) to a timeseries if
it is not needed.
Two common tests for time dependencies and volatility clustering
are the Ljung-Box test (Ljung and Box, 1978) and Engle’s ARCH-test
(Tsay, 2010).
25
-
2 THEORY
Definition 2.10 (Ljung Box Test)The Ljung Box statistic Q is
defined as:
Q = n(n+2)h
∑k=1
ρ2kn− k
where n is the sample size, ρk is the sample autocorrelation at
lag k and h isthe number of lags tested.
The Ljung Box test is a hypothesis driven test where:
H0: The data is i.i.d.H1: The data is not i.i.d but rather
exhibit time dependencies.
For a significance level α , we reject H0 when:
Q > χ21−α,h
where χ21−α,h is the (1−α)-quantile of the chi-sqaured
distribution with hdegrees of freedom.
Definition 2.11 (Engle’s ARCH-test)Consider a time series rt =
µt + εt where εt are the residuals. Engle’s ARCH-test is then
performed through the following steps:
• Estimate the best fitting AR(p)-model by:
rt = c+p
∑i=1
airt−i + εt .
• Obtain the least squares of the error, ε̂2, and regress them
on a constantand p lagged values:
ε̂2t = α̂0 +p
∑i=1
α̂iε̂2t−i
where p is the number of lags.
• In the sample of T residuals, the test statistic T R2 follows
an χ21−α,pdistribution with p degrees of freedom.
Engle’s ARCH-test is a hypothesis driven test where:
H0: Abscense of volatility clustering in time series.H1:
Volatility clustering is present in time series.
and we reject null hypothesis if: T R2 > χ21−α,p.
26
-
2 THEORY
By applying the Ljung Box test and Engle’s ARCH-test, one gets a
greater under-standing of what type of model that should be fitted
to each time series. The testsdo not, however, give any information
regarding the number of lags m,n, p,q thatshould be included in the
model, this must also be tested for.
According to Akaike (1998), a model should be evaluated based on
the resultswhen used in a prediction which depends on the estimated
parameters. Akaikesuggests a method for evaluating the model based
on the concept of closeness be-tween the generic distribution
defined by the model and the true distribution, theAIC criterion,
which is commonly used. However, Javed and Mantalos (2013)shows
that the AIC criterion is not consistent. In fact, AIC will to a
high probabil-ity lead to a false result when dealing with a large
amount of data. Schwarz (1978)developed an alternative criterion
that is more consistent, the BIC criterion, whichis also suggested
by Alexander (2009b).
Thus, to find the appropriate set of lags for m,n, p,q, i.e.
which combination thatprovides the best model fit for each time
series, the BIC-criterion is applied foreach set of chosen lags
m,n, p,q and the model with lowest BIC-criterion will beused.
Definition 2.12 (BIC-criterion)The BIC-criterion is defined
as
BIC =−2lnL+ k lnT (12)
where L is the maximized value of the likelihood function, k is
the number offree parameters used in the model and T is the number
of observations.
In other words, the BIC criterion is the goodness-of-fit measure
of an estimatedstatistical model. It consists of two parts and a
compromise takes place betweenthe maximized log likelihood (the
lack of fit component) and kT (the penalty com-ponent) which
increases with the number of parameters and observations
(preventsoverfitting) (Javed and Mantalos, 2013).
2.3 Multivariate Time Series
The previous sections mostly focused on univariate time series.
The reality is thata financial institution often hold multi-asset
portfolios and it is therefore importantto consider multivariate
time series.
When working with multivariate returns, it is important that one
consider the jointdistribution and not the marginal distributions
for each unique time series. In otherwords, it is important to
consider the dependence structure for the time seriesrather than
just applying mathematical models tp the marginal
distributions.
27
-
2 THEORY
2.3.1 Dependence
Two random variables are independent if any information on
either variable doesnot affect the distribution of the other random
variable (Meucci, 2005). To intro-duce the concept of dependence,
consider the concept of conditional distributions:
Definition 2.13 (Conditional distribution in 2-dim)Consider an
N-dimensional random variable X and split it in two subsets:the
k-dimensional random variable XA of the first k entries and the (N−
k)-dimensional random variable XB of the remaining entries, X can
be expressedas:
X =(
XAXB
).
The conditional distribution of the variable XB given xA is the
distribution ofXB knowing that the realization of XA is the
specific value xA. The conditionalrandom variable is denoted as
XB|xA.
Formally, we denote the conditional probability density function
(PDF) as:
fXB|xA(xB) =fX(xA,xB)∫
fX(xA,xB)dxB=
fX(xA,xB)fXA(xA)
.
In other words, we define the conditional PDF of XB given
knowledge of XA asthe joint PDF of XA and XB divided by the
marginal PDF of XA evaluated at theobserved xA.
To describe how different financial time series behave over time
in terms of eachother it is common to use correlation or covariance
as measures. Covariance is themultivariate form of variance and the
covariance matrix is defined as:
Definition 2.14 (Covariance Matrix)Consider the multivariat
financial time series X. The covariance matrix of Xis then defined
as:
Cov(X) = Σ = E((X−E(X))(X−E(X))T )
where E represents the expected value. Each element in Σ can be
express as:
σi, j =Cov(Xi,X j) = E(XiX j)−E(Xi)E(X j)
which is the covariance between asset i and asset j.
With the help of covariance, we can now also define a
correlation matrix:
28
-
2 THEORY
Definition 2.15 (Correlation matrix)Consider the multivariate
financial time series of asset returns X with N
time series. The correlation matrix of X is defined as ρ(X) by
introducingthe standardized vector Y such that Yi = Xi√Var(Xi)
for i = 1, ...,N and taking
ρ(X) =Cov(Y).
Each element in ρ(X), the correlation between the i:th and j:th
asset in X,can be expressed as:
ρi, j = ρ(Xi,X j) =Cov(Xi,X j)√
Var(Xi)Var(X j). (13)
When simulating multi-asset returns, one would not model these
independently.It is important to account for the dependence between
time series. Historically,some equities tend to move together while
others might not, e.g. when the mar-ket is upward trending, some
assets’ values increase in a similar fashion and itwould therefore
not make sense to simulate each asset individually. Due to
stockmovements, the dependence might change over time and the
covariance matrix Σand the correlation matrix ρ(X) depend on the
number of observations that areincluded in the time series.
When using covariance or correlation for describing
dependencies, it is some-times common practice to put larger
weights on the most recent covariances andcorrelations. Just like
in the univariate case, we can do so through the EWMAtechnique
described in Definition 2.5. Equation 3 describes how to estimate
theone-dimensional covariance, i.e. the variance, σ2t . In the
multivariate case onewould instead like to estimate the covariance
matrix Σt . It is easy to construct amultivariate model from the
univariate by exchanging the single return x to thevector of
returns xxx. By doing so, Equation 3 becomes:
Σt =1−λ
λ (1−λWE )
WE
∑i=1
λ ixxxt−ixxxTt−i. (14)
2.3.2 Copula
A reoccurring problem in financial time series analysis is
finding a structure forthe dependence between multiple time series.
In the previous section, theory re-garding covariance and
correlation is provided, which are measure of dependence.However,
as Alexander (2009b) emphasizes, these are measures of
dependencethat can only represent a certain type of risk where each
asset return must followan i.i.d. process and the joint
distribution of the variables must be elliptical. In
29
-
2 THEORY
practice, few assets and portfolios satisfy these conditions.
Instead we must workwith the entire joint distribution of returns
which is built through first specifyingthe marginal distributions
for each time series, and then with the help of a copula,describing
the dependence between the marginals.
Copula theory was introduced by Sklar (1959) and is based on the
fact that wewant to isolate the dependence structure from the
structure of the marginal distri-butions. The advantage of this is
that each marginal distribution may differ andthe copula can still
describe the dependency between them. Basically, copulasare used to
specify a joint distribution in a two-stage process. First we
specifythe type of the marginal distributions and then we specify
the copula distributionwhich "binds" the marginals together.
Since the copula only specifies the dependence structure,
different copulas pro-duce different joint distributions when
applied to the same marginals. There aremultiple families of
copulas which can be found in Alexander (2009b) or McNeil,Rüdiger,
and Embrechts (2015). In this study, due to their common usage in
thefinancial industry, both Gaussian and t-copula will be applied
in the MCS method.
Before defining a copula, recall the following statistical
properties described byHult et al. (2012):
Proposition 2.1Let F be a distribution function on R. Then:
1. u≤ F(x) if and only if F−1(u)≤ x.
2. If F is continuous, then F(F−1(u)) = u.
3. (Quantile Transform) If U is U(0,1)-distributed, then
P(F−1(U)≤ x) =F(x).
4. (Probability transform) If X has a distribution function F ,
then F(X)∼U(0,1) if and only if F is continuous.
30
-
2 THEORY
Now, the concept of copula is defined as:Definition 2.16
(Copula)
An N-dimensional copula is the distribution function C of a
random vector UUUwhose components Uk, k = 1,2, ...,N are uniformly
distributed, i.e:
C(u1, ...,uN) = P(U1 ≤ u1, ...,Ud ≤ uN)
where (u1, ...,uN) ∈ (0,1)N .
Now, let XXX = (X1, ...,XN) be a random vector with joint
distribution
F(x1, ...,xN) = P(X1 ≤ x1, ...,XN ≤ xN)
and continuous marginal distribution functions
Fk = P(Xk ≤ xk)
for k = 1, ...,N.
From Proposition 2.1 we know that the components of the
vector
UUU = (U1, ...,UN) = (F1(X1), ...,FN(XN))
are uniformly distributed. More specifically, we can also say
that the distribu-tion C of UUU is a copula which we will call the
copula function of X since:
C(F1(x1), ...,FN(xN)) = P(U1 ≤ F1(x1), ...,UN ≤ FN(xN))= P(F−11
(U1)≤ x1, ...,F
−1N (UN)≤ xN)
= F(x1,x2, ...,xN) (15)
Equation 15 is the result from Sklar’s Theorem which explains
the representationof the joint distribution F in terms of the
copula C and the marginal distributionsFk (Ruschendorf, 2013).
Like previously stated, we have different copula families, all
of which will applya separate dependence structure to the
marginals. Hence, it is of great importanceto choose a copula that
represents the actual dependence which is not always atrivial task.
Below follows definitions of two of the most commonly used
copulasfor financial time series, the Gaussian and t-copula.
31
-
2 THEORY
Definition 2.17 (Gaussian copula)The Gaussian copula is derived
from the N-dimensional multivariate and uni-variate standard normal
distribution functions, denoted ΦΦΦ and Φ, respectively.It is
defined by:
C(u1,u2, ...,uN ;ρ(XXX)) = ΦΦΦ(Φ−1(u1),Φ−1(u2),
...,Φ−1(un)),
where ρ(XXX) is the correlation matrix.
Definition 2.18 (t-copula)The N-dimensional symmetric t-copula
is derived implicitly from a multivari-ate distribution function
and defined by:
Cv(u1,u2, ...,uN ;ρ(XXX)) = tttv(t−1v (u1), t−1v (u2), ...,
t
−1v (un)),
where tttv and tv are the multivariate and univariate
t-distributions with v degreesof freedom and ρ(XXX) is the
correlation matrix.
The correlation matrix ρ(XXX) is seen in Definition 2.17 and
2.18 and it is a crucialcomponent which takes account for the
dependencies between the marginals. Thecorrelation matrix is used
when finding the joint distribution and simulating fromit. The
explicit formula for generating numbers from a copula can be found
inAlexander (2009b).
Like any marginal or joint distribution function, copulas have
conditional distri-butions. For simplicity, Alexander (2009b)
describes the conditional distributionfor a bivariate copula
C(u1,u2), but this can of course be applied for multipledimensions
as well.
Let us consider the bivariate case where there are two
conditional distributions ofthe copula defined as:
C(u1|u2) = P(U1 ≤ u1|U2 = u2)
andC(u2|u1) = P(U2 ≤ u2|U1 = u1).
The conditional distribution is derived by taking the first
derivatives of the copulawith respect to each variable:
C(u1|u2) =∂C(u1,u2)
∂u2C(u2|u1) =
∂C(u1,u2)∂u1
32
-
2 THEORY
2.3.3 Simulation with copula
Using copulas for simulation purposes is a vital part of this
thesis. The methodwill be applied to the benchmark method, MCS, to
account for the dependenciesbetween the equities. Below follows a
general description of simulation withcopula by Salvadori et al.
(2007).
Let F be a multivariate distribution with continuous marginals
F1,F2, ...,FN , andsuppose that F can be expressed in a unique way
via an N-copula, C, throughEquation 15. In order to simulate a
vector (X1,X2, ...,XN) ∼ F , it is sufficientto simulate a vector
(U1,U2, ...,UN) ∼C, where the random variables Ui are uni-formly
distributed on (0,1). By Equation 15 and Proposition 2.1 we
know:
Ui = Fi(Xi)⇐⇒ Xi = F−1i (Ui),
where i = 1,2, ...,N, the random variables Xi have the marginal
distributions Fi,and joint distribution F . In general, to simulate
values we apply the followingsteps:
• To simulate a sample uk from Uk, conditional on the previously
sampledu1,u2, ...,uk−1, we need to know the distribution of Uk
conditional on theevents {U1 = u1,U2 = u2, ...,Uk−1 = uk−1}. Let us
denote this law byGk(uk|u1,u2, ...,uk−1), given by:
Gk(uk|u1,u2, ...,uk−1) = P(Uk ≤ uk|U1 = u1,U1 = u2, ...,Uk−1 =
uk−1)
=∂u1,u2,...,uk−1C(u1,u2, ...,uk,1, ...,1)
∂u1,u2,...,uk−1C(u1,u2, ...,uk−1,1, ...,1).
Then we take uk = G−1k (u′k|u1,u2, ...,uk−1), where u′k is the
realization of a
random variable U ′k uniformly distributed and independent of
U′1,U
′2, ...,U
′k−1.
• Using the probability integral transform, it is easy to
generate the sample(x1,x2, ...,xN) extracted from F :
(x1,x2, ...,xN) = (F−11 (u1),F−12 (u2), ...,F
−1N (uN)) (16)
33
-
2 THEORY
2.4 VaR approaches
There are different methods applied by financial institutions to
calculate VaR. Inthe section below, two of the most common methods;
Historical Simulation andMonte Carlo Simulation will be
presented.
2.4.1 Historical Simulation
In the HS approach, few assumptions about the statistical
distributions need to bemade. The approach involves using
historical risk factor changes, or log returns,to construct a
distribution of potential future portfolio P&Ls and then
extractingVaRα as the loss that is exceeded only α ·100% of the
time (Linsmeier and Pear-son, 2000).
The distribution of the portfolio’s P&Ls is obtained by
taking the current portfolioand applying actual historical changes
of the assets in the portfolio over the pastN periods. This allows
us to compute N hypothetical scenarios for our currentportfolio and
we can extract the α ·N:th scenario of the ordered portfolio
valuesas our -VaRα .
Formally, Linsmeier and Pearson (2000) describes the process of
HS in five steps:
• Step 1 - Identify the basic risk factors for the current
portfolio (in our case,this will be the stocks’ log returns).
• Step 2 - Obtain historical values for the risk factors over
the past N periods.
• Step 3 - Subject the current assets of the portfolio to the
historical risk factorchanges and calculate the daily P&Ls.
• Step 4 - Order the N portfolio P&Ls from the largest loss
to the largestprofit.
• Step 5 - Extract the loss that is equaled or exceeded α ·100%
of the time.
HS is a non-parametric approach and only reasonable under the
assumption thatthe market changes which historically has produced a
certain set of returns are thesame as those changes that will be in
the future. HS has advantages in the formof its simplicity in both
comprehension and computation. However, the majordisadvantage is
that it is based solely on the assumption that history will
repeatitself. Results will vary depending on the history and the
probability of an underor overestimation of extreme losses is
probable.
34
-
2 THEORY
Example 2.2 (Historical Simulation)For illustration purposes, we
assume that a bank holds one long position in theunivariate
S&P500 index at the first of December 2017 and wants to
completea HS to obtain the 1-day VaRα with 252, 500 and 1000 days
in the observationwindows at α-levels of 0.005, 0.01, 0.05 and 0.1.
We follow the five stepprocess described above:
• Step 1 - The risk factor is the daily log returns of the
S&P500 index.
• Step 2 - Historical prices are presented in Figure 2.Log
returns are calcu-lated by applying Equation 2 to the values in the
observation windows.
• Step 3 - The current value of the position is $2238.8, apply
the log re-turns obtained in Step 2 to the current portfolio and
calculate the P&Ls.
• Step 4 - Order the P&Ls from largest loss to the largest
profit.
• Step 5 - Obtain the loss that is exceeded α · 100% of the
time, i.e theα ·N:th value in the ordered vector of P&Ls where
N is the number ofobservations.
Figure 8 illustrates the P&L distribution and the 1-day
VaR0.01 obtained byHS. For each observation window, there is a
specific P&L distribution andtherefore also different VaRα
results.
-100 -80 -60 -40 -20 0 20 40 60
P&L
0
10
20
30
40
50
60
Observ
ations
P&L distribution from HS-252
(a) 252 days HS
-100 -80 -60 -40 -20 0 20 40 60 80 100
P&L
0
20
40
60
80
100
120
Observ
ations
P&L distribution from HS-500
(b) 500 days HS
-100 -80 -60 -40 -20 0 20 40 60 80 100
P&L
0
50
100
150
200
250
Observ
ations
P&L distribution from HS-1000
(c) 1000 days HS
Figure 8: P&L distribution of S&P500 obtained from HS
with 252 (a), 500 (b) and 1000 days(c) as observation window. The
red lines represent the 1-day VaR0.01.
35
-
2 THEORY
Table 1 present the 1-day VaRα for different values of α and
observation win-dows and as we can see, there is variation in the
results. For example, theVaR0.005 with 252 days of observations
equals $81.9 while 1000 days of ob-servations equals $58.2. One can
therefore conclude that the market in generalhas experienced more
frequent large losses over the past 252 days comparedto the past
1000 days and thus, resulting in a higher VaR0.005.
Table 1: HS 1-day VaR for S&P500 with different values of α
and different length of obser-vation windows.
α 252 days 500 days 1000 days
0.005 $81.9 $72.5 $58.20.01 $55.6 $58.2 $52.00.05 $31.1 $32.8
$31.00.1 $21.0 $23.1 $20.8
From Example 2.2, it should be clear that there are
uncertainties involved withHS and the result heavily depends on the
chosen α level as well as the size of theobservation window.
2.4.2 Monte Carlo Simulation
MCS has a few similarities to HS but differs in one major
aspect: When applyingMCS, one choose a distribution that is
believed to adequately capture or approxi-mate the possible risk
factor changes rather than using the observed changes overthe last
N periods.
Formally, the definition of MCS is the average expectation of
the function f (inour case, f = σz from Equation 9) evaluated at a
random location (Russel, 1998):
Definition 2.19 (Monte Carlo method)Consider an integral in the
one-dimensional unit interval
I[ f ] =∫ 1
0f (x)dx = f̄ .
Let x be a random variable that is uniformly distributed on the
unit interval,then
I[ f ] = E[ f (x)].
36
-
2 THEORY
For an integral in the unit cube Id = [0,1]d in d dimensions we
have:
I[ f ] = E[ f (xxx)] =∫
Idf (xxx)dxxx
in which xxx is a uniformly distributed vector in the
d-dimensional unit space.
The Monte Carlo method is based on the probabilistic
interpretation of anintegral. If we consider a sequence {x} sampled
from the uniform distributionwe can approximate the expectation
as:
IN [ f ] =1N
N
∑n=1
f (xn)
and according to the Strong Law of Large Numbers, this
approximation isconvergent with probability of one, i.e:
limN→∞
IN [ f ]→ I[ f ].
In MCS, a pseudo-random number generator is used to generate
(often) multiplethousands of hypothetical risk factor changes which
then are applied to the cur-rent positions to create hypothetical
future portfolio P&Ls. Finally, the VaR isdetermined from
extracting the α quantile of the obtained P&L distribution.
Linsmeier and Pearson (2000) describes the process of MCS in
five steps:
• Step 1 - Identify the risk factors for the current portfolio
(in our case, thiswill be the stocks’ returns).
• Step 2 - Determine or assume a specific distribution for
changes in the riskfactors and estimate the parameters of that
distribution.
• Step 3 - Use a pseudo-random number generator to generate N
hypotheticalvalues of changes in the risk factors, where N is
large. These hypotheticalchanges are then applied to the assets in
the portfolio and used to calculate Nhypothetical portfolio values.
Finally, from each of the hypothetical portfo-lio values, one
subtracts the current portfolio value to obtain N hypotheticaldaily
P&Ls.
• Step 4 - Order the portfolio P&Ls from the largest loss to
the largest profit.
• Step 5 - Select the loss that is equaled or exceeded α ·100%
of the time.
The ability to pick the distribution is the feature that really
distinguishes MCSfrom other approaches. Here, it is possible to
choose a distribution based on own
37
-
2 THEORY
preferences even though it is often based on observed data. The
distribution canalso include tweaks in terms of personal
assumptions of future market changes.
The MCS is a very powerful and flexible technique used in many
areas. One of theadvantages is that the method incorporates all
desirable distributional propertiesthat one might want to include
(e.g. fat tails). However, the result depends heavilyon the chosen
distribution which will result in highly differentiated results.
Fur-thermore, MCS is a complex method and for high-dimensional
portfolios it is verycomputationally demanding.
Example 2.3 (Monte Carlo simulation)For comparison purposes, we
assume that a bank holds the same position as
in Example 2.2 and wants to complete an MCS to obtain the 1-day
VaRα with1000, 5000 and 10000 and 100000 simulations and α levels
of 0.005, 0.01,0.05, 0.1. We follow the five step process
above:
• Step 1 - The risk factor is the daily return of the S&P500
index.
• Step 2 - Historical prices are presented in Figure 2 and the
log returnsare calculated by applying Equation 2 to the total time
series. Fit adistribution to these log returns.
• Step 3 - Generate 1000, 5000, 10000 and 100000 pseudo-random
num-bers from the distribution to represent the potential future
risk factorchanges. Apply these to the current portfolio to achieve
the P&L distri-bution.
• Step 4 - Order the P&Ls from largest loss to the largest
profit.
• Step 5 - Obtain the loss that is exceeded α · 100% of the
time, i.e theα ·N:th value in the ordered P&Ls where N is the
number of simulations.
Figure 9 represents the distribution of S&P500’s log returns
between January1st 2000 and December 1st 2016. It seems like the
most appropriate distribu-tion that describe the returns is the
student t-distribution. More precisely, thefitted distribution has
a mean of 0.000483, standard deviation of 0.0074 and2.7293 degrees
of freedom.
38
-
2 THEORY
-0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1
log return
0
10
20
30
40
50
De
nsity
S&P500
log return
Normal
student t
Figure 9: Distribution of S&P500 log returns.
From the obtained t-distribution, we generate N pseudo-random
numbers. Fig-ure 10 presents the obtained P&L distributions for
1000, 5000, 10000 and100000 simulations. The red lines represent
the VaR0.01.
-200 -150 -100 -50 0 50 100 150 200
P&L
0
20
40
60
80
100
120
140
Ob
se
rvatio
ns
P&L distribution from MCS, 1000 simulations
(a) 1000 simulations.
-200 -150 -100 -50 0 50 100 150 200
P&L
0
200
400
600
800
1000
1200
1400
Ob
se
rvatio
ns
P&L distribution from MCS, 5000 simulations
(b) 5000 simulations.
-200 -150 -100 -50 0 50 100 150 200
P&L
0
200
400
600
800
1000
1200
1400
Obse
rvatio
ns
P&L distribution from MCS, 10000 simulations
(c) 10000 simulations.
-200 -150 -100 -50 0 50 100 150 200
P&L
0
0.5
1
1.5
2
2.5
3
Obse
rvatio
ns
104 P&L distribution from MCS, 100000 simulations
(d) 100000 simulations.
Figure 10: P&L distribution of S&P500 obtained from the
MCS method with 1000 (a), 5000(b), 10000 (c) and 100000 simulations
(d). The red line represent the 1-day VaR0.01.
Table 2 show the 1-day VaRα for different values of α and
different numbersof simulations. As we can see, there are big
differences for different numbersof simulations, especially for VaR
with lower α . For example, the VaR0.005obtained with 1000
simulations equals $165.4 while 100000 simulations result
39
-
2 THEORY
in $110.0. Furthermore, it appears that as the number of
simulations increases,the VaRα converges to a specific value and
more simulations will most likelynot result in a more accurate VaRα
.
Table 2: MCS 1-day VaR for S&P500 with different values of α
and different numbers ofsimulations.
α 1000 sim 5000 sim 10000 sim 100000 sim
0.005 $165.4 $116.9 $110.2 $110.00.01 $85.8 $88.9 $85.8
$85.60.05 $41.6 $45.0 $42.2 $42.10.1 $28.7 $30.4 $29.3 $29.0
The results in Example 2.3 are obtained by using the fitted
student t-distributionfrom Figure 9. The reader should be aware of
the dangers of changing the dis-tribution to for example a normal
distribution when generating pseudo-randomnumbers. As can be
observed in Figure 9, the normal distribution does not accu-rately
represent the distribution of the log returns and might therefore
not reflectthe potential future loss.
2.5 Dimension Reduction
One way of speeding up a simulation is to reduce a
high-dimensional problem to alower one. Dimension reduction
techniques are used for many statistical problemsand below, the
reader will be presented with one of such techniques: the
principalcomponent analysis (PCA) and an application of it on GARCH
models.
2.5.1 Principal Component Analysis
PCA is a statistical technique which allows the user to obtain a
set of relatively fewdescribing variables from a large set of
variables. The technique is popular whenworking with high
dimensions because it is an effective approach to reduce
di-mensions while still describing the majority of the variance in
the data-set (Jameset al., 2014).
Kreinin et al. (1998) describes how PCA can be applied in MCS.
The idea behindPCA is fairly straight forward when considering a
set of risk factors and their jointdistribution. First, PCA obtains
a new set of uncorrelated vectors that are a linearcombination of
the original data. These variables are called principal
components(PC). An MCS of the risk factors is performed by
generating independent randomforecasts of the uncorrelated PCs and
then using the linear transformation to obtainthe forecasted
correlated risk factors. By selecting only a subset of the PCs
whichrepresent a big part of the variance in the data, the
dimensionality is reduced.
40
-
2 THEORY
Alexander (2009b) describes PCA as a technique based on the
eigenvalue eigen-vector decomposition of the risk factors’
correlation matrix. So, when applyingPCA to a data set containing
multiple financial time series, we first denote Y asthe T ×N matrix
of daily log returns to be analyzed by PCA. T is the number ofdata
points used in the analysis and N is the number of equities. Let
yt,i denote thedaily return of stock i at time t, i = 1,2, ...,N, t
= 1, ...,T . Normalizing the datagives:
xt,i =yt,i−µi
σi(17)
where µi and σi are the mean and standard deviation of yi.
Normalization is animportant step in PCA since the results are
sensitive to scaling of data.
The T ×N matrix X will now represent the data set of normalized
log returns andthe columns x1, x2, ...,xN , are the univariate time
series.
The standardized log returns’ correlations are summarized by the
N×N matrix:
XT X.
with the diagonal elements of ones.
As already stated, PCA utilize the eigenvalue eigenvector
decomposition and byextracting the eigenvectors from XT X we get
what is referred to as the factorweights, which are vectors
contained in the matrix W. W is an orthogonal matrixwhere we denote
Λ as the diagonal matrix of the corresponding eigenvalues.
Theweights in W are crucial for finding the PCs which we formally
define as:
Definition 2.20 (Principal components)Let X be the T ×N matrix
of T data points and N equities. Then, the prin-
cipal components are defined as vectors which are linear
combinations of thecolumns of X where the vector weights are chosen
in such a way that (Alexan-der, 2009a):
• the principal components are uncorrelated with each other
• the first principal component explains the most variation
(i.e. the great-est amount of the total variation in X) and the
second component ex-plains the greatest amount of the remaining
variation, etc.
Now, let Λ denote the diagonal matrix of ordered eigenvalues
obtained fromXT X and W the orthogonal matrix of corresponding
eigenvectors, also calledweights, ordered according to Λ. Then, the
principal components are vectorsin a T ×N matrix, P, defined as the
product of the matrix of input data X andthe eigenvector matrix
W:
P = XW.
41
-
2 THEORY
More specifically, the m:th principal component is defined as
the m:th columnof P, which can be written as:
pm = w1mx1 +w2mx2 + ...+wNmxN (18)
where wm =(w1m,w2m, ...,wNm)T is the eigenvector corresponding
to the eigen-value λm in Λ, the m:th largest eigenvalue of XT
X.
Equation 18 shows that each PC is a linear combination of the
column of X withfactor weights given by the elements in W. The
total variation in X is the sum ofthe eigenvalues in Λ, i.e. λ1 +λ2
+ ...+λN . Hence, the proportion of this totalvariation that is
explained by the m:th principal component is
λmλ1 + ...+λN
and in a highly correlated system, the first eigenvalue can
explain a larger part ofthe total variation.
Alexander (2009a) continues and describes how we can represent
the data withthe principal components:
Since P = XW and WT = W−1 we have
X = PWT .
In other words, each of the original returns in X can be
described as a linearcombination of the principal components
by:
xi = wi1p1 +wi2p2 + ...+wiNpN (19)
and since in a highly correlated system, there are only a few
independent sourcesof variation, we can represent the matrix X with
k < n number of principal com-ponents and still explain a large
part of the variation. For example, by using thefirst three
principal components we get:
xi ≈ wi1p1 +wi2p2 +wi3p3
which in matrix notation is:
X≈ P∗W∗T (20)
where X is the T ×N matrix of standardized returns, P∗ is a T ×
k matrix with kcolumns representing the first k principal
components, and W∗ is a k×N matrixwith k columns of weights given
by the first k eigenvectors.
42
-
2 THEORY
Another way of describing X with only k principal components is
through:
X = P∗W∗T +E
where E is the part of the variation that is not not captured by
only using k PCs.
When choosing the numbers of PCs that should be used, we usually
want to ex-plain a certain proportion of the variation in the data
set. Kreinin et al. (1998)suggests an approach where we suppose ε∗
is an admissible proportion of unex-plained variance. Then we
select the minimal number, k, of principal componentssatisfying the
inequality
λ1 + ...+λkλ1 + ...+λN
> 1− ε∗. (21)
The PCs with indices j > k have a smaller effect on the
underlying vector of riskfactors since the corresponding
eigenvalues are smaller.
2.5.2 Orthogonal ARMA-GARCH
It might not be trivial to understand how PCA can be used as a
forecasting tech-nique but Equation 19 show that it is possible to
describe each equity’s standard-ized return for all data points
with the help of the obtained PCs and the weights.This means that
the PCs can be seen as individual time series which are
uncorre-lated with each other.
So, when applying PCA for forecasting purposes, the goal is to
forecast tomor-row’s PC coefficient and since the PCs can be seen
as individual time series, it ispossible to utilize the same models
as in MCS. Definition 2.7 presents a methodfor forecasting
conditional variances and Alexander (2009b) refers to
applyingGARCH-models to the orthogonal PCs as orthogonal GARCH
(O-GARCH) andjust as in the case with a regular financial time
series, we might also want to applyan ARMA model to handle time
dependencies which would be referred to as anO-ARMA-GARCH
process.
Since the PCs are independent, univariate GARCH models can be
used and we donot have to account for the dependencies between the
PCs. When k conditionalvariances are forecasted and new PC
coefficients calculated, it is easy to back-transform the results
to obtain forecasted returns by:
Xt = P̃∗W∗T
where Xt is the forecasted vector of standardized returns, P̃∗
is the vector of kforecasted PC values and W∗T the k corresponding
vectors of weights. Alexander(2009b) emphasize that when
forecasting Xt we do not have to forecast a new W∗
43
-
2 THEORY
but rather use the observed matrix since it seems to be fairly
constant betweenobservations.
2.5.3 Cluster Analysis
The previous section describes PCA and how it is a good method
to apply whenthe data is highly correlated. While this might be the
case for interest rates or sometypes of derivatives, this is not
the case for equities. In fact, equity price series canvary a lot
which is shown in Figures 11a and 11b. The former illustrates
Apple’sand Amgen’s stock prices while the later illustrates Apple’s
and Arcerol Mittal’s,all of which are traded on the New York Stock
Exchange. Figure 11a provides agood example of two stocks that have
a high correlation, i.e. the price movementsare similar, while
Figure 11b illustrates a low correlation.
(a) Historical closing prices of Ap-ple and Amgen with ρ =
0.9151.
(b) Historical closing prices of Ap-ple and Arcerol Mittal with
ρ =−0.6824.
Figure 11: Illustration of historical closing prices.
In the data set used in this thesis, including 2520 equities, it
is bound to includeprice series with high and low correlations.
When this is the case, Alexander(2009b) suggests to divide the data
set into clusters of highly correlated equitiesand subsequently
perform PCA on each cluster. By doing so, the first PC foreach
cluster, which describes the general trend of the data, will be
more accuratein terms of describing all stocks in that particular
cluster and thus also generatea more accurate forecast. By
combining the results from each cluster, the totalportfolio result
is obtained.
To organize the stocks in clusters it is common to use a
distance metric, di, j,between for example stock i and j. Mantegna
(1999) emphasizes that a distancemetric must uphold the following
properties:
44
-
2 THEORY
1. di, j ≥ 0,
2. di, j = 0⇐⇒ i = j,
3. di, j = d j,i
and since the correlation metric, described in Equation 13, not
always upholdproperty (1), it cannot be used without manipulation.
A simple but effectivemethod which is applied in this thesis is to
let:
di, j = 1−ρi, j. (22)
By doing so, the highly correlated stocks will have distance
metrics close to zeroand all metrics are ≥ 0.
Dividing the data into c clusters based on their distance
metrics can be completedby first creating a spanning tree (ST).
This is a concept from graph theory wheremany different methods can
be used based on the clustering preferences (West,2001). Tola et
al. (2008) describes one method which is used in this study,
thecomplete linkage clustering, in the following way:
Assume the matrix D consists of distances between pairs of
elements in the systemto be clustered. Begin by disjointing
distance metrics with value 0.
Start from N clusters, each containing one item. Then at each
iteration:
1. Using the current matrix D of cluster distances, find the two
closest clusters.
2. Update the list of clusters by merging the two closest
entries.
3. Update the distance matrix accordingly.
Repeat until all items are joined in one cluster.
The complete linkage clustering method updates the distance
matrix in step 3 in aparticular way. The distance between the new,
merged entry, and another item orcluster, is the furthest distance
between any items in these clusters. For illustrationpurposes I
present a simple example on the next few pages:
45
-
2 THEORY
Example 2.4 (Complete linkage clustering)Assume we have the
closing prices between 1st of January 2010 and 31st ofDecember 2017
for five stocks; Bank of America (BAC), Citibank (C), J.P.Morgan
(JPM), Apple (APPL) and Amazon (AMZN). To cluster these stocksbased
on their correlation with the complete linkage clustering method,
westart by calculating each pair’s correlation, ρi, j, using
Equation 13. We getthe correlation matrix (due to symmetry, only
the upper triangular matrix isillustrated):
Table 3: Correlation matrix for the five stocks.BAC C JPM APPL
AMZN
BAC 1 0.92 0.90 0.67 0.76C 1 0.91 0.72 0.75
JPM 1 0.87 0.93APPL 1 0.86AMZN 1
Now, calculate the distance metric di, j for each pair of stocks
by applyingEquation 22. We obtain the distance matrix D:
Table 4: Distance matrix for the five stocks.BAC C JPM APPL
AMZN
BAC 0 0.08 0.10 0.33 0.24C 0 0.09 0.28 0.25
JPM 0 0.13 0.07APPL 0 0.14AMZN 0
The ST corresponding to the distance matrix D is computed using
the completelinkage method. In Table 4 we find that the minimum
distance is betweenAMZN and JPM (d = 0.07). The first step is thus
to merge these two equitiesinto one cluster, AMZN-JPM, which
becomes a new item. We then createa new distance matrix, D, where
the distances between AMZN-JPM and theother items, BAC, C and APPL,
are the maximum distances between the itemand AMZN or JPM:
Table 5: New distance matrix including one cluster of two stocks
and 3 individual stocks.AMZN-JPM BAC C APPL
AMZN-JPM 0 0.24 0.25 0.14BAC 0 0.08 0.33
C 0 0.28APPL 0
46
-
2 THEORY
The next shortest distance is now found in the new distance
matrix betweenBAC and C (d = 0.08). This means that we now have two
separate clusterscontaining multiple stocks in the ST which is
shown by Figure 12A, the firstcluster; AMZN-JPM and the second
cluster; C-BAC.
By merging BAC and C we get a new distance matrix, D:
Table 6: New distance matrix including two clusters of two
stocks and 1 individual stock.AMZN-JPM BAC-C APPL
AMZN-JPM 0 0.25 0.14BAC-C 0 0.33APPL 0
and the next two closest elements in D are AMZN-JPM and APPL (d
= 0.14)which creates the ST in Figure 12B.
Once again, a new distance matrix D is computed:
Table 7: Distance matrix for one cluster of three stocks and one
cluster of two stocks.AMZN-JPM-APPL BAC-C
AMZN-JPM-APPL 0 0.33BAC-C 0
where we find that the two clusters, AMZN-JPM-APPL and BAC-C
ar