Top Banner
Value at Risk for a high-dimensional equity portfolio A comparative study investigating computational complexity and accuracy for different methods Robin Lundberg Master Thesis, 30 credits M.Sc. in Management & Industrial Engineering, Risk Management 300 credits Department of Mathematics and Mathematical Statistics Spring 2018
91

Value at Risk for a high-dimensional equity portfolioumu.diva-portal.org/smash/get/diva2:1217261/FULLTEXT01.pdfS&P500 Standard and Poor’s 500. ST Spanning Tree. VaR Value at Risk.

Oct 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Value at Risk for a high-dimensionalequity portfolio

    A comparative study investigating computational complexityand accuracy for different methods

    Robin Lundberg

    Master Thesis, 30 creditsM.Sc. in Management & Industrial Engineering, Risk Management 300 credits

    Department of Mathematics and Mathematical StatisticsSpring 2018

  • Abstract

    Risk management is practiced in many financial institutions and one of the mostcommonly used risk measures is Value at Risk. This measure represents howmuch a portfolio of assets could lose over a pre-specified time horizon to a cer-tain probability. Value at Risk is often utilized to calculate capital requirementsand margins, which work as collateral to cover potential losses that might occurdue to market turbulence. It is important that the calculation of Value at Risk isaccurate which require complex and time demanding models but many financialinstitutions also wishes to calculate Value at Risk continuously throughout theday, which requires computational speed.

    Today’s most commonly used method for calculating Value at Risk is his-torical simulation which is a simple but often inaccurate method. It is criticizedby many scholars since it heavily depends on the assumption that history willrepeat itself. A substitute method to historical simulation is the Monte Carlosimulation which is seen as a more accurate and robust method. However, for ahigh-dimensional portfolio, Monte Carlo simulated Value at Risk is very com-putationally demanding and in many cases it is not possible to use it due to timeconstraints.

    The study investigates alternative methods for calculating Value at Risk withthe purpose of finding a method that could be used as a substitute to the MonteCarlo method. The portfolio used in this thesis is a high-dimensional equityportfolio containing 2520 equities with 10 years of observations. I find thatby first using a clustering algorithm to divide the equities in to groups basedon their correlation, and then applying principal component analysis to achievea lower dimensional problem, computational time can be reduced by approxi-mately 99% and still provide an accurate result.

    Sammanfattning

    Risk managemant är ett verktyg vanligt förekommande bland finansiella insti-tutioner där ett av de mest vanliga riskmåstten är Value at Risk. Detta är ettkvantitativt mått som beskriver hur mycket av en portfölj tillgångar man riskeraratt förlora under en förebestämd tidsperiod till en specifik sannolikhet. Dessaegenskaper gör Value at Risk till ett bra mått att använda för att beräkna mäng-den kapital som behövs för att täcka potentiella framtida förluster. Det är vik-tigt att beräkningen av Value at Risk är noggrann, vilket ställer höga krav påmodellens komplexitet. Dessutom vill många finansiella institutioner beräknaValue at Risk kontinuerligt över dagen, vilket ställer ytterligare krav på model-lens beräkningstid.

    Den modell som idag är mest vanligt förekommande vid Value at Risk-beräkningar är historisk simulering vilket är en simpel men ofta felaktig modell.Historisk simulering kritiseras ofta på grund av dess höga beroende av antagan-det att historiska marknadsrörelser kommer att upprepa sig. En alternativ metodtill historisk simulering är Monte Carlo-simulering vilket ofta anses vara en merrobust och noggrann metod att använda. Metoden har dock nackdelar när detkommer till beräkningstid och för en flerdimensionell portfölj så riskerar tidsre-striktionerna att överskridas.

    I denna uppsats så undersöker jag alternativa metoder för att beräkna Valueat Risk med målet att hitta ett substitut till Monte Carlo-simulering. Portföljensom testas är en aktieportfölj av höga dimensioner, bestående av 2520 aktieroch 10 års data. I studien presenterar jag en metod som är en kombinationav klusterfiering baserat på aktiernas korrelation och principalkomponentanalys.Metoden reducerar beräkningstiden med cirka 99% jämfört med Monte Carlo-metoden samtidigt som den resulterar i ett noggrant resultat.

    i

  • Acknowledgements

    I would like to express my gratitude to my supervisors Mikael Öhman andMonika Monstvilaite at Cinnober Financial Technology who have guided methrough any problems that have arisen during the model implementation. Fur-thermore, I’d like to thank Lisa Hed at Umeå University for her guidance inwriting this thesis and Markus Ådahl at Umeå University who have brought mesome valuable insights when it comes to the choice of methods.

    ii

  • CONTENTS

    Contents1 Introduction 8

    1.1 Background 81.1.1 Cinnober Financial Technology 81.1.2 Value at Risk 91.1.3 Risk Management in Practice 101.1.4 Risks 11

    1.2 Problem Statement 111.3 Purpose 121.4 Delimitations 121.5 Data Description 121.6 Approach & Outline 12

    2 Theory 142.1 Risk Measures 14

    2.1.1 Coherent Risk Measure 142.1.2 Value at Risk 15

    2.2 Financial Time Series 172.2.1 Asset Returns 172.2.2 Volatility clustering 182.2.3 Autocorrelation 192.2.4 EWMA 202.2.5 ARMA-GARCH 212.2.6 IGARCH 242.2.7 Model verification 25

    2.3 Multivariate Time Series 272.3.1 Dependence 282.3.2 Copula 292.3.3 Simulation with copula 33

    2.4 VaR approaches 342.4.1 Historical Simulation 342.4.2 Monte Carlo Simulation 36

    2.5 Dimension Reduction 402.5.1 Principal Component Analysis 402.5.2 Orthogonal ARMA-GARCH 432.5.3 Cluster Analysis 44

    2.6 Backtesting 48

    3 Model Implementation 493.1 The Historical Simulation Method 503.2 The Monte Carlo Method 513.3 The Principal Component Analysis Method 533.4 The Cluster Principal Component Analysis Method 553.5 Backtesting 56

    iii

  • CONTENTS

    4 Results 574.1 Data Exploration 574.2 VaR Result 624.3 Model Verification 664.4 Computational Complexity 72

    5 Conclusion and Further Studies 74

    6 Appendix 80

    A Total Backtesting Results 80

    B Best Fitting Models for Multiple Lags 86

    iv

  • GLOSSARY

    Glossary

    Annualizing factor The factor which you have to multiply with toannualize a rate of length into a rate that reflectsthe rate on a yearly basis.

    Clearing house A third-party intermediary between buyers andsellers of financial instruments responsible forsettling trading accounts, clearing trades, collect-ing and maintaining margins, and reporting trad-ing data.

    Copula Translates to "connection" in Latin. Here, it con-nects the marginal distributions of a multivariatetime series to a dependency structure.

    Derivative A financial contract which value depends on oneor multiple underlying assets.

    Exchange An institution used for the trading of financialinstruments.

    Exponential smoothing A technique applied to a time series where expo-nentially decreasing weights are assigned to theobservations decreasingly, from newest to oldestobservation.

    Instrument A financial contract.

    Long/short position Generally describes if the investor has purchasedan asset believing it will increase in value or ifthe investor has sold the asset believing that itwill decrease in value.

    Margin requirement Percentage of a position’s value that may be usedas a collateral to finance its purchase.

    Quantile A cut point dividing the range of a probabilitydistribution.

    1

  • GLOSSARY

    Regulatory requirements Rules or laws imposed by an outside (usuallygovernmental) agency that must be met by ev-ery product or service under the purview of thatagency.

    Risk Possibility of financial losses from investments.

    Risk management The process of identification, analysis and accep-tance or mitigation of uncertainty of investmentdecisions.

    Standard and Poor’s 500 An American equity index based on the marketcapitalization of the 500 largest companies on theNYSE or NASDAQ.

    Tail loss The extreme losses that corresponds to the "tails"of the distribution.

    Time lag The period between two related events.

    Volatility shock A shock or a change to the standard deviation.

    White noise A time series is called white noise if it is a se-quence of independent and identically distributedrandom variables with finite mean and variance.

    2

  • ACRONYMS

    Acronyms

    BIC Bayesian Information Criterion.

    CCI Christoffersen’s Confidence Interval.

    CDF Cumulative Distribution Function.

    CPCA Cluster PCA.

    ES Expected Shortfall.

    EWMA Exponentially Weighted Moving Average.

    GARCH Generalized Autoregressive ConditionalHeteroskedasticity.

    HS Historical Simulation.

    i.i.d Independent and identically distributed.

    IGARCH Integrated GARCH.

    MCS Monte Carlo Simulated.

    ML Maximum Likelihood.

    O-GARCH Orthogonal GARCH.

    P&L Profit and Loss.

    PC Principal Component.

    PCA Principal Component Analysis.

    PDF Probability Density Function.

    POF Proportion of Failures.

    S&P500 Standard and Poor’s 500.

    ST Spanning Tree.

    VaR Value at Risk.

    3

  • LIST OF TABLES

    List of Tables

    1 HS 1-day VaR for S&P500 with different values of α and differentlength of observation windows. 36

    2 MCS 1-day VaR for S&P500 with different values of α and dif-ferent numbers of simulations. 40

    3 Correlation matrix for the five stocks. 46

    4 Distance matrix for the five stocks. 46

    5 New distance matrix including one cluster of two stocks and 3individual stocks. 46

    6 New distance matrix including two clusters of two stocks and 1individual stock. 47

    7 Distance matrix for one cluster of three stocks and one cluster oftwo stocks. 47

    8 Presentation of how many percent of the log return series and PCsthat experience time dependent observations and volatility clus-tering. 59

    9 Presentation of how many log returns series, standard PCs andEWMA PCs that are modeled with what type of model. 60

    10 MCS 1-day VaRα with different values of α and different num-bers of simulations. 63

    11 HS results for different windows of observations. 64

    12 Standard PCA and EWMA PCA results. 64

    13 CPCA results for cluster sizes of 50, 25, 10 and 5. 65

    14 Backtesting statistics for HS with observation windows of 252,500 and 1000, and MCS with 10000 simulations and differentcopula parameters. 67

    15 Backtesting statistics for Standard PCA, EWMA PCA and CPCAwith one PC and different numbers of clusters. 70

    16 Computational time for MCS and CPCA with different clustersizes and 10000 simulations. 72

    17 Backtesting statistics for standard and EWMA PCA for all ε∗. 81

    4

  • LIST OF TABLES

    18 CPCA backtesting results for 50 clusters and all ε∗. 82

    19 CPCA backtesting results for 25 clusters and all ε∗. 83

    20 CPCA backtesting results for 10 clusters and all ε∗. 84

    21 CPCA backtesting results for 5 clusters and all ε∗. 85

    5

  • LIST OF FIGURES

    List of Figures

    1 Illustration of VaR0.05 which is represented by the 5% quantile ofthe P&L distribution. 15

    2 Illustration of SP500’s closing prices between January 2000 andDecember 2016. 17

    3 Illustration of SP500’s log returns between January 2000 and De-cember 2016. 18

    4 Illustration of randomly drawn observations from a log normaldistribution and normal distribution fitted to historical S&P500log returns. 19

    5 Illustration of the autocorrelation of S&P500’s log returns and ab-solute log returns with lags between 1 and 150 days. 20

    6 16 year simulation of S&P500’s returns using GARCH(1,1) andARMA(1,1)-GARCH(1,1) 24

    7 Seadrill’s log returns between 2008-02-07 and 2017-12-29. 25

    8 P&L distribution of S&P500 obtained from HS with 252 (a), 500(b) and 1000 days (c) as observation window. The red lines repre-sent the 1-day VaR0.01. 35

    9 Distribution of S&P500 log returns. 39

    10 P&L distribution of S&P500 obtained from the MCS method with1000 (a), 5000 (b), 10000 (c) and 100000 simulations (d). The redline represent the 1-day VaR0.01. 39

    11 Illustration of historical closing prices. 44

    12 ST of the five stocks. 47

    13 Dendrogram of the five stocks 48

    14 Illustration of HS, MC, CPCA and PCA methods for calculatingVaRα . 50

    15 Portfolio log returns between 2008-02-07 and 2017-12-29. 57

    16 Illustration of the three first PCs achieved by standard PCA andEWMA PCA. 58

    17 Cumulative variance explained by PCs. 58

    6

  • LIST OF FIGURES

    18 The number of equities within each clusters for 5 (a), 10 (b), 25(c) and 50 clusters (d). 61

    19 Backtesting of HS with observation windows of 252 (a), 500 (b)and 1000 days (c). The blue, orange and yellow lines indicatesVaR0.01, VaR0.05 and VaR0.1 respectively and the red dots illus-trates where a loss greater than VaR0.01 has occurred in the back-testing. 68

    20 Backtesting of MCS with 1000 days of estimation window anddifferent copulas. The blue, orange and yellow lines indicatesVaR0.01, VaR0.05 and VaR0.1 respectively and the red dots illus-trates where a loss greater than VaR0.01 has occurred in the back-testing. 69

    21 Backtesting of Standard PCA (a) and EWMA PCA (b) with es-timation window of 1000 days. The blue, orange and yellowlines indicates VaR0.01, VaR0.05 and VaR0.1 respectively and thered dots illustrates where a loss greater than VaR0.01 has occurredin the backtesting. 71

    22 Backtesting of CPCA with estimation window of 1000 days andstandard correlation method. Results from 50 clusters are illus-trated in (a), 25 clusters in (b), 10 clusters in (c) and 5 clusters in(d). The blue, orange and yellow lines indicates VaR0.01, VaR0.05and VaR0.1 respectively and the red dots illustrates where a lossgreater than VaR0.01 has occurred in the backtesting. 71

    23 Illustration of best fitting models with multiple lags tested for. 86

    7

  • 1 INTRODUCTION

    1 Introduction

    1.1 Background

    Risk management practices are used all over the world and especially within thefinancial markets. Authorities and regulators demand that parties within this sec-tor apply methods for quantifying their risks so that they can hold capital to coverthem. However, as history has shown, regulations seems to not be sufficient meth-ods for preventing crises. We have experienced new crises even with stricter de-mands and it seems like regulators are always lagging behind. In other words,even though regulations are not enough to prevent a new crisis, they work as aninsurance that past crises will not repeat themselves.

    The biggest regulator within the banking industry is the Basel Committee. It wasfounded in the aftermath of some serious problems regarding the internationalcurrency market with the purpose of enhancing the financial stability around theworld. Today, the Basel Committee consists of 45 institutions from 28 jurisdic-tions and are most famous for its regulations Basel I, Basel II and Basel III (Bankof International Settlements, 2016).

    One of the most common ways of quantifying a financial institution’s risks isthrough the measure Value at Risk (VaR). The Basel framework requires banks touse this measure and even though the explicit method used for calculations mightdiffer, they all have in common to estimate the maximum potential future lossbased on a specified probability, over a preset time period.

    1.1.1 Cinnober Financial Technology

    The study is completed at Cinnober Financial Technology (Cinnober) which isa software provider within the finance sector that develops systems for clearinghouses, exchanges and banks, amongst others. In particular, trading, clearing andrisk management systems are focal points and the company is an internationalplayer with customers all over the world.

    The financial markets are continuously changing, making the requirements on thesystems used in the industry extremely high, both in terms of speed and accuracy.New regulatory requirements come in to play each year, meaning that Cinnoberneeds to be able to quickly adapt their systems’ functionality to new market con-ditions with more complex algorithms and faster calculations.

    One functionality that Cinnober implements in the majority of their system is thecalculation of VaR. It is therefore of great importance for them to stay updated onnewly developed calculation methods which they can offer customers.

    8

  • 1 INTRODUCTION

    1.1.2 Value at Risk

    VaR is perhaps the most common risk measure for banks, it is used in the everyday practice and works as a foundation for capital requirements. VaR is also usedby other financial institutions such as clearing houses and exchanges to calculatemargin requirements needed by their members in order for them to be allowed tohold their positions. In other words, VaR is used to find out how much margin themembers need to hold so that the clearing house/exchange can be certain that themembers can cover their potential losses.

    It is common to see VaR denoted as

    1 day-VaRα = $X

    or1 day-VaRα = Y%.

    In other words; "with a probability of (1−α) ·100%, the potential loss over oneday will not be greater than $X, or Y%."

    There are different ways of calculating VaRα but the ultimate goal is always to es-timate the potential future profit & loss (P&L) distribution and find the α-quantileof it. One of the most common methods to do so is through historical simulation(HS). This method is a quick way of calculating VaRα and an easy approach toexplain to management, but the result depend solely on historical data and it as-sumes that history will be relived. One can think of it as estimating the futurepotential P&L distribution under the empirical distribution of data.

    As an example, suppose a bank applies a 252 day window (average trading daysper year) to calculate its 1 day-VaR0.05 for a specific portfolio of assets (i.e. thebank wants to know the maximum 1 day-loss this portfolio can experience with aprobability of 95%). HS implies that we can either look at the P&L distributionof the portfolio’s daily returns for the past 252 days and extract the value thatrepresents the 5% quantile, or we can evaluate the 252 portfolio returns as losses(i.e. the positive returns will here be seen as negative losses) and extract thevalue that represents the 95% quantile. The latter method uses the loss-distributionrather than the P&L-distribution but both methods will carry the same result.

    The biggest drawback of HS is that the result might not be accurate enough dueto its dependence on historical observations. For example, if the market has beenupward trending over the past 252 days, the magnitude of the VaRα measure willbe low and thus not reflect the potential loss in case of a crisis.

    An alternative method is the Monte Carlo simulation (MCS) which involves thesimulation of an explicit parametric model for risk factor changes such as stockreturns. These changes are then applied to the portfolio to find the future poten-tial P&L distribution (or loss distribution). Important to note is that generally,

    9

  • 1 INTRODUCTION

    the number of simulations can be chosen to be much larger than the number ofobservations used in HS and thus, it is possible to obtain a more probable result.

    While MCS might seem preferable, it too has its weaknesses. For large portfolios,the computational cost can be extensive, as every simulation requires a full evalu-ation of the portfolio to compute the P&L. In fact, the dimensions of the problemincrease rapidly with the number of risk factors and simulations so with a moreextensive portfolio, comes a larger dimensional problem and the computationaltime might be too high to be useful.

    Another problem is the assumption of the returns’ distributions. In contrast toHS where we use the historical data to represent the distribution, we now haveto make an assumption about it. A common practice is to fit a distribution to thefinancial time series we are interested in simulating and then draw samples from itbut this cause another problem. Even though it is possible to tweak the distributionbased on personal assumptions, it might also be a tedious assignment. Making thewrong assumption about the distribution will heavily alter the final result.

    1.1.3 Risk Management in Practice

    Through history, financial disasters have arisen in both financial and non-financialfirms and authorities all over the world have increased the regulatory require-ments for various forms of risk management in attempts to prevent this. To em-phasize the need for continuously updated regulations, Pyle (1999) states in hisarticle about risk management that "Financial misadventures are hardly a newphenomenon, but the rapidity with which economic entities can get into troubleis". It might therefore be more important for regulators to focus on preventing afinancial crisis rather than fixing it when it already has happened as by then, it’soften too late to react.

    An example of fiscal irresponsibility that could have been avoided with commonrisk management practices is the Orange County’s bankruptcy. In 1994, the man-ager of the Orange County Investment Pool used the power of leverage in interestrate derivatives when trying to create excess returns for the county’s schools, citiesand districts. What the manager didn’t account for was the interest rate risk thataccompanies such investments but rather focused the entire fund’s capital on hisown speculations; that interest rates would keep falling (Jorion, 2011). $1.6 bil-lion was lost due to the Fed’s unexpected increments of interest rates that yearand the county have yet not been able to repay the bonds they had to use to avoidbankruptcy (Castillo, 2017). While this is just one example, it illustrates the needfor financial firms to control their risk and maybe more so, the need for authori-ties to ensure that financial institutions actually manage their risks. Jorion (2011)claims that if the county would have used VaR as a risk measure to hold capital,they could easily have avoided the severe impact of these events.

    10

  • 1 INTRODUCTION

    1.1.4 Risks

    For a financial institution such as a bank, there are multiple risk factors to ac-count for when quantifying risk exposure. Three main risks should be of concern:Market risk, operational risk and credit risk. According to European BankingAuthority (2017), the risks are defined as:

    • Market risk is the risk of losses in on and off-balance sheet positions arisingfrom adverse movements in market prices.

    • Operational risk is the risk of losses stemming from inadequate or failed in-ternal processes, people and systems or from external events, such as fraud,security, legal situations etc.

    • Credit risk is measured in respect to the bank’s activities, excluding thetrading book business.

    Together, market risk, operational risk and credit risk cover a big portion of thebank’s risk exposure and quantification methods for each one are required by theBasel Committee to be in place.

    This thesis will be focusing on market risk which stems from the market positionsthat the bank holds. Thus, risk factors such as equity price changes are of biginterest.

    1.2 Problem Statement

    Banks and other financial institutions using VaR face a big problem regardingcalculating the measure. On the one hand, they often need to calculate the riskmeasure multiple times per day which require speed, and on the other, they wantthe calculations to be accurate, which require time. Therefore, the problem boilsdown to investigating:

    • How different techniques of calculating VaR differ in terms of time andaccuracy

    as well as resolving:

    • Which method/methods, if any, provide acceptable computational time andaccuracy?

    11

  • 1 INTRODUCTION

    1.3 Purpose

    The project aims to investigate different methods of quantifying VaR in terms ofaccuracy and speed. The goal is to find out if there are any preferable method-/methods for a financial institution to use in their daily calculations of VaR whichare both accurate and not too computationally demanding.

    1.4 Delimitations

    Regarding the main risks; market risk, operational risk and credit risk, this studywill only investigate market risk.

    Due to time constraints, the study will only investigate portfolios including equi-ties. Portfolios of other types of instruments could be the subject of a sequentialstudy.

    Equities that have low liquidity cause problems when trying to fit a conditionalvariance model and are in real-life often handled separately. Fitting the condi-tional variance model is a vital part when modeling returns and therefore, theseequities are excluded from the study.

    1.5 Data Description

    The study includes 2520 equities that are traded in any of the US exchanges. Theoriginal data set contained approximately 5000 equities but due to low liquidity,nearly half were excluded. Each equity consists of almost 10 years of observa-tions, from 2008-02-07 to 2017-12-29, which amounts to 2493 closing prices. Alldata has been downloaded from Yahoo Finance.

    1.6 Approach & Outline

    In this thesis, different methods of calculating VaR will be investigated and imple-mented on a high-dimensional equity portfolio using the software Matlab 2017b.The approach is to investigate the two most conventional methods, historical sim-ulation and Monte Carlo simulation (as well as different variations of them), interms of both accuracy and computational time. Subsequently, an attempt to im-prove these methods will be implemented with the aim to reduce the computa-tional time of the Monte Carlo method while still upholding accuracy. Here, astatistical technique called principal component analysis will be applied in orderto reduce the high-dimensional problem to a lower one and thus, hopefully alsoreduce the computational time.

    The thesis is structured as follows: In Chapter 2, important theories will be pro-vided so the reader can establish some knowledge of the area. The chapter in-

    12

  • 1 INTRODUCTION

    cludes theories regarding risks, returns, VaR, time series analysis, simulation ap-proaches as well as dimension reduction techniques. In Chapter 3, an explanationof the model implementation will be established together with a review of nec-essary steps to complete the simulations and evaluate the models. In Chapter 4,a presentation and visualization of the results will be provided to the reader andfinally, in Chapter 5, a discussion regarding the results will be in place followedby some recommendations for further studies.

    13

  • 2 THEORY

    2 Theory

    2.1 Risk Measures

    A market risk measure is used to quantify the uncertainty about the future valueof a portfolio, i.e. it is related to randomness and uncertainty of the risk factorsthat are affecting a portfolio. The fundamentals of handling risk is essentially toestimate the possible deviation from an expected value and in today’s financialinstitutions it is of interest to both measure and manage these risk exposures.

    Some types of financial institutions are particularly exposed to market risk dueto their exposures to the financial markets. Banks for example, hold positions invarious types of instruments and to be able to understand and handle the risks thatfollows, they utilize the practice of modeling.

    2.1.1 Coherent Risk Measure

    A risk measure is a single number which summarize the uncertainty of an out-come. In finance, a common risk measure is volatility but there are several otherssuch as VaR or Expected shortfall (ES) that are used in daily practices. To definea fair risk measure, Artzner et al. (1999) provides a definition of what’s called acoherent risk measure. This illustrates properties that a good risk measure mighthave.

    Definition 2.1 (Coherent risk measure)Let G denote a vector space of random variables representing portfolio valuesat a fixed future time. Furthermore, let X and Y be random variables denotinga set of future net worths of an investment. A coherent risk measure is a func-tion ρ : G→ R that satisfies the following axioms:

    MonotonicityFor all X ,Y ∈ G, with X ≤ Y :

    ρ(X)≤ ρ(Y ).

    Sub-additivityIf X ,Y ∈ G, then

    ρ(X +Y )≤ ρ(X)+ρ(Y ).

    HomogeneityFor all λ ≥ 0 and all X ∈ G :

    ρ(λX) = λρ(X).

    14

  • 2 THEORY

    Translation invarianceFor all X ∈ G and all real numbers α , we have:

    ρ(X +α · r) = ρ(X)−α

    where r is the total gain of a risk-free investment.

    2.1.2 Value at Risk

    VaR measures a portfolio’s exposure to certain set of risk factors. The measurecan be interpreted as the loss that to a specified probability will not be exceededif the current portfolio is held over some period of time.

    VaR has two basic parameters (Alexander, 2009c):

    • Significance level of α ∈ (0,1).

    • The time horizon, denoted h, which is the period of time, traditionally mea-sured in trading days rather than calender days, over which the VaR is mea-sured.

    VaR is a quantile based method, meaning that it represents the α quantile of adistribution, in this case the P&L distribution. Figure 1 illustrates VaR0.05 for ahypothetical portfolio where the value is obtained by extracting the 5% quantilefrom the distribution. The red line illustrates where we can find the value thatrepresents this quantile. Since VaR is measured as a loss and the α quantile rep-resents a negative value, we will interpret the VaR measure as the negative valueof the α quantile.

    Figure 1: Illustration of VaR0.05 which is represented by the 5% quantile of the P&L distribution.

    15

  • 2 THEORY

    Definition 2.2 (Value at Risk)Value at Risk at significance level α for a portfolio with value at some futuretime described by the random variable X is defined as:

    VaRα(X) = min{m ∈ R : P(m · r+X < 0)≤ α} (1)

    where r is the total return of a risk-free asset.

    If X is assumed to have right continuous and increasing cumulative distri-bution function (CDF) F(·), it also follows from Equation 1 that

    VaRα(X) = min{m ∈ R : P(−X > m · r)≤ α}=−F−1X (α)= min{m ∈ R : P(L≤ m)≥ 1−α}= F−1L (1−α)

    where L = −Xr is the discounted loss.

    In Figure 1, note that VaR does not take the most extreme losses into account,i.e. the distribution below the α level is not included. This can cause big prob-lems, especially when the portfolio returns are not normally distributed, but ratherhave distributions with fat tails. A risk measure like this could be exploited byso called shadow-traders through "hiding" their risky investments by making thelosses more extreme and thus, missed by the VaR measure.

    Since VaR does not account for the tail loss, it is easy to find examples where itdoes not fulfill the sub-additivity condition from Definition 2.1 and thus, is nota coherent risk measure (Alexander, 2009c). However, the measure is intuitiveto work with and provides a good quantification of a major part of the risk. Tocomplement VaR with a risk measure that accounts for the tail loss, banks of-ten incorporates the calculation of ES, which is the expected loss for a portfolio,given that the VaR is exceeded. The reader can find out more about ES in Hultet al. (2012).

    16

  • 2 THEORY

    2.2 Financial Time Series

    When simulating VaR, we are interested in forecasting the future of our equitiesin the portfolio. A common way of doing so is through fitting a mathematicalmodel to the equity’s financial time series which accurately can describe the timeseries’ characteristics. A financial time series can be presented in terms of differ-ent units and for an equity, it is common to see it in terms of prices in certain timefrequencies. As an example, Figure 2 illustrates a univariate financial time se-ries representing the S&P500 index’s prices between January 2000 and December2016.

    2000 2002 2005 2007 2010 2012 2015 2017

    Year

    600

    800

    1000

    1200

    1400

    1600

    1800

    2000

    2200

    2400

    Valu

    e

    S&P500

    Figure 2: Illustration of SP500’s closing prices between January 2000 and December 2016.

    2.2.1 Asset Returns

    Many studies that include financial time series use returns rather than prices. If Ptdenotes the price of an asset at time t, we can describe the discrete return betweentime period t − 1 and t as: PtPt−1 − 1. Tsay (2010) proposes two main reasonsfor using returns instead of prices. First, for the average investor, asset returnsis a complete and scale-free summary of the investment opportunity and second,return series are easier to handle than price series because the former have moreattractive statistical properties.

    There are more than one type of return, but we will be considering the contin-uously compounded return, or log return, since it is simple to work with whenapplying continuous stochastic models and it approximates the discrete return ac-

    17

  • 2 THEORY

    curately for short time periods (e.g. daily returns).

    Definition 2.3 (Log return)Let Pt be the price of an asset at time t. The log return over the time intervalt−1 to t is then defined as:

    rt = ln(

    PtPt−1

    ). (2)

    Figure 3 presents the log return series for the S&P500 index over the same timeperiod as Figure 2. If we try to fit a mathematical model to the time series of logreturns there are some important characteristics which we have to account for, oneof them is volatility clustering (Cont, 2007).

    2000 2002 2005 2007 2010 2012 2015 2017

    Year

    -0.1

    -0.05

    0

    0.05

    0.1

    0.15

    log r

    etu

    rn

    S&P500

    Figure 3: Illustration of SP500’s log returns between January 2000 and December 2016.

    2.2.2 Volatility clustering

    Non-randomness in a time series is important to account for when we want tosimulate thousands of new hypothetical risk factor changes. Volatility clusteringis one of these characteristics and it can be observed in Figure 3. By inspection,we can find clusters of low and high volatility which means that there are timeperiods that are experiencing multiple sequential days of large price movementsas well as periods with low price movements.

    18

  • 2 THEORY

    When modeling the future returns of a financial time series, it is important to ac-count for the volatility clusters rather than simulating random numbers from a dis-tribution. Figure 4 presents an excellent visualization of how wrong this approachcan be. Here, we have random drawings from a fitted log-normal distibution aswell as from a fitted normal distribution. Comparing them to Figure 3, one mightnotice that there is no volatility clustering present in either of the distributions inFigure 4. In other words, we have not accounted for the non-randomness that thetime series experience.

    2000 2002 2005 2007 2010 2012 2015 2017-0.2

    -0.1

    0

    0.1

    0.2

    log r

    etu

    rn

    log normal distribution

    2000 2002 2005 2007 2010 2012 2015 2017-0.05

    0

    0.05

    log r

    etu

    rn

    normal distribution

    Figure 4: Illustration of randomly drawn observations from a log normal distribution and normaldistribution fitted to historical S&P500 log returns.

    2.2.3 Autocorrelation

    A common way to describe volatility clustering is through the autocorrelationfunction. Here we denote the log return of an asset over a time period ∆t as rt .Observations are sampled at discrete times tn = n∆t and time lags will be denotedby k. If we let ∆t = 1 day for simplicity, we can describe the correlation betweenthe daily returns at period t and the daily return at t + k as Corr(rt+k,rt).

    19

  • 2 THEORY

    Definition 2.4 (Autocorrelation function)Given the observations r1,r2, ...,rN at time t1, t2, ..., tN and the lag k, the auto-correlation function is defined as

    ρk =∑N−ki=1 (ri− r)(ri+k− r)

    ∑Ni=1(ri− r)2

    where r denotes the mean of the returns.

    The autocorrelation function is used to find non-randomness in data in the formof volatility clustering and apply that knowledge to find an appropriate modelfor simulation. Figure 5 illustrates the autocorrelation with lags k = 1,2, ...,150for the S&P500 time series illustrated in Figure 3. While there is no clear auto-correlation for the log returns, there is a pattern for the absolute log returns thatconverges to zero as the lag k increases. Thus, we can conclude that there is a non-randomness in the financial data, a return of a specific magnitude (either positiveor negative) seems to be followed by similar return magnitudes (either positiveor negative) the next day. Consequently, we can likely predict the magnitude oftomorrows return, but not the direction.

    -0.1

    0

    0.1

    Auto

    corr

    ela

    tion

    Log returns

    1 20 40 60 80 100 120 140

    Lag

    0

    0.2

    0.4

    Auto

    corr

    ela

    tion

    Absolute log returns

    1 20 40 60 80 100 120 140

    Lag

    Figure 5: Illustration of the autocorrelation of S&P500’s log returns and absolute log returns withlags between 1 and 150 days.

    2.2.4 EWMA

    A method which is commonly used for forecasting the volatility of financial timeseries is the exponentially weighted moving average (EWMA) model. This modelutilizes the concept of exponential smoothing, meaning that more weight is put on

    20

  • 2 THEORY

    the most recently observed volatilities.

    Definition 2.5 (EWMA)Consider the univariate financial time series X with an observation window

    WE which describes the number of time periods included in our time series.The EWMA model for forecasting the variance is then described as:

    σ2t =1−λ

    λ (1−λWE )

    WE

    ∑i=1

    λ ix2t−i (3)

    where σ2t is the forecasted variance, 0 < λ < 1 is the chosen weight and x areobserved values in X.

    McNeil, Rüdiger, and Embrechts (2015) shows that if WE is large, we can use anapproximation of Equation 3:

    σ2t =1−λ

    λ (1−λWE )

    WE

    ∑i=1

    λ ix2t−i ≈1−λ

    λ

    ∑i=1

    λ ix2t−i

    and then, they show that:

    σ2t = (1−λ )x2t−1 +λσ2t−1. (4)

    To describe Equation 4 more intuitively, we call (1−λ ) the coefficient of reac-tion and λ the coefficient of endurance. The former determines how much theforecasted variance σ2t should react to the new information x2t−1 while the latterdetermines how enduring the variance is. A common value for λ is 0.94 for dailydata (Zumbach, 2007).

    The EWMA model is easy to implement and accounts for the problem of volatilityclustering. However, a substantial drawback with EWMA is that it only includesone parameter, λ . Therefore, in every EWMA model, the sum of the coeffiecientof reaction and the coefficient of endurance always equals one. Due to the com-plexity of some time series, we might need a more flexible model to describe itaccurately and a commonly used substitute model is the GARCH model, whichwill be described in the next section.

    2.2.5 ARMA-GARCH

    Tsay (2010) states that to describe the dynamic structure of observed data, wemight need a high-ordered model including multiple parameters. In fact, apartfrom volatility clusters, daily log returns are often also time dependent, which isanother non-random characteristic that must be accounted for when simulating.

    21

  • 2 THEORY

    To account for seasonality, trends and dependencies we can fit an autoregressivemoving-average (ARMA) model and for volatility clustering we can fit a gen-eralized autoregressive conditional heteroskedasticity (GARCH) model. Below,the reader can find descriptions of each model separately before a combination ofthem is presented.

    First of all, it is important to understand how we apply these models to a timeseries. The first step is to understand how we describe a financial time series ofreturns, rt , as a process:

    rt = µt + εt (5)

    where rt is the log return at time t, µt is the expected value of conditional rt andεt is a white noise or error term.

    The ARMA(m,n) model is a combination of an autoregressive model (l.h.s ofEquation 6) and a moving average model (r.h.s of Equation 6) and we use it toforecast rt in a more accurate manner when the time series is time-dependent.Formally Tsay (2010) define it as:

    Definition 2.6 (ARMA(m,n) process)The ARMA(m,n) process can be written as

    rt−m

    ∑i=1

    airt−i = c+ εt +n

    ∑j=1

    b jεt− j (6)

    which gives

    rt = c+m

    ∑i=1

    airt−i + εt +n

    ∑j=1

    b jεt− j (7)

    where m and n represent how many time lags we include in the model. c,{a1, ...,am} and {b1, ...,bn} are ARMA parameters and the random variable εtis white noise.

    Basically, the ARMA model is used to specify the conditional mean of the returnprocess:

    E(rt |Ft−1) = c+m

    ∑i=1

    airt−i +n

    ∑j=1

    b jεt− j

    where Ft−1 is the information about the time series obtained until t−1.

    With a model for the conditional mean specified, one is needed for the conditionalvariance as well. Engle (1982) introduced the autoregressive conditional het-eroscedasicity (ARCH) model in 1982 which in 1986 got generalized (GARCH)by Tim Bollerslev (Bollerslev, 1986). The model is used for volatility modelingpurposes and is defined by:

    22

  • 2 THEORY

    Definition 2.7 (GARCH process)Let εt = rt − µt be the error at time t for a process rt . Then εt follows a

    GARCH(p,q) process if it satisfies:

    εt = σtzt , σ2t = ω +p

    ∑i=1

    αiε2t−i +q

    ∑j=1

    β jσ2t− j

    where p and q represent how many time lags we include in the model. zt arei.i.d. random variables with mean 0 and unit variance, ω > 0, {α1≥ 0, ...,αp≥0} and {β1≥ 0, ...,βq≥ 0} are GARCH parameters and ∑pi=1 αi+∑

    qj=1 β j < 1.

    In the GARCH model, α describes to what degree the last period’s volatility shockshould affect next period’s volatility and β to what degree the last volatility shouldaffect next period’s volatility. ω is a constant which assures that the volatilitynever equals zero and Alexander (2009b) suggests using Maximum Likelihood(ML) to estimate these parameters, which will be applied in this thesis.

    Finally, a model that accurately describes a time series which experience both timedependencies and volatility clustering is obtained and can be applied for simula-tion purposes. The model is called ARMA-GARCH and is defined as:

    Definition 2.8 (ARMA(m,n)-GARCH(p,q) process)For time t, the ARMA(m,n)-GARCH(p,q) process is defined through:

    rt = c+m

    ∑i=1

    airt−i + εt +n

    ∑j=1

    b jεt− j (8)

    εt = σtzt , z∼ i.i.d.(0,1) (9)

    σ2t = ω +p

    ∑i=1

    αiε2t−i +q

    ∑j=1

    β jσ2t− j (10)

    GARCH(1,1) models were among the first models to take the volatility clusteringinto account. Here, the variance, σ2t , only depends on the last period’s shock andvariance (Cont, 2007):

    σ2t = ω +αε2t−1 +βσ

    2t−1,

    which leads to a positive autocorrelation in the volatility process σ , with a rate ofdecay described by α +β . For illustration purposes, Example 2.1 provides resultsfrom forecasting with GARCH(1,1) and ARMA(1,1)-GARCH(1,1) which shouldbe compared to the results with just random drawings in Figure 4.

    23

  • 2 THEORY

    Example 2.1 (GARCH(1,1) & ARMA(1,1)-GARCH(1,1))If we fit a GARCH(1,1) model to the S&P500 log returns in Figure 3 and

    simulate new returns we obtain a financial time series which take volatilityclustering in to account. The result of this is illustrated in Figure 6.

    For illustration purposes, Figure 6 also presents a fitted ARMA(1,1)-GARCH(1,1)model which not only accounts for volatility clustering, but also time depen-dencies.

    2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036-0.05

    0

    0.05

    0.1

    log r

    etu

    rn

    GARCH(1,1)

    2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036-0.1

    -0.05

    0

    0.05

    0.1

    log r

    etu

    rn

    ARMA(1,1)-GARCH(1,1)

    Figure 6: 16 year simulation of S&P500’s returns using GARCH(1,1) and ARMA(1,1)-GARCH(1,1)

    As observed in Figure 6, these simulations show a similar pattern to the ob-served returns in Figure 3, especially compared to the simulations in Figure4.

    2.2.6 IGARCH

    The constraint of ∑pi=1 αi +∑qj=1 β j < 1 in Definition 2.7 ensures that the finan-

    cial time series can be described by a stationary process such as the GARCHmodel (Alexander, 2009a). However, the constraint is not always fulfilled, so Tsay(2010) describes a model where ∑pi=1 αi +∑

    qj=1 β j = 1. Such a process is referred

    to as a GARCH process with a unit-root or an integrated GARCH (IGARCH)process.

    24

  • 2 THEORY

    Definition 2.9 (IGARCH(p,q) process)Let εt = rt − µt be the error at time t for a process rt . Then εt follows an

    IGARCH(p,q) process if the fitted GARCH parameters satisfies:

    p

    ∑i=1

    αi +q

    ∑j=1

    β j = 1. (11)

    In that case, we can model the time series as:

    εt = σtzt , σ2t = ω +p

    ∑i=1

    (1−βi)ε2t−i +q

    ∑j=1

    β jσ2t− j

    where p, and q represent how many time lags we include in teh model. zt arei.i.d. random variables with mean 0 and unit variance, ω ≥ 0 , {β1≥ 0, ...,βq≥0} are GARCH parameters.

    For an insight into how a time series looks like where the errors follow an IGARCH(1,1)model, Figure 7 present such a case. The figure illustrates the log returns of Sead-rill’s stock (SDRL) over the time period 2008-02-07 to 2017-12-29.

    2008-02-07 2010-01-04 2012-01-03 2014-01-02 2016-01-04 2017-12-29

    Date

    -1

    -0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    Log-r

    etu

    rn

    IGARCH(1,1)-process

    SDRL log returns

    Figure 7: Seadrill’s log returns between 2008-02-07 and 2017-12-29.

    2.2.7 Model verification

    When presented with a financial time series it is important to know if the observa-tions are time-dependent, if they experience volatility clustering or perhaps both,before fitting a model. It would be unnecessary, time consuming work, to fit amore complex model (e.g. ARMA-GARCH rather than only GARCH) to a timeseries if it is not needed.

    Two common tests for time dependencies and volatility clustering are the Ljung-Box test (Ljung and Box, 1978) and Engle’s ARCH-test (Tsay, 2010).

    25

  • 2 THEORY

    Definition 2.10 (Ljung Box Test)The Ljung Box statistic Q is defined as:

    Q = n(n+2)h

    ∑k=1

    ρ2kn− k

    where n is the sample size, ρk is the sample autocorrelation at lag k and h isthe number of lags tested.

    The Ljung Box test is a hypothesis driven test where:

    H0: The data is i.i.d.H1: The data is not i.i.d but rather exhibit time dependencies.

    For a significance level α , we reject H0 when:

    Q > χ21−α,h

    where χ21−α,h is the (1−α)-quantile of the chi-sqaured distribution with hdegrees of freedom.

    Definition 2.11 (Engle’s ARCH-test)Consider a time series rt = µt + εt where εt are the residuals. Engle’s ARCH-test is then performed through the following steps:

    • Estimate the best fitting AR(p)-model by:

    rt = c+p

    ∑i=1

    airt−i + εt .

    • Obtain the least squares of the error, ε̂2, and regress them on a constantand p lagged values:

    ε̂2t = α̂0 +p

    ∑i=1

    α̂iε̂2t−i

    where p is the number of lags.

    • In the sample of T residuals, the test statistic T R2 follows an χ21−α,pdistribution with p degrees of freedom.

    Engle’s ARCH-test is a hypothesis driven test where:

    H0: Abscense of volatility clustering in time series.H1: Volatility clustering is present in time series.

    and we reject null hypothesis if: T R2 > χ21−α,p.

    26

  • 2 THEORY

    By applying the Ljung Box test and Engle’s ARCH-test, one gets a greater under-standing of what type of model that should be fitted to each time series. The testsdo not, however, give any information regarding the number of lags m,n, p,q thatshould be included in the model, this must also be tested for.

    According to Akaike (1998), a model should be evaluated based on the resultswhen used in a prediction which depends on the estimated parameters. Akaikesuggests a method for evaluating the model based on the concept of closeness be-tween the generic distribution defined by the model and the true distribution, theAIC criterion, which is commonly used. However, Javed and Mantalos (2013)shows that the AIC criterion is not consistent. In fact, AIC will to a high probabil-ity lead to a false result when dealing with a large amount of data. Schwarz (1978)developed an alternative criterion that is more consistent, the BIC criterion, whichis also suggested by Alexander (2009b).

    Thus, to find the appropriate set of lags for m,n, p,q, i.e. which combination thatprovides the best model fit for each time series, the BIC-criterion is applied foreach set of chosen lags m,n, p,q and the model with lowest BIC-criterion will beused.

    Definition 2.12 (BIC-criterion)The BIC-criterion is defined as

    BIC =−2lnL+ k lnT (12)

    where L is the maximized value of the likelihood function, k is the number offree parameters used in the model and T is the number of observations.

    In other words, the BIC criterion is the goodness-of-fit measure of an estimatedstatistical model. It consists of two parts and a compromise takes place betweenthe maximized log likelihood (the lack of fit component) and kT (the penalty com-ponent) which increases with the number of parameters and observations (preventsoverfitting) (Javed and Mantalos, 2013).

    2.3 Multivariate Time Series

    The previous sections mostly focused on univariate time series. The reality is thata financial institution often hold multi-asset portfolios and it is therefore importantto consider multivariate time series.

    When working with multivariate returns, it is important that one consider the jointdistribution and not the marginal distributions for each unique time series. In otherwords, it is important to consider the dependence structure for the time seriesrather than just applying mathematical models tp the marginal distributions.

    27

  • 2 THEORY

    2.3.1 Dependence

    Two random variables are independent if any information on either variable doesnot affect the distribution of the other random variable (Meucci, 2005). To intro-duce the concept of dependence, consider the concept of conditional distributions:

    Definition 2.13 (Conditional distribution in 2-dim)Consider an N-dimensional random variable X and split it in two subsets:the k-dimensional random variable XA of the first k entries and the (N− k)-dimensional random variable XB of the remaining entries, X can be expressedas:

    X =(

    XAXB

    ).

    The conditional distribution of the variable XB given xA is the distribution ofXB knowing that the realization of XA is the specific value xA. The conditionalrandom variable is denoted as XB|xA.

    Formally, we denote the conditional probability density function (PDF) as:

    fXB|xA(xB) =fX(xA,xB)∫

    fX(xA,xB)dxB=

    fX(xA,xB)fXA(xA)

    .

    In other words, we define the conditional PDF of XB given knowledge of XA asthe joint PDF of XA and XB divided by the marginal PDF of XA evaluated at theobserved xA.

    To describe how different financial time series behave over time in terms of eachother it is common to use correlation or covariance as measures. Covariance is themultivariate form of variance and the covariance matrix is defined as:

    Definition 2.14 (Covariance Matrix)Consider the multivariat financial time series X. The covariance matrix of Xis then defined as:

    Cov(X) = Σ = E((X−E(X))(X−E(X))T )

    where E represents the expected value. Each element in Σ can be express as:

    σi, j =Cov(Xi,X j) = E(XiX j)−E(Xi)E(X j)

    which is the covariance between asset i and asset j.

    With the help of covariance, we can now also define a correlation matrix:

    28

  • 2 THEORY

    Definition 2.15 (Correlation matrix)Consider the multivariate financial time series of asset returns X with N

    time series. The correlation matrix of X is defined as ρ(X) by introducingthe standardized vector Y such that Yi = Xi√Var(Xi)

    for i = 1, ...,N and taking

    ρ(X) =Cov(Y).

    Each element in ρ(X), the correlation between the i:th and j:th asset in X,can be expressed as:

    ρi, j = ρ(Xi,X j) =Cov(Xi,X j)√

    Var(Xi)Var(X j). (13)

    When simulating multi-asset returns, one would not model these independently.It is important to account for the dependence between time series. Historically,some equities tend to move together while others might not, e.g. when the mar-ket is upward trending, some assets’ values increase in a similar fashion and itwould therefore not make sense to simulate each asset individually. Due to stockmovements, the dependence might change over time and the covariance matrix Σand the correlation matrix ρ(X) depend on the number of observations that areincluded in the time series.

    When using covariance or correlation for describing dependencies, it is some-times common practice to put larger weights on the most recent covariances andcorrelations. Just like in the univariate case, we can do so through the EWMAtechnique described in Definition 2.5. Equation 3 describes how to estimate theone-dimensional covariance, i.e. the variance, σ2t . In the multivariate case onewould instead like to estimate the covariance matrix Σt . It is easy to construct amultivariate model from the univariate by exchanging the single return x to thevector of returns xxx. By doing so, Equation 3 becomes:

    Σt =1−λ

    λ (1−λWE )

    WE

    ∑i=1

    λ ixxxt−ixxxTt−i. (14)

    2.3.2 Copula

    A reoccurring problem in financial time series analysis is finding a structure forthe dependence between multiple time series. In the previous section, theory re-garding covariance and correlation is provided, which are measure of dependence.However, as Alexander (2009b) emphasizes, these are measures of dependencethat can only represent a certain type of risk where each asset return must followan i.i.d. process and the joint distribution of the variables must be elliptical. In

    29

  • 2 THEORY

    practice, few assets and portfolios satisfy these conditions. Instead we must workwith the entire joint distribution of returns which is built through first specifyingthe marginal distributions for each time series, and then with the help of a copula,describing the dependence between the marginals.

    Copula theory was introduced by Sklar (1959) and is based on the fact that wewant to isolate the dependence structure from the structure of the marginal distri-butions. The advantage of this is that each marginal distribution may differ andthe copula can still describe the dependency between them. Basically, copulasare used to specify a joint distribution in a two-stage process. First we specifythe type of the marginal distributions and then we specify the copula distributionwhich "binds" the marginals together.

    Since the copula only specifies the dependence structure, different copulas pro-duce different joint distributions when applied to the same marginals. There aremultiple families of copulas which can be found in Alexander (2009b) or McNeil,Rüdiger, and Embrechts (2015). In this study, due to their common usage in thefinancial industry, both Gaussian and t-copula will be applied in the MCS method.

    Before defining a copula, recall the following statistical properties described byHult et al. (2012):

    Proposition 2.1Let F be a distribution function on R. Then:

    1. u≤ F(x) if and only if F−1(u)≤ x.

    2. If F is continuous, then F(F−1(u)) = u.

    3. (Quantile Transform) If U is U(0,1)-distributed, then P(F−1(U)≤ x) =F(x).

    4. (Probability transform) If X has a distribution function F , then F(X)∼U(0,1) if and only if F is continuous.

    30

  • 2 THEORY

    Now, the concept of copula is defined as:Definition 2.16 (Copula)

    An N-dimensional copula is the distribution function C of a random vector UUUwhose components Uk, k = 1,2, ...,N are uniformly distributed, i.e:

    C(u1, ...,uN) = P(U1 ≤ u1, ...,Ud ≤ uN)

    where (u1, ...,uN) ∈ (0,1)N .

    Now, let XXX = (X1, ...,XN) be a random vector with joint distribution

    F(x1, ...,xN) = P(X1 ≤ x1, ...,XN ≤ xN)

    and continuous marginal distribution functions

    Fk = P(Xk ≤ xk)

    for k = 1, ...,N.

    From Proposition 2.1 we know that the components of the vector

    UUU = (U1, ...,UN) = (F1(X1), ...,FN(XN))

    are uniformly distributed. More specifically, we can also say that the distribu-tion C of UUU is a copula which we will call the copula function of X since:

    C(F1(x1), ...,FN(xN)) = P(U1 ≤ F1(x1), ...,UN ≤ FN(xN))= P(F−11 (U1)≤ x1, ...,F

    −1N (UN)≤ xN)

    = F(x1,x2, ...,xN) (15)

    Equation 15 is the result from Sklar’s Theorem which explains the representationof the joint distribution F in terms of the copula C and the marginal distributionsFk (Ruschendorf, 2013).

    Like previously stated, we have different copula families, all of which will applya separate dependence structure to the marginals. Hence, it is of great importanceto choose a copula that represents the actual dependence which is not always atrivial task. Below follows definitions of two of the most commonly used copulasfor financial time series, the Gaussian and t-copula.

    31

  • 2 THEORY

    Definition 2.17 (Gaussian copula)The Gaussian copula is derived from the N-dimensional multivariate and uni-variate standard normal distribution functions, denoted ΦΦΦ and Φ, respectively.It is defined by:

    C(u1,u2, ...,uN ;ρ(XXX)) = ΦΦΦ(Φ−1(u1),Φ−1(u2), ...,Φ−1(un)),

    where ρ(XXX) is the correlation matrix.

    Definition 2.18 (t-copula)The N-dimensional symmetric t-copula is derived implicitly from a multivari-ate distribution function and defined by:

    Cv(u1,u2, ...,uN ;ρ(XXX)) = tttv(t−1v (u1), t−1v (u2), ..., t

    −1v (un)),

    where tttv and tv are the multivariate and univariate t-distributions with v degreesof freedom and ρ(XXX) is the correlation matrix.

    The correlation matrix ρ(XXX) is seen in Definition 2.17 and 2.18 and it is a crucialcomponent which takes account for the dependencies between the marginals. Thecorrelation matrix is used when finding the joint distribution and simulating fromit. The explicit formula for generating numbers from a copula can be found inAlexander (2009b).

    Like any marginal or joint distribution function, copulas have conditional distri-butions. For simplicity, Alexander (2009b) describes the conditional distributionfor a bivariate copula C(u1,u2), but this can of course be applied for multipledimensions as well.

    Let us consider the bivariate case where there are two conditional distributions ofthe copula defined as:

    C(u1|u2) = P(U1 ≤ u1|U2 = u2)

    andC(u2|u1) = P(U2 ≤ u2|U1 = u1).

    The conditional distribution is derived by taking the first derivatives of the copulawith respect to each variable:

    C(u1|u2) =∂C(u1,u2)

    ∂u2C(u2|u1) =

    ∂C(u1,u2)∂u1

    32

  • 2 THEORY

    2.3.3 Simulation with copula

    Using copulas for simulation purposes is a vital part of this thesis. The methodwill be applied to the benchmark method, MCS, to account for the dependenciesbetween the equities. Below follows a general description of simulation withcopula by Salvadori et al. (2007).

    Let F be a multivariate distribution with continuous marginals F1,F2, ...,FN , andsuppose that F can be expressed in a unique way via an N-copula, C, throughEquation 15. In order to simulate a vector (X1,X2, ...,XN) ∼ F , it is sufficientto simulate a vector (U1,U2, ...,UN) ∼C, where the random variables Ui are uni-formly distributed on (0,1). By Equation 15 and Proposition 2.1 we know:

    Ui = Fi(Xi)⇐⇒ Xi = F−1i (Ui),

    where i = 1,2, ...,N, the random variables Xi have the marginal distributions Fi,and joint distribution F . In general, to simulate values we apply the followingsteps:

    • To simulate a sample uk from Uk, conditional on the previously sampledu1,u2, ...,uk−1, we need to know the distribution of Uk conditional on theevents {U1 = u1,U2 = u2, ...,Uk−1 = uk−1}. Let us denote this law byGk(uk|u1,u2, ...,uk−1), given by:

    Gk(uk|u1,u2, ...,uk−1) = P(Uk ≤ uk|U1 = u1,U1 = u2, ...,Uk−1 = uk−1)

    =∂u1,u2,...,uk−1C(u1,u2, ...,uk,1, ...,1)

    ∂u1,u2,...,uk−1C(u1,u2, ...,uk−1,1, ...,1).

    Then we take uk = G−1k (u′k|u1,u2, ...,uk−1), where u′k is the realization of a

    random variable U ′k uniformly distributed and independent of U′1,U

    ′2, ...,U

    ′k−1.

    • Using the probability integral transform, it is easy to generate the sample(x1,x2, ...,xN) extracted from F :

    (x1,x2, ...,xN) = (F−11 (u1),F−12 (u2), ...,F

    −1N (uN)) (16)

    33

  • 2 THEORY

    2.4 VaR approaches

    There are different methods applied by financial institutions to calculate VaR. Inthe section below, two of the most common methods; Historical Simulation andMonte Carlo Simulation will be presented.

    2.4.1 Historical Simulation

    In the HS approach, few assumptions about the statistical distributions need to bemade. The approach involves using historical risk factor changes, or log returns,to construct a distribution of potential future portfolio P&Ls and then extractingVaRα as the loss that is exceeded only α ·100% of the time (Linsmeier and Pear-son, 2000).

    The distribution of the portfolio’s P&Ls is obtained by taking the current portfolioand applying actual historical changes of the assets in the portfolio over the pastN periods. This allows us to compute N hypothetical scenarios for our currentportfolio and we can extract the α ·N:th scenario of the ordered portfolio valuesas our -VaRα .

    Formally, Linsmeier and Pearson (2000) describes the process of HS in five steps:

    • Step 1 - Identify the basic risk factors for the current portfolio (in our case,this will be the stocks’ log returns).

    • Step 2 - Obtain historical values for the risk factors over the past N periods.

    • Step 3 - Subject the current assets of the portfolio to the historical risk factorchanges and calculate the daily P&Ls.

    • Step 4 - Order the N portfolio P&Ls from the largest loss to the largestprofit.

    • Step 5 - Extract the loss that is equaled or exceeded α ·100% of the time.

    HS is a non-parametric approach and only reasonable under the assumption thatthe market changes which historically has produced a certain set of returns are thesame as those changes that will be in the future. HS has advantages in the formof its simplicity in both comprehension and computation. However, the majordisadvantage is that it is based solely on the assumption that history will repeatitself. Results will vary depending on the history and the probability of an underor overestimation of extreme losses is probable.

    34

  • 2 THEORY

    Example 2.2 (Historical Simulation)For illustration purposes, we assume that a bank holds one long position in theunivariate S&P500 index at the first of December 2017 and wants to completea HS to obtain the 1-day VaRα with 252, 500 and 1000 days in the observationwindows at α-levels of 0.005, 0.01, 0.05 and 0.1. We follow the five stepprocess described above:

    • Step 1 - The risk factor is the daily log returns of the S&P500 index.

    • Step 2 - Historical prices are presented in Figure 2.Log returns are calcu-lated by applying Equation 2 to the values in the observation windows.

    • Step 3 - The current value of the position is $2238.8, apply the log re-turns obtained in Step 2 to the current portfolio and calculate the P&Ls.

    • Step 4 - Order the P&Ls from largest loss to the largest profit.

    • Step 5 - Obtain the loss that is exceeded α · 100% of the time, i.e theα ·N:th value in the ordered vector of P&Ls where N is the number ofobservations.

    Figure 8 illustrates the P&L distribution and the 1-day VaR0.01 obtained byHS. For each observation window, there is a specific P&L distribution andtherefore also different VaRα results.

    -100 -80 -60 -40 -20 0 20 40 60

    P&L

    0

    10

    20

    30

    40

    50

    60

    Observ

    ations

    P&L distribution from HS-252

    (a) 252 days HS

    -100 -80 -60 -40 -20 0 20 40 60 80 100

    P&L

    0

    20

    40

    60

    80

    100

    120

    Observ

    ations

    P&L distribution from HS-500

    (b) 500 days HS

    -100 -80 -60 -40 -20 0 20 40 60 80 100

    P&L

    0

    50

    100

    150

    200

    250

    Observ

    ations

    P&L distribution from HS-1000

    (c) 1000 days HS

    Figure 8: P&L distribution of S&P500 obtained from HS with 252 (a), 500 (b) and 1000 days(c) as observation window. The red lines represent the 1-day VaR0.01.

    35

  • 2 THEORY

    Table 1 present the 1-day VaRα for different values of α and observation win-dows and as we can see, there is variation in the results. For example, theVaR0.005 with 252 days of observations equals $81.9 while 1000 days of ob-servations equals $58.2. One can therefore conclude that the market in generalhas experienced more frequent large losses over the past 252 days comparedto the past 1000 days and thus, resulting in a higher VaR0.005.

    Table 1: HS 1-day VaR for S&P500 with different values of α and different length of obser-vation windows.

    α 252 days 500 days 1000 days

    0.005 $81.9 $72.5 $58.20.01 $55.6 $58.2 $52.00.05 $31.1 $32.8 $31.00.1 $21.0 $23.1 $20.8

    From Example 2.2, it should be clear that there are uncertainties involved withHS and the result heavily depends on the chosen α level as well as the size of theobservation window.

    2.4.2 Monte Carlo Simulation

    MCS has a few similarities to HS but differs in one major aspect: When applyingMCS, one choose a distribution that is believed to adequately capture or approxi-mate the possible risk factor changes rather than using the observed changes overthe last N periods.

    Formally, the definition of MCS is the average expectation of the function f (inour case, f = σz from Equation 9) evaluated at a random location (Russel, 1998):

    Definition 2.19 (Monte Carlo method)Consider an integral in the one-dimensional unit interval

    I[ f ] =∫ 1

    0f (x)dx = f̄ .

    Let x be a random variable that is uniformly distributed on the unit interval,then

    I[ f ] = E[ f (x)].

    36

  • 2 THEORY

    For an integral in the unit cube Id = [0,1]d in d dimensions we have:

    I[ f ] = E[ f (xxx)] =∫

    Idf (xxx)dxxx

    in which xxx is a uniformly distributed vector in the d-dimensional unit space.

    The Monte Carlo method is based on the probabilistic interpretation of anintegral. If we consider a sequence {x} sampled from the uniform distributionwe can approximate the expectation as:

    IN [ f ] =1N

    N

    ∑n=1

    f (xn)

    and according to the Strong Law of Large Numbers, this approximation isconvergent with probability of one, i.e:

    limN→∞

    IN [ f ]→ I[ f ].

    In MCS, a pseudo-random number generator is used to generate (often) multiplethousands of hypothetical risk factor changes which then are applied to the cur-rent positions to create hypothetical future portfolio P&Ls. Finally, the VaR isdetermined from extracting the α quantile of the obtained P&L distribution.

    Linsmeier and Pearson (2000) describes the process of MCS in five steps:

    • Step 1 - Identify the risk factors for the current portfolio (in our case, thiswill be the stocks’ returns).

    • Step 2 - Determine or assume a specific distribution for changes in the riskfactors and estimate the parameters of that distribution.

    • Step 3 - Use a pseudo-random number generator to generate N hypotheticalvalues of changes in the risk factors, where N is large. These hypotheticalchanges are then applied to the assets in the portfolio and used to calculate Nhypothetical portfolio values. Finally, from each of the hypothetical portfo-lio values, one subtracts the current portfolio value to obtain N hypotheticaldaily P&Ls.

    • Step 4 - Order the portfolio P&Ls from the largest loss to the largest profit.

    • Step 5 - Select the loss that is equaled or exceeded α ·100% of the time.

    The ability to pick the distribution is the feature that really distinguishes MCSfrom other approaches. Here, it is possible to choose a distribution based on own

    37

  • 2 THEORY

    preferences even though it is often based on observed data. The distribution canalso include tweaks in terms of personal assumptions of future market changes.

    The MCS is a very powerful and flexible technique used in many areas. One of theadvantages is that the method incorporates all desirable distributional propertiesthat one might want to include (e.g. fat tails). However, the result depends heavilyon the chosen distribution which will result in highly differentiated results. Fur-thermore, MCS is a complex method and for high-dimensional portfolios it is verycomputationally demanding.

    Example 2.3 (Monte Carlo simulation)For comparison purposes, we assume that a bank holds the same position as

    in Example 2.2 and wants to complete an MCS to obtain the 1-day VaRα with1000, 5000 and 10000 and 100000 simulations and α levels of 0.005, 0.01,0.05, 0.1. We follow the five step process above:

    • Step 1 - The risk factor is the daily return of the S&P500 index.

    • Step 2 - Historical prices are presented in Figure 2 and the log returnsare calculated by applying Equation 2 to the total time series. Fit adistribution to these log returns.

    • Step 3 - Generate 1000, 5000, 10000 and 100000 pseudo-random num-bers from the distribution to represent the potential future risk factorchanges. Apply these to the current portfolio to achieve the P&L distri-bution.

    • Step 4 - Order the P&Ls from largest loss to the largest profit.

    • Step 5 - Obtain the loss that is exceeded α · 100% of the time, i.e theα ·N:th value in the ordered P&Ls where N is the number of simulations.

    Figure 9 represents the distribution of S&P500’s log returns between January1st 2000 and December 1st 2016. It seems like the most appropriate distribu-tion that describe the returns is the student t-distribution. More precisely, thefitted distribution has a mean of 0.000483, standard deviation of 0.0074 and2.7293 degrees of freedom.

    38

  • 2 THEORY

    -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1

    log return

    0

    10

    20

    30

    40

    50

    De

    nsity

    S&P500

    log return

    Normal

    student t

    Figure 9: Distribution of S&P500 log returns.

    From the obtained t-distribution, we generate N pseudo-random numbers. Fig-ure 10 presents the obtained P&L distributions for 1000, 5000, 10000 and100000 simulations. The red lines represent the VaR0.01.

    -200 -150 -100 -50 0 50 100 150 200

    P&L

    0

    20

    40

    60

    80

    100

    120

    140

    Ob

    se

    rvatio

    ns

    P&L distribution from MCS, 1000 simulations

    (a) 1000 simulations.

    -200 -150 -100 -50 0 50 100 150 200

    P&L

    0

    200

    400

    600

    800

    1000

    1200

    1400

    Ob

    se

    rvatio

    ns

    P&L distribution from MCS, 5000 simulations

    (b) 5000 simulations.

    -200 -150 -100 -50 0 50 100 150 200

    P&L

    0

    200

    400

    600

    800

    1000

    1200

    1400

    Obse

    rvatio

    ns

    P&L distribution from MCS, 10000 simulations

    (c) 10000 simulations.

    -200 -150 -100 -50 0 50 100 150 200

    P&L

    0

    0.5

    1

    1.5

    2

    2.5

    3

    Obse

    rvatio

    ns

    104 P&L distribution from MCS, 100000 simulations

    (d) 100000 simulations.

    Figure 10: P&L distribution of S&P500 obtained from the MCS method with 1000 (a), 5000(b), 10000 (c) and 100000 simulations (d). The red line represent the 1-day VaR0.01.

    Table 2 show the 1-day VaRα for different values of α and different numbersof simulations. As we can see, there are big differences for different numbersof simulations, especially for VaR with lower α . For example, the VaR0.005obtained with 1000 simulations equals $165.4 while 100000 simulations result

    39

  • 2 THEORY

    in $110.0. Furthermore, it appears that as the number of simulations increases,the VaRα converges to a specific value and more simulations will most likelynot result in a more accurate VaRα .

    Table 2: MCS 1-day VaR for S&P500 with different values of α and different numbers ofsimulations.

    α 1000 sim 5000 sim 10000 sim 100000 sim

    0.005 $165.4 $116.9 $110.2 $110.00.01 $85.8 $88.9 $85.8 $85.60.05 $41.6 $45.0 $42.2 $42.10.1 $28.7 $30.4 $29.3 $29.0

    The results in Example 2.3 are obtained by using the fitted student t-distributionfrom Figure 9. The reader should be aware of the dangers of changing the dis-tribution to for example a normal distribution when generating pseudo-randomnumbers. As can be observed in Figure 9, the normal distribution does not accu-rately represent the distribution of the log returns and might therefore not reflectthe potential future loss.

    2.5 Dimension Reduction

    One way of speeding up a simulation is to reduce a high-dimensional problem to alower one. Dimension reduction techniques are used for many statistical problemsand below, the reader will be presented with one of such techniques: the principalcomponent analysis (PCA) and an application of it on GARCH models.

    2.5.1 Principal Component Analysis

    PCA is a statistical technique which allows the user to obtain a set of relatively fewdescribing variables from a large set of variables. The technique is popular whenworking with high dimensions because it is an effective approach to reduce di-mensions while still describing the majority of the variance in the data-set (Jameset al., 2014).

    Kreinin et al. (1998) describes how PCA can be applied in MCS. The idea behindPCA is fairly straight forward when considering a set of risk factors and their jointdistribution. First, PCA obtains a new set of uncorrelated vectors that are a linearcombination of the original data. These variables are called principal components(PC). An MCS of the risk factors is performed by generating independent randomforecasts of the uncorrelated PCs and then using the linear transformation to obtainthe forecasted correlated risk factors. By selecting only a subset of the PCs whichrepresent a big part of the variance in the data, the dimensionality is reduced.

    40

  • 2 THEORY

    Alexander (2009b) describes PCA as a technique based on the eigenvalue eigen-vector decomposition of the risk factors’ correlation matrix. So, when applyingPCA to a data set containing multiple financial time series, we first denote Y asthe T ×N matrix of daily log returns to be analyzed by PCA. T is the number ofdata points used in the analysis and N is the number of equities. Let yt,i denote thedaily return of stock i at time t, i = 1,2, ...,N, t = 1, ...,T . Normalizing the datagives:

    xt,i =yt,i−µi

    σi(17)

    where µi and σi are the mean and standard deviation of yi. Normalization is animportant step in PCA since the results are sensitive to scaling of data.

    The T ×N matrix X will now represent the data set of normalized log returns andthe columns x1, x2, ...,xN , are the univariate time series.

    The standardized log returns’ correlations are summarized by the N×N matrix:

    XT X.

    with the diagonal elements of ones.

    As already stated, PCA utilize the eigenvalue eigenvector decomposition and byextracting the eigenvectors from XT X we get what is referred to as the factorweights, which are vectors contained in the matrix W. W is an orthogonal matrixwhere we denote Λ as the diagonal matrix of the corresponding eigenvalues. Theweights in W are crucial for finding the PCs which we formally define as:

    Definition 2.20 (Principal components)Let X be the T ×N matrix of T data points and N equities. Then, the prin-

    cipal components are defined as vectors which are linear combinations of thecolumns of X where the vector weights are chosen in such a way that (Alexan-der, 2009a):

    • the principal components are uncorrelated with each other

    • the first principal component explains the most variation (i.e. the great-est amount of the total variation in X) and the second component ex-plains the greatest amount of the remaining variation, etc.

    Now, let Λ denote the diagonal matrix of ordered eigenvalues obtained fromXT X and W the orthogonal matrix of corresponding eigenvectors, also calledweights, ordered according to Λ. Then, the principal components are vectorsin a T ×N matrix, P, defined as the product of the matrix of input data X andthe eigenvector matrix W:

    P = XW.

    41

  • 2 THEORY

    More specifically, the m:th principal component is defined as the m:th columnof P, which can be written as:

    pm = w1mx1 +w2mx2 + ...+wNmxN (18)

    where wm =(w1m,w2m, ...,wNm)T is the eigenvector corresponding to the eigen-value λm in Λ, the m:th largest eigenvalue of XT X.

    Equation 18 shows that each PC is a linear combination of the column of X withfactor weights given by the elements in W. The total variation in X is the sum ofthe eigenvalues in Λ, i.e. λ1 +λ2 + ...+λN . Hence, the proportion of this totalvariation that is explained by the m:th principal component is

    λmλ1 + ...+λN

    and in a highly correlated system, the first eigenvalue can explain a larger part ofthe total variation.

    Alexander (2009a) continues and describes how we can represent the data withthe principal components:

    Since P = XW and WT = W−1 we have

    X = PWT .

    In other words, each of the original returns in X can be described as a linearcombination of the principal components by:

    xi = wi1p1 +wi2p2 + ...+wiNpN (19)

    and since in a highly correlated system, there are only a few independent sourcesof variation, we can represent the matrix X with k < n number of principal com-ponents and still explain a large part of the variation. For example, by using thefirst three principal components we get:

    xi ≈ wi1p1 +wi2p2 +wi3p3

    which in matrix notation is:

    X≈ P∗W∗T (20)

    where X is the T ×N matrix of standardized returns, P∗ is a T × k matrix with kcolumns representing the first k principal components, and W∗ is a k×N matrixwith k columns of weights given by the first k eigenvectors.

    42

  • 2 THEORY

    Another way of describing X with only k principal components is through:

    X = P∗W∗T +E

    where E is the part of the variation that is not not captured by only using k PCs.

    When choosing the numbers of PCs that should be used, we usually want to ex-plain a certain proportion of the variation in the data set. Kreinin et al. (1998)suggests an approach where we suppose ε∗ is an admissible proportion of unex-plained variance. Then we select the minimal number, k, of principal componentssatisfying the inequality

    λ1 + ...+λkλ1 + ...+λN

    > 1− ε∗. (21)

    The PCs with indices j > k have a smaller effect on the underlying vector of riskfactors since the corresponding eigenvalues are smaller.

    2.5.2 Orthogonal ARMA-GARCH

    It might not be trivial to understand how PCA can be used as a forecasting tech-nique but Equation 19 show that it is possible to describe each equity’s standard-ized return for all data points with the help of the obtained PCs and the weights.This means that the PCs can be seen as individual time series which are uncorre-lated with each other.

    So, when applying PCA for forecasting purposes, the goal is to forecast tomor-row’s PC coefficient and since the PCs can be seen as individual time series, it ispossible to utilize the same models as in MCS. Definition 2.7 presents a methodfor forecasting conditional variances and Alexander (2009b) refers to applyingGARCH-models to the orthogonal PCs as orthogonal GARCH (O-GARCH) andjust as in the case with a regular financial time series, we might also want to applyan ARMA model to handle time dependencies which would be referred to as anO-ARMA-GARCH process.

    Since the PCs are independent, univariate GARCH models can be used and we donot have to account for the dependencies between the PCs. When k conditionalvariances are forecasted and new PC coefficients calculated, it is easy to back-transform the results to obtain forecasted returns by:

    Xt = P̃∗W∗T

    where Xt is the forecasted vector of standardized returns, P̃∗ is the vector of kforecasted PC values and W∗T the k corresponding vectors of weights. Alexander(2009b) emphasize that when forecasting Xt we do not have to forecast a new W∗

    43

  • 2 THEORY

    but rather use the observed matrix since it seems to be fairly constant betweenobservations.

    2.5.3 Cluster Analysis

    The previous section describes PCA and how it is a good method to apply whenthe data is highly correlated. While this might be the case for interest rates or sometypes of derivatives, this is not the case for equities. In fact, equity price series canvary a lot which is shown in Figures 11a and 11b. The former illustrates Apple’sand Amgen’s stock prices while the later illustrates Apple’s and Arcerol Mittal’s,all of which are traded on the New York Stock Exchange. Figure 11a provides agood example of two stocks that have a high correlation, i.e. the price movementsare similar, while Figure 11b illustrates a low correlation.

    (a) Historical closing prices of Ap-ple and Amgen with ρ = 0.9151.

    (b) Historical closing prices of Ap-ple and Arcerol Mittal with ρ =−0.6824.

    Figure 11: Illustration of historical closing prices.

    In the data set used in this thesis, including 2520 equities, it is bound to includeprice series with high and low correlations. When this is the case, Alexander(2009b) suggests to divide the data set into clusters of highly correlated equitiesand subsequently perform PCA on each cluster. By doing so, the first PC foreach cluster, which describes the general trend of the data, will be more accuratein terms of describing all stocks in that particular cluster and thus also generatea more accurate forecast. By combining the results from each cluster, the totalportfolio result is obtained.

    To organize the stocks in clusters it is common to use a distance metric, di, j,between for example stock i and j. Mantegna (1999) emphasizes that a distancemetric must uphold the following properties:

    44

  • 2 THEORY

    1. di, j ≥ 0,

    2. di, j = 0⇐⇒ i = j,

    3. di, j = d j,i

    and since the correlation metric, described in Equation 13, not always upholdproperty (1), it cannot be used without manipulation. A simple but effectivemethod which is applied in this thesis is to let:

    di, j = 1−ρi, j. (22)

    By doing so, the highly correlated stocks will have distance metrics close to zeroand all metrics are ≥ 0.

    Dividing the data into c clusters based on their distance metrics can be completedby first creating a spanning tree (ST). This is a concept from graph theory wheremany different methods can be used based on the clustering preferences (West,2001). Tola et al. (2008) describes one method which is used in this study, thecomplete linkage clustering, in the following way:

    Assume the matrix D consists of distances between pairs of elements in the systemto be clustered. Begin by disjointing distance metrics with value 0.

    Start from N clusters, each containing one item. Then at each iteration:

    1. Using the current matrix D of cluster distances, find the two closest clusters.

    2. Update the list of clusters by merging the two closest entries.

    3. Update the distance matrix accordingly.

    Repeat until all items are joined in one cluster.

    The complete linkage clustering method updates the distance matrix in step 3 in aparticular way. The distance between the new, merged entry, and another item orcluster, is the furthest distance between any items in these clusters. For illustrationpurposes I present a simple example on the next few pages:

    45

  • 2 THEORY

    Example 2.4 (Complete linkage clustering)Assume we have the closing prices between 1st of January 2010 and 31st ofDecember 2017 for five stocks; Bank of America (BAC), Citibank (C), J.P.Morgan (JPM), Apple (APPL) and Amazon (AMZN). To cluster these stocksbased on their correlation with the complete linkage clustering method, westart by calculating each pair’s correlation, ρi, j, using Equation 13. We getthe correlation matrix (due to symmetry, only the upper triangular matrix isillustrated):

    Table 3: Correlation matrix for the five stocks.BAC C JPM APPL AMZN

    BAC 1 0.92 0.90 0.67 0.76C 1 0.91 0.72 0.75

    JPM 1 0.87 0.93APPL 1 0.86AMZN 1

    Now, calculate the distance metric di, j for each pair of stocks by applyingEquation 22. We obtain the distance matrix D:

    Table 4: Distance matrix for the five stocks.BAC C JPM APPL AMZN

    BAC 0 0.08 0.10 0.33 0.24C 0 0.09 0.28 0.25

    JPM 0 0.13 0.07APPL 0 0.14AMZN 0

    The ST corresponding to the distance matrix D is computed using the completelinkage method. In Table 4 we find that the minimum distance is betweenAMZN and JPM (d = 0.07). The first step is thus to merge these two equitiesinto one cluster, AMZN-JPM, which becomes a new item. We then createa new distance matrix, D, where the distances between AMZN-JPM and theother items, BAC, C and APPL, are the maximum distances between the itemand AMZN or JPM:

    Table 5: New distance matrix including one cluster of two stocks and 3 individual stocks.AMZN-JPM BAC C APPL

    AMZN-JPM 0 0.24 0.25 0.14BAC 0 0.08 0.33

    C 0 0.28APPL 0

    46

  • 2 THEORY

    The next shortest distance is now found in the new distance matrix betweenBAC and C (d = 0.08). This means that we now have two separate clusterscontaining multiple stocks in the ST which is shown by Figure 12A, the firstcluster; AMZN-JPM and the second cluster; C-BAC.

    By merging BAC and C we get a new distance matrix, D:

    Table 6: New distance matrix including two clusters of two stocks and 1 individual stock.AMZN-JPM BAC-C APPL

    AMZN-JPM 0 0.25 0.14BAC-C 0 0.33APPL 0

    and the next two closest elements in D are AMZN-JPM and APPL (d = 0.14)which creates the ST in Figure 12B.

    Once again, a new distance matrix D is computed:

    Table 7: Distance matrix for one cluster of three stocks and one cluster of two stocks.AMZN-JPM-APPL BAC-C

    AMZN-JPM-APPL 0 0.33BAC-C 0

    where we find that the two clusters, AMZN-JPM-APPL and BAC-C ar