-
School of Education, Culture and CommunicationDivision of
Applied Mathematics
MASTER THESIS IN MATHEMATICS / APPLIED MATHEMATICS
Generative Neural Network for Portfolio Optimization
by
Mengxin Liu
Masterarbete i matematik / tillämpad matematik
DIVISION OF APPLIED MATHEMATICSMälardalen University
SE-721 23 Västerås, Sweden
-
School of Education, Culture and CommunicationDivision of
Applied Mathematics
Master thesis in mathematics / applied mathematics
Date:2021-01-15
Project name:Generative Neural Network for Portfolio
Optimization
Author:Mengxin Liu
Supervisor(s):Supervisor at Qognica AB: George FodorSupervisor
at MDH: Olha Bodnar
Reviewer:Christopher Engström
Examiner:Daniel Andrén
Comprising:30 ECTS credits
-
Contents
1 Introduction 11.1 Problem description . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . 11.2 Literature review . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3
Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 4
2 Traditional Portfolio Optimization Method 52.1 Mean-Variance
Portfolio Optimization . . . . . . . . . . . . . . . . . . . . .
52.2 CAPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 7
3 Limitation of Traditional Portfolio Optimization 93.1
Drawbacks within Assumptions . . . . . . . . . . . . . . . . . . .
. . . . . 93.2 Drawbacks within Applications . . . . . . . . . . .
. . . . . . . . . . . . . . 9
4 Preprocessing 114.1 Distribution of Daily Return . . . . . . .
. . . . . . . . . . . . . . . . . . . 11
4.1.1 Normal Distribution . . . . . . . . . . . . . . . . . . .
. . . . . . . 134.1.2 Student’s t Distribution . . . . . . . . . .
. . . . . . . . . . . . . . . 134.1.3 Complete Data . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 15
4.2 Whether to Include Technical Indicators . . . . . . . . . .
. . . . . . . . . . 154.3 Scaling . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 15
4.3.1 Standard Score . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 164.3.2 Min-Max Scaler . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 164.3.3 Robust Scaler . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 164.3.4 Max Abs Scaler . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.5 Power
Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
5 Artificial Neural Network 185.1 Introduction to Neural Network
. . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1.1 Relation between Different Concepts . . . . . . . . . . .
. . . . . . 185.1.2 Definition of Artificial Neural Network . . . .
. . . . . . . . . . . . 195.1.3 Differences between ANN and
Statistical Method . . . . . . . . . . . 21
5.2 Training Neural Network . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 225.2.1 Hyperparameters . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 23
i
-
5.3 Activation Function . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 245.3.1 Sigmoid function . . . . . . . . . .
. . . . . . . . . . . . . . . . . . 245.3.2 Hyperbolic Tangent
function . . . . . . . . . . . . . . . . . . . . . . 245.3.3
Rectified Linear Unit function . . . . . . . . . . . . . . . . . .
. . . 255.3.4 Exponential Linear Unit . . . . . . . . . . . . . . .
. . . . . . . . . 265.3.5 Leaky ReLU . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 26
5.4 Approaches to Prevent Overfitting . . . . . . . . . . . . .
. . . . . . . . . . 275.4.1 Increase Data Size . . . . . . . . . .
. . . . . . . . . . . . . . . . . 275.4.2 Reduce Size of Neural
Network . . . . . . . . . . . . . . . . . . . . 275.4.3 L1
Regularization . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 285.4.4 L2 Regularization . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 285.4.5 Dropout . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 28
5.5 Supervised Learning and Unsupervised Learning . . . . . . .
. . . . . . . . 295.6 Generative Adversarial Network . . . . . . .
. . . . . . . . . . . . . . . . . 30
5.6.1 Cost function . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 315.7 Implement Neural Network in Portfolio
Optimization . . . . . . . . . . . . . 32
5.7.1 How to Optimize Portfolio from Output of Neural Network .
. . . . . 32
6 Empirical Study 346.1 Data Software and Hardware . . . . . . .
. . . . . . . . . . . . . . . . . . . 34
6.1.1 Data and Data Source . . . . . . . . . . . . . . . . . . .
. . . . . . . 346.1.2 Software Choice . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 346.1.3 Hardware . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 34
6.2 Risk Measurement . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 356.2.1 Volatility . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 356.2.2 Value at Risk . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 356.2.3 Conditional
Value at Risk . . . . . . . . . . . . . . . . . . . . . . . 35
6.3 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 366.3.1 Simulated Path of Monte Carlo Simulation
. . . . . . . . . . . . . . 366.3.2 Calculate VaR using Monte Carlo
Method . . . . . . . . . . . . . . . 376.3.3 Calculate CVaR using
Monte Carlo Method . . . . . . . . . . . . . . 376.3.4 Markowitz
GMV Portfolio Selection . . . . . . . . . . . . . . . . . . 39
6.4 Studies on GAN . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 406.4.1 Structure of GAN . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 406.4.2 Key Point on Selecting
Batches . . . . . . . . . . . . . . . . . . . . 416.4.3 Output from
GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.4.4
The Effect of Epoch . . . . . . . . . . . . . . . . . . . . . . . .
. . 466.4.5 The Effect of Batchsize . . . . . . . . . . . . . . . .
. . . . . . . . . 476.4.6 The Effect of Latent Dimension . . . . .
. . . . . . . . . . . . . . . 476.4.7 A Portfolio Optimization
Example . . . . . . . . . . . . . . . . . . . 47
7 Discussion 497.1 Advantages of Generative Neural Network
Portfolio Optimization . . . . . . 497.2 Disadvantages of
Generative Neural Network Portfolio Optimization . . . . . 49
ii
-
8 Further Research and Conclusion 518.1 Further Research . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 518.2
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 52
A Weight of GMV portfolioA.1 Weights of GMV Portfolio . . . . .
. . . . . . . . . . . . . . . . . . . . . .
B VaR and CVaR of Stocks Using Normal Monte CarloB.1 Part 1 . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .B.2 Part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .B.3 Part 3 . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
C VaR of the First GAN ResultC.1 Part 1 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .C.2 Part 2 . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
D Epoch Study
E Batchsize Study
F Latent Dimension Study
iii
-
List of Figures
3.1 Rolling correlation coefficient between AAK and ABB . . . .
. . . . . . . . 10
4.1 Histogram of ABB . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 124.2 Comparison Between Histogram and Normal
Distribution PDF . . . . . . . . 134.3 Comparison Between Histogram
and Student’s t Distribution PDF . . . . . . 14
5.1 A brief description of the relation between three different
concepts . . . . . . 185.2 LTU unit . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 205.3 Neural Network . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.4
Graph of Sigmoid function . . . . . . . . . . . . . . . . . . . . .
. . . . . . 245.5 Graph of Hyperbolic Tangent function . . . . . .
. . . . . . . . . . . . . . . 255.6 Graph of Rectified Linear Unit
function . . . . . . . . . . . . . . . . . . . . 255.7 Graph of
Exponential Linear Unit . . . . . . . . . . . . . . . . . . . . . .
. 265.8 Graph of Leaky ReLU . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 275.9 Graphical Explanation of Autoencoder . . .
. . . . . . . . . . . . . . . . . . 305.10 Graphical Representation
of GAN . . . . . . . . . . . . . . . . . . . . . . . 31
6.1 One path generate by Monte Carlo simulation . . . . . . . .
. . . . . . . . . 366.2 VaR of ABB using Monte Carlo simulation . .
. . . . . . . . . . . . . . . . 376.3 CVaR of ABB using Monte Carlo
simulation . . . . . . . . . . . . . . . . . 386.4 VaR and CVaR of
ABB using Monte Carlo simulation . . . . . . . . . . . . . 386.5
Value of GMV portfolio in 10 years . . . . . . . . . . . . . . . .
. . . . . . 396.6 VaR and CVaR of GMV portfolio using Monte Carlo
simulation . . . . . . . 406.7 Data structure of input . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 416.8 ABB Price paths
generated by GAN . . . . . . . . . . . . . . . . . . . . . . 426.9
Histogram comparison . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 436.10 Histogram comparison(daily return) . . . . . . .
. . . . . . . . . . . . . . . 446.11 Heatmap of one generated data
. . . . . . . . . . . . . . . . . . . . . . . . . 446.12 Heatmap
real stocks returns data . . . . . . . . . . . . . . . . . . . . .
. . . 456.13 Rolling Correlation of Neural Network . . . . . . . .
. . . . . . . . . . . . . 466.14 Value of Portfolio in 10 years . .
. . . . . . . . . . . . . . . . . . . . . . . . 48
iv
-
Abstract
This thesis aims to overcome the drawbacks of traditional
portfolio optimization by employingGenerative Deep Neural Networks
on real stock data. The proposed framework is capable ofgenerating
return data that have similar statistical characteristics as the
original stock data. Theresult is acquired using Monte Carlo
simulation method and presented in terms of individualrisk. This
method is tested on real Swedish stock market data. A practical
example demon-strates how to optimize a portfolio based on the
output of the proposed Generative AdversarialNetworks.
v
-
Acknowledgements
I would like to thank everyone in the Qognica AB for giving me
this chance of doing thisthesis. It has been a really enjoyable
journey for me. Also, I would like to thank my supervisorOlha
Bodnar for giving me constructive opinions.
vi
-
Chapter 1
Introduction
1.1 Problem descriptionPortfolios are a set of financial assets
selected to optimize trade-offs between risks and returns.Optimal
portfolios define a line in the risk vs. return plane called the
efficient frontier. Theoptimization process as such is done by
Portfolio Managers. A manager selecting assets willtypically
consider factors such as the risk aversion of the investor,
risk/return profile of eachasset and the risk-free rate and the
borrowing rate. Advances in financial engineering led to
anincreased sophistication on both the optimization instruments
side and also on the investor’sunderstanding of risks. This trend
could be accelerate using recent results in machine learningmethods
and in advanced computerized mathematical modelling tools.
In order to construct a portfolio, it is important to model the
asset. The assets in a portfoliocan be represented as a combination
of weight, expected return, and risk. The weight wi is
therepresentation of the portion of stock i in the portfolio.
Expected return µi is the representationof investors’ expectation
on the return of stock i in the future. Normally risk is measured
by afunction that considers the standard deviation σ
The Harry Markowitz paper[28] gives a solution of how to
construct a portfolio based onthe formulation introduced before.
The weights of each stock can be represented by a weightvector ~W T
= (w1,w2, · · · ,wN), in order to calculate the variance of the
whole portfolio, thecovariance matrix Σ need to be calculated:
Σ =
σ1,1 · · · σ1,N... . . . ...σN,1 · · · σN,N
Where σN,N = σ2N is the variance of asset N. σi, j is the
covariance between asset i and j.Now the portfolio’s risk σp can be
calculated using formula:
σ2p = ~WT Σ ~W
The problem of minimizing the risk of the portfolio can be
solved using techniques like
1
-
Lagrange Multiplier. The calculated portfolio is called Global
Minimum Variance (GMV)Portfolio.
If the investors want to have more returns while controlling the
risk, then the optimizationproblem could be formulated using the
concept of Sharpe ratio[43]. Sharpe ratio Sp can becalculated with
the formula:
Sp =µp− r f
σp
Where r f is the risk free rate, µp is the expected return of
the portfolio, and σp is thevolatility of the portfolio.
Solving the optimization problem that maximizes the Sharpe ratio
will give a portfoliocalled optimal portfolio. We can define the
negative Sharpe ratio as cost function, then theoptimization
problem minimizes this cost function.
In this thesis, some questions are raised. Is the mean variance
portfolio optimization frame-work a good portfolio selection
method? Just estimating the risks by the standard deviationmight
not capture all the regularities that could identify risk patterns.
Recently with the de-velopment of computing power, the Artificial
Intelligence method especially neural networkalgorithm is becoming
more and more important in many fields like computer vision due
totheir capacities to recognize patterns. This lead to the problem
of this thesis. Is it possible toapply neural network algorithm in
the portfolio selection process? How does a neural networkbased
portfolio perform compared with the Markowitz portfolio selection
framework?
This thesis aims to find the answer to the previous question. A
designed unsupervisedneural network will try to extract features
from the existing stock data. Then the neural net-work will
generate many return series that have similar characteristics to
the original data.Then a portfolio will be constructed based on the
generated data. The designed portfolio willhave minimum risk(in the
measurement of standard deviation, value at risk or
conditionalvalue at risk).
1.2 Literature reviewThe aim of academic studies in modelling
time series is to find a model that can better describetime series
characteristics. A more complete model of time series will give a
better predic-tion or estimation. The prediction will be used for
optimization. A model that seeks to finda better representation of
time series has two parts: structure and parameters. When we tryto
choose a model, we want to select a structure that leads to the
least amount of parameters.Among many proposed time series models
with financial applications, Autoregressive Integ-rated Moving
Average (ARIMA) model[47] is a good model with good prediction
power andfew parameters. As a common rule, Occam Razor[6] states,
the simpler model is preferredin any case, this being a normal
regularization principle. Apart from this advantage, whatmakes it
interesting in financial application is its capability of
simulating Brownian motion.Brownian motion is one of the most
common way to model prices of financial assets. Wheninvestors try
to identify the parameters of ARIMA, essentially they are doing
statistical mod-elling, which is built upon statistical
assumptions[11]. However, if we want to build a modelthat has no
statistical assumptions, ARIMA is not the most suitable in this
case. Hence we
2
-
want to build a model based on no statistical assumption
enabling us to find correlations orpatterns that are hard to
recognize in a normal setting. Compared to the traditional
method,this method will give a more precise estimation of the
financial characteristics. To solve thisquestion, we choose to
implement the ideas from Artificial Neural Network research.
Becauseit is a widely developed field, giving us a new way to model
time series(financial time seriesin this case).
Since the introduction of the neural network, many researchers
have been trying to im-plement artificial neural network techniques
in financial applications. Article by Cavalcante[7] categorizes the
machine learning related articles and summarizes the core
implications ac-cording to its directions. From the author’s
summarization, the most common applications ofmachine learning in
financial applications are price prediction. Machine learning
techniquecan also be applied in other applications such as features
extraction and outliers detection.
Under the category of price prediction, there are some articles
trying to achieve this goal.W.Bao[4] proposes a framework to
predict stock price. Indices data are fed into a wavelettransform
system. The purpose of the wavelet transform is to denoise price
data. Later thedata will go through a stacked autoencoder.
Autoencoder is an unsupervised learning methoddesigned to extract
deep features from the data. Subsequently, the extracted features
are fedinto a long short term memory model(LSTM) in order to
acquire one step ahead prediction.According to the author, the
proposed framework has the capability of predicting price datawith
Coefficient of determination R2 above 90%. This demonstrates the
potential of ArtificialNeural Network in the financial market.
Another approach in predicting stock price involves a commonly
implemented way ofprocessing data: technical indicator. Tegner [46]
suggests a method to predict the financialasset price in the
future. In his suggested framework, the input of the neural network
consistsof prices and technical indicators like Moving average and
Momentum. Then a selection isconducted in order to find the
technical indicators that have more importance than others.
Toacquire the prediction from the neural network, the author
chooses to feed the selected datainto a Recurrent Neural
Network(RNN). According to the article, the most effective
methodachieves an accuracy of 52%. This is one of the articles that
incorporate the idea of technicalanalysis with the power of
Artificial Neural Networks.
One of the latest topics of the artificial neural network is
generative model. One frame-work: Variational Autoencoder(VAE)[24]
is gaining more attention. It can be applied in manyapplications
like text generation[50][42], and also image related tasks [33].
Variational Au-toencoder can be seen as an extension of
autoencoder, its probabilistic characteristics enableit to generate
different outputs with similar characteristics. It belongs to the
family of Gener-ative Autoencoder, this type of autoencoder has the
ability to generate new data, making it aninteresting topic in
financial applications.
Another generative model Generative Adversarial
Networks(GAN)[18] is another modelthat has been applied in many
research fields. X.Zhou [52] proposes a framework that im-plements
GAN with high-frequency data to predict stock data in the future.
The frameworkincorporates Long Short Term Memory with GAN to
predict the stock price. The perform-ance is measured based on two
measurements, Root Mean Squared Relative Error(RMSRE)and Direction
Prediction Accuracy(DPA). The result indicated that GAN could be a
good topicin financial related applications.
The power of GAN is that there are many variations under the
categories of GAN. Deep
3
-
Convolutional Generative Adversarial Network(DCGAN)[34] is one
common variation ofGAN. This technique combines Deep Convolutional
Neural Network with GAN, and is com-monly applied in image-related
applications. Another variation called Wasserstein GAN[3]and its
improved version [50] gives a better result compared to Normal GAN
structure insome applications
Reinforcement learning can also be applied in the financial
fields[10][31]. These articlestry to implement reinforcement
learning into trading execution. The result demonstrates
thatreinforcement learning can be applied in the buy or sell case
trading execution problem.
To better understand the ideas and the applications of some
references, we also choose torun some programs to test the
different neural network models, which will be reflected in
ourempirical studies part.
1.3 OutlineThis thesis will have the following structure, in
Chapter 2 we will give an introduction to thetraditional model for
portfolio optimization. This will give the reader a better
understanding ofportfolio optimization. Then in Chapter 3, we will
discuss the drawbacks of traditional port-folio optimization. In
Chapter 4, we give our methods of preprocessing data, including
fillingand scaling data. Chapter 5 includes the introduction to
Artificial Neural Network and alsoGenerative Adversarial Network
that is implemented in this thesis. In Chapter 6, the
empiricalstudies result, and a study on the effect of
hyperparameters will be presented. In Chapter 7,the advantages and
disadvantages are discussed, based on the results in the empirical
studiespart, a discussion will be presented on the proposed
Generative Neural Network portfolio op-timization framework.
Finally, in Chapter 8, some directions for further studies,
especiallydirections that can improve the proposed framework will
be suggested. Also, we will give theconclusions of our proposed
framework.
4
-
Chapter 2
Traditional Portfolio OptimizationMethod
The modern portfolio theory begins with the paper of Harry
Markowitz in 1952[28]. Thispaper updates the investors’
relationship with risk and return. Before the era of Modern
Port-folio Theory, the risk is not properly implemented in the
stock selecting process. The investorsfocused more on the return of
the individual stock. Modern portfolio theory allows investorsto
make decisions in terms of risk and return. The weights of the
constructed portfolio can becalculated by solving an optimization
problem. The following section is a brief introductionto the two
most commonly applied modern portfolio theories
2.1 Mean-Variance Portfolio OptimizationThe Mean-Variance
Portfolio Optimization starts with the assumption that the investor
at timet will hold the portfolio for a time period ∆t. The
portfolio will be judged based on theterminal value at time t +∆t
Under this theory the portfolio selection process is a
trade-offbetween return and risk.
Suppose that investors need to construct a portfolio in a pool
of N risky assets. Denotew as the weight vector, which represents
the weight of each stock. The weight vector can bewritten as: w =
(w1,w2, · · · ,wn). Then to represent that the investor will fully
invest his/hermoney, we introduced the first constraint of
portfolio optimization.
N
∑i=1
wi = 1 (2.1)
This constraint represents that the investor needs to invest all
the available money intorisky assets. Therefore the sum of the
weight equals to one.
Then the investors need to estimate the expected return of the
stock either from a statisticalmodel or other method. The asset
return is denoted as µ =(µ1,µ2, · · · ,µn). Next, the variance-
5
-
covariance matrix needs to be calculated. The
variance-covariance matrix Σ can be written as:
Σ =
σ1,1 · · · σ1,N... . . . ...σN,1 · · · σN,N
(2.2)Where the σi, j denote as the covariance between assets i
and j.
With these assumptions, we have the expected return of the
portfolio µp :
µp = wT µ (2.3)
and the variance of the portfolio σ2p :
σ2p = wT Σw (2.4)
Now we can form an optimization problem that minimizes the risk
given a target expectedreturn µ0:
minw
wT Σw
Subject to
µ0 = wT µ
wT I = 1, I = [1,1 · · · ,1]
This optimization can be solved using Lagrange multipliers, and
the solution is[15]:
w = j+ kµ0 (2.5)
Where j and k are given by:
j =1
ln−m2·Σ−1[nI−mµ]
k =1
ln−m2·Σ−1[lµ−mI]
and
l = IT Σ−1I
m = IT Σ−1µ
n = µT Σ−1µ
Now with different choices of µ0, the optimization problem can
be solved, and obtain theweight portfolio. Then the variance of
this portfolio can be calculated using equation 2.4.Then we can
form many expected return and standard deviation pairs. This forms
the termEfficient Frontier.
6
-
Now the efficient frontier starts from the Global Minimum
Variance Portfolio(GMV). Theoptimization problem of this portfolio
can be described as:
minw
wT Σw
Subject to
wT I = 1, I = [1,1 · · · ,1]
Now the solution of this optimization problem is[15]:
w =1
IT Σ−1I·Σ−1I
Now if the investor has a risk aversion toward risk denoted as λ
, then the optimizationproblem can be formulated as:
maxw
(wT µ−λwT Σw)
Subject to
wT I = 1, I = [1,1 · · · ,1]
2.2 CAPMCapital Asset Pricing Model (CAPM) is an equilibrium
asset pricing model. The CAPM isfounded based on the following
assumptions[14]:
1. The investor makes decision based on expected return and the
standard deviation ofreturn.
2. Investors are rational and risk-averse.
3. Investors use Modern Portfolio Theory to do portfolio
diversification.
4. Investors invest in the same time period.
5. Investors all have the same expected return and risk
evaluation of all assets.
6. Investors can borrow or lend at risk free rate at an infinite
amount.
7. There is no transaction cost.
To introduce the formula of CAPM, we start with a more general
case: the single-indexmodel. The single index model can be
described as the linear regression between index returnand stock
return. In other word, the return of stock i can be described
as:
Ri = ai +βiRm
7
-
Where ai is the part of the stock return that is irrelevant to
the market return. Rm is the returnof the market βi is a constant
that describes the relation between market return and return
ofstock.
Then rewrite ai as:
ai = αi + ei
Where αi is the mean value of ai and ei is the random value of
ai and has expected value of 0.Now the return of stock i can be
written as:
Ri = αi +βiRm + ei
Then it is obvious that the correlation between ei and Rm is
0Now we give:
1. The mean return: R̄i = αi +βiR̄m
2. Variance of return: σ2i = β 2i σ2m +σ2ei
3. Covariance between return of stock i and stock j: σi j = βiβ
jσ2m
The proofs of the above formulas can be found in[13].Now the
formulation of CAPM is written as:
Ri = R f +βi(RM−R f )
This is the standard form of CAPM, and according to the formula,
the expected return on aparticular stock can be calculated based on
the beta of the stocks, risk free rate on the marketand the
expected return of the market. This is the standard form of the
CAPM also knownas (Sharpe Lintner Mossin) form. There are many
other forms that try to solve some of theseproblems in the standard
form.
In theory, CAPM is a good estimation of the expected return of
stocks. However, in reality,implementation is more complicated.
From the CAPM formula, we can see that the variablesare expressed
in terms of future values. In other word, investors need to
estimate the return ofthe market and the future beta of stocks.
This exposes a problem: a large scale data systematicdata on
estimating the expectation does not exist, therefore the accuracy
of CPAM cannot beguaranteed.
8
-
Chapter 3
Limitation of Traditional PortfolioOptimization
3.1 Drawbacks within AssumptionsThe Traditional Portfolio
Optimization is very useful in many ways, however, it has
manydrawbacks. Lagrange multiplier is implemented to solve this
optimization problem, to makethe solution optimal, it is necessary
to fulfil Karush-Kuhn-Tucker Condition[26]. From thenecessary
condition, the whole process needs to be stationary, however, this
is not necessarilythe case, as there is no proof for that. Finally,
the whole optimization process does not assumethe stochastic
characteristics of the data. Therefore this kind of optimization is
not robustenough since it does not consider one of the most
important characteristics of stock price data.When we talk about
the potential solution to this type of optimization, stochastic
propertiesshould not be ignored after all investors want to find a
portfolio that can give them a secureposition in most cases.
3.2 Drawbacks within ApplicationsWhen investors try to implement
Markowitz portfolio optimization theory in reality, they mayface
another weakness of Modern Portfolio Theory: the difficulties of
estimating required in-puts. Starting with GMV portfolio. The
required input of GMV portfolio optimization is theCovariance
Matrix. The core idea behind this optimization is that By combining
covarianceand variance of each stock, investor can find the right
combination to minimize the varianceof the designed portfolio. The
problem with this idea is that covariance is not a good
rep-resentation of the relations between the two stocks.
Specifically, a number does not have thecapability of explaining
how two stocks move together. In Figure3.1 we present the
rollingcorrelation coefficient between stock AAK and ABB.
9
-
Figure 3.1: Rolling correlation coefficient between AAK and
ABB
As can be seen in the Figure, the correlation between stocks
varies a lot, therefore theoptimization based on the covariance
matrix cannot construct a good portfolio that can give usthe
minimum portfolio variance in the future.
Besides the difficulties of finding the accurate covariance
matrix. If the investors also careabout the return, then they need
to give another input: Expected Return vector. Expected re-turn is
what investors expect from a stock in the future. It is very hard
to estimate the return ofa stock in the future, therefore the
investors may have inaccurate estimations. Since the max-imum
Sharpe ratio portfolio optimization is very sensitive to the
expected return[14], henceit will result in a scenario that can be
described as "Garbage in, Garbage out". If investorshave a wrong
estimation of expected return, then the constructed portfolio
cannot have a goodperformance.
10
-
Chapter 4
Preprocessing
Sometimes the original data contains missing data, therefore it
should either be ignored orfilled in with data. In this thesis, the
core idea is that the missing should be filled. In thefollowing
section, an approach is proposed to fill the missing data. Note
that we implementdistribution studies only to give us knowledge
about how to fill the missing data. Since missingdata does not
contribute much in the whole dataset, it will not change our
purpose that in themain neural network application part: we do not
have any statistical assumption about thedistribution of asset
returns.
4.1 Distribution of Daily ReturnIn order to compute the daily
return of stocks, the log return of the stock price daily data
isselected to represent the daily return of stocks. The formula for
calculating log return is[47]:
Ri = ln(Pf /Pi)
Where Ri is the return at the end of the period, Pf is the price
at the end of the period and Pi isthe price at the initial of the
period.
The reason behind choosing log return is that we can calculate
the return in a period bysumming up all individual log return. For
example, denote Pn as the price at time n, thereforethe log return
during period 1 to n can be calculated as:
R1−n = ln(Pn
P1
)= ln
(P2P1· P3
P2· · · Pn
Pn−1
)And we know that:
ln(a ·b) = ln(a)+ ln(b)
11
-
Consequently:
R1−n =n
∑i=1
Ri
Also, if we choose to apply normal return, then there is a limit
of return. The normal returnrange from -1 to +∞, because the stock
price cannot have more negative price changes thanits current
price. Subsequently, it will create difficulties in studying or
emulating daily return.Log return, on the other hand, ranges from
−∞ to +∞, making it a good choice to calculatedaily return in our
studies.
Now we choose to identify the statistical properties of the
daily stock returns, it is better toplot the histogram of the stock
daily returns. Figure 4.1 is the histogram of ABB daily return.This
will give us a first glimpse of the statistical properties of stock
returns.
0.08 0.06 0.04 0.02 0.00 0.02 0.04 0.060
5
10
15
20
25
30
35
40
Figure 4.1: Histogram of ABB
With the histogram of the daily log return, we decide to find
the distribution that can betterdescribe the daily returns of
stock.
To find the distribution that fits best with the daily return of
stocks, some potential candid-ates are suggested. In the following
part, introductions will be made to these distributions andcompared
between the (Probability Density Function) PDF of fitted
distribution and histogramof the stock return.
12
-
4.1.1 Normal DistributionA random variable X follows random
distribution if its probability density function can bewritten
as[39]:
f (x) =1
σ√
2πe−
12 (
x−µσ )
2
, −∞ < x < ∞ (4.1)
Where −∞ < µ < ∞ and 0 < σ2 < ∞. We denote that µ as
mean and σ2 as variance. Then arandom variable with mean µ and
variance σ2 can be written as: X ∼ N
(µ,σ2
).
Random variable that generates from the normal distribution will
not be suitable for thiscase, the figure shows PDF of the normal
return with mean and variance calculated accordingto the history
daily return. From Figure 4.2 we can see that the PDF of normal
distributiondoes not fit well with the histogram.
0.08 0.06 0.04 0.02 0.00 0.02 0.04 0.06 0.080
5
10
15
20
25
30
35
40 Normal Distribution
Figure 4.2: Comparison Between Histogram and Normal Distribution
PDF
4.1.2 Student’s t DistributionTo introduce Student’s t
Distribution we first introduce the concept of gamma function.
Thegamma function Γ(z) can be written as:
Γ(z) =∫ ∞
0xz−1e−xdx
13
-
Then a random variable X follows Student’s t Distribution with ν
degree of freedom if theprobability density function can be written
as[39]:
f (x;ν) =Γ( ν+1
2
)√
πν Γ( ν
2
)(1+ x
2
ν
)( ν+12 ),
, −∞ < x < ∞
The comparison between Student’s t Distribution is presented in
the Figure4.3
0.08 0.06 0.04 0.02 0.00 0.02 0.04 0.06 0.080
5
10
15
20
25
30
35
40 Normal DistributionStudents t Distribution
Figure 4.3: Comparison Between Histogram and Student’s t
Distribution PDF
The student t distribution can be viewed as the generalization
of Cauchy Distribution andNormal Distribution. First, we look at
the probability density function of Cauchy Distribution[39]:
f (x) =1
π (1+ x2)−∞ < x < ∞
Now, this probability density function can be viewed as the
special case that the degree offreedom ν=1. Now when we let the
degree of freedom ν → ∞, then we have:
limν→∞
f (x;ν) =1√2π
e−12 x
2 −∞ < x < ∞
If we compare the result with the Equation 4.1 we find that the
result of the form of thestandard Normal Distribution.
14
-
4.1.3 Complete DataBefore filling the missing data, we need to
select stocks that we want to fill. Because themissing data is
generated from the fitted distribution, if our inputs to the neural
network arefilled with too many random variables, the true
information inside the real stock price willbe distorted. Hence we
select stocks that only contains missing data with length less
thanone-fourth of the total length of the data.
In terms of distribution, the Student’s t Distribution is chosen
because it fits good with thestatistical property of daily returns
of stock.
To get the parameters of the distribution we choose to implement
Python package Scipy[48],we use functions from Scipy to fit the
distribution to return data and generate daily returns thatfollow
fitted distribution.
4.2 Whether to Include Technical IndicatorsIn the finance
industry, technical indicators are applied in many trading
strategies. However,in its essence, technical indicators analysis
is not exact science[32]. Because it is a reflectionof the market
price trend, technical analysis aims to find the market trend at an
early stage.Technical analysis believes that crowds’ psychology
affects the stock price. The investorsstudy the market trend will
orient investors to decide on buying and selling in some degree
ofconfidence.
Some studies also choose to use technical indicators as an
input, studies like Tegner2018[46] and Widegren 2017 [49] all
choose to combine data and technical indicators to feedthem into
Artificial Neural Network.
However, in this thesis technical indicators will not become an
input of the designed neuralnetwork. The technical indicator or
technical analysis aims to provide predictions of the futureasset
price. In this thesis, prediction is not important. In other word,
we do not care about thefuture price of tomorrow or next month. The
focus of this program is to estimate the risk ofstocks in any given
period. The potential loss of the portfolio to be exact. Therefore
applyingtechnical indicators cannot provide any improvement to the
risk estimation and potentiallygenerate noise and affect our
estimations.
4.3 ScalingBefore feeding data into the neural network. It is
important to perform scaling on data. Be-cause the range of the
data might be different from each other. If the data is fed into
the neuralnetwork or machine learning algorithm without any
preprocessing, the result will be deeplycompromised because of the
different data range. In the following section, several
scalingtechniques will be introduced.
15
-
4.3.1 Standard ScoreThe formula for normal scale can be
expressed as[53]
X̂ =x−µ
σ
Where: µ is the mean of the data, and σ is the standard
deviation of the data.The Standard score is the most common
technique of the scaling and applied in many
machine learning algorithms. However, in our application, this
method is not appropriate,because our designed network requires
data inputs in the range of -1 to 1. Standard score scaledata based
on the standard deviation, from the Figure 4.2 one can see that the
stock data hasthe trend of fat tail distribution, therefore the
scaled data cannot fulfil our requirement for theproposed
framework.
4.3.2 Min-Max ScalerAnother common technique that can be
implemented in our application is Min-Max scaler.The formula for
the scaling can be expressed as[35]:
X̂ =x−min(x)
max(x)−min(x)
Then the result will transform the result into the range 0 to 1,
the whole data will be scaledbased on the maximum and the minimum
of the data. Although this technique will ensure thatthe result in
the range of 0 and 1, however in the context of stock data, the
whole data willbe scaled according to the outliers, hence the
scaled data will overly concentrated around anarrow interval,
resulting in inaccurate outputs from the neural network.
4.3.3 Robust ScalerNow to solve the issue we faced in the
Min-Max Scaler, we can implement another type ofscaler: Robust
Scaler. The formula for robust scaler can be formulated as[35]:
X̂ =x−Q2
Q3−Q1Where Q1 is the 25th quantile of x, Q2 is the median of x,
and Q3 is the 75th quantile of x.
By using Robust Scaler, the scaled data will be evenly
distributed. Therefore in the situ-ation of large outliers,
compared to Min-Max Scaler, the Robust Scaler will allow data to
bedistributed in a wider range.
4.3.4 Max Abs ScalerMax Abs Scaler is a variant of the Min-Max
Scaler, compare to Min Max Scaler Max AbsScaler will scale the data
in range of -1 to 1. The formula for Max Abs Scaler is[53]:
X̂ =x
max(abs(x))
16
-
4.3.5 Power TransformPower transform is one of the techniques
that try to transform data to a more normal distri-bution liking
data using power function. Under this category, there are two major
approachesBox-Cox Transform[40] and Yeo-Johnson Transform[51].
Under this application, we haveto use Yeo-Johnson Transform.
Because the Box-Cox Transform has to be implemented onstrictly
positive values. Since the Box-Cox Transform is not implemented,
there will not bean introduction for it. Now Yeo-Johnson Transform
is formulated as:
yλi =
((yi +1)
λ −1)/λ if λ 6= 0,y≥ 0
log(yi +1) if λ = 0,y≥ 0−[(−yi +1)(2−λ )−1
]/(2−λ ) if λ 6= 2,y < 0
− log(−yi +1) if λ = 2,y < 0
Where 0≤ λ ≤ 2. Yeo-Johnson Transform is the choice for the
application because it can beapplied to negative data. The
transformed data will display some kind of normal
distribution.After performing Power transform we can apply Max Abs
Scaler to make scaling compatiblewith our network’s required
inputs. Because Power Transform is not a linear scaler, the
effectof outliers will not have more impact on the Max Abs Scaler’s
result compared to Max AbsScaler result without Power
Transform.
17
-
Chapter 5
Artificial Neural Network
5.1 Introduction to Neural Network
5.1.1 Relation between Different ConceptsWith the evolution of
computer related technologies especially the GPU1, Artificial
Intelli-gence becomes applicable rather than a proposed concept.
Terminology like machine learningand neural network become more and
more popular and frequently appearing in articles andmedia.
However, some readers may have difficulties in understanding the
relations betweendifferent concepts. In this section, we give a
small introduction to these concepts. Figure 5.1is a brief
description of the three most common concepts in the Artificial
Intelligence researchfield.
Figure 5.1: A brief description of the relation between three
different concepts
The concept of Artificial Intelligence began in 1950s, the core
idea of A.I is to perform the
1https://www.nvidia.com/en-us/about-nvidia/ai-computing/
18
-
intellectual task, which is done by human automatically. In the
following years, this concepthas been continuously developed. And
under this concept, a new approach was proposed:Machine
Learning
In traditional model based method, data and rules are fed into
the system, then the resultwill be calculated according to the data
and rules. Machine Learning implements a whole newparadigm, a
relation is found according to the given input and result. In other
word, machinelearning program aims to replicate the given result
with a given input. When a machine learn-ing model is trained, we
can give a new set of data as input and obtain output from
machinelearning algorithm.
To explain the concept of Deep Learning, we first elaborate on
how machine learningalgorithm works. In a machine learning program,
three types of information are necessary.Input data, which for
example can be numbers, pictures, and sound etc. And the examplesof
the known results, which can be tags of pictures in an image
recognition task. Finallythe measurement of the algorithm’s
performance. This measurement measures the distancebetween the
result of the algorithm and the expected output. Then adjustments
are madeaccording to the predetermined measurement and output from
this machine learning algorithmwill improve accordingly.
Now we come to the difference between Machine Learning and Deep
Learning. Contraryto the first impression, compared to the normal
machine learning algorithm, the deep learningalgorithm will not
necessarily give a deep interpretation of the data. The term Deep
Learningdefines algorithms that feed input data through a
successive set of layers that have an increas-ingly meaningful
interpretation[16]. By doing this, input data can be represented by
differentlayers of interpretation. The depth of the model is the
term that describes the number of layerscontributed to a model.
Under the definition of Deep Learning, the technique Aritficial
NeuralNetwork is one commonly applied approach. In the following
section, a detailed explanationof the neural network will be
presented.
5.1.2 Definition of Artificial Neural NetworkThe concept of
Artificial Neural Network(ANN) borrows the idea of how biology
neuronworks in real life.
Now the term Artificial Neural Network is defined as an
interconnected assembly of simpleelements nodes or units, whose
functionality is similar to biological neuron. The
processingability is stored in the weight of the inter unit, which
can be obtained from learning.[21]
The first concept of Artificial Neural Network starts with the
paper by McCulloch andPitts(1943)[29]. The paper proposes a
computational model that imitates the way neuron workwhen
performing complex computation. This is the world’s first
Artificial Neural Networkstructure. One of the simplest ANN
networks Perception is proposed by Frank Rosenblatt in1957[38]. It
is a variation of a another network Linear Threshold
Unit(LTU).Figure 5.2 is adescription of the LTU
19
-
W1 W2 W3
X1 X2 X3
Σ
StepFunction
Figure 5.2: LTU unit
First, LTU unit computes the weighted sum of its inputs, which
can be expressed as:(w1x1+w2x2+ · · ·+wnxn), we denote the weighted
sum as z, then we can rewrite the output ofweighted sum as: z = wT
x. Then LTU unity applies a step function on the weighted sum z,
theoutput is represented as: G(x) = Step(wT x). With the LTU unit
defined, we can introduce theterm Perceptron. A Perceptron is
consist of one layer of LTU units with each neuron connectedto
input layer.[17]. When we stack multiple Perceptrons together, we
create a Multi-LayerPerception (MLP).
In an MLP we have an input layer, which handles the input. Then
in the middle program-mers can choose to have one or multiple LTU
units. The middle layer is called a hidden layer.The number of
hidden layers is predetermined and can be adjusted according to the
applica-tions. The middle layer is connected to an output layer. If
ANN has more than 2 hidden layers,it is called Deep Neural Network
(DNN). Figure 5.3 is an example of Deep Neural Network.As the
figure shown, this network has 2 hidden layers with 5 and 4 LTU
units respectively.
20
-
Input Layer ∈ ℝ⁶ Hidden Layer ∈ ℝ⁵ Hidden Layer ∈ ℝ⁴ Output
Layer ∈ ℝ⁶
Figure 5.3: Neural Network
For the convenience of this thesis, in the following parts, the
term neural network will beequal to artificial neural network.
5.1.3 Differences between ANN and Statistical
MethodTraditionally when one wants to complete a task, a common
option is statistical method. Toexplain the philosophy of ANN,
assuming that we want to solve a practical problem: Identi-fying
handwritten numbers. To complete this task using statistical
techniques, a model has tobe proposed. This model is an appropriate
representation of the relationships between inputsand outputs.
Denote the this model as: y = f (x,β ). In this case x is
input(picture) and y is thedesired output(identified number). The
input of this task can be a large amount of data, andthe function f
is unknown. Consequently, it requires a large number of parameters
to give arelatively accurate model. This means that the model for
identifying handwritten number islarge and complex.
Neural Network, on the other hand, takes a different approach.
Compared to the statisticalapproach, Neural Network model has far
more parameters than statistical technique, thereforethere are many
combinations of parameters. In reality, a different combination of
paramet-ers can sometimes give the same output, making it hard to
interpret the parameters inside theneural network. To be more
precise, Neural Network works as a black-box method, and willnot
give any more interpretable result from its parameters[17].
However, in this example, wejust want to have a model that can
recognize handwritten numbers and do not care about the
re-lationships between pixels. Neural Network works well in this
kind of applications. Moreover,in the financial market there are
hundreds of variables, hence finding a robust statistical modelis
very hard. Neural Network is an optimal choice in this type of
situation.
21
-
5.2 Training Neural NetworkFirst, in a linear model, the
relation between input and out can be expressed as:
f (x) = wT x+b
where x: input, w: weight and b:biasNow this can be elaborate
as:
f (x) = (w1x1 +w2x2 + · · ·+wnxn +b)
Then the output of one layers can be fed as inputs to the
subsequent layer. However, theprevious equation can only represent
the linear relationship, therefore it is important to convertthe
output to non-linear relationship. To achieve that, the activation
function is applied. Nowthe output can be written as:
u = τ(w1x1 +w2x2 + · · ·+wnxn +b)
where u: output, τ: activation functionThen we need to
demonstrate how to estimate parameters in the model, the weights in
a
model are chosen to minimize the error of output and expected
output, in other word error.The error of a model can be expressed
in terms of Mean Squared Error:
E = ∑l
∑i(ŷli− yli)2
Also, other error can substitute MSE in the Neural Network. When
the structure and lossfunction of the designed network is known,
neural network problem becomes a non-linearoptimization problem.
This type of problem can be solved in many ways, in terms of
NeuralNetwork the choice will be Backpropagation (Gradient Descent)
algorithm. The weight wi isaltered according to the error, the
easiest way to reflect this idea can be written as:
∆wi = αdEdwi
(5.1)
Denote w(k) as the weights at iteration k then the weights at
iteration k+1 becomes:
w(k+1) = w(k)+∆w(k) (5.2)
Then if we want to minimize the error, the direction should be
the opposite of gradient,then we can combine Formula 5.1 and 5.2
as:
w(k+1) = w(k)−α dEdw(k)
(5.3)
This is the simplest form of Gradient Descent. One of the
drawbacks of gradient decedentis that calculation of the sum of all
gradient is required, therefore it is computationally
heavy.Stochastic Gradient Decedent is designed to mitigate the
workload of the gradient decedent
22
-
algorithm. Instead of calculating the sum of all gradients, it
randomly selects observations tocalculate the gradient.
This process is represented as:
E(w) =N
∑n=1
En(w)
Then we have:
w(k+1) = w(k)−α dEndw(k)
In the practical applications, both gradient descent and
stochastic gradient descent requireto compute the gradient. This
can be achieved using the chain rule, and consider the output ofthe
network as a function of weights.
5.2.1 HyperparametersHyperparameters is the parameters that have
been specified before the training process, unlikenormal parameters
in a neural network, hyperparameters cannot be derived or improved
fromthe normal training process. Therefore it is important to
select a good set of hyperparameters.In some case, the
hyperparameters optimization technique can be applied to improve
the ac-curacy. However, in this thesis due to the length and
complexity of this topic, this techniqueis not implemented.
Instead, the chosen hyperparameters are given then readers can
chooseto improve the network using their proposed method. To
explain these terminologies we needto start from gradient descent
technique, this technique as introduced before is an
iterativemethod. Follow the previous notation we called parameter α
learning rate in practical ap-plications. This parameter itself is
also a hyperparameter that need to be optimized in
someapplications.
Now if we can feed all the data we have to the neural network,
then there is no need forbatch size. However, in almost all cases,
this cannot be achieved because the data size is toolarge for the
computer to handle at once. Now to solve this problem, we have to
divide ourdata into smaller pieces and then update the weights
inside the neural network for a piece ofdata. In the end, we can
get the weights of the trained neural network. And the size of
thissmall batch is the batch size.
One epoch is described as the whole data set pass through to the
neural network one time.Then naturally for readers who are not
familiar with this topic, why do we need more than oneepoch to
neural network? To put it in another way, why do we require to feed
the same datamore than once? Gradient descent method is an
iterative method, then for a limited numberof datasets, one epoch
will not give us a satisfying result, in other word an underfitted
result.However, for a large number of epoch, the weights inside the
network will become too focusedon the training data, resulting in
an overfitted model.
With epoch defined, we can define iteration, which is a term
that describes how manybatches are required to finish one epoch.
This is not a hyperparameter since we already definethe batch
size.
23
-
5.3 Activation FunctionAs illustrated before neural network
requires an activation function to transform linear func-tion into
non-linear function. In the subsequent part, we will introduce some
commonly im-plemented activation functions[5].
5.3.1 Sigmoid functionSigmoid function as known as logistic
function is one common activation function implemen-ted in the
neural network. The formula for Sigmoid function is given as:
f (x) =1
1+ e−x
The figure of Sigmoid function is given in Figure 5.4
Figure 5.4: Graph of Sigmoid function
5.3.2 Hyperbolic Tangent functionHyperbolic Tangent
function(Tanh) is another function commonly used activation
function.It is zero centered function with limit between -1 and 1.
The output of Hyperbolic Tangentfunction can be calculated using
following formula:
f (x) =ex− e−x
ex + e−x
Figure 5.5 is the graph of Tanh function
24
-
Figure 5.5: Graph of Hyperbolic Tangent function
5.3.3 Rectified Linear Unit functionRectified Linear Unit
function (ReLU) is one commonly used activation functions in
deeplearning. The ReLU function can be represented as:
f (x) = max(0,x)
Because the right positive part of the function is linear, ReLU
function is easier to optimizeusing gradient-descent method. Figure
5.6 is the graph of this function.
Figure 5.6: Graph of Rectified Linear Unit function
25
-
5.3.4 Exponential Linear UnitExponential Linear Unit(ELU) is a
variation function of ReLU and can converge faster thanregular
version of activation function, the ELU function is formulated
as:
f (x) =
{z, z > 0α(ez−1), z≤ 0
The difference between ReLU and ELU is in the negative part of
the function. ELU smoothslowly until −α and ReLU smooths sharply.
Figure 5.7 is the graph of this function, whereα = 0.7
Figure 5.7: Graph of Exponential Linear Unit
5.3.5 Leaky ReLULeaky ReLU is a variant of ReLU the formula is
represented as:
f (x) =
{x, x > 0αx, x≤ 0
Figure 5.8 is the graph of Leaky ReLU, where α = 0.1
26
-
Figure 5.8: Graph of Leaky ReLU
5.4 Approaches to Prevent OverfittingOverfitting is a common
problem facing in the machine learning area, in neural network it
isnot uncommon to encounter this problem. Overfitting will
compromise the performance andaccuracy of the neural network in its
actual applications. Hence it is necessary to take measuresto
prevent overfitting. Therefore we introduce several measures to
prevent overfitting.
5.4.1 Increase Data SizeThe most obvious and easiest solution is
increasing the size of data. After all the causationof overfitting
is that there is not enough data to fully trained a complicated
neural network.However, in many circumstances, it is not possible
to acquire more data. In this case, there isno more stock return
data than the existing data on the market. Therefore other measures
needto be applied to prevent overfitting training data.
5.4.2 Reduce Size of Neural NetworkAnother approach is reducing
the size of the neural network. To elaborate on this concept,we
have to define the complexity(capacity) of a neural network. The
capacity of a neuralnetwork is defined as the number of trainable
parameters in a network.[23] For a complexneural network, there are
more parameters, which means that it has more capacity to learnand
even perfectly represent the training data. For example, let’s
assume that our trainingdata consists of 10000 numbers, a network
with 200000 trainable parameters will easily finda perfect fit for
training data set. In this case, we have an overfitted neural
network.
An overfitted model will not provide any significant prediction
for a new set of data. Be-cause the network itself is a perfect
representation of the training data rather than a general
de-scription for a set of data. Naturally, to solve this problem,
the number of trainable parametersin a network need to be reduced.
On the other hand, a network with 100 trainable parameters
27
-
will also not give any meaningful prediction of the data,
because the size of network does nothave the capability of
representing training data, in the actual application phase this
modelcannot give a meaningful result. Hence finding the right
number of trainable parameters in aneural network is the key to
training a well-performed neural network.
5.4.3 L1 RegularizationOne technique that can mitigate the
effect of overfitting is to regularize the size of the para-meters,
this can be achieved by introducing regularization term to the loss
function. Denote Las loss function, then this process can be
expressed as[46]:
L( f (w,b),y)+λR(w)
where R(w) is the regularization function.Then if the
regularization function is L1 norm, then it is called L1
regularization. L1
regularization is expressed as:
R(w) = ||w||
=L
∑k=1
∑i, j
∣∣∣W ki, j∣∣∣5.4.4 L2 RegularizationWhen we replace L1 norm with
L2 norm, then we can construct the L2 regularization. L2norm is
formulated as:
R(w) = ||w||2= ∑
m∑k
W 2mk
Then the adjusted loss function is minimized in the neural
network, which means that theloss function is minimized on the
condition that the elements in the weight will not becometoo large.
The parameters λ is the term that represents the amount of
regularization. Thisparameter should be chosen in a balanced way
because a large λ will result in an underfittingmodel.
5.4.5 DropoutAnother way to prevent the overfit is to apply
dropout technique. This approach is proposedby Srivastava in
2014[44]. First, denote l as the number of layers l ∈ L{1 · · ·L},
u(l) as theinput to the layer l. y(l) as the output from layer l.
w(l) and b(l) as weights and bias at layer l.Then a normal feed
forward network is represented as:
u(l+1)i = w(l+1)i y
l +b(l+1)i
y(l+1)i = τ(
u(l+1)i)
28
-
Where τ is the activation function.Now after implement dropout,
the network should look like:
r(l)j ∼ Bernoulli (P)
ỹ(l) = r(l) ∗y(l)
u(l+1)i = w(l+1)i ỹ
l +b(l+1)i
y(l+1)i = τ(
u(l+1)i)
Denote ∗ as a element product, and vector r(l) follows Bernoulli
Distribution, with probabilityof P.
5.5 Supervised Learning and Unsupervised LearningThe traditional
application of the neural network is to do the classification
application, a briefdescription of classification can be described
as a set of data and corresponding label or tag ofthe data set.
Then an artificial neural network is trained based on the input of
the data and thecorresponding labels. This kind of task can be
described as Supervised Learning. Because thelabels or examples are
provided by human. In other words, the algorithm tries to replicate
thehuman’s judgment or decision. To be more precise, supervised
learning requires us to providedata input as well as responses or
outcomes. The job of supervised learning is to predict theoutput
with given input.
However, in stock or derivative market, it is hard to implement
supervised learning. Thereason behind this argument is the
difficulties of finding robust examples or tags for neuralnetworks.
For example, if we want to train a neural network that can
distinguish good stocksand under-performed stocks. We need to give
neural network examples, but recognizing agood stock is not as easy
as it sounds. First, there is not a universally recognized standard
fora well-performed stock, also we cannot guarantee that our
examples are correct or will havea good return in the future. This
leads to the second difficulty of implementing supervisedlearning:
enforcing biases. Even if we manage to find tags or examples for a
supervisedneural network, the designed neural network may amplify
human’s judgments and potentiallygive false answers. After all, we
cannot differentiate luck and skill. To put it in another way,we
cannot tell whether investors earn money just because they are
lucky or they possess thenecessary expertise.
To solve this problem, a new type of learning: Unsupervised
Learning is applied. Un-supervised learning only needs input data
and tries to extract the features by itself withoutany guidance
from human instead of making prediction. Consequently, unsupervised
learningtend to find interesting features that cannot be found by
human, making it suitable for applic-ations in stock market and
portfolio optimization, because it may reveal some interesting
newfeatures. Apart from this advantage, it is easier to acquire
unlabelled stock data, since it takesextra effort to tag the
data.
Under the category of unsupervised learning, one particular
application is gaining moreattention in the research and
application area: Autoencoder. An autoencoder is a network that
29
-
has the same input and output data. A simple explanation of
autoencoder can be illustrate asFigure 5.9:
Data Encoder Compressed Data Decoder Data
Figure 5.9: Graphical Explanation of Autoencoder
First, the data will be fed into an encoder, then the output of
the encoder will have fewerdimension than the original data. Next,
a decoder will process the compressed data and returndata that have
the same dimension as the input data. An autoencoder will force the
data tocompressed into a lower dimension. By doing so, the
algorithm has to extract useful featurefrom the data to minimize
the difference between original data and the recovered data.
Apart from the feature extraction, a type of autoencoder:
generative autoencoder can alsorandomly generate data that is
similar to the original data. This type of generation leads toa new
direction of Monte-Carlo simulation, a more accurate random data
will give a betterestimate of the portfolio risk. Hence portfolio
optimization will select a lower risk portfolio.
5.6 Generative Adversarial NetworkThe Generative Adversarial
Network(GAN) is a new type of generative network that was
in-troduced by I.Goodfellow in 2014[19], and consists of two parts
generator and discriminator(critic in some literature). The core
idea of GAN is a competition between generator and dis-criminator.
Discriminator try to identify whether the sample is real or not.
The discriminatorin a generative adversarial network is a
supervised network and has binary output. Under thissetting, the
examples in this network is the real stock price data.
The job of discriminator is to classify data into two groups:
real and fake. At the sametime, the generator tries to generate
sample data that can be classified as real data. In a real-world
example, a generator can be described as a counterfeit artist who
tries to create a fakecopy of a word famous painting. Discriminator
is an art specialist that can distinguish fakepaintings and real
artwork. Under this metaphor, the training process can be
represented asthe counterfeit artist keep sending paintings to art
specialist to tell whether the painting isreal or not. When the
painting a classify as real artwork, the counterfeit artist can
produce aninfinite amount of paintings that have the same
characteristics of a real painting. This graphicalrepresentation of
this process is shown in Figure 5.10
30
-
Noise Generator
RealData Discriminator
GeneratedData
Data with samecharacteristics as
real data
Train
Identify
Output
Figure 5.10: Graphical Representation of GAN
Then to help readers further understand GAN, a detailed
explanation of GAN is intro-duced.
First, denote Discriminator and Generator as functions D and G,
then their parametersare represented as θD and θG. Under this
notation the optimization process of GAN can berepresented as[18]:
The Discriminator tries to minimize CD(θD,θG), while only
changingparameter θD. The Generator tries to minimize CG(θD,θG),
while changing parameter θG.This process in its essence is an
optimization problem, the objective of this problem is to findthe
minimum point. (sometimes it may find the local minimum)
As introduced before, the training process is achieved using
Stochastic Gradient Descentmethod and these. Now denote the input
noise to the Generator as xG and the observed vari-able(processed
stock returns in this case) as xD. For the reason that the neural
network al-gorithm cannot process the entire data at once, the data
is divided into several batches. Thenthe batches are fed into the
network to update the gradient. Note that the updating process
isconducted at the same time[18]. At each step, Discriminator find
the parameter θD to reducethe value of CD. Generator updates the
parameter θG to reduce the value of CG.
5.6.1 Cost functionAs previously explained, Neural Network is an
optimization problem, therefore it is importantto specify the cost
function. The discriminator in GAN can be constructed using a
normaldeep neural network with Binary Cross-Entropy loss function.
Because in this case, the tagfor the data are real and fake. This
type of problem can be described as a binary
classificationproblem.
31
-
The cost function of Discriminator is given as[19]:
CD(
θ (D),θ (G))=−1
2ExD∼pdata logD(xD)−
12ExG log(1−D(G(xG))) (5.4)
Reader who are familiar with Neural Network will recognize that
this is the standard formof the cost function of the binary
classification problem. Discriminator implements the coreidea of
binary classification, with tag marked the actual data and data
generated from Gener-ator.
In terms of the cost function of the Generator, the objective is
to minimize the Cross-Entropy between the output from the Generator
and the actual data. Cross-Entropy loss func-tion measures the
distance between the distribution of the empirical data and model
distribu-tion. Therefore by repeatably training the data, the data
will have a similar distribution. Thecost function of the Generator
is given as[18]:
CG =−12ExG logD(G(xG)) (5.5)
This cost function can be explained as Generator trying to
maximize the log possibilitythat the discriminator recognize
differences between actual data and the data generated
byGenerator.
5.7 Implement Neural Network in Portfolio OptimizationWith the
necessary introduction being made, now a brief introduction will be
presented forthe application of this type of Neural Network.
Investors can implement neural network tech-nique specifically
generative model to conduct portfolio optimization. As suggested
beforewe can bring the idea of Monte-Carlo simulation to our
result. Since the data generate similarcharacteristics with actual
stocks returns. By implementing this type of procedures,
NeuralNetworks processes the capability to simulate the stock
prices in different scenarios. Hence,the optimized portfolio will
have more robustness compared to traditional Markowitz
portfoliooptimization. In other word, the designed portfolio may
not be the one that gives more returnor smaller risk in one
particular case. However, since the Monte-Carlo simulation covers
themajority of the scenarios, the designed portfolio will give a
more secure position in all pos-sible outcomes compared to the
static optimized method with normal portfolio
optimizationmethod.
5.7.1 How to Optimize Portfolio from Output of Neural
NetworkSince the goal for portfolio optimization is to minimize the
risk of the constructed portfolio(in terms of VaR or CVaR), it is
necessary to determine the weight of each stock that give theleast
amount of risk in terms of value at risk or conditional value of
risk. This kind of idea hasbeen implemented before, Rockafellar
2000[37] gives a solution for portfolio optimization interms of
conditional value at risk. This article is a groundbreaking method,
some even callit Markowitz 2.0 to signify the importance of this
article. Despite that this method cannotbe implemented in this
application, because it assumes the distribution of
returns(Smooth
32
-
Multivariate Discrete Distribution to be exact). As demonstrated
before, we do not want tomake any distribution assumption for asset
returns. To solve this problem, the grid searchmethod is
implemented to find the optimal weights for the portfolio. This
solution is not theoptimal solution, but due to the time
constraint, we choose to implement this easy approach.Readers can
choose to do more research on this and propose a better
solution.
33
-
Chapter 6
Empirical Study
6.1 Data Software and Hardware
6.1.1 Data and Data SourceThe data source of this thesis is
Yahoo Finance, we use API(Application Programming Inter-face) to
download data from the server. And in terms of data, we select
stocks that are listed onthe Nasdaq Stockholm, which consists of
378 stocks1. The period of data is from 2009-02-09to 2019-02-08.
And the prices are quoted daily and consist of Open, Close, High,
Low andAdjusted Close price of each day. Also, the data contains
the daily trading volume of eachstock.
6.1.2 Software ChoiceIn the empirical studies part, programming
language Python with necessary scientific pack-ages like numpy[22]
scipy[48] and pandas[30] is selected. In the neural network
applicationpart of the empirical studies, the framework is Keras[8]
with backend of Tensorflow-GPU[1].This indicates that the neural
networks are run on the GPU of the computer. The operatingsystem
choice is Ubuntu (Ubuntu 18.042).
6.1.3 HardwareThroughout this thesis, all the codes are run on a
PC with an Intel I5-8400 processor, 8GBRAM. In terms of graphics
card, the computer has Nvidia GTX 1060 with 6GB of
graphicsmemory.
1http://www.nasdaqomxnordic.com/aktier/listed-companies/stockholm2https://www.ubuntu.com/desktop
34
-
6.2 Risk Measurement
6.2.1 VolatilityThe most common way to measure the risk of an
investment asset is volatility, in a math-ematical definition
volatility is defined as the standard deviation of return. The core
idea ofimplementing standard deviation as the risk representation
is that average is the expected out-come of a stock. Therefore a
stock that has more diversion from the mean has more risk. Ina
statistical term, this is measured using variance or standard
deviation. It is natural to thinkthat a stock that has more
differences from the mean will give more risk to the investors.
6.2.2 Value at RiskNonetheless using standard deviation to
measure the risk of investment assets has its disad-vantages.
Because it only represents the diversion from the mean disregard of
the direction ofthe return. For investors, a positive direction to
mean is preferred to investors. Subsequently,another representation
of risk can be defined. More specifically a risk representation
thatmeasures downward risk. One of the measurements is Value at
Risk(VaR). Denote X as valueof investment asset, and parameter 0%
< α < 100% then α VaR is defined as [12]:
VaRα(X) = min{c : P(X ≤ c)≥ α}
VaR can be interpreted as the minimum loss at (1-α) worst-case
scenario. By applyingVaR investors can better estimate the downturn
risk of their assets.
6.2.3 Conditional Value at RiskAnother form of risk
representation is Conditional Value at Risk, denote X as value of
invest-ment asset and parameter 0% < α < 100% then α VaR is
defined as[37]:
CVaRα(X) = E [X |X ≥ VaRα(X)]
Compared to VaR, CVaR has more distinctive advantages, which
makes it preferable to VaR.In Sarykalin 2008[41] the author talks
about the advantages in several aspects. Compare toVaR, CVaR has
following advantages:
1. CVaR has better mathematical properties, the risk represent
by CVaR is coherent.
2. CVaR deviation can represent risk, in other word a good
substitute of standard deviation.
3. A Risk management based on CVaR is more efficient compared to
the one based onVaR. To be more precise, CVaR can be optimized with
regular optimization method.
4. CVaR considers the effect of the case when loss exceeding a
certain level, on the otherhand, VaR does not consider the scenario
where the loss exceeds a certain level.
35
-
6.3 Monte Carlo SimulationA Monte Carlo simulation describes a
type of simulation that conduct random sampling re-peatedly, then
using statistical methods to analyze result[36]. To put it in a
detailed explana-tion, it is a simulation in another case scenario.
In other word, the results from Monte Carlosimulation could be the
daily return of financial assets if the actual results do not
happen. Byrepeatedly simulate the Potential Realities, a more
accurate estimation of financial assets canbe achieved.
To perform a Monte Carlo simulation, we should first identify a
statistical distribution.The most common choice is normal
distribution. First, we can draw random variables basedon the
statistical distribution that represents the daily return of stock.
Then we calculate thevalue of the stock at a given time at a
specific path.
6.3.1 Simulated Path of Monte Carlo SimulationFollowing the
introduction from the previous section, we choose to draw random
samplesbased on the Gaussian distribution. In this case, we choose
to implement daily log return asrandom variables. Because it is
easier to calculate the return in a given period.
To demonstrate path generate, we decide to give one of the paths
generated by MonteCarlo simulation based on the statistical
properties of ABB stock. Figure 6.1 is the examplepath generated by
the Monte Carlo simulation.
0 50 100 150 200 250Days
1.0
1.1
1.2
1.3
1.4
Val
ue
ABB price path
Figure 6.1: One path generate by Monte Carlo simulation
36
-
6.3.2 Calculate VaR using Monte Carlo MethodVaR can be
calculated using several approaches a detailed techniques
explanation can be foundin Duffie 1997 [12]. In this part, Monte
Carlo method is chosen to calculate VaR. Because it isa good
comparison with our proposed Monte Carlo method that incorporate
Deep Learning.
In this example, VaR of stocks is calculated in a one year
period. We apply statisticalmethod to generate random returns in a
one year period, and repeat several times and computethe terminal
value in each path. Then to calculate the 95% VaR of stocks, the 5%
quantileof yearly terminal value is computed. Figure 6.2 is VaR of
ABB using Monte Carlo methodwith 250000 generated paths. In this
thesis, VaR is represented in a different format, insteadof
representing the loss in terms of money, we choose to represent in
the format of the portion.For example, if the 95% VaR of a stock is
0.5, then in the worst 5% scenario, the minimumloss of this stock
is 1− 0.5 = 50%. For convenience reasons, in the following studies,
95%confidence level is chosen when computing VaR and CVaR.
Figure 6.2: VaR of ABB using Monte Carlo simulation
6.3.3 Calculate CVaR using Monte Carlo MethodThe first step in
calculating CVaR is to compute VaR, then select terminal values
that are lowerthan VaR and acquire the mean value of these values.
Figure 6.3 is CVaR of ABB using MonteCarlo method with 250000
generated paths.
37
-
Figure 6.3: CVaR of ABB using Monte Carlo simulation
Same as the definition, CVaR is lower than VaR. To help readers
understand the relationbetween VaR and CVaR, we choose to give the
comparison in one figure. Figure 6.4 is theVaR and CVaR of ABB
using Monte Carlo method with 250000 generated paths.
Figure 6.4: VaR and CVaR of ABB using Monte Carlo simulation
38
-
In Appendix B, we choose to give the VaR and CVaR of all the
stocks fed into the proposedneural network structure.
6.3.4 Markowitz GMV Portfolio SelectionGlobal Minimum
Variance(GMV) portfolio as introduced before is the most
traditional wayof portfolio optimization. The goal of this
portfolio optimization is to minimize the risk of theportfolio.
Since the objective of this thesis is to find a framework to reduce
the risk in termsof variance or CVaR, it is necessary to conduct a
portfolio optimization using the traditionalmethod.
A program that gives the weight of the global minimum variance
portfolio is executed.The weights are presented in the appendix
part.
As can be seen in the table, the weights of this portfolio is
overly concentrated, therefore itis not very diversified portfolio.
The value of GMV portfolio in 10 years is presented in
Figure6.5
Figure 6.5: Value of GMV portfolio in 10 years
To evaluate the performance of this constructed GMV portfolio,
we choose to present thetraditional average yearly return and
volatility data. The return and risk of this GMV stock ispresented
in Table 6.1
Yearly Return Yearly Volatility6.14% 6.44%
Table 6.1: Yearly return and volatility of GMV portfolio
39
-
This portfolio optimization gives a good insight of this
constructed portfolio, it has a rel-atively good return with low
volatility, however, this evaluation is not suitable for
comparingwith our proposed neural network-based minimum risk
portfolio. Therefore we choose topresent the VaR and CVaR of this
constructed portfolio to compare with our subsequent em-pirical
studies. Figure 6.6 is the graph that shows the comparison between
VaR and CVaR.VaR and CVaR are calculated using the same method as
the Monte Carlo simulation with250000 paths.
Figure 6.6: VaR and CVaR of GMV portfolio using Monte Carlo
simulation
6.4 Studies on GAN
6.4.1 Structure of GANIn this thesis, a GAN is constructed
according to the introduced structure of GAN, in terms ofthe
overall parameters of the network. The generator consists of two
intermediate layers withactivation function of Leaky ReLU function
with parameter α = 0.2. These two layers allbelong to fully
connected category. The output of second layer then will be passed
to a layerwith a flatten number of nodes, with activation function
of Hyperbolic Tangent function. Thenreshape it to the same shape as
the input. The discriminator network, it contains two
fullyconnected layers with leaky ReLU as activation function. The
parameter α = 0.2 Then theoutput will be connected to layer with
only 1 single node, with sigmoid activation function.This
discriminator network takes inputs from actual price data and
simulated data generatedfrom the generator network. In terms of the
actual input, the data is reshaped in the followingway:
The input to the net is a rank 3 tensor (three-dimension
tensor), the first axis representstime, and the second axis
represents features of stocks: for example daily returns or
closereturns. The third axis represents different stocks. The
output of the generator network has the
40
-
same formulation of the data structure. It is demonstrated in
Figure 6.7. This kind of structureallows us to add features to the
input flexibly.
Features
Time
Stocks
Figure 6.7: Data structure of input
The input to the generator has a different kind of input. The
nature of GAN’s generatorrequires noise as input, we choose to feed
Gaussian white noise to the generator. And also weneed to specify
the dimension of the noise, this is a hyperparameter that need to
be definedbefore the training process.
Then we program the GAN on Python using package Keras, the code
borrows some struc-tures and method from Open source code in Github
[27].
6.4.2 Key Point on Selecting BatchesSome readers may be familiar
with neural network’s application in picture identification
orgeneration, under these applications a batch can be randomly
selected within the data set(pictures)without any consideration of
the order. By doing so, the training dataset can be extended,
im-proving the performance of the training process. However, in
this application, this technique isnot implemented, because we do
not want any information leaking in the training process. Tofurther
explain this concept, we start with applications in visual
identification. For example,when the program tries to identify
handwritten number, the training data are the pictures
ofhandwritten numbers. The order of the picture is not important,
in other word we can feed thelast picture first and the result will
not make much difference.
While in this application, the order of data is very important.
Because stock data or returnsare time series, time is a key factor
in the structure of the input data. If the same approach isapplied,
the information from the future is leaked into the present. To put
it in a metaphor, ifprice data in the future is known today. Will
the investors have the same estimation of stock
41
-
risk? Hence in the empirical study part, the batches are
selected according to the time order.This will make sure that no
information in the future is leaked into the present.
6.4.3 Output from GANTo receive the simulated paths from the
GAN, it is required to train the designed GAN withreal data first.
Then to obtain the simulated price paths, noise is generated and
sent to thetrained generator to acquire the simulated paths.
In the price simulation studies, the simulation is the price
movement of selected stock ina one year period(252 trading days).
To be more precise, each stock starts at value 1 then thevalue of
each stock at the end of the one year period is presented. In this
one year period,the result is represented in a format of matrix.
with one axis represents time and another onerepresents stocks.
Then to simulate the Potential Realities in this one year
period, we repeatedly generatevalue of stocks several times. In
this case, we choose to do it 10000 times. This number isselected
based on the constraints of RAM in the working computer, the tensor
generate fromthe computer takes up 4GB memory in the RAM, for
readers who have more RAM storage intheir computer, this number can
be larger to acquire a more accurate estimation of risk.
First 5 of the simulated paths for one of the asset are given to
help readers understand thegenerated paths. The model is conducted
with parameters epoch: 80, batch size: 60, latentdimension: 200,
the generator’s layers have 128, 256, and 512 nodes
respectively.
0 50 100 150 200 250Days
0.8
0.9
1.0
1.1
Val
ue
Figure 6.8: ABB Price paths generated by GAN
To further examine the output of GAN, it is necessary to study
the statistical properties ofthe output. The most obvious way is to
examine the histogram of the generated Monte-Carlo
42
-
0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.250.0
0.5
1.0
1.5
2.0
2.5 NormalNetwork
Figure 6.9: Histogram comparison
simulation. First, we present the histogram of the terminal
value after one year with 10000samples. To make readers better
understand the differences between neural network and tra-ditional
Monte-Carlo simulation. The Figure 6.9 is the terminal value
histogram comparison.
From the figure, it is obvious that the neural network estimates
more downward risk com-pared to normal Monte Carlo simulation.
Hence from the results, investors can interfere thataccording to
neural network, ABB stock have more downward risk than the normal
distrib-uted based estimation. Now to further understand the
output, the daily return generated bythe neural network is
presented and compared with normally distributed data generated
bythe normal distribution. The comparison between simulated ABB
daily return is presented inFigure 6.10
Now as demonstrated before in a portfolio setting it is crucial
to estimate the covariancebetween different stocks. A correct
estimation of covariance between stocks is vital to con-structing a
good low risk portfolio. The heatmap of the correlation coefficient
matrix is presen-ted in Figure 6.11. The data is one of the
simulated stock returns data in one year generatedby the Neural
Network based Monte Carlo simulation.
43
-
0.04 0.02 0.00 0.02 0.04 0.060
5
10
15
20
25
30
35
NormalNetwork
Figure 6.10: Histogram comparison(daily return)
Figure 6.11: Heatmap of one generated data
To verify the accuracy of the results, the heatmap of
correlation coefficient matrix of real10 years stocks returns data
is given in Figure 6.12
44
-
Figure 6.12: Heatmap real stocks returns data
As we demonstrated before, the covariance between stocks can
vary a lot, so theoreticallythe simulated dataset should exhibit
similar characteristics. Therefore the rolling correlationbetween
stocks is investigated. Figure 6.13 is the graphical representation
between stock AAKand ABB. The result shows that the generated data
from the neural networks can simulatethe covariance between
different stocks. Compared to the traditional Monte Carlo methodand
generated from multivariate normal distribution, this method will
give a more reliableestimation of risk.
45
-
Figure 6.13: Rolling Correlation of Neural Network
Then compare the paths with normal paths and more importantly
computing the risk of theassets in terms of VaR and CVaR. With
output from GAN, we can run a similar VaR and CVaRcalculation. The
result from this designed GAN with previously specified
hyperparametersis presented in the Appendix C. From the output one
can observe that this application givegood VaR estimations of many
stocks, while some estimations are not realistic. Like a VaRabove 1
is not correct, since this means that in the worst-case scenarios
the stocks will stillhave a positive return. Naturally, we ask the
question: why there are wrong estimations inthe result? Since the
GAN estimations of SAAB, ICA and ATRE are very similar to
thetraditional estimations of VaR. As the results are generated
from the generator, with input ofrandom noise with 0 mean and
standard deviation of 1. Hence the GAN possess the ability
toreplicate the actual data.
The reason behind this problem may relate to GAN’s sensitivity
to hyperparameters, there-fore it is harder to train GAN compared
to normal neural network[2]. In the following parts, asmall
experiment is conducted to observe the effect of hyperparameters on
the result of GAN.Since 196 stocks is too large for this type of
research, in the subsequent studies the samplesize is shrunk to 20
stocks to better find the effect of different parameters.
6.4.4 The Effect of EpochIn this section, we run the program
with the same parameters other than epoch and the resultis
presented in Appendix D, here the VaRs of these 20 stocks are
presented. We start from 1epoch, undeniably 1 epoch will not give a
satisfactory result, the result is more like the initialstate of
the neural network. As we can see in the result, the output from
GAN is all wrong
46
-
with no practical meaning, this corresponds to the nature of the
generator since it takes noiseas input.
Then as epoch grows, the result start showing some accurate
estimations of stock’s risk,however, the inst