Forex Market Prediction Using Multi Discrete Hidden
Markov Models
José Pedro de Oliveira Alves
Thesis to obtain the Master of Science Degree in
Electrical and Computer Engineering
Supervisor: Prof. Doutor Rui Fuentecilla Maia Ferreira Neves
Prof. Doutor Nuno Cavaco Gomes Horta
Examination Committee
Chairperson: Prof. Doutor Horácio Cláudio Campos Neto
Supervisor: Prof. Doutor Rui Fuentecilla Maia Ferreira Neves
Member of the Committee: Prof. Doutor Joaquim Amaro Graça Pires Faia e Pina
Catalão Lopes
May 2015
i
Resumo
O trabalho desenvolvido ao longo desta dissertação de mestrado apresenta uma nova metodologia de
análise e previsão diária da direção do preço de fecho do mercado Forex. Com este objetivo, foi utilizado
o “Hidden Markov Model” (HMM) largamente utilizado em diferentes áreas para análise de séries
temporais (e.g., predição genética, criptoanalise, reconhecimento da fala). Neste trabalho é utilizada a
versão discreta do HMM (DHMM) devido à transformação dos dados em três valores discretos
representantes do aumento, decréscimo e manutenção do preço em relação ao dia anterior. A utilização
da versão discreta mostrou-se um desafio já que em grande parte das aplicações do HMM em
mercados financeiros é utilizada a versão contínua do mesmo. Os parâmetros do modelo são treinados
utilizando o algoritmo de Baum-Welch e a previsão é feita a partir de uma adaptação do algoritmo de
Viterbi.
Uma das grandes novidades deste trabalho é a utilização de três DHMMs em simultâneo na previsão
da direção do preço de fecho do mercado. Cada um dos três DHMMs é treinado com uma janela de
tamanho diferente, permitindo que cada um adquira uma sensibilidade diferente a flutuações no
comportamento do mercado. A adição de indicadores técnicos ao modelo para indicar modificações
na tendência do mercado possibilitou utilizar de forma específica as especificidades de cada um dos
DHMMs, tornando a adaptação a comportamentos diferentes do mercado muito mais eficiente. Com a
utilização de indicadores técnicos e dos três DHMM, foram desenvolvidos submodelos que
apresentaram diferentes características e resultados entre eles. Os melhores foram posteriormente
agregados criando assim um supermodelo capaz de se adaptar e responder às exigências do
mercado Forex.
Os testes realizados utilizaram dados Forex do par EUR/USD entre Janeiro de 2002 a Dezembro de
2013. O somatório de “price interest points” (pips) foi a principal métrica utilizada para a análise dos
resultados isto porque dá uma maior perceção do lucro a obter quando utilizado o modelo apresentado
nas datas analisadas. Os resultados obtidos mostram um excelente desempenho indicando uma
percentagem de acerto de 57% e atingindo um total positivo de 26349 pips ao fim dos 11 anos
analisados. Este resultado mostra a fiabilidade e qualidade do modelo desenvolvido mesmo quando
dentro do intervalo de teste consta um período de grande crise e instabilidade financeira como o
verificado em 2008.
Palavras-Chave: Deteção de padrões, MACD, mercados financeiros, Modelo Oculto de Markov, multi-
HMM, previsão, RSI, séries temporais.
ii
iii
Abstract
The work developed throughout this dissertation presents a new methodology for analysis and forecast
the direction of the Forex market daily closing price. For this purpose, it was used the "Hidden Markov
Model" (HMM), widely used in different areas for time series analysis (e.g., gene prediction,
cryptanalysis, speech recognition). This work used the discrete version of the HMM (DHMM) due to the
transformation of data into three discrete values representatives of the increase, price decrease and
maintenance of the value from the previous day. The use of discrete version proved to be a challenge
since in most applications of HMM in financial markets is used continuous version. The model
parameters are trained using the Baum-Welch algorithm and the prediction is made from the adaptation
of the Viterbi algorithm.
One of the great innovations of this work is the use of three DHMMs simultaneously for the prediction
of the direction of the market closing price. Each of these three DHMMs is trained with different window
sizes, permitting that each one acquire a different sensitivity to fluctuations in market behavior. The
addition of technical indicators to the model to indicate changes in market trends enabled to use
specifically the particular feature of each DHMMs, making adaptation to different market behaviors much
more rapid and efficient. With the use of technical indicators and the three DHMM were developed sub-
models that showed different characteristics and results between them. The best were added later
creating a supermodel able to adapt and respond to the demands of the Forex market.
Tests were conducted using data from the Forex EUR / USD pair between January 2002 and December
2013. The sum of "price interest points" (pips) was the primary metric used for the analysis of the results
because it gives a greater perception of profit. The results show an excellent performance indicating a
percentage of 57% of correct predictions and reaching a total of 26349 pips at the end of the 11 years.
This result shows the reliability and quality of the model developed even when within the test range is
included a period of great financial instability as seen in 2008.
Keywords: Financial market, forecasting, Forex, Hidden Markov Model, MACD, multi-HMM time series
analysis, pattern discovery, RSI, time series.
iv
v
Acknowledgements
This dissertation could not have been done without the help, cooperation and interest shown by the
supervisor Prof. Dr. Rui Fuentecilla Maia Ferreira Neves and co-supervisor Prof. Dr. Nuno Cavaco
Gomes Horta.
I want to thank to my family for all their support.
Finally I want to thank my wife for all the unconditional support, patience and encouragement passed
over the preparation of this dissertation.
vi
vii
Table of Contents
Resumo .....................................................................................................................................................i
Abstract.................................................................................................................................................... iii
Acknowledgements ..................................................................................................................................v
Table of Contents ................................................................................................................................... vii
List of Tables ........................................................................................................................................... ix
List of Figures .......................................................................................................................................... xi
List of Acronyms and Abbreviations ...................................................................................................... xiii
CHAPTER 1 INTRODUCTION ............................................................................................................. 1
1.1 Motivation ................................................................................................................................ 2
1.2 Work’s Purpose ....................................................................................................................... 2
1.3 Document Structure ................................................................................................................. 2
CHAPTER 2 RELATED WORK ............................................................................................................ 3
2.1 Foreign Exchange Market ....................................................................................................... 3
2.2 Market Analysis ....................................................................................................................... 3
2.2.1 Fundamental Analysis ..................................................................................................... 3
2.2.2 Technical Analysis ........................................................................................................... 4
2.3 Existing Solutions .................................................................................................................... 8
2.3.1 A New Sax-GA Methodology ........................................................................................... 9
2.3.2 A Dynamic Pattern Recognition Approach Based on Neural Network .......................... 11
2.3.3 Stock Market Forecasting Using Hidden Markov Model: A New Approach .................. 13
2.3.4 Stock Market Prediction Using Hidden Markov Models ................................................ 14
2.3.5 A fusion model of HMM, ANN and GA for stock market forecasting ............................. 15
2.4 Why a Discrete HMM approach ............................................................................................. 18
2.5 Conclusions ........................................................................................................................... 19
CHAPTER 3 MULTI DISCRETE HMM APPROACH .......................................................................... 21
3.1 Discrete HMM for Financial Time Series Analysis ................................................................ 21
3.1.1 Historic data transformation ........................................................................................... 21
3.1.2 DHMM Training: Baum-Welch Algorithm ...................................................................... 22
3.1.3 DHMM Testing: Viterbi Algorithm .................................................................................. 24
viii
3.1.4 Why Multi Hidden Markov Models ................................................................................. 26
3.2 Developed Methods ............................................................................................................... 27
3.2.1 Development of a Multi HMM strategy .......................................................................... 27
3.2.2 Fusion between different methods ................................................................................ 30
3.3 Multi DHMM Automation ........................................................................................................ 31
3.4 Conclusion ............................................................................................................................. 32
CHAPTER 4 RESULTS ...................................................................................................................... 33
4.1 HMM Analysis, Comparison and Decision ............................................................................ 33
4.1.1 Case Study I – Analysis with specific patterns .............................................................. 33
4.1.2 Case Study II – Analysis with real data ......................................................................... 34
4.1.3 Case Study II – Continuous and Discrete HMM comparison ........................................ 35
4.2 Construction and Improvement ............................................................................................. 36
4.2.1 Case Study I – Simple means ....................................................................................... 36
4.2.2 Case Study II – Usage of mixed training days .............................................................. 38
4.2.3 Case Study IV – MACD indicator .................................................................................. 41
4.2.4 Case Study V – Combining RSI and MACD with ADX .................................................. 42
4.2.5 Case Study VI – Combining MACD and RSI ................................................................. 42
4.2.6 Case Study VII – Combining MACD, RSI and mixed training days .............................. 43
4.3 Multi DHMM Automation ........................................................................................................ 45
4.4 Conclusions ........................................................................................................................... 48
CHAPTER 5 CONCLUSIONS AND FUTURE WORK ........................................................................ 49
5.1 Conclusions ........................................................................................................................... 49
5.2 Future Work ........................................................................................................................... 49
APPENDIX 1 - GRAPHS USED TO ASSESS THE HMM VIABILITY .................................................... 1
APPENDIX 2 – HIDDEN MARKOV MODEL TUTORIAL ........................................................................ 3
ix
List of Tables
Table 1 - List of macroeconomic indicators ............................................................................................. 4
Table 2 - Breakpoints vs. a division from [15] ....................................................................................... 10
Table 3 - Number of NN inputs and outputs and corresponding patterns from [12] ............................. 13
Table 4 - Comparison between models ................................................................................................. 18
Table 5 - Transform pseudo code ......................................................................................................... 22
Table 6 - Selected models from each phase ......................................................................................... 31
Table 7 - Results using a predefined set of patterns ............................................................................. 34
Table 8 - Results using different training window sizes ........................................................................ 35
Table 9 - DHMM and CHMM comparison using hourly, daily and weekly data .................................... 35
Table 10 - DHMM and CHMM comparison using data from 2002 to 2013 and 60 days window ......... 36
Table 11 - Total of the DHMM and Simple Mean results from 2002 to 2013 ........................................ 37
Table 12 - Results of different sets of DHMM training window sizes from 2002 to 2013 ...................... 38
Table 13 - Results from 2002 to 2013 of the DHMM training using 15, 30 and 90 days window size .. 39
Table 14 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 20 and 45 days steps ...... 40
Table 15 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 15 and 20 days steps ...... 40
Table 16 - Results from 2002 to 2013 from Multi DHMM using MACD and Divergence ...................... 41
Table 17 - Results from 2002 to 2013 from Multi DHMM using Divergence ......................................... 41
Table 18 - Results from 2002 to 2013 from Multi DHMM using MACD ................................................ 42
Table 19 - Results from 2002 to 2013 ................................................................................................... 42
Table 20 - Results from 2002 to 2013 from Multi DHMM combining MACD and RSI .......................... 43
Table 21 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM MACD ............................ 43
Table 22 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM RSI ................................. 44
Table 23 - Results from 2002 to 2013 combining HMM MACD and HMM RSI .................................... 44
Table 24 - Resulting percentages from the final model ......................................................................... 45
Table 25 - Resulting percentages from the final model without stop signal .......................................... 45
Table 26 - Results from 2002 to 2013 from the Final Model ................................................................. 46
Table 27 - Summary of the results from sub-models and Final Model ................................................. 47
x
xi
List of Figures
Figure 1- Example of a 50 days and 30 days moving averages ............................................................. 5
Figure 2 - RSI (below) and EUR/USD (above) Graphic .......................................................................... 6
Figure 3 - MACD (green line), Signal (blue line) and Histogram (red) Graph below with EUR/USD
graph above ............................................................................................................................................. 7
Figure 4 - ADX (black), +DI (green) and -DI (red) graph below with EUR/USD graph above ................ 8
Figure 5 - Normalization process of the stock quote time series from [15] ............................................. 9
Figure 6 - SAX representation from [15] ................................................................................................ 10
Figure 7 - Flexible length window design .............................................................................................. 12
Figure 8 - Flexible length window design .............................................................................................. 13
Figure 9 - Block diagram of the fusion model from [14] ........................................................................ 15
Figure 10 - Integrated ANN-HMM model ............................................................................................... 16
Figure 11 - GA optimization of the parameters of ANN HiMMI method ................................................ 17
Figure 122 - Transformation of the historic data ................................................................................... 21
Figure 133 - Flowchart of the HMM training .......................................................................................... 22
Figure 144 - Flowchart of the Baum-Welch algorithm for a discrete HMM ........................................... 24
Figure 155 – Optimal path using Viterbi Algorithm ................................................................................ 25
Figure 166 - Focus of different windows size ........................................................................................ 26
Figure 17 - Aggregation of both 0 and 1 signals into the 0 signal ......................................................... 27
Figure 18 - Flowchart of the three DHMM model integration ................................................................ 28
Figure 19 - Flowchart of a multi DHMM model with technical indicators .............................................. 29
Figure 20 - Technical indicators used in the Technical Indicator box stated in Figure 22 .................... 29
Figure 21 - Flowchart of the fusion model ............................................................................................. 30
Figure 22 - Different sub-models aggregation ....................................................................................... 30
Figure 23 - Flowchart of the final model ................................................................................................ 31
Figure 24 - Comparison of the 200 days SMA (left) and the 50 days SMA (right) in 2011 ................... 37
Figure 25 - Comparison of the 200 days SMA (left) and the 30 days SMA (right) in 2006 ................... 38
Figure 26 - Results from multi DHMM in 2010 ...................................................................................... 39
Figure 27 - Performance of each developed model from 2002 to 2013 ................................................ 45
Figure 28 - Final model resulting performance from 2002 to 2013 ....................................................... 46
xii
xiii
List of Acronyms and Abbreviations
Optimization and Computer Engineering Related
ADX – Average Directional Index
ANN – Artificial Neural Network
CHMM – Continuous Hidden Markov Model
DHMM – Discrete Hidden Markov Model
DI – Directional Indicator
EM – Expectation Maximization
EMA – Exponential Moving Average
GA – Genetic Algorithm
HMM – Hidden Markov Model
MACD – Moving Average Convergence Divergence
MAP – Maximum A Posteriori
NN – Same as ANN
PAA – Piecewise Aggregate Approximation
PIP – Perceptual Important Point
RSI – Relative Strength Index
SAX – Symbolic Aggregate Approximation
SMA – Simple Mean Average
Investment Related
CPI – Consumer Price Index
ECB – European Central Bank
EUR – Euro
FED - Federal Reserve System
Forex – Foreign Exchange Market
GDP – Gross Domestic Product
pip – Price interest point
USD – United States Dollar
xiv
1
CHAPTER 1 INTRODUCTION
Due to the enormous cash flow present on financial markets they are constantly attracting companies
seeking capital to expand their business and small investors who invest in order to achieve a return on
their investment. Amongst the different types of available financial markets the stock market and the
Forex (Foreign Exchange market) are the ones that attract more investment and curiosity due to the
volume of daily transactions that are performed [1]. Although these markets are very attractive for those
who decide to invest, they have important differences which led some investors to choose one over the
other [3].
The main difference between these two markets is in the daily average traded value that for the Forex
market, according to 2007 data, is 3.43 times the sum of all the bond markets of the world [1]. Another
difference lies in the possibility of using a higher level of leverage that despite significantly increase risks
also significantly increase earnings but since the Forex market is dependent on two different currencies
its analysis becomes more complex. Despite the greater complexity and risk of high leverages, Forex
was the financial market selected for analysis. Its choice is due to the high volume of money traded
daily. In recent years with the introduction of online trading and the evolution of technology increased
the ease with which one can invest and have access to the values of each market, thereby, also
increased the ease with which one can compute and have access to technical indicators to analyse
future market trends [1]. While it is important to analyse the behaviour of the market, the main goal of
an investor shall be the prediction of the future values and market trends. Many models have been
developed but due to the volatility and the non-linearity of the financial market has been extremely
difficult to develop effective models that can credibly predict the behaviour of the market. Amongst all
the research and models developed in recent years, a model that has shown good results is the hidden
Markov model. This machine learning model is already widely used in gene prediction [39], protein
folding [38], cryptanalysis [40], part-of-speech tagging [41] and speech recognition [42]. For the analysis
of financial time-series is typically used their continuous variant (CHMM) in order to analyse and predict
the exact value, or the closest possible approximation to the real value. This quest for the real value in
a continuous universe becomes quite difficult due to the infinite number of possibilities, a small deviation
in prediction, apart from providing the wrong value, can give wrong market trend information. For this
reason, the prediction of a certain trend can be presented as a credible alternative to the real value
prediction. With this objective, it was used simultaneously three DHMMs for the prediction of the
direction of the market closing price. Each of these three DHMMs is trained with different window sizes,
permitting that each one acquire a different sensitivity to fluctuations in market behavior. The efficient
adaptation to different market behaviors is achieved with the addition of technical indicators to the model
to indicate changes in market trends. With the use of technical indicators and the three DHMM were
developed sub-models that showed different characteristics and results between them. The best were
added later creating a supermodel able to adapt and respond to the demands of the Forex market.
2
1.1 Motivation
The possibility of taking in knowledge commonly used in telecommunication and speech recognition and
apply in a completely different field proved to be an extremely interesting challenge. I always had an
interest in financial markets but it was the choice of this dissertation that encouraged me to research
deeply and expand my knowledge on a subject with a huge importance today these days such as the
financial markets. Of course there was also the monetary attractive as it would be possible to use the
mechanisms developed in the course of this work to have some financial return by investing in Forex.
1.2 Work’s Purpose
This dissertation proposes a new Hidden Markov Model (HMM) approach to pattern discovery using
MACD and RSI technical indicators to assist in the HMM forecast. This approach will use three discrete
HMMs (DHMM) each of which will be trained with different windows size. Having been trained differently,
each HMM will have a different sensibility to direction variations in the financial time series. With the
assistance of the above technical indicators it will be possible to adapt each of the three different HMM
to the market behavior. Using the above approach will then developed a set of sub-models to be
integrated in the final solution.
The developed approach will be tested using EUR/USD Forex market historical data from January 2002
to December 2013 and is performed a sum of “price interest points” (pips) over the 11 years period.
Price interest point are used as the primary metric for the analysis of the results because it gives a
greater perception of profit difficult to obtain from other metric analysis.
1.3 Document Structure
The presented dissertation is structured as following:
Chapter 2: It is made a description of the financial market, specifically the Forex market and
thereafter the two most popular type of financial analysis are discussed and analyzed. After,
are analyzed some of the existing strategies that seek to address the same problem and is
described why the choice of the HMM as the main model.
Chapter 3: It is described the entire development process of the final model as its sub-models,
from the processing of data and model training, to the forecast of the next day market
direction.
Chapter 4: Are presented and analyzed all tests throughout the development of the sub
models and final model.
Chapter 5: Presents the conclusions of the model development and proposes possible
improvements to be explored in the future.
APPENDIX: Provides the list of graphics used to assess the HMM viability and provides a
HMM tutorial.
3
CHAPTER 2 RELATED WORK
2.1 Foreign Exchange Market
The foreign Exchange market is the market that more money moves daily and includes all world
currency. According to data from Bank for International Settlements [1] in 2010 the daily average foreign
exchange market turnover reached $4 trillion in April 2010, 20% higher than in 2007. More recently with
the emergence of the internet the Forex trading has ceased to belong to large corporations, large
financial institutions, central banks, hedge funds or extremely wealthy people and began to be
accessible to ordinary investors. This change, according the same data is one of the main factors that
enabled the growth of the Forex market of 20% from 2007 to 2010.
The daily fluctuations [3] typically represent less than 1% in the value of money making forex the less
volatile financial market in the world, this characteristic allows the use of higher leverages compared to
other financial markets. Although quite risky, the use of higher leverage has remained possible due to
its continuous operation (24 hours a day except weekends) and high liquidity as a result of a high trading
volume. Each currency [6] is intrinsically linked to his correspondent country or group of countries being
one of the most important determinants of a country’s relative level of economic health, this
characteristic makes that those influential factors responsible of the Exchange rates different from other
markets factors, such as those from the stock market. In the Forex market are included as main factors
the inflation, interest rates, current-account deficits, public debt, terms of trade, political stability and
economic performance and each one have a combination of factors in its analysis, this makes the Forex
analysis extremely complex.
2.2 Market Analysis
It is extremely important to analyze the market if one want to understand their behavior and predict its
movement to for this purpose are used a number of different exploring techniques approaches in its
analysis. The two most commonly used approaches are the Fundamental Analysis and Technical
Analysis. Although the purpose of these two techniques is the same, they are quite different. For the
Forex market a fundamental analysis examines macroeconomic indicators, asset markets and political
considerations. To analyse all this aspects of a currency pair the fundamental analysis takes relatively
long time compared to the technical analysis, on the other hand, the technical analysis uses past price
data and charts to find patterns that enables the prediction of future behaviour. A technical analysis
assumes that people will constantly repeat their behavior. This type of analysis has been become
extremely popular in recent years, as more and more people believe that all the necessary information
can be found in its charts.
2.2.1 Fundamental Analysis
To perform a solid fundamental analysis in the Forex market one must consider the trading currency
pair that, using macroeconomic indicators, asset markets and political considerations corresponding to
each currency and try to determine with these information the market behavior. When we analyze one
4
currency we are indirectly doing an analysis of its country or group of relevant countries, and therefore
a fundamental analysis in the Forex market will be different from an assets markets analysis. In Forex
is necessary to take into account macroeconomic indicators as shown in Table 1 [3]:
Macroeconomic
Indicator
Definition
GDP (Gross Domestic
Product)
The value of a country’s overall output of goods and services at market
prices, excluding net income from abroad
Interest Rates The annualized cost of credit or debt-capital computed as the
percentage ration of interest to the principal
Inflation Rates The percentage change in the value of the Wholesale Price Index (WPI)
on a year-on year basis
Unemployment Rates Number of unemployed persons divided by the number of people in the
labor force
Money Supply Total amount of money in circulation
Foreign Exchange
Reserves
Total of a country’s foreign currency deposit and bonds held by the
central bank and monetary authorities
Productivity Measurement of the amount of real GDP produced by an hour of labor
CPI (Consumer Price
Index)
Measurement of the changes in the prices paid by urban consumers for
a representative basket of goods and services
PPI (Producer Price
Index)
Measurement of the average change over time on the selling prices
received by domestic producers for their output
Table 1 - List of macroeconomic indicators
In addition to the macroeconomic indicators it is possible to detect a link between the value of a currency,
its asset markets and politics. Markets such as stock markets [5], gold and oil commodity prices often
align with some currencies trend. For instance, in the past, the behavior of the stock markets and the
USD were extremely similar and the rise and fall of the gold and oil price translate into movements in
the Forex markets. For instance, the fall of the oil price the Russia’s and Angola dependence of their oil
exportation has been appointed as one of the reasons why ruble is losing its value, at the other hand,
the Swiss political neutrality and the fact that a major part of its currency reserves have been backed by
gold made the Swiss franc a safe haven during periods of uncertainty.
2.2.2 Technical Analysis
Over the past two decades technical analysis has become popular and widely applied in financial
markets. Technical analysts [7][22] believe that the markets historical performance is an indication of
future performance and changes on price and volume already incorporates all the fundamental factors.
Thus, in order to monetarily benefit technical analysts have been developing a set of indicators with
predictable and analytical capabilities.
Due to its heuristic nature, [4][22] technical analysis can hardly be proven mathematically, consequently
different studies came across with different conclusions. Jensen and Benington [43] indicate that past
5
information cannot be used to predict future prices. Neftçi [17] argues that technical analysis cannot
beat the market if the underlying process is linear. Opposing to these conclusions Treynor and Ferguson
[18] argue that when the non-public information is considered, technical analysis can produce sizable
profits and Gunasekarage and Power [19], Kwon and Kish [20] and Chong and Ng [21] also report
significant excess returns to technical trading rules.
2.2.2.1 Technical Indicators
Mathematical metrics have been developed based on historic data/volume of an asset. Those metrics,
named technical indicators, aim to predict the value, direction or simply try to understand the behavior
of an asset. Some of the most used technical indicators are described next.
2.2.2.2 Moving Averages
Moving averages (MA) are one of the most used technical indicators, they are a trend-following based
on past prices. They are commonly used to smooth out short-term fluctuations and highlight longer-term
trends. The two most common used moving averages are the simple moving average (SMA) and the
exponential moving average (EMA), the first is a simple average over a defined number of time periods
while the second differs from the first by giving a higher weight to more recent prices. This type of
indicators used to determine levels of support and resistance and to identify trend directions. These
indicators also provide the basis for more complex indicators such as RSI and the MACD which will be
explained below
Figure 1- Example of a 50 days and 30 days moving averages
2.2.2.3 RSI – Relative Strength Index
Developed by Welles and Wilder in 1978, the RSI (Relative Strength Index) indicator is widely used
analyze the momentum of an asset and [16] is called an oscillator because it moves, or oscillates,
between 0 and 100 based on the price movement of the correspondent asset.
Welles and Wilder [9][11] consider that when RSI is above 70, an overbought condition is potentially
presented and consequently, as prices are too high the asset must be sold. Likewise, when RSI is below
30 an oversold condition is presented and as prices are considered to be low it presents as the best
opportunity to buy the asset.
6
To calculate the value of the RSI one must decide the period length. When Wilder developed the RSI
he used a 14-day period. As the RSI becomes less sensitive with longer time periods most books on
technical analysis when discussing RSI will typically use a 14-day look back period to perform its
calculations. The RSI calculation is as follow:
𝑅𝑆𝐼 = 100 −
100
1 + 𝑅𝑆, (1)
Where,
𝑅𝑆 =
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑎𝑖𝑛
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐿𝑜𝑠𝑠 (2)
The initial calculation for the average gain and loss are simple averages using the chosen time period.
Subsequent calculations are based on prior averages and the current gain or loss using an EMA
approach:
{
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑎𝑖𝑛 =
(𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑎𝑖𝑛) × (𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑 − 1) + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝐺𝑎𝑖𝑛
𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐿𝑜𝑠𝑠 =(𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐿𝑜𝑠𝑠) × (𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑 − 1) + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝐿𝑜𝑠𝑠
𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑
(3)
Figure 2 - RSI (below) and EUR/USD (above) Graphic
It is expected the occurrence of divergence with technical indicators. When using RSI, those
divergences can be interpreted as a signal. When the RSI don’t follow a price uptrend a negative
divergence is presented and the asset must be sold, otherwise, when the RSI don’t follow a price
downtrend, a positive divergence is presented and is an opportunity to acquire the asset.
2.2.2.4 MACD – Moving Averages Convergence and Divergence
The MACD indicator, created by Gerald Appel [35] in the late 1970’s, is a trend indicator which shows
relationship between prices and exponential moving averages (EMA). One of the advantages of MACD
is its potential to incorporate the aspects of both trend and momentum in a single indicator [8]. It is
calculated computing the difference between the 26 and 12 days EMA’s besides, a 9 day exponential
moving average of MACD, called “trigger” or “signal” is calculated to indicate long and short
opportunities. The MACD and the Signal calculation are as follow:
7
𝑀𝐴𝐶𝐷 = 𝐸𝑀𝐴[𝐶𝑙𝑜𝑠𝑒 𝑉𝑎𝑙𝑢𝑒𝑠, 12] − 𝐸𝑀𝐴[𝐶𝑙𝑜𝑠𝑒 𝑉𝑎𝑙𝑢𝑒𝑠, 26] (4)
𝑆𝑖𝑔𝑛𝑎𝑙 = 𝐸𝑀𝐴[𝑀𝐴𝐶𝐷 𝑉𝑎𝑙𝑢𝑒𝑠, 9] (5)
Figure 3 - MACD (green line), Signal (blue line) and Histogram (red) Graph below with EUR/USD graph above
To interpret and apply the MACD as a trend indicator, one need to examine the intersection between
the MACD and the Signal.
When the MACD cross above the Signal is an indication to buy the asset.
When the MACD cross below the Signal is an indication to sell the asset.
Divergence in MACD is also interpreted as a signal. When a negative divergence is present, i.e. when
MACD do not follow a price uptrend, it means there is a buy opportunity. On the other hand, when the
MACD do not follow a price downtrend, a positive divergence is present and there is a sell opportunity.
2.2.2.5 ADX – Average Directional movement Index
ADX is a trend strength indicator developed by J.Welles Wilder [9] in 1978. The ADX itself is an indicator
that measures trend strength rather than direction although to determine both direction and strength of
a trend the ADX needs to be complemented with other two indicators, the Plus Direction Indicator (+DI)
and Minus Directional Indicator (-DI) [21]. The ADX combines them and smooth the result with an
exponential moving average [20]. The calculation is as follow:
Calculation of the Up Move and Down Move using the low and high values of the price data:
𝑈𝑝 𝑀𝑜𝑣𝑒 = 𝑡𝑜𝑑𝑎𝑦′𝑠 ℎ𝑖𝑔ℎ − 𝑦𝑒𝑠𝑡𝑎𝑟𝑑𝑎𝑦′𝑠 ℎ𝑖𝑔ℎ (6)
𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 = 𝑦𝑒𝑠𝑡𝑒𝑟𝑑𝑎𝑦′𝑠 𝑙𝑜𝑤 − 𝑡𝑜𝑑𝑎𝑦′𝑠 𝑙𝑜𝑤 (7)
Calculation of the +DM and –DM :
Positive Directional Movement (+DM) :
𝐼𝑓 𝑈𝑝 𝑀𝑜𝑣𝑒 > 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 𝑎𝑛𝑑 𝑈𝑝 𝑀𝑜𝑣𝑒 > 0 → +𝐷𝑀 = 𝑈𝑝 𝑀𝑜𝑣𝑒 𝐸𝑙𝑠𝑒 → +𝐷𝑀 = 0
(8)
8
Negative Directional Movement (-DM):
𝐼𝑓 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 > 𝑈𝑝 𝑀𝑜𝑣𝑒 𝑎𝑛𝑑 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 > 0 → −𝐷𝑀 = 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 𝐸𝑙𝑠𝑒 → −𝐷𝑀 = 0
(9)
Calculation of the positive and negative Directional Indicator (DI) after selecting the time
period (Wilder used 14 days originally):
+𝐷𝐼 =
100 × 𝐸𝑀𝐴[+𝐷𝑀, 14]
average true range (10)
−𝐷𝐼 =
100 × 𝐸𝑀𝐴[−𝐷𝑀, 14]
average true range (11)
Where
𝑡𝑟𝑢𝑒 𝑟𝑎𝑛𝑔𝑒 = max[(ℎ𝑖𝑔ℎ − 𝑙𝑜𝑤), |ℎ𝑖𝑔ℎ − 𝑐𝑙𝑜𝑠𝑒𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠|, |𝑙𝑜𝑤 − 𝑐𝑙𝑜𝑠𝑒𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠|] (12)
Calculation of the ADX:
𝐴𝐷𝑋 =
100 × 𝐸𝑀𝐴[|(+𝐷𝐼) − (−𝐷𝐼),14]
(+𝐷𝐼) + (−𝐷𝐼) (13)
Figure 4 - ADX (black), +DI (green) and -DI (red) graph below with EUR/USD graph above
The value of ADX will oscillate between 0 and 100, if ADX crosses above 25, it is believed that prices
moving trend has enough strength. Conversely, when ADX crosses below 25 it is a warning to avoid
trend trading strategies.
2.3 Existing Solutions
Over the past few years have been developed different models and algorithms dedicated to the analysis
and forecasting of financial markets. Many of these approaches have proved ineffective or financially
unattractive, others in turn have raised interest of investors and researchers. In this chapter are revised
9
some of the most interesting methods presented these days with special focus to those using the HMM
in its process.
2.3.1 A New Sax-GA Methodology
This algorithm [15] combines a Symbolic Aggregate approXimation (SAX) technique, to describe the
financial time series, so that, relevant patterns can be efficiently identified, with an optimization kernel
based on genetic algorithms (GA) used to identify the most relevant patterns and generate investment
rules. The program will slide a window along the time series and converts it to a SAX sequence, then
the patterns in the chromosomes are compared with each sequence to apply the algorithm rules to get
a buy or sell decision.
The Symbolic Aggregate approXimation (SAX) is a recently new method which is based on Piecewise
Aggregate Approximation (PAA). First, the algorithm divides the time series in windows and each
window in segments to reduce each a set of point in each segment to their arithmetic mean and convert
the value to a symbol.
2.3.1.1 SAX Method
The aim of the SAX methodology is the transformation of large time series of dimension m into a symbol
representation of a smaller time series windows of size n << m. With this purpose, a normalization is
performed without affecting the original shape and scales the data to the same relative magnitude using
in (14):
𝑥𝑖′ =
𝑥𝑖−𝜇𝑥𝜎𝑥
(14)
Where 𝑥𝑖 are the point in window 𝑊𝑘, 𝜇𝑥 is the mean of the points in 𝑊𝑘 and 𝜎𝑥 is the standard deviation
of all the 𝑥𝑖.
Figure 5 - Normalization process of the stock quote time series from [15]
Even if after normalization all data windows are ready to be compared, the dimension of this data is
high. To address this problem is used a dimensionality reduction method based on PAA.
As one can see in (14), the time series windows are divided in w equal size segments and each segment
is represented by the arithmetic mean of the points in it. This equation is valid if 𝑛/𝑤 has an integer
10
result where each point contribute entirely to the frame where is inserted. If a non-integer relation is
presented, is used the method developed by Li Wey where the point in the frontier between segments
contribute with some part to each of the segments.
After the PAA transformation, a normal distribution curve is applied to the vertical axis and breakpoints
(𝛽) are calculated to produce equal areas under the curve, then the amplitude of the time series is
divided into 𝑎 intervals and to each of them is assigned a symbol.
Figure 6 - SAX representation from [15]
Table 2 - Breakpoints vs. a division from [15]
Each segment is evaluated to determinate to which interval belongs and for each PAA level a symbol
is assigned to represent that segment.
To identify patterns is used in (15) to evaluate the distance between sequences P and Q, and reveal
the degree of similitude between them.
𝑀𝐼𝑁𝐷𝐼𝑆𝑇(𝑃, 𝑄) = √𝑛
𝑤√∑(𝑑𝑖𝑠𝑡(𝑝𝑖 , 𝑞𝑖))
2𝑤
𝑖=1
(15)
Where 𝑑𝑖𝑠𝑡(. ) is a function defined as (16):
𝑑𝑖𝑠𝑡(𝑝𝑖 , 𝑞𝑗) = {
0𝛽𝑗−1 − 𝛽𝑖𝛽(𝑖−1) − 𝛽_𝑗
|𝑖 − 𝑗| ≤ 1𝑖 < 𝑗 − 1𝑖 > 𝑗 + 1
(16)
Where 𝛽′𝑠 are the breakpoints defined in Table 2:
11
2.3.1.2 Genetic Algorithm
After the SAX discretization of the time series, a Genetic Algorithm (GA) is used to produce patterns
and detect if they are present on the time series and since the SAX representation is used, the patterns
are sequence of symbols and the distance from the pattern to the time series must be calculated to
identify their presence.
The GA used in this algorithm uses two distance measures. The first one is presented in (15) and the
second measures the distance between symbols (17)
𝑑𝑖𝑠𝑡 = √∑(𝑇𝑖 − 𝑃𝑖)2
𝑤
𝑖=1
(17)
{
𝑤 → 𝑊𝑜𝑟𝑑 𝑠𝑖𝑧𝑒𝑇𝑖 → 𝑆𝑦𝑚𝑏𝑜𝑙 𝑖 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠
𝑃𝑖 → 𝑆𝑦𝑚𝑏𝑜𝑙 𝑖 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 (18)
And the chromosome presented in figure x is used in the population and it is divided in two major parts.
The first one labeled as “Parameters for rule decision” includes the two distances that permits to
evaluate if the pattern is present in order to buy or if its effects is no longer present and it is time to sell
(distance to buy and distance to sell), another gene defines the number of days should the algorithm
sell if it is in a buying position (days to sell). The last gene identifies which of the measures should be
used to evaluate the distances (measure type). The last part labeled as “Pattern Symbols” includes the
symbols that constitute the pattern sequence.
The selection process applies a random selection to the best half of the population and then uses a two
point crossover to generate the offspring. The fitness function is the total earnings produced by the
investment strategy defined by the pattern and application rule associated with it.
A buy signal is applied if the distance is less than the “Distance to buy” defined in the chromosome. The
application sells the stock if the distance is bigger than the “Distance to Sell” gene or if the stock was
bought more days ago than the specified by the “Days to Sell” gene.
Although this methodology shows an approach with great potential, proves incapable to identify the
same patterns at different time intervals. Either a pattern is set aside in the pattern detection process or
if the pattern develops over a long period of time, it will be segmented and analyzed as a different set
of patterns.
2.3.2 A Dynamic Pattern Recognition Approach Based on Neural Network
In this approach [12] vertexes are extracted in stock-series using a slide window with flexible length and
a dynamic perceptual important point (PIP) locating method to avoid the computation expense problem.
For pattern recognition and window length identification this methodology has chosen to use a neural
network (ANN) approach.
12
Figure 7 - Flexible length window design
2.3.2.1 Sliding window method and dynamic feature extraction
This approach use a flexible slide window length considering that the length of a pattern differs greatly
and a fixed window length is a factor difficult to be decided because it will directly affect the
performance and the efficiency of extraction.
Due to its huge computation cost and the reduced efficiency when using the PIP locating method this
methodology proposed a dynamic algorithm based on PIP location. First, a binary tree structure is used
to implement the PIP locating method and organize the located PIP point and then the dynamic PIP
location is used to avoid the computation expense problem.
There are three differences between the PIP method and the dynamic PIP method presented:
1. Density variable: Used with the purpose of avoiding the dense distribution of vertexes and
normally larger than 6, this value is set as 5.
2. Shortening sequence lengths: The design of shortening sequence lengths is based on that,
when the slide window move forward, the tree structure changes, meaning that if a vertex get a
higher position in the prior tree Q[1:m], the height will not change greatly in the next tree
Q[2:m+1]. Therefore, assuming the need of search the root node of the tree Q[2:m+1], a
traverse to the first 3 levels of the nodes in the tree Q[1:m] will guarantee that will be possible
to achieve the goal.
3. Vertex validation: The vertex validation aims to remove the overlapping computation. This
validation tries to identify in prior trees the start and ending vertex of a sequence and use the
existing sequence to construct the new tree instead computing the same sequence again.
2.3.2.2 Neural Network (NN) design
For pattern recognition, three-layer NNs are designed to detect the interior relationships between the
pattern vertexes and for the 11 classic patterns was designed 4 Neural Networks, each NN have a
specific number of inputs and outputs and each output node corresponds to a predefined pattern, as
described in Table 3:
13
Neural Network NN1 NN2 NN3 NN4
Num of Inputs 3 4 5 6
Num of Outputs 2 7 9 9
Correspondig
Patterns
two tops
/
bottoms
Two tops /
bottoms, triple
bottom, diamond
top / bottom,
head and
shoulder top
/bottom
Triple bottom, diamond
top / bottom, head and
shoulder top / bottom,
symetric triangle upside
/ downside, bump and
run upside /downside
Triple bottom, diamond
top / bottom, head and
shoulder top / bottom,
symetric triangle upside
/ downside, bump and
run upside /downside
Table 3 - Number of NN inputs and outputs and corresponding patterns from [12]
Based on the description above, first the ANN is trained, and the dynamic PIP method is used with the
sliding window. The result is checked and if the output binary tree satisfies the ANN input condition, the
algorithm has a match and the output of the neural network is checked. The algorithm get the pattern or
a vertex of the pattern, if it is a vertex the algorithm will decide if whether have to enlarge the window or
not. The process is described in Figure 8:
Figure 8 - Flexible length window design
Despite the dynamic approach to show better than the same static approach, the use of 11 patterns
limits the detection of patterns that do not meet the 11 used patterns. They also found that some patterns
were not suitable to use with this method because the number of segments cannot be fixed.
2.3.3 Stock Market Forecasting Using Hidden Markov Model: A New Approach
In this approach [26] a continuous Hidden Markov Model is trained with past stock datasets to predict
the next day’s closing price. The model receives as input a vector of four observations corresponding to
the opening, high, low and closing prices from those past stock datasets.
14
To work with the CHMM and given the model 𝜆 = (𝐴, 𝐵, 𝜋) and the observation sequence 𝑂 =
𝑂1, 𝑂2, … , 𝑂𝑡 was used the forward-backward algorithm to compute 𝑃(𝑂|𝜆), Viterby algorithm to choose
a state sequence that best explains the observations and Baum-Welch algorithm to train the HMM. Each
continuous algorithm stated above is explained in detail in APPENDIX 2.
To forecast the next day’s closing price, the model computes a likelihood value “𝜗” for the day and locate
from the past data set those instances that would produce the same “𝜗” or nearest to the “𝜗” likelihood
value. Assuming that the next day’s stock price should follow the same pattern, from the located past
day(s) is calculated the difference of that day’s closing price and next to that day’s closing price and the
next day’s stock closing price is obtained adding the above difference to the current day’s closing price.
The results showed that this approach can be very similar results to approaches that have been
extensively used in the detection and prediction of patterns in financial markets. Furthermore, the HMM
have a strong mathematical structure and theoretical basis for use in a wide range of applications.
2.3.4 Stock Market Prediction Using Hidden Markov Models
In this approach [13] the daily opening, closing, high and low indices of the stock market are model as
continuous observations from underlying hidden states and the training of the HMM from given
sequences is done using the Baum-Welch algorithm.
Both continuous Hidden Markov Model (CHMM) and Baum-Welch algorithm are extensively explained
in APPENDIX 2.
In this model the observations are represented as a 3-dimensinal vector,
𝑂𝑡 = (
𝑐𝑙𝑜𝑠𝑒 − 𝑜𝑝𝑒𝑛
𝑜𝑝𝑒𝑛,ℎ𝑖𝑔ℎ − 𝑜𝑝𝑒𝑛
𝑜𝑝𝑒𝑛,𝑜𝑝𝑒𝑛 − 𝑙𝑜𝑤
𝑜𝑝𝑒𝑛) = (𝑓𝑟𝑎𝑐𝐶ℎ𝑎𝑛𝑔𝑒, 𝑓𝑟𝑎𝑐𝐻𝑖𝑔ℎ, 𝑓𝑟𝑎𝑐𝐿𝑜𝑤) (19)
After the training of the model is undertaken the test using an approximate Maximum a Posteriori (MAP)
approach. The objective of this approach is the computation of the close value for the (𝑑 + 1)𝑠𝑡 day,
given the HMM model 𝜆 and the stock values for 𝑑 days (𝑂1, 𝑂2, … 𝑂𝑑) and the open value for the
(𝑑 + 1)𝑠𝑡 day.
Let Ô𝑑+1 be the MAP estimate of the observation and vector on the (𝑑 + 1)𝑠𝑡 day and the observation
vector 𝑂𝑑+1 varied over all possible, this approach can be described as follow:
Ô𝑑+1 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑂𝑑+1𝑃(𝑂𝑑+1|𝑂1 , 𝑂2, … 𝑂𝑑 , 𝜆) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑂𝑑+1 (
𝑃(𝑂1, 𝑂2, … , 𝑂𝑑 , 𝑂𝑑+1|𝜆)
𝑃(𝑂1 , 𝑂2, … , 𝑂𝑑 , 𝜆)) (20)
Since the denominator is constant it is possible to simplify the MAP estimate as,
Ô𝑑+1 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑂𝑑+1𝑃(𝑂1, 𝑂2, … , 𝑂𝑑 , 𝑂𝑑+1|𝜆) (21)
15
The previous joint probability can be computed using the forward-backward algorithm, this algorithm is
also extensively explained in APPENDIX 2.
From the description above it is possible to perceive the usefulness of using an HMM approach. In this
particular case, the use of MAP must be taken into account that not always the best estimate value can
become the best solution because even closer values may give a wrong indication of the market
direction, that is, a price rise or fall, which will result in a wrong buy or sell decision.
2.3.5 A fusion model of HMM, ANN and GA for stock market forecasting
Next a fusion model [14] is implemented using a combination of a continuous Hidden Markov Model
(CHMM), Genetic Algorithm (GA) and Artificial Neural Networks (ANN) to forecast financial market
behaviors. The GA and ANN are used for the HMM optimization and a weighted average of the price
differences of similar patterns is obtained to prepare a forecast for the required next day. The diagram
shown in Figure 9 summarizes the links between the various processes and algorithms mentioned
above.
Figure 9 - Block diagram of the fusion model from [14]
2.3.5.1 ANN combination with HMM
The ANN is used to transform the daily price of the shares in groups of independent values that become
input values for the HMM. The process can be described in the following steps:
1. Create a random ANN structure having "n" number of inputs and "n" number of outputs where
n is the number of predictors
2. Initialize the weights of the ANN randomly
3. Current observations vectors are fed into the ANN as inputs
4. The 𝑌𝑡 output vector produced by the ANN at time 𝑡 is fed into the HMM as input observation
vector
16
Figure 10 - Integrated ANN-HMM model
2.3.5.2 GA combination with ANN and HMM
After the HMM be powered by the output of the ANN is then used the GA to find the optimal parameters
(1. Observation emission probability matrix, 2. State transition probability matrix and 3. Initial probability
matrix) for the HMM given the transformed sequences to minimize the Mean Absolute Percentage Error
(MAPE) of the ANN-HMM forecast method. The process can be described in the following steps:
1. Choose randomly the initial parameter values
2. Execute the GA to obtain the observation emission probability matrix initial values keeping the
initial values assigned to the remaining parameters in step 1.
3. Execute the GA to obtain the state transition probability matrix initial values keeping the value
computed in step 2 and the initial value assigned to the remaining parameter in step 1.
4. Execute the GA to obtain the initial probability matrix initial values keeping the values
computed in step 2 and 3.
5. If the resulting fitness value converges go to step 2.
In this model chromosome size is equal to the size of the parameters to be optimized. For instance, for
a one dimensional Gaussian probability distribution as emission probability distribution, the size of the
chromosome will be 16 (considering a 4 state HMM). Again for the initial probability matrix the
chromosome will be 4.
17
Figure 11 - GA optimization of the parameters of ANN HiMMI method
2.3.5.3 Next day forecast
To forecast the next day value a range of data vector are identified for having likelihood values closer
to that of current data vector. Next, the price difference between the value of each identified vector at
time t and the value of the vector of the day ahead t+1 is computed. Then, the equation x is used to
give more weight to the most recent vector price differences. The result is added to the current price of
the day in order to obtain a prediction of the next day price.
𝑤𝑑𝑖 =
∑ 𝑤𝑚∗ 𝑑𝑖𝑓𝑓𝑚𝑚
∑ 𝑤𝑚𝑚 (22)
Where,
𝑖 – Index number of current day
𝑚 – Index number of matched day
𝑤𝑚 - Weight assigned to the day ‘m’ using the equation 𝑤𝑚 = exp (1
𝑖−𝑚 + 1)
𝑤𝑑𝑖 - Weighted average of price difference for current day 'i'
𝑑𝑖𝑓𝑓𝑚 - Price difference between day ‘m’ and ‘m+1’
This approach offers the ANN and GA as an alternative to the training of the HMM. One of the differences
presented focuses on the importance of the start of the HMM parameters, such concern does not exist
when using the Baum-Welch Algorithm gaining in performance and simplicity. Also, this process
developed from the HMM cannot be trained if the training sequence does not fit properly on the selected
18
parameters. In addition, the proposed model choses the number of state as the number of attributes in
the observation vectors and this may not be suitable for some instances.
2.4 Why a Discrete HMM approach
The Hidden Markov Model is presented as a statistical model that has been widely used in different
areas for the analysis of time series, such as biological sequence analysis, protein sequence alignment
[ref], genetic prediction [ref], cryptanalysis [ref], part-of-speech tagging and speech recognition [ref] [ref].
Although there are different models for detecting and predicting patterns in financial time series, these
turn out to have limitations that the HMM is committed to overcome. Unlike much of the existing models,
this model has the particularity of not requiring a prior list of well-defined patterns in order to be able to
proceed with the analysis. This model takes care of automatically identifying temporal patterns present
in the sequences analyzed, this feature leads to an identification and characterization of a greater
number of patterns. Unlike SAX, identification of patterns is not dependent on their size where either a
pattern is set aside in the pattern detection process, or if the pattern develops over a long period of time,
it will be segmented and analyzed as a different set of patterns.
Model Advantages Disadvantages
Sax-GA
The conversion of data into symbols sequences enables simple implementation and easy identification of patterns from strings
Either a pattern is set aside in the pattern detection process or if the pattern develops over a long period of time, it will be segmented and analyzed as a different set of patterns.
PIP and ANN
Created for the purpose of handling financial data. Great data preservation even when using a high level dimensional reduction
Limited number of patterns. Some patterns are not suitable due to the number of segments cannot be fixed.
Continuous HMM and Likelihood Values
Easy to implement, using the HMM gives a mathematical dimension that other analytical models do not have.
The use of likelihood values must be taken into account that not always the best estimate value can become the best solution because even closer values may give a wrong indication of the market direction.
Continuous HMM and MAP
Easy to implement, using the HMM gives a mathematical dimension that other analytical models do not have.
The use of MAP must be taken into account that not always the best estimate value can become the best solution because even closer values may give a wrong indication of the market direction.
HMM, ANN and GA
Using the HMM gives a mathematical dimension that other analytical models do not have. Uses two models to adapt in an optimal way the model to the analyzed financial time series.
This process developed from the HMM cannot be trained if the training sequence does not fit properly on the selected parameters. In addition, the proposed model choses the number of state as the number of attributes in the observation vectors and this may not be suitable for some instances.
Table 4 - Comparison between models
19
By using the HMM one of the most used approaches is the use of its continuous version to predict the
daily closing value. This quest for the real value in a continuous universe becomes quite difficult due to
the infinite number of possibilities, a small deviation in prediction, apart from providing the wrong
value, can give wrong market trend information. Whichever method is used the main objective shall be
to predict the market price trend. Thus a discrete approach to the problem may be advantageous that
a continuous approach does not present the reasons given above.
2.5 Conclusions
Many models have been developed to analyze and predict the behavior of financial market. Few are
those who actually end up convincing and present as a practical solution used by investors. In this
chapter were analyzed five of these solutions. The first two represent two models that have come to
arouse curiosity of investors and scholars. The following three models represent three strategies using
the main model adopted in the development of this thesis.
The conclusions drawn from the analysis of each model are presented in Table 4. Finally, it is explained
the use of the DHMM and its advantages over other models. This model will therefore be applied to the
analysis and prediction of the Forex market behavior, specifically the analysis of the EUR / USD pair.
20
21
CHAPTER 3 MULTI DISCRETE HMM APPROACH
In this chapter is explained in detail the adjustments made in the development of Multi DHMM. First, it
is described how the DHMM is used to analyze and forecast the direction of the price of Forex EUR /
USD from the given historical data. Then it is explained how the chosen technical indicators interact with
DHMM and the expected benefits of that interaction. After the different developed models are presented
and indicated the choice of models to enter the Multi DHMM. In section 4.4 is explained in detail the
implementation process of the developed method. Finally conclusions will be drawn from all the
adaptation and development performed.
3.1 Discrete HMM for Financial Time Series Analysis
To use the discrete HMM is necessary to adapt the data to the model, for it to be trained in order to
model the assigned time series. The model forecast from the use of the Viterbi algorithm also assumes
a small change to it. This changes and modifications are presented in this section along with the
justification of the need to use different DHMM with different window sizes.
3.1.1 Historic data transformation
The format of the historical data for each day are composed of opening, high, low and closing values
indicating the main points of interest in the market that day. Since the main objective of the model is the
direction of the market forecast is possible to make a discretization of values for this direction is
calculated by the difference of the closing level of the day 𝑁 with the day 𝑁 − 1. This transformation will
be made for all a set of historical data in order to adjust the values to DHMM. The transformation is
described in Figure 17.
Figure 122 - Transformation of the historic data
The selection of the values 0, 1 and 2 to represent drop, maintenance and rising signals from closing
values was due to the use of matrices in the HMM. Since the emission probability matrix 𝐵𝑠(𝑂𝑡)
comprises all possible observations, it has become simpler to make a direct association between each
observation and the line of the 𝐵𝑠(𝑂𝑡) matrix. Since computationally an array starts at index 0 and the
22
process would need three indices, the choice of values came trivially. Thus, processing is made simple
and is represented in Table 4.
𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 (𝑵 − 𝟏,𝑵) IF N – 1 > 0:
return 0; IF N – 1 < 0:
return 2; ELSE:
return 1; Table 5 - Transform pseudo code
3.1.2 DHMM Training: Baum-Welch Algorithm
How can we estimate the model parameters given an observation set?
In order to answer the question is important to use an algorithm capable of finding the unknown
parameters 𝜆 = {𝜋, 𝐴, 𝐵} of a HMM. Although exists some algorithms with capacity to address the
question, due to the type of data on which it will be used it is necessary that the algorithm does not need
any model initialization. This algorithm is called the Baum-Welch and use the EM algorithm to find the
maximum likelihood estimate of 𝜆 = {𝜋, 𝐴, 𝐵} given the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 and uses the
production probability 𝑃(𝑂|𝜆) as the optimization criterion.
The Viterbi training algorithm is another algorithm that can be presented as an answer to the parameter
estimation question, however, The Viterbi training requires some reasonable initialization and makes a
limited use of the training data and is less robust, since it only uses observations inside the segments
corresponding to a given HMM state to re-estimate the parameters of that state. [29][23][27]
For the training of DHMM it is crucial to have the transformed data as signals of declining, raising or
maintaining (0,1,2) of the market value and define the number of states of the HMM. The decision of the
number of states appears as an unknown variable in the process but in the HMM process is often taken
as a rule, though not required, having the number of states equal to the number of observations, that is,
have a possible strategy (state) for each existing observation. This rule will be adopted in the
development of the model. Decided the number of states, the use of the Baum-Welch algorithm will be
straight forward, as described in section 3.1.2.1.
Figure 133 - Flowchart of the HMM training
The last question that remains to be defined is the initialization of matrices that characterize the HMM.
When using the Baum-Welch Algorithm it is irrelevant the initial values of the HMM parameters so
random values are assigned. These values must be in agreement with the following rules:
23
0 ≤ 𝑥 ≤ 1, where 𝑥 is the random value assign
∑ 𝜋𝑖𝑖 = 1
∑ 𝐴𝑖𝑗𝑖 = 1
∑ 𝐵𝑖(𝑂𝑘)𝑖 = 1
Where:
𝜋 – Initial matrix,
𝐴 – Transition matrix,
𝐵 – Emission matrix,
𝑖 – State,
𝑂𝑘 – Observation k;
3.1.2.1 Baum-Welch Algorithm
The Baum-Welch algorithm starts using the Forward-Backward algorithm. Before starting with the
iterative process the transition matrix A, emission matrix B and initial matrix 𝜋 from 𝜆 are set with random
initial conditions.
The Baum-Welch algorithm is described as follow [23][29]:
Forward Procedure:
Having 𝛼1(𝑖) = 𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of ending in state 𝑠𝑖 given the observation
sequence 𝑂1, 𝑂2, … , 𝑂𝑡 is recursively computed,
1. 𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1)
2. 𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑂𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑁𝑖=1
Backward Procedure:
Having 𝛽𝑡(𝑖) = 𝑃(𝑂𝑡+1, 𝑂𝑡+2, … , 𝑂𝑡|𝑠𝑡 = 𝑖, 𝜆), the probability of the ending sequence 𝑂𝑡+1, 𝑂𝑡+2, 𝑂𝑇 given
the model 𝜆 and the 𝑠𝑖 at time t is recursively computed,
1. 𝛽𝑇(𝑖) = 1
2. 𝛽𝑡(𝑖) = ∑ 𝛽𝑡+1(𝑗)𝑁𝑗=1 𝑎𝑖𝑗𝑏𝑗(𝑂𝑡+1)
Optimization:
It is now possible to compute the temporary variables:
𝛾𝑡(𝑖) = 𝑃(𝑠𝑡 = 𝑖|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛽𝑡(𝑖)
∑ 𝛼𝑡(𝑗)𝛽𝑡(𝑗)𝑁𝑗=1
,
(23)
24
This quantity 𝛾𝑡(𝑖) represents the probability of being in state 𝑠𝑖 and time t having the observation set
𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.
𝜉𝑡(𝑖, 𝑗) = 𝑃(𝑠𝑡 = 𝑖, 𝑠𝑡+1 = 𝑗|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛼𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)
∑ ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)𝑁𝑗=1
𝑁𝑖=1
,
(24)
This quantity 𝜉𝑡(𝑖, 𝑗) represents the probability of being in state i and j in times t and t+1 respectively,
having the observation set 𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.
With the computation of this two quantities it is now possible to update the model determining the
expected quantities �̂� = {�̂�, �̂�, �̂� }
Update of the model �̂�:
𝜋�̂� = 𝛾1(𝑖)
𝑎𝑖�̂� =∑ 𝜉𝑡(𝑖,𝑗)𝑇−1𝑡=1
∑ 𝛾𝑡(𝑖)𝑇−1𝑡=1
𝑏�̂�(𝑘) =∑ 𝛾𝑡(𝑖)𝑡=1,𝑂𝑡=𝑜𝑘
∑ 𝛾𝑡(𝑖)𝑇𝑡=1
Termination:
If the quality measure 𝑃(𝑂|�̂�) was not improved by the updated model �̂� compared to 𝜆, the process
stops, however, if not, repeat all steps.
Figure 144 - Flowchart of the Baum-Welch algorithm for a discrete HMM
3.1.3 DHMM Testing: Viterbi Algorithm
The Viterbi algorithm is presented as the chosen option to forecast the direction of the market close
price. This algorithm look at every state sequence and simply select the most likely sequence in a
process assumed to be a finite-state and discrete in time Markov process.
Like the forward or the backward algorithm, the Viterbi algorithm also have a variable represented
by 𝛿𝑡(𝑖). This new variable generate the segment from the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 with
maximum likelihood and ends in state 𝑠𝑖.
𝛿𝑡(𝑖) = max𝑠1,𝑠2,…,𝑠𝑡−1
𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠1, 𝑠2, … , 𝑠𝑡−1, 𝑠𝑡 = 𝑖|𝜆) (25)
25
This variable 𝛿𝑡(𝑖) can be compared with the forward variable 𝛼𝑡(𝑖), except that the Viterbi algorithm
uses maximization instead a summation over previous states.
Figure 155 – Optimal path using Viterbi Algorithm
The Viterbi algorithm is described as follows:
1. Select the most likely sequence in the process using the Viterbi algorithm [23]:
Initialization:
For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 1 we have:
𝛿1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1),
𝜓1(𝑖) = 0,
(26)
(27)
Recursion:
For all times 𝑡, 1 ≤ 𝑡 ≤ 𝑇 − 1 and all states 𝑖, 𝑗 ∈ [1, 𝑁] we have:
𝛿𝑡+1(𝑗) = max𝑖{𝛿𝑡(𝑖) 𝑎𝑖𝑗}𝑏𝑗(𝑂𝑡+1)
𝜓𝑡+1(𝑗) = 𝑎𝑟𝑔max𝑖{𝛿𝑡(𝑖)𝑎𝑖𝑗}
(28)
(29)
Termination:
For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 𝑇 we have:
𝑃∗(𝑂|𝜆) = 𝑃(𝑂|, 𝑠∗|𝜆) = max𝑖𝛿𝑇(𝑖) (30)
𝑠𝑇∗ = 𝑎𝑟𝑔max
𝑗 𝛿𝑇(𝑗) (31)
2. Having the most likely sequence from 𝑡 = 1 to 𝑡 = 𝑇, the next step will to assess the most likely
state in 𝑡 = 𝑇 + 1. This is calculated from the manipulation of the algorithm equations. It is created
26
the matrix 𝜑𝑖(𝑂𝑇+1) that holds the probability of going to state 𝑖 in case of having the observation 𝑂
in 𝑇 + 1,
𝜑𝑖(𝑂𝑇+1) = max𝑖{𝛿𝑇(𝑖)𝑎𝑖𝑗}𝑏𝑗(𝑂𝑇+1) (32)
3. Next using (63) the most probable state in T is extracted from the results obtained with the Viterbi
algorithm,
𝑆𝑡𝑎𝑡𝑒 = 𝑎𝑟𝑔max𝑖𝛿𝑇(𝑖) (33)
4. The state extracted the previous topic is used in 𝜓𝑇(𝑗) to extract the most likely predecessor state,
i.e., the most likely state at 𝑇 + 1 where,
𝑃𝑟𝑒𝑑𝑒𝑐𝑒𝑠𝑠𝑜𝑟 = 𝜓𝑇(𝑗 = 𝑆𝑡𝑎𝑡𝑒) (34)
5. Having the most probable predecessor state, it is now possible to compute the most probable
observation in 𝑇 + 1,
𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = 𝑎𝑟𝑔max𝑖𝜑𝑝𝑟𝑒𝑑𝑒𝑐𝑒𝑠𝑠𝑜𝑟(𝑂𝑇+1) (35)
3.1.4 Why Multi Hidden Markov Models
The focus of the HMM is bounded by the size of the training window. Higher training windows give the
HMM the capacity to perceive the formation of long-term trends but make the model less sensitive to
detect changes in trends, as opposed, the use of reduced training windows give the model the ability to
identify the formation of short and transient patterns and greater sensitivity in detecting changing trends.
Due to constant fluctuations in the financial markets, it is important that the developed model is able to
adapt to such fluctuations. Thus, it is important that the model is capable of analyzing long term trends,
while quickly adapts to these market changes.
Figure 166 - Focus of different windows size
To this end, it was decided to use three DHMM trained with different window sizes. The size of the
windows has been obtained through test. The size was determined 90 days because this window size
27
achieved the best results in all tested window sizes, nevertheless it was found that the 90-day DHMM
slowly adapts to changing trends, and so it was decided that this size would be the maximum size to be
considered. The choice of the 15-day size was again determined through tests that showed that the
quality of forecasts and the sensitivity to fluctuations would eventually present itself as the most
balanced choice. To bridge the gap between the maximum and minimum value of the window, it was
decided to adopt a third DHMM with the window size that would be between the other two. Two cases
were then tested, the 30 days size, equivalent to 1 month, and 90 days, equivalent to 2 months. The
tests results showed the DHMM 30 days as the best among them.
3.2 Developed Methods
This section will analyze the methods carried out over the final solution development process. More
experiments and tests were made and the results can be found in Chapter 5, but the focus of this section
centers on the discussion and explanation of the methods that are actually used in the final solution.
The presentation order of each solution coincide with the order of development as new challenges have
emerged that needed to be overcome with the addition and improvement of constructed models.
3.2.1 Development of a Multi HMM strategy
This first model intends to add three different DHMM in a single model. The idea centers on the use of
the output of each model to generate the final prediction using the most predicted signal. For this
decision to be possible, it was necessary to reduce the number of the DHMM output of observations
for two, this because there was a risk of each of the three DHMM generates a different value having a
halt in the forecast decision. Thus, it was decided to add the observation that there is a price drop with
the observation concerning the existence of price maintenance. It can be concluded from the tests that
the probability of the same repeating closing price on these two days is greatly reduced.
Figure 17 - Aggregation of both 0 and 1 signals into the 0 signal
The objective of this approach has focused on balancing both the long term prediction and the
adjustment sensitivity of the model. As stated above and shown in Figure 20, each of these three DHMM
produce an output and having three forecasts for two possible values, there will be at least two equal
forecasts, the value of these forecasts is then chosen to be the value reported as final forecast from the
model.
28
Figure 18 - Flowchart of the three DHMM model integration
Having three DHMM with different sensibility to direction variations enables a rapid and effective
adaptation to the behavior of financial time series. Thus, stick to use on an analysis of the value predicted
in higher number is to be under-utilizing the capabilities that this approach can offer. That said, the new
objective focused on the use of DHMM 90 as the main model and the use of DHMM 30 and DHMM 15
each time a new trend is detected to accelerate the adaptation of the model to new market behaviors.
To this end, were used three technical indicators to the detection and indication of possible overbought
or oversold (RSI), new trends (MACD) or indication of the strength of those trends (ADX) and a fourth
approach when the RSI and MACD indicators were used simultaneously. The above indicators are
explained in detail in chapter 2.
To deal with the detection by the technical indicator of a trend change was developed an application
controller of each HMM in an orderly manner. This controller applies the HMM 15 for the first 15 days
following the indicator signal, the 15-20 day the HMM 30 and finally returns to the use of the HMM 90.
These time intervals are chosen firstly such a way that the window of each HMM model did not stop
contain the day when the indicator has detected a change in trend. Adaptation and final decision was
made in the days presented from testing, which showed these are values that better obtained results.
29
Figure 19 - Flowchart of a multi DHMM model with technical indicators
In Figure 22 is shown the flowchart of the process where the Technical Indicator block is replaced by
the analysis of the results of each indicator and Compute Indicator block compute the value of the
indicator. The analysis process for each indicator can be found in Figure 23.
Figure 20 - Technical indicators used in the Technical Indicator box stated in Figure 22
The first stage of the model development was preceded by their respective testing which showed a
significant improvement in the results, except for the ADX that due to its poorer performance this
indicator will not be included in the second model development. The remaining cases were able to
achieve the objectives. For the second stage the reduction of losses over the years has become as the
primary goal, even if it means a decrease of profits, to increase the reliability and the investment safety
of the model.
30
3.2.2 Fusion between different methods
For this second phase of development, as mentioned in the previous section the aim is to reduce the
losses per year of the model even if the profit per year will eventually reduce. Thus, three of five models
previously developed were added to this new model and a new "0" signal was added. Aggregation is
made so that there is confirmation from two models of direction provided by DHMM, the signal will be
transmitted whenever the prediction of both algorithms is contradictory. In these days of uncertainty, in
which the two DHMM give a different forecast will not be done investment, avoiding possible losses.
From this second stage the signal is transformed into estimates of the new representations. In addition
to adding the "0" signal, the down signal of closing market value is converted to "-1", and the increase
to "1".
Figure 21 - Flowchart of the fusion model
In Figure 24 is shown the flowchart of the process where the Aggregation of Developed Models block is
replaced by the analysis of the results of each. The analysis process for each sub-model aggregation
can be found in Figure 25.
Figure 22 - Different sub-models aggregation
The results show that there was a substantial reduction in losses despite having been a slight drop in
earnings. So, to try to recover losses in annual earnings without further increase the damage was
developed the third and final phase which is described in the next section.
31
3.3 Multi DHMM Automation
The development of the third and final stage were added the five developed models that showed the
best results. Once again in order to have a confirmation of direction provided by the various models, the
most predicted value will be chosen. Table 5 lists the selected models from phase one and two:
Table 6 - Selected models from each phase
Therefore, were chosen the first phase models with a higher total gain, but in order to mitigate the losses
occurring in some of the years in these two models are used three models of the second phase, thus
incorporating to the final model the signal "0" so that no investments were made when there are large
uncertainties in the calculation of own forecast.
Figure 23 - Flowchart of the final model
In Figure 26 is shown the flowchart of the final model. Each of the five models that incorporate issues a
forecast and is subsequently chosen the most expected forecast of three possible, its value is saved
and if there are more dates to predict, the windows slide and the process is resumed. When there is no
data stored values are then recorded in a text file for further analysis.
Phase 1 Phase 2
DHMM MACD
DHMM MACD RSI
DHMM RSI and DHMM MACD
DHMM 15 30 90 AND DHMM MACD
DHMM 15 30 90 and DHMM RSI
32
3.4 Conclusion
In this chapter was described the entire development process and the models used to develop the final
model. The chapter is divided into four parts corresponding to the division by stages of development of
the model. In the first phase was used a Multi DHMM strategy, using three DHMM trained with different
window sizes to take advantage of the specific features of each one. In the second phase were used
the RSI, ADX and MACD technical indicators in order to efficiently use the strategy developed in the
first phase. The third phase was developed taking into account limitations of the prior phase, thus the
scope of this phase is centered in the reduction of losses existing in some years, for this it was added a
new signal that suggests that the model is uncertain of its prediction. Finally, was developed the final
model which added the best models yet developed to make the most of their abilities, each of these
sub-models proceed with a prediction and the most predicted signal from these sub-models is chosen
as the final model prediction.
33
CHAPTER 4 RESULTS
In this section are presented all results and conclusions from each case study. From the analysis of the
algorithm results it’ll be described the process and decisions that leaded to the final algorithm
combination and subsequent automation.
This chapter is divided in three main sections. In the first section is tested the ability of the DHMM to
detect patterns and forecast. In this same section the performance of the discrete version is compared
with the continuous version. The second section describes the construction of the solution and
necessary improvements that lead to the development of different sub-models to address DHMM
limitations. The last subsection analyzes the results obtained in tests performed to the final model and
compare with the results obtained in each sub-model used in the final model.
4.1 HMM Analysis, Comparison and Decision
In this chapter are held the first DHMM algorithm analysis, in order to assess its usefulness as a Forex
market direction prediction algorithm. Thereafter, in order to perceive if a discrete approach of the
algorithm would have any advantage over a continuous approach, both algorithms were compared and
evaluated in same circumstances.
4.1.1 Case Study I – Analysis with specific patterns
Before proceeding with the development of models using the DHMM, it was necessary to see if the
algorithm was able to meet the necessary requirements to analyze financial time series. With this
purpose were designed 8 different patterns, which give some idea of the type of analysis and quality in
predicting that one would expect from DHMM.
Each pattern contains three different observations (0.1 and 2 respectively), corresponding to the same
number of states used in the training of DHMM. For these tests, all the DHMM was trained with 30 data
points from each pattern before forecast the next point. The shape of each pattern can be found in
Appendix 1 and the obtained results can be found in Table 6 stated below:
34
Table 7 - Results using a predefined set of patterns
The results lead to believe that the use of the model as a basis for model development appears to be a
good choice. Although overall the results are satisfactory, in patterns 5 and 6, one can find some difficulty
in detecting abrupt transitions in the pattern direction. Such difficulty in detecting abrupt or semi-abrupt
transitions is considered the biggest limitation of DHMM, so during the development was necessary to
adopt strategies to solve this problem.
4.1.2 Case Study II – Analysis with real data
After assessing the DHMM feasibility for detecting the patterns direction it is now important to know how
the same algorithm behaves with actual data from the EUR / USD Forex market. For this purpose it was
used 2011 historic data. The year 2011 was chosen due to its characteristics over the months. The year
can be divided into three parts, the first part shows a price rise from the EUR / USD pair, the second
has a price stagnation zone and the third part presents a drop in price. In addition, throughout the year,
there are periods of sudden drops and rises. With this variety of cases in the same year, it was possible
to conduct a more thorough algorithm behavior analysis taking account of all these factors.
For the analysis of these first tests were calculated percentages corresponding to the two case studies.
In the first case one used two states and three points, while in the second case it was used three states
and three observations. The number of observations, as discussed in chapter 4, correspond to the three
possible conditions, i.e. drop, maintenance or increase of the next EUR/USD closing value. For each
case where the number of states varies between two and three, the algorithm is trained with different
quantities of data, characterized by weeks, months or years, as follows:
Direction 2 2 2 2 0 0 0 0
Prediction 0 2 2 2 2 0 0 0
Direction 2 0 2 0 2 0 2 0
Prediction 2 0 2 0 2 0 2 0
Direction 0 0 0 0 2 0 0 0
Prediction 0 0 0 2 2 0 0 2
Direction 1 2 1 2 1 0 1 0
Prediction 1 0 1 2 1 2 1 2
Direction 0 2 0 2 0 0 2 0
Prediction 2 2 0 2 0 2 2 0
Direction 1 1 1 1 1 1 1 1
Prediction 1 1 1 1 1 1 1 1
Direction 2 0 2 0 2 0 2 0
Prediction 2 0 2 0 2 0 2 0
Direction 2 2 0 2 2 2 0 2
Prediction 2 2 0 2 2 2 0 2
Pattern 7
Pattern 8
Pattern 1
Pattern 2
Pattern 3
Pattern 4
Pattern 5
Pattern 6
35
Table 8 - Results using different training window sizes
The results show that in all six cases the use of three states, the same number of observations obtained
a superior performance. The model with the 90-day window size showed the best result reaching a
percentage of 59% correct predictions that presents itself as an excellent result for this first test from
real data. From the results one can conclude that the best decision is to use three states HMM in the
development of the next models.
4.1.3 Case Study II – Continuous and Discrete HMM comparison
Next, it was implemented and tested the HMM continuous version comparing both continuous and
discrete to assess if the discrete HMM could surpass the performance of its continuous version. As
described in chapter 4, the aim of the CHMM algorithm is to forecast the exact price of an asset. To use
the same algorithm to predict the direction of the same asset was added a new procedure to compare
the predicted value with the previous value and assigned a new value corresponding to a decrease,
increase or maintenance signal.
Table 9 - DHMM and CHMM comparison using hourly, daily and weekly data
For the test results below, it was used 3 states and 3 observations on both algorithms and
complementary to each pair of state and observation it was test the impact of having from 2 to 6 mixtures
in the CHMM. The data used in the test correspond to Forex values from 2002 to 2013 and both
algorithms were trained with 15, 30, 60 and 90 data values from hourly, daily and weekly values. The
comparison of the two alternatives, using the best result from each time are shown in Table 9:
Training Window Size States and Observations Correct
2 States 3 Observations 52%
3 States 3 Observations 57%
2 States 3 Observations 56%
3 States 3 Observations 59%
2 States 3 Observations 56%
3 States 3 Observations 58%
2 States 3 Observations 55%
3 States 3 Observations 57%
2 States 3 Observations 56%
3 States 3 Observations 59%
2 States 3 Observations 54%
3 States 3 Observations 56%
7 days
15 days
30 days
60 days
90 days
1 year
Hourly 54,68% 50,43%
Daily 55.73% 44,17%
Weekly 51,85% 40,13%
Discrete HMM Continuous HMMTime
36
Table 10 - DHMM and CHMM comparison using data from 2002 to 2013 and 60 days window
From Table 9 is possible to verify that the choice for a DHMM model for direction prediction was the
most appropriate. Only in 2005 the continuous HMM managed to perform better than the discrete model.
4.2 Construction and Improvement
For the construction of the final model several tests were performed in order to detect the limitations of
the model and carry out with possible solutions. The analysis and subsequent model improvements led
to be held 7 case studies until reaching the final model. These tests were made using Forex EUR/USD
historical data from 2002 to 2013 and a sum of pips is performed to assess the revenue of using the
proposed strategy over the 11 years period. When a wrong prediction is obtained the pips difference
between the closing value of the expected day and its former is subtracted from the pip total, in the
event of a correct forecast this difference is added to the total. As mentioned previously, the approach
uses a sliding window strategy for training and testing.
At the same time an analysis was made of the impact of the meetings of the European Central Bank
(ECB) and the Federal Reserve System (FED) in predicting and behavior of the solutions developed,
since the existence of such meetings insert a period of uncertainty in the Forex pair EUR / USD.
4.2.1 Case Study I – Simple means
The first indicator chosen to engage with the HMM was the SMA (Simple Moving Average). The choice
was due to its importance, simplicity and usability. The idea behind its use focuses on the closing value
estimate from the HMM only when the closing value is higher than the average value calculated by the
SMA. Thus, it was expected to avoid times of greater uncertainty that could hinder the pattern analysis
and subsequent prediction of direction.
For this purpose we analyzed four sets of moving averages, i.e., 30, 50, 100 and 200 days sets in its
calculation. These values were selected for being the most used SMA sets in the financial market
technical analysis.
YearDescrete
HMM
Continuous
HMMYear
Descrete
HMM
Continuous
HMM
2002 50.97% 45,56% 2008 51,72% 48,28%
2003 55,21% 37,45% 2009 53,08% 43,85%
2004 52,11% 44,06% 2010 51,74% 44,02%
2005 50,38% 55,38% 2011 56,54% 50,00%
2006 54,23% 41,15% 2012 51,55% 46,12%
2007 50,96% 45,59% 2013 53,67% 50,58%
52,84% 46,00%Mean
37
Table 11 - Total of the DHMM and Simple Mean results from 2002 to 2013
After analyzed the four cases in Table 10, it was concluded that the HMM / Moving Averages set features
the best performance when used the 200 days SMA, in contrast the 50 days SMA showed the worst
performance and the difference lies in the prediction of the year 2011, where the 200 days SMA
presented lower losses compared to the 50 days SMA and in opposition, in 2006 where the forecast
have presented considerable profits, the 200 days SMA stood out scoring 2191pips.
As you can see below, the discrepancy of results in 2011 relates to the last quarter of the year. The
respective quarter may be divided into two parts. In the first part both SMA's managed to stop the HMM
when it appeared an uncertain area. In the second part, while the 200 days SMA is stopped, the SMA
50 days SMA no longer considered this part as an uncertainty area, causing a considerable loss in the
remaining quarter.
Figure 24 - Comparison of the 200 days SMA (left) and the 50 days SMA (right) in 2011
Comparing the 30 days SMA to the 200 days in 2006, i.e. worst and best case, one can conclude that
the difference concerns once again with the different analysis in the same area performed by both SMAs.
As shown in Figure 28 the 30 days SMA considered some areas that could have been profitable as
areas of uncertainty:
SMA Normal W/out FED W/out ECB W/out FED+ECB
30 2414 2947 3574 4107
50 2815 3549 3312 4046
100 3302 3097 4089 3884
200 3735 3395 4227 3887
-750
-250
250
750
1250
1,25
1,3
1,35
1,4
1,45
1,5
j f m a m j j a s o n d
200 days SMA - Year 2011
EUR/USD Earnings
-1500
-1000
-500
0
500
1000
1,25
1,3
1,35
1,4
1,45
1,5
j f m a m j j a s o n d
50 days SMA - Year 2011
EUR/USD Earnings
38
Figure 25 - Comparison of the 200 days SMA (left) and the 30 days SMA (right) in 2006
It is concluded that the best result is displayed by the pair HMM 90 / SMA 200 days, although the results
are positive, other constraints may be mentioned that this indicator could not overcome and may be
create profit limitations. One of these limitations is related to the delay of the HMM to realize a change
in direction when leaving a strong trend.
4.2.2 Case Study II – Usage of mixed training days
The time detecting a change in direction depends of the size of the window used for training the DHMM.
Smaller windows turn more sensitive the algorithm to slight changes but less sensitive to long-term
trends, on the other hand, larger windows turn to have a low sensitivity to immediate changes because
its focus will be on long-term trends.
Being said, it is important to find a balance between both cases. Thus arises the possibility of using a
combination of three DHMM with different training window sizes, a DHMM sensitive to small changes,
other sensitive to long-term trends and a third who is relatively between both cases. The result will be
three predictions from each one and will be chosen the more predicted direction of the three.
Table 12 - Results of different sets of DHMM training window sizes from 2002 to 2013
From the results in Table 11 it is possible to verify that the DHMM 15, 30 and 90 set has a much higher
performance than the remaining sets, achieving in 2010 a total of 2461 pip. In Figure 29 it is noticeable
the use of a long-term component between January and June when the algorithm does not change the
direction of the forecast in April when the direction reversed temporarily. Logically that during this period
the prediction proved wrong resulting in a loss of Pips. It is also easy to identify a greater sensitivity to
changes since September the algorithm was able to follow a slight change in direction. In this approach,
it was noted that an analysis without the days of the EDF or ECB meeting are even more harmful than
a global approach.
-300
200
700
1200
1700
2200
1,15
1,175
1,2
1,225
1,25
1,275
1,3
1,325
1,35
01 02 03 04 05 06 07 08 09 10 11 12
EUR/USD Earnings
-500
0
500
1000
1500
1,15
1,175
1,2
1,225
1,25
1,275
1,3
1,325
1,35
01 02 03 04 05 06 07 08 09 10 11 12
EUR/USD Earnings
Set [days] Normal W/out FED W/out ECB W/out FED+ECB
15, 30 and 90 5936 5646 5167 4877
15, 60 and 90 4096 3072 3415 2391
30, 60 and 90 4230 3110 3467 2347
39
Table 13 - Results from 2002 to 2013 of the DHMM training using 15, 30 and 90 days window size
Although this approach has able to contemplate a macro and micro view of the patterns and their
changes in direction, its use is somewhat random. To overcome this challenge, it is important to use a
technical momentum indicator which will information the algorithm of a possible change in market
behavior for a faster and more efficient adaptation.
Figure 26 - Results from multi DHMM in 2010
To attempt to reduce the time it takes the algorithm to detect a new direction transition resorted to the
use of RSI. This technical momentum indicator attempts to determine overbought and oversold
conditions and this information is expected to detect a change in behavior and market direction. That
said, and from the results obtained with previous tests, especially with the results in 5.2.2 were used
DHMM three algorithms:
Training the algorithm using the previous 15 days:
As mentioned in the previous chapter, when using a small dataset for training the algorithm
makes it more sensitive to small changes of direction. So when the RSI indicate a possible
change of direction, we expect that the algorithm is as quickly as possible to detect it.
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 -446 -267 -531 -352 2008 1455 1675 964 1184
2003 1075 819 880 624 2009 369 449 -115 -35
2004 222 213 111 102 2010 2461 1739 2024 1302
2005 235 233 546 544 2011 1080 916 1578 1414
2006 22 222 139 339 2012 -763 -436 -741 -414
2007 -46 -179 53 -80 2013 272 262 259 249
Normal W/out FED W/out ECB W/out FED+ECB
5936 5646 5167 4877
YearPip Values
YearPip Values
TOTAL
-250
250
750
1250
1750
2250
2750
1,15
1,2
1,25
1,3
1,35
1,4
1,45
01 02 03 04 05 06 07 08 09 10 11 12
EUR/USD Earnings
40
Training the algorithm using the previous 90 days:
It is necessary that as soon as it detects and confirm the direction of change detected by the
RSI, the algorithm again not be sensitive to noise and to focus on a detection medium and
long term patterns.
Training the algorithm using the previous 30 days:
It was considered the need for a smooth transition between the last two points. So one can
still discard false direction changes detected by RSI within a short period while carried forward
to a stage where there is less sensitivity to noise.
It was necessary to define the time that each of the DHMM's presented in the previous 3 points should
have after the RSI detect a possible change in market behavior. The first analysis was to use DHMM 15
during the first 20 days, the DHMM 30 during the following 15 days proceeding with the use of the
remaining time until a new indication of RSI. Results are shown below:
Table 14 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 20 and 45 days steps
The first results were lower than expected. The main reason was the use of DHMM 15 and 30 DHMM
for too long. To analyze the impact, it was decided to reduce the use of each DHMM to the following
intervals: DHMM 15 until 15 days after the statement of RSI, DHMM 30 within 5 days later and DHMM
until next indication of RSI. The results in Table 18 show the positive effect of a small reduction of these
ranges. It is thus possible to complete the utility of this approach in the detection and adapt to market
changes in behavior. The supplementary analysis the presence or absence of the days of the ECB and
Fed meetings show that for this case, their absence has no advantage after 14 years.
Table 15 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 15 and 20 days steps
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 -18 -301 -103 -386 2008 1135 807 586 258
2003 -229 -295 -326 -392 2009 -2761 -2681 -2883 -2803
2004 240 137 577 474 2010 3185 2463 2784 2062
2005 1481 1479 1286 1284 2011 -266 -446 316 136
2006 -228 -28 -377 -177 2012 825 976 911 1062
2007 -1408 -1475 -1289 -1356 2013 -464 -456 127 135
Normal W/out FED W/out ECB W/out FED+ECB
1492 180 1609 297
YearPip Values
YearPip Values
TOTAL
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 -60 -273 -45 -258 2008 1867 1293 956 382
2003 483 357 1168 1042 2009 879 959 395 475
2004 1068 1391 889 1212 2010 1235 789 834 388
2005 1533 1399 1664 1530 2011 -1624 -1668 -748 -792
2006 98 298 -7 193 2012 587 642 429 484
2007 -422 -489 -227 -294 2013 -604 -964 -225 -585
Normal W/out FED W/out ECB W/out FED+ECB
5040 3734 5083 3777
YearPip Values
YearPip Values
TOTAL
41
4.2.3 Case Study IV – MACD indicator
This approach to try to improve the results of the model with the addition of the trend-following
momentum indicator MACD. By using this indicator we are left with three possibilities at their junction,
these are the indicator used in its entirety (MACD and divergence), only use the divergence or only use
the values of the MACD. In each case were used the same strategy as the previous case study,
corresponding to DHMM 15 until 15 days after the statement of RSI, DHMM 30 within 5 days later and
DHMM until next indication of RSI.
Table 16 - Results from 2002 to 2013 from Multi DHMM using MACD and Divergence
The results presented in Table 15 show a great total of 8578 pips when combining Divergence and
MACD with DHMM but by analyzing Table 16, which is present only the analysis of the combination of
DHMM and Divergence one sees a decrease to half of the total achieved in the test of Table 15. Thus,
the good result shown in the first case should be of the entire responsibility of the MACD.
Table 17 - Results from 2002 to 2013 from Multi DHMM using Divergence
Confirmation can be obtained by analyzing Table 17, where the combination of DHMM and MACD
was a 10038 pips profit far superior to the two previous cases. It is possible to verify that this result
was due to the large profits between 2008 and 2011 and reduced losses over the 11 years ensuring a
performance far superior compared to the two previous cases.
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 -1172 -1177 -1219 -1224 2008 2563 2119 1710 1266
2003 835 767 882 814 2009 1901 1373 1417 889
2004 606 929 325 648 2010 1899 1843 1396 1340
2005 357 283 522 448 2011 1164 1136 1956 1928
2006 -926 -726 -881 -681 2012 927 926 1141 1140
2007 -298 -497 -129 -328 2013 722 566 847 691
Normal W/out FED W/out ECB W/out FED+ECB
8578 7542 7967 6931
YearPip Values
YearPip Values
TOTAL
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 -1146 -1151 -1193 -1198 2008 883 1745 30 892
2003 517 223 642 348 2009 1887 1359 1403 875
2004 416 739 173 496 2010 977 905 420 348
2005 37 -71 510 402 2011 506 726 862 1082
2006 -556 -594 -621 -659 2012 473 566 747 840
2007 86 -139 281 56 2013 644 654 769 779
Normal W/out FED W/out ECB W/out FED+ECB
4724 4962 4023 4261
YearPip Values
YearPip Values
TOTAL
42
Table 18 - Results from 2002 to 2013 from Multi DHMM using MACD
From the results is possible to conclude that the use of MACD greatly increases the algorithm's
performance. The divergence and the set, despite having an interesting result, it falls short of the results
reported by isolated MACD. It is even possible to say that the use of divergence with the MACD makes
the result decreases in 1566 Pips over the years.
4.2.4 Case Study V – Combining RSI and MACD with ADX
Trend Strength Indicator ADX was used to confirm that a new trend indicated by the RSI or MACD for
would be strong or weak. The objective focused on ignoring false signals of new trends. Therefore, if
the trend of the MACD or RSI was indicated by the ADX how strong would be used the analysis of 15,
30 and 90 days as referred to in the two previous case studies, if the trend is weak, the HMM continues
with an analysis at 90 days. The results in Table 18 showed that due to the poor results there is no
advantage in using this indicator in conjunction with other already analyzed. The result managing use
MACD - ADX could even be considered, but this result alone does not reflect the performance of the
ADX but the high-performance of the MACD indicator that had the same influence as in the previous
section with the analysis of MACD with Divergence.
Table 19 - Results from 2002 to 2013
4.2.5 Case Study VI – Combining MACD and RSI
Although the results obtained in the previous case study are quite satisfactory, would be very useful to
decrease the amount of losses on negative years, to that end joined a momentum technical indicator
(RSI) with a trend-following momentum indicator (MACD) for the detection the change of a trend would
be better identified which would allow a more efficient action of the HMM those moments reducing the
amount of losses.
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 806 975 859 1028 2008 3169 2705 2756 2292
2003 1575 1621 1158 1204 2009 1873 1845 1507 1479
2004 -848 -847 -641 -640 2010 1889 1459 1498 1068
2005 19 -115 150 16 2011 1666 1110 2164 1608
2006 458 658 649 849 2012 -121 170 -599 -308
2007 -500 -587 -487 -574 2013 52 -106 -153 -311
Normal W/out FED W/out ECB W/out FED+ECB
10038 8888 8861 7711
YearPip Values
YearPip Values
TOTAL
Model Normal W/out FED W/out ECB W/out FED+ECB
Multi DHMM - MACD - ADX 6668 5410 6141 4883
Multi DHMM - RSI - ADX 3434 1582 2413 561
43
Table 20 - Results from 2002 to 2013 from Multi DHMM combining MACD and RSI
The result turned out to meet with the objective pursued, managing to reduce the maximum value of
losses in a year to -680 Pips without reducing the total amount of earnings at the end of 14 years. Once
again one can view that the analysis without the data of the days of the communications made by the
Fed or ECB have a negative impact.
4.2.6 Case Study VII – Combining MACD, RSI and mixed training days
Although the above results show an increase in the total amount of pips and a reduction of losses over
the years, it was decided to try to further reduce the possible annual losses even if it represents a fall
on the positive results of the remaining years. The objective focuses on having losses of less than -500
pips. That said, the models analyzed in previous case studies are paired and if their predictions are
different, the algorithm does not predict any direction. Therefore, on that day will not be opened any
position and closed all the existing positions.
Table 21 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM MACD
In Table 22 are the results obtained with the combination HMM 15, 30 and 90 and MACD HMM are
plotted according to the purpose. On average this model managed to have an annual loss of -93.45
pips and an 842.91 pip profit. This combination also showed the best pip total in 11 years from the
three analyzed in this section. In Table 23
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 790 751 647 608 2008 3365 2791 2638 2064
2003 1787 1833 1312 1358 2009 1647 1619 1281 1253
2004 -464 -463 -257 -256 2010 1627 1197 1050 620
2005 1061 907 1158 1004 2011 -282 -838 300 -256
2006 566 766 425 625 2012 303 458 53 208
2007 -680 -767 -581 -668 2013 -10 -168 -45 -203
Normal W/out FED W/out ECB W/out FED+ECB
9710 8086 7981 6357
YearPip Values
YearPip Values
TOTAL
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 437 700 499 762 2008 2312 2190 1860 1738
2003 1325 1220 1019 914 2009 1121 1147 696 722
2004 -313 -317 -265 -269 2010 2175 1599 1761 1185
2005 127 59 348 280 2011 1373 1013 1871 1511
2006 240 440 394 594 2012 -442 -133 -670 -361
2007 -273 -383 -217 -327 2013 162 78 53 -31
Normal W/out FED W/out ECB W/out FED+ECB
8244 7613 7349 6718
YearPip Values
YearPip Values
TOTAL
44
Table 22 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM RSI
Table 23 shows the results obtained by combining HMM RSI and HMM 15 30 90. Although this sub-
model has the lowest total of the three, has an average value much lower than the previous sub-model.
This sub-model achieved an average loss of -49.1818 pips and annual earnings of 544.273 pips. Finally,
Table 23 shows the results obtained from the HMM RSI and MACD HMM model. This model has the
lowest average loss over the 11 years of the three achieving a total of -39,82pips a very small result
compared to all other models already developed. Their average gain is 698pips, which presents itself
as the second best of the three models analyzed in this section.
Table 23 - Results from 2002 to 2013 combining HMM MACD and HMM RSI
With this approach, the aim of having losses below 500 Pips was reached, naturally in addition to the
lower, the gains have also suffered from this new change. Nevertheless, the three cases studied had
very positive results, specifically the pair HMM 15 30 90 and HMM MACD who got one of the best results
of all simulations.
The junction of the various algorithms ultimately creates a stronger forecast removing large variations
in earnings. Despite these variations can be seen as beneficial when positive, is not the case with large
negative changes that can completely disable the algorithm.
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 -10 154 33 197 2008 129 728 -391 208
2003 757 539 765 547 2009 247 273 -102 -76
2004 657 648 652 643 2010 2066 1669 1685 1288
2005 -127 -205 231 153 2011 478 233 1227 982
2006 841 906 847 912 2012 -404 -289 -442 -327
2007 107 -21 157 29 2013 705 520 681 496
Normal W/out FED W/out ECB W/out FED+ECB
5446 5155 5343 5052
YearPip Values
YearPip Values
TOTAL
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 359 429 393 463 2008 986 1243 505 762
2003 1007 940 904 837 2009 999 971 709 681
2004 122 118 276 272 2010 1780 1529 1422 1171
2005 -235 -379 33 -111 2011 771 330 1520 1079
2006 1059 1124 1102 1167 2012 -83 14 -371 -274
2007 -120 -225 -113 -218 2013 595 336 475 216
Normal W/out FED W/out ECB W/out FED+ECB
7240 6430 6855 6045
YearPip Values
YearPip Values
TOTAL
45
Figure 27 - Performance of each developed model from 2002 to 2013
4.3 Multi DHMM Automation
This section aims to analyze the result of the final adaptation of the Multi DHMM. This final adaptation
had as main objective the substantial reduction of its losses each year. For this purpose, were introduced
to the model some of the approaches analyzed in the previous case studies which have demonstrated
better performance. A characteristic withdrawal of the model analyzed in the previous case study
focuses on the addition of a STOP signal that tells that there is uncertainty in the forecast and therefore
it will be better to discard the current forecast. By analyzing Table 28 and 29 is possible to verify the
positive impact that this new signal had in the final results. When analyzed all possible outcomes from
the model, the percentage of correct predictions reaches 52%, if the estimates that generated the STOP
sign are discarded, since in these moments there were no losses nor gains, the percentage rises to
57%, one considerable difference of 5% which has its impact on the final results.
Table 24 - Resulting percentages from the final model
Table 25 - Resulting percentages from the final model without stop signal
It is possible to identify Table 30 the complete elimination of years with a negative pip total. The year of
2007 that appeared negative in the previous case of states, recovered to a gain of 217 pips. Despite
this gain can be considered small, also turns out to be the lowest value obtained over the 12 years
analyzed. Furthermore, 2008 and 2010 shows an above the earning average of 2196 pip. The analysis
-1500
500
2500
4500
6500
8500
10500
0,8
0,9
1
1,1
1,2
1,3
1,4
1,5
1,6
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Close Values HMM153090 + HMMMACD
HMM153090 + HMMRSI HMMMACD + HMMRSI
False True Stop Total
Num of Days 1210 1624 283 3117
Percentage 39% 52% 9% 100%
False True Total
Num of Days 1210 1624 2834
Percentage 43% 57% 100%
46
without the days of FED and ECB press conferences suggests once again that despite these days
represent days of instability in the market, their inclusion brings more benefits than one might think at
first analysis being noticed a degradation of the results when those days are removed.
Table 26 - Results from 2002 to 2013 from the Final Model
Over the 12 years analyzed the EUR / USD often changed behavior. Between 2002 and 2013 can be
detected relatively stable periods and sharp falls, as the 2008 financial crisis and subsequent instability
as large oscillation periods. From the graph we can see that the developed method detects and quickly
adapts to new market trends, such as the rapid detection of the 2008 financial crisis where the current
approach had the highest profit. These results suggest that the developed method is well prepared for
fluctuations or different market trends that may arise in the future. Thus, the results show the multi
DHMM a profitable method readily adaptable even in unpredictable market conditions.
Figure 28 - Final model resulting performance from 2002 to 2013
The first analysis of the DHMM demonstrated that this model is an excellent algorithm to perform the
forecast of the Forex values direction. In tests conducted the discrete version was more effective as the
continuous version.
Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB
2002 1076 934 997 855 2008 5690 5058 5027 4395
2003 1180 1121 1419 1360 2009 2923 2458 2717 2252
2004 2481 2208 2132 1859 2010 4567 3761 4000 3194
2005 2050 1896 2053 1899 2011 2249 1898 2736 2385
2006 754 927 981 1154 2012 2398 2269 2123 1994
2007 217 385 194 362 2013 764 774 639 649
Normal W/out FED W/out ECB W/out FED+ECB
26349 23689 25018 22358
YearPip Values
YearPip Values
TOTAL
-500
4500
9500
14500
19500
24500
29500
0,8
0,9
1
1,1
1,2
1,3
1,4
1,5
1,6
1,7
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Close Value Pip Somation
47
The HMM itself has limitations in adapting to new patterns and trends. The speed of adjustment depends
on the window size which the DHMM is trained, i.e. the smaller the window more sensitive the algorithm
will be to variations and as result, will be more vulnerable to noise.
To help overcome the difficulties encountered were used three indicators (RSI, MACD and ADX), and a
joint between the DHMM 15, DHMM 30 and DHMM 90. Results showed that the use of RSI and MACD
indicators could provide relevant information on market behavior changes. With the information provided
by these indicators it is possible to make the adaptation of the algorithm to the new trend much faster.
The use of ADX, to confirm the existence of a trend turned out to be of no use, since the obtained
results failed to achieve the intended objective.
The following objective focused on the limitation of the annual losses in -500 Pips. The objective was
achieved by combining previous tested cases, such as, HMM 15 30 90 and HMM MACD, HMM 15 30
90 and HMM RSI and for last the pair HMM MACD and HMM RSI. Thus, values obtained from the best
cases were selected for incorporation the automated method, these are:
DHMM MACD
DHMM MACD RSI
DHMM RSI and DHMM MACD
DHMM 15 30 90 and DHMM RSI
DHMM 15 30 90 and DHMM MACD
Having analysed the automated method, we observed that the aim of reducing annual losses was largely
achieved. This new approach could register 12 years of steady profits. In addition, the inclusion of a
STOP signal largely expanded capacity to contain unnecessary losses and readily adaptable even in
unpredictable market conditions. In Table 31 are described the results of the underlying methods that
are presented in the Multi DHMM as the result of the final method.
Algorithms HMM
MACD
HMM
MACD RSI
HMM RSI +
HMM MACD
HMM 15 30 90
+ HMM RSI
HMM 15 30 90
+ HMM MACD
Multi
DHMM
Total (Pip) 10038 9710 7240 5446 8244 26349
Table 27 - Summary of the results from sub-models and Final Model
The secondary analysis on the inclusion of the press conferences days by the ECB and FED concluded
that although they represent days of greater uncertainty, there is no advantage in removing the
respective days of algorithm analysis being noticed a degradation of the results when those days are
removed.
48
4.4 Conclusions
Throughout this chapter we analyzed the different case studies related to strategies that were chosen
along the development of the model. The viability of DHMM was tested and compared their performance
with the performance of CHMM. Subsequently were tested all sub-models until reaching the final model.
The results show that the inclusion of sub-models in the model developed during the process led to
much higher gains compared to the individual gains from each sub-model, i.e., the final model has a
gain of 26349 pips after 11 years (from 2002 to 2013), this value is 2.6 times higher than the best results
obtained from the separate analysis of the sub-models for the same period. This is due to the need for
confirmation of the estimate from the majority of the sub-models. The confirmation of the adaptation of
the model to new trends can also be confirmed from the model performance over the chosen years for
analysis since between 2002 and 2013 it can be found diversified behaviors of the EUR / USD and
despite this diversification, the model has de capability to adapt to each one of those cases.
49
CHAPTER 5 CONCLUSIONS AND FUTURE WORK
The work developed throughout this dissertation presents a new methodology for analysis and forecast
the direction of the Forex market daily closing price. For this purpose it was used the discrete version of
the HMM (DHMM) due to the transformation of data into three discrete values representatives of the
increase, price decrease and maintenance of the value from the previous day.
One of the great innovations of this work is the use of three DHMMs simultaneously for the prediction
of the direction of the market closing price. Each of these three DHMMs is trained in a different window
size, permitting that each one acquire a different sensitivity to fluctuations in market behavior. The
addition of technical indicators to the model to indicate changes in market trends enabled to use
specifically the particular feature of each DHMMs, making adaptation to different market behaviors much
more rapid and efficient. With the use of technical indicators and the three DHMM were developed sub-
models that showed different characteristics and results between them. The best were added later
creating a supermodel able to adapt and respond to the demands of the Forex market.
Tests were conducted using data from the Forex EUR / USD pair between January 2002 and December
2013. The sum of "price interest points" (pips) was the primary metric used for the analysis of the results
because it gives a greater perception of profit.
5.1 Conclusions
After developing the model and analyzed the results is easy to conclude that the strategy used to
forecast the direction of the daily closing value for the FX EUR / USD is presented as a great choice.
The ease with which this strategy adapts to new trends and behaviors of the market and the quality of
their predictions not only allowed faster adaptation but also a substantial reduction in losses that the
DHMM alone had. These features were made possible by the merger of technical indicators (RSI and
MACD), already widely used in technical analysis of financial markets, with the implementation strategy
of three DHMM. The results show that the inclusion of sub-models in the model developed during the
process led to much higher gains compared to the individual gains from each sub-model, i.e., the final
model has a gain of 26349 pips after 11 years (from 2002 to 2013), this value is 2.6 times higher than
the best results obtained from the separate analysis of the sub-models for the same period. This is due
to the need for confirmation of the estimate from the majority of the sub-models. The confirmation of the
adaptation of the model to new trends can also be confirmed from the model performance over the
chosen years for analysis since between 2002 and 2013 it can be found diversified behaviors of the
EUR / USD and despite this diversification, the model has de capability to adapt to each one of those
cases.
5.2 Future Work
A major problem identified during the development of the Multi-DHMM was the delay in adapting to new
market trends, these problems despite being clogged do not ceased to exist. Therefore, one of the key
points to be developed in future work could go through the reduction of time delay in detection of new
50
trends in financial time series. Beyond the point above, were also identified other changes that would
be interesting to develop in future work:
The changing trends and market behavior and the delay that the model notes to adapt to these
new circumstances create losses that although minimized are still be recorded. One possible
solution, possible to remedy that problem could involve the use of wavelets, for a time-frequency
analysis present in financial time series. It would be desirable to divide this analysis time series
into sets of frequencies and use a single model in each set. Can be associated the different
frequencies to different behaviors of the market and so it was possible to train each model to
different behaviors and market trends.
In this model is used as final result the most expected forecast. This type of prediction does not
take into account the performance of each sub-model at every moment. That said, it would be
interesting to study the use of an assessment metric to understand which sub-models have
been showing better performance in latest forecasts and predict taking into account this same
analysis.
Throughout the analysis of financial time series there may be times when the use of three
DHMMs may be considered insufficient or excessive, so it was interesting to examine how to
adapt the number of the possible need DHMMs analysis and pass a number of dynamic models
dependent on market behavior.
For the training of DHMM there are alternative models that could have been used, so it was
interesting to study the impact on the Multi-DHMM to use an alternative method to the Baum-
Welch Algorithm for model training.
One of the methods used to predict the value from the Hidden Markov Model is the use of
likelihood models. These models search in past data the most likely time and predict the next
moment from the next instant from the instant identified. It would be interesting to study the
impact and compare the results of using these types of models to predict compared to the Viterbi
algorithm used in the model developed for this thesis.
51
References
[1] M. R. King and D. Rime, “The $4 trillion question: what explains FX growth since the 2007
survey?,” pp. 27-39, December 2010.
[2] K. G. Kulkarni and G. A. Kulkarni, “FUNDAMENTAL ANALYSIS vs. TECHNICAL ANALYSIS: A
CHOICE OF SECTORAL ANALYSIS,” Internacional Journal of Engineering & Management
Sciences, vol. Vol. 4 Issue 2, p. 234, Apr. 2013.
[3] M. McDonald, “FOREX SIMPLIFIED,” in Behind the Scenes of Currency Trading, Marketplace
Books, Aug. 2007, pp. 19-32.
[4] C. J. Neely and P. A. Weller, “Technical Analysis in the Foreign Exchange Market,” Federal
Reserve Bank of St. Louis Research Division, St. Louis, Jan. 2011.
[5] P. S. Froidevaux, “Fundamental Equity Valuation,” in Stock Selection based on Discounted Cash
Flow, Fribourg, University of Fribourg, Faculty of Economics and Social Sciences, 2004, pp. 3-4.
[6] K. Maciejczyk and X. Hu, “Forex Analysis and Money Management,” in Interactive Qualifying
Project, Worcester, Worcester Polytechnic Institute, 2012, pp. 21-35.
[7] S. B. Achelis, “Technical Analysis From A-To-Z,” Vision Books, 2000.
[8] M. D. Sheimo, “Cashing in on the dow,” in Using Down Theory to Trade and Determine Trends in
Today's Markets, Apr. 1998, p. 87.
[9] W. J. Wilder, “New Concepts in Technical Trading Systems,” Greensboro, Trend Research,
1978.
[10] A. W. Lo, H. Mamaysky and J. Wang, “Foundations of Technical Analysis: Computational
Algorithms, Statistical Inference and Empirical Implementation,” The Journal of Finance, August
2000.
[11] J. Hayden, Trend Determination - a Quick, Accurate, & Effective Methodology.
[12] B. Zhou and J. Hu, “A Dynamic Pattern Recognition Approach Based on Neural Network for
Stock Time-Series,” Graduate School of Information, Production and System, Waseda
University, Fukuoka, 2009.
[13] A. Gupta and B. Dhingra, “Stock Market Prediction Using Hidden Markov Models”.
[14] M. R. Hassan, B. Nath and M. Kirley, “A fusion model of HMM, ANN and GA for stock market
forecasting,” Elsevier Ltd, Melbourne, 2006.
52
[15] A. Canelas, R. Neves and N. Horta, “A New SAX-GA Methodology Applied to Investment
Strategies Optimization,” in GECCO'12, 2012.
[16] E. Stephan and G. Kiell, “Decision processes in professional investors: Does expertise moderate
judgemental biases,” IAREP/SABE Proceeding, pp. 416-420.
[17] S. N. Neftci, “Naive Trading Rules in Fiancial Markets and Wiener-Kolmogorov Prediction
Theory: A Study of "Technical Analysis",” Journal of Business, Volume 64, Issue, 1991.
[18] J. L. Treynor and R. Ferguson, “In Defense of Technical Analysis,” in Anual Meeting American
Finance Association, Dallas, 1985.
[19] A. Gunasekarage and D. M. Power, “The profitability of moving average trading rules in South
Asian stock markets.,” in Emerging Markets Review 2, 2001, pp. 17-33.
[20] K.-Y. Kwon and J. R. Kish, “Technical trading strategies and return predictability: NYSE,” in
Applied Financial Economics, 2002.
[21] T. T.-L. Chong and W. K. Ng, “Technical analysis and the London stock exchange: Testing the
MACD and RSI rules using the FT30,” in Appl. Econ. Lett., 2008, pp. 1111-1114.
[22] T. T.-L. Chong, W.-K. Ng and V. K.-S. Liew, “Revisiting the Performance of MACD and RSI
Oscillators,” J. Risk Financial Manag. , vol. 7, pp. 1-12, 2014.
[23] G. A. Fink, “Markov Models for Pattern Recognition,” in From Theory to Applications, Dortmund,
Springer, 1998, pp. 61-92.
[24] A. Papoulis, “Brownian Movement and Markov Processes,” in Probability Random Variables and
Stochastic Processes, 2nd ed., New York, McGraw-Hill, 1984, pp. 515-553.
[25] D. Ramage, “Hidden Markov Models Fundamentals,” 2007.
[26] M. R. Hassan and B. Nath, “Stock Market Forecasting Using Hidden Markov Model: A New
Approach,” Computer Society, Melbourne, 2006.
[27] Y. Zhang, “Prediction of Financial Time Series With Hidden Markov Models,” Shandong, 2001.
[28] M. Collins, “The Forward-Backward Algorithm,” Department of Computer Science, Columbia
University, Columbia.
[29] L. J. Rodríguez and I. Torres, “Comparative Study of the Baum-Welch and Viterbi Training
Algorithms Applied to Read and Spontaneous Speech Recognition,” in Pattern Recognition and
Image Analysis, Springer, 2003, pp. 847-857.
53
[30] J. A. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation
for Gaussian Mixture and Hidden Markov Models,” International Computer Science Institute,
Berkeley, 1998.
[31] J. N. Liu and R. W. Kwong, “Automatic extraction and identification of chart patterns towards
financial forecast,” ScienceDirect, Hong Kong, 2006.
[32] X. Ge and P. Smyth, “Deformable Markov Model Templates for Time-Series Pattern Matching,”
Department of Information and Computer Science, University of California, Irvine, 2000.
[33] A. X. Huang and J. M, “Hidden Markov Models for speech recognition,” Edinburgh University
Press., 1990.
[34] R. L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech
Recognition,” Proceedings of the IEEE, vol. 77(2), pp. 257-286, 1989.
[35] G. Appel, Technical Analysis: Power Tools for Active Investors, FT Press, 2005.
[36] J. J. Murphy, “How to Spot Market Trends,” in The Visual Investor, John Wiley and Sons, 2009,
p. 100.
[37] A. Krogh, “An Introduction to Hidden Markov Models for Biological Sequences,” in Computational
Methods in Molecular Biology, Lyngby, Elsevier, 1998, pp. 45-63.
[38] N. Mimouni, G. Lunter and C. Deane, “Hidden Markov Models for Protein Sequence Alignment,”
University of Oxford, Oxford.
[39] M. Stanke and S. Waack, “Gene prediction with a hidden Markov model and a new intron
submodel,” Oxford Journals, vol. 19, no. Bioinformatics, pp. 215-225, 2003.
[40] C. Karlof and D. Wagner, “Hiden Markov Model Cryptanalysis,” Department of Computer
Science, University of California, Berkeley.
[41] S. M. Thede and M. P. Harper, “A Second-Order Hidden Markov Model for Part-of-Speech
Tagging,” pp. 175-182.
[42] M. Gales and S. Young, “The Application of Hidden Markov Models in Speech Recognition,”
Foundations and Trends in Signal Processing, vol. 1, pp. 195-304, 2007.
[43] M. Jensen and G. Bennington, “Random Walks and Technical Theories: Some Additional
Evidences,” Journal of Finance 25 , vol. 2, pp. 469-482, 1970.
54
1
APPENDIX 1 - GRAPHS USED TO ASSESS THE HMM VIABILITY
0 5 10 15
01234
Time
Val
ue
1.txt
0 5 10 15
01234
Time
Val
ue
2.txt
0 5 10 15
01234
time
valu
e
3.txt
0 5 10 15
01234
time
valu
e
4.txt
0 5 10 15
01234
time
valu
e
5.txt
0 5 10 15
01234
time
valu
e
6.txt
0 5 10 15
0
2
4
time
valu
e
7.txt
0 5 10 15
01234
time
valu
e
8.txt
2
3
APPENDIX 2 – HIDDEN MARKOV MODEL TUTORIAL
Markov Models
Considering the discrete stochastic sequence of random variables X1,X2,…,Xt , which take on values xt,
from a continuous or discrete domain according to individual probability distributions, it is said to be
stationary, if the probability distribution is the same for all random variables Xt. The process is further
said to be causal, if the distribution of the random variable Xt is only dependent on past states x1,x2,…,xt-
1. For a discrete, stationary and causal stochastic process the probability distribution can be written as:
𝑃(𝑋𝑡 = 𝑥𝑡|𝑋1 = 𝑥1, 𝑋2 = 𝑥2, … , 𝑋𝑡−1 = 𝑥𝑡−1)
Due to the causality of the stochastic process, while the sequence of random variables evolve over time
the number of random variables will grow considerably and, therefore, an arbitrary long set of
dependencies for the probability distribution can be generated.
The restriction of the causality in each stochastic process is imposed by the Markov property. , this
property refers that the conditional probability distribution depends only upon the last state to the
prediction of the present state, it also means that future states do not depend on the past, i.e. ,
memoryless.
A first order Markov process is a stationary and causal stochastic process that satisfies the Markov
property and can be written as:
𝑃(𝑋𝑡 = 𝑥𝑡|𝑋1 = 𝑥1, 𝑋2 = 𝑥2, … , 𝑋𝑡−1 = 𝑥𝑡−1) = 𝑃(𝑋𝑡 = 𝑥𝑡|𝑋𝑡−1 = 𝑥𝑡−1)
Markov Chains
A Markov Chain is a collection of random variables {Xt} having the property that, given the present, the
future is conditionally independent of the past, in other words, (24) define a Markov chain, where the
possible values of Xi form a countable set S called the state space of the chain.
For example, considering the Forex market where we are interested in the movement of the GBP/EUR
pair and representing the 0, 1 and 2 nodes as a possible 3 states vector {drop, no change, rise} one can
construct the Markov chain in Figure 12.
4
Example of a Markov Chain
According to the Figure 12 a drop on price is followed by another drop 38% of the time, a rise 60% of
the time and have no change in 0,02% of the time. The transition matrix for this example is:
𝐴 = [0,38 0,02 0,600,42 0,05 0,580,60 0,06 0,34
]
To predict which price movement is more likely to happen in time t+3, if in time t the system is in state
0 (drop). The distribution over states can be re-written using a stochastic row vector 𝑥 with the relation
𝑥(𝑡+1) = 𝑥(𝑡)𝐴 :
𝑥(𝑡+3) = 𝑥(𝑡+2)𝐴 = (𝑥(𝑡+1)𝐴)𝐴 = 𝑥(𝑡+1)𝐴2 = (𝑥(𝑡)𝐴2)𝐴 = 𝑥(𝑡)𝐴3
Using (27) when x(t) is the initial matrix 𝜋, one can predict that is more likely for the price to rise (be in state 2) in time t+3:
𝑥(𝑡+3) = [1 0 0] [0,38 0,02 0,600,42 0,05 0,580,60 0,06 0,34
]
3
= [0,480 0,039 0,484]
From the same model we can calculate the probability of a certain sequence, e.g., the probability of
seeing “drop, drop, rise, no change”. For this example the state sequence is defined as X = {0, 0, 2, 1}
and given the initial matrix 𝜋 and the transition matrix A the probability of the given sequence is: [6], [3]
𝑃(𝑋|𝐴, 𝜋) = 𝑃(0,0,2,1|𝐴, 𝜋) =∏𝜋𝑋𝑡𝑋𝑡+1 =
𝑇−1
𝑡=1
𝑃(0)𝑃(0|0)𝑃(2|0)𝑃(1|2)
= 𝜋 × a00 × a02 × a21
5
In Markov chains, the state is directly visible to the observer but in the financial market, despite the
previous example, is possible to find other important factors hidden from the observable data which
plays an important role to predict the market behaviour (e.g. volatility, overseas markets, monetary
policies…).
Introduction to the Hidden Markov Model
The Hidden Markov Model (HMM) presents as a model capable of overcome the Markov Chain
limitations. In this new model the state is not directly visible but the observation generated by the
probabilistic function associated to the state is visible. The process can be presented in the Bayesian
network shown at Figure 13. [5]
Bayesian network
In a HMM the state transition can be described within a finite and discrete state space while the
observation can either be discrete or continuous. As a Markov Model the Hidden Markov Model satisfies
the Markov property, therefore, each state in a given time t only depend on his immediate predecessor
state t-1 and as in every time t a new observation (also known as emission) is generated and only
dependent on the respective state in time t both can be characterized as (29) and (30).
𝑃(𝑆𝑡|𝑆1, 𝑆2, … , 𝑆𝑡−1) = 𝑃(𝑆𝑡|𝑆𝑡−1)
𝑃(𝑂𝑡|𝑂1 , 𝑂2, … , 𝑂𝑡−1, 𝑆1, 𝑆2, … , 𝑆𝑡) = 𝑃(𝑂𝑡|𝑆𝑡)
A Hidden Markov Model of first order (usually denoted as λ) can be completely characterized be the
following [1], [4]
a finite set of states
a finite (discrete) or infinite (continuous) set of observations.
a state transition probability matrix A:
𝑨 = {𝑎𝑖𝑗|𝑎𝑖𝑗 = 𝑃(𝑆𝑡 = 𝑗|𝑆𝑡−1 = 𝑖)}
a vector π of start probabilities
𝝅 = {𝜋𝑖|𝜋𝑖 = 𝑃(𝑆1 = 𝑖)}
6
an observation emission probability distribution that characterizes each state
{𝑏𝑗(𝑜𝑘)|𝑏𝑗(𝑜𝑘) = 𝑃(𝑂𝑡 = 𝑜𝑘|𝑆𝑡 = 𝑗)}
However, if the observation set is finite each observation have a symbolic nature and the quantity 𝑏𝑗(𝑜𝑘)
represent a discrete probability distribution which can be described as an emission matrix of probabilities
(34). If the set of observations are an infinite set of vector valued quantities 𝑥 ∈ ℝ𝑛, the observations are
described from a continuous probability density function (35).
𝑩 = {𝑏𝑗(𝑜𝑘)|𝑏𝑗(𝑜𝑘) = 𝑃(𝑂𝑡 = 𝑜𝑘|𝑆𝑡 = 𝑗)}
𝑏𝑗(𝑥) = 𝑝(𝑥|𝑆𝑡 = 𝑗)
Three fundamental questions in the HMM
Taken the characterization of HMM, to understand how the model can be used to analyse and predict a
certain temporal pattern it is important to answer three fundamental questions [1][4][8][9]
Evaluation
Given the model 𝜆 = (𝐴, 𝐵, 𝜋) how can we assess the quality of the model to describe the
statistical properties of certain data (𝑂|𝜆) ?
This question, also known as filtering, can also be interpreted as the need to compute the probability of
a state at a certain time, given the history of evidence.
To resolve the previous question one can use a merely probabilistic analysis (36) to assess the
production probability 𝑃(𝑂|𝜆), however, this strategy is inefficient leading to an exponential number of
operations, NT. [9]
𝑃(𝑂|𝜆) =∑𝑃(𝑂|𝑠, 𝜆)𝑃(𝑠|𝜆)
𝑠
where
𝑃(𝑂, 𝑠|𝜆)𝑃(𝑠|𝜆) =∏ 𝑎𝑠𝑡−1,𝑆𝑡𝑏𝑠𝑡
𝑇
𝑡=1(𝑂𝑡)
Forward Algorithm
Given a certain model λ in state j and time t, is totally irrelevant to know which path and generated
outputs have led to the respective state. Due to the Markov property is sufficient to consider all possible
states in t-1.
The Forward algorithm presents itself as a solution to the previous problem, taking into account the
irrelevant of the past states and decreases drastically the complexity to linear in T.
To compute the production probability the forward algorithm uses the variable 𝛼𝑡(𝑖), known as forward
variable, is defined as the probability of ending in state 𝑠𝑖 given the observation sequence 𝑂1 , 𝑂2, … , 𝑂𝑡
7
𝛼𝑡(𝑖) = 𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠𝑡 = 𝑖|𝜆)
The algorithm is described as follow:
Initialization
𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1)
Recursion
𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑂𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑖 , for t = 1 … T - 1
Termination
𝑃(𝑂|𝜆) = ∑ 𝛼𝑇(𝑖)
𝑁
𝑖=1
Complementary to the forward algorithm a similar process can be used to take account future history.
This process is called smoothing and use the backward algorithm.
Backward Algorithm
As the forward algorithm, the backward algorithm has its own quantity and is referred as the backward
variable. It represents the probability of the ending sequence 𝑂𝑡+1, 𝑂𝑡+2, 𝑂𝑇 given the model 𝜆 and the 𝑠𝑖
at time t.
𝛽𝑡(𝑗) = 𝑃(𝑂𝑡+1, 𝑂𝑡+2, … , 𝑂𝑇|𝑠𝑡 = 𝑗, 𝜆)
The algorithm is described as follow:
Initialization
𝛽𝑇(𝑖) = 1
Recursion
𝛽𝑡(𝑖) = ∑ 𝑎𝑖𝑗𝑏𝑗(𝑂𝑡+1)𝛽𝑡+1(𝑗)𝑗 , for t = T – 1 … 1
Termination
𝑃(𝑂|𝜆) = ∑ 𝜋𝑖𝑏𝑖(𝑂1)𝛽1(𝑖)
𝑁
𝑖=1
Decoding
What is the most probable state sequence that for a given model 𝜆 = (𝐴, 𝐵, 𝜋) generate the
observation sequence?
The question is resolved by the Viterbi algorithm. This algorithm look at every state sequence and simply
select the most likely sequence in a process assumed to be a finite-state and discrete in time Markov
process.
8
Viterbi Algorithm
Like the forward or the backward algorithm, the Viterbi algorithm also have a variable represented
by 𝛿𝑡(𝑖). This new variable generate the segment from the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 with
maximum likelihood and ends in state 𝑠𝑖.
𝛿𝑡(𝑖) = max𝑠1,𝑠2,…,𝑠𝑡−1
𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠1, 𝑠2, … , 𝑠𝑡−1, 𝑠𝑡 = 𝑖|𝜆)
This variable 𝛿𝑡(𝑖) can be compared with the forward variable 𝛼𝑡(𝑖), except that the Viterbi algorithm
uses maximization instead a summation over previous states.
Optimal path using Viterbi Algorithm
The Viterbi algorithm is described as follows:
Initialization:
For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 1 we have:
𝛿1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1),
𝜓1(𝑖) = 0,
Recursion:
For all times 𝑡, 1 ≤ 𝑡 ≤ 𝑇 − 1 and all states 𝑖, 𝑗 ∈ [1, 𝑁] we have:
𝛿𝑡+1(𝑗) = max𝑖{𝛿𝑡(𝑖) 𝑎𝑖𝑗}𝑏𝑗(𝑂𝑡+1)
𝜓𝑡+1(𝑗) = 𝑎𝑟𝑔max𝑖{𝛿𝑡(𝑖)𝑎𝑖𝑗}
Termination:
For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 𝑇 we have:
𝑃∗(𝑂|𝜆) = 𝑃(𝑂|, 𝑠∗|𝜆) = max𝑖𝛿𝑇(𝑖)
9
𝑠𝑇∗ = 𝑎𝑟𝑔max
𝑗 𝛿𝑇(𝑗)
Optimal Path
Back-tracking for all times 𝑡, 𝑇 − 1 ≥ 𝑡 ≥ 1 we have:
𝑠𝑡∗ = 𝜓𝑡+1(𝑠𝑡+1
∗ )
In the previous description of the Viterbi algorithm one can observe a new variable 𝜓𝑡(𝑗) known as
backward pointer that for each 𝛿𝑡(𝑖) stores the optimal predecessor state. The optimal path can be
recursively reconstructed in inverse chronological order using (53).
The complexity of the Viterbi algorithm is 𝑂(𝑇 × |𝑆|2).
Parameter Estimation
How can we estimate the model parameters given an observation set?
In order to answer the last question is important to use an algorithm capable of finding the unknown
parameters 𝜆 = {𝜋, 𝐴, 𝐵} of a HMM. Although exists some algorithms with capacity to address the
question, due to the type of data on which it will be used it is necessary that the algorithm does not need
any model initialization. This algorithm is called the Baum-Welch and use the EM algorithm to find the
maximum likelihood estimate of 𝜆 = {𝜋, 𝐴, 𝐵} given the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 and uses the
production probability 𝑃(𝑂|𝜆) as the optimization criterion.
The Viterbi training algorithm is another algorithm that can be presented as an answer to the parameter
estimation question, however, The Viterbi training requires some reasonable initialization and makes a
limited use of the training data and is less robust, since it only uses observations inside the segments
corresponding to a given HMM state to re-estimate the parameters of that state. [11][1][6][12]
Baum-Welch Algorithm
The Baum-Welch algorithm starts using the Forward-Backward algorithm. Once this algorithm
corresponds to an aggregation of the Forward and the Backward algorithms explained before no detailed
explanation will be given.
Before starting with the iterative process the transition matrix A, emission matrix B and initial matrix 𝜋
from 𝜆 are set with random initial conditions.
The Baum-Welch algorithm is described as follow:
Forward Procedure:
Having 𝛼1(𝑖) = 𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of ending in state 𝑠𝑖 given the observation
sequence 𝑂1, 𝑂2, … , 𝑂𝑡 is recursively computed,
3. 𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1)
4. 𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑂𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑁𝑖=1
10
Backward Procedure:
Having 𝛽𝑡(𝑖) = 𝑃(𝑂𝑡+1, 𝑂𝑡+2, … , 𝑂𝑡|𝑠𝑡 = 𝑖, 𝜆), the probability of the ending sequence 𝑂𝑡+1, 𝑂𝑡+2, 𝑂𝑇 given
the model 𝜆 and the 𝑠𝑖 at time t is recursively computed,
3. 𝛽𝑇(𝑖) = 1
4. 𝛽𝑡(𝑖) = ∑ 𝛽𝑡+1(𝑗)𝑁𝑗=1 𝑎𝑖𝑗𝑏𝑗(𝑂𝑡+1)
Optimization:
It is now possible to compute the temporary variables:
𝛾𝑡(𝑖) = 𝑃(𝑠𝑡 = 𝑖|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛽𝑡(𝑖)
∑ 𝛼𝑡(𝑗)𝛽𝑡(𝑗)𝑁𝑗=1
,
This quantity 𝛾𝑡(𝑖) represents the probability of being in state 𝑠𝑖 and time t having the observation set
𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.
𝜉𝑡(𝑖, 𝑗) = 𝑃(𝑠𝑡 = 𝑖, 𝑠𝑡+1 = 𝑗|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛼𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)
∑ ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)𝑁𝑗=1
𝑁𝑖=1
,
This quantity 𝜉𝑡(𝑖, 𝑗) represents the probability of being in state i and j in times t and t+1 respectively,
having the observation set 𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.
With the computation of this two quantities it is now possible to update the model determining the
expected quantities �̂� = {�̂�, �̂�, �̂� }
Update of the model �̂�:
𝜋�̂� = 𝛾1(𝑖)
𝑎𝑖�̂� =∑ 𝜉𝑡(𝑖,𝑗)𝑇−1𝑡=1
∑ 𝛾𝑡(𝑖)𝑇−1𝑡=1
𝑏�̂�(𝑘) =∑ 𝛾𝑡(𝑖)𝑡=1,𝑂𝑡=𝑜𝑘
∑ 𝛾𝑡(𝑖)𝑇𝑡=1
Termination:
If the quality measure 𝑃(𝑂|�̂�) was not improved by the updated model �̂� compared to 𝜆, the process
stops, however, if not, repeat all steps.
11
Flowchart of the Baum-Welch algorithm for a discrete HMM
The previous description from the Baum-Welch algorithm only take into account a discrete hidden
Markov model. In a continuous hidden Markov model the function 𝑏𝑗(𝑜𝑡) is in the form of a continuous
probability density function (pdf) or a mixture of continuous pdfs, therefore, the procedure will be slightly
different.
The Continuous Hidden Markov Model
To represent a continuous sequence or vector valued quantities the emission probability function 𝑏𝑗(𝑜𝑡)
can no longer be described as a simple matrix of probabilities but rather as a continuous probability
function or a defined number of continuous pdfs as the Gaussian mixture represented in (56) in which
M is the number of mixtures, 𝑤𝑗𝑘 is the weight in mixture k and state j and 𝑁(𝑜𝑡|𝜇𝑗𝑘 , 𝐶𝑗𝑘) represents the
Gaussian density, also known as Normal distribution.
𝑏𝑗(𝑥) = ∑𝑤𝑗𝑘𝑁(𝑜𝑡|𝜇𝑗𝑘, 𝐶𝑗𝑘)
𝑀
𝑘=1
Gaussian Densities
Depending on whether one uses a simple continuous sequence or a vector quantity the type of Gaussian
density will be different. For the first example a simple Gaussian density function will be performed:
𝑔𝑗𝑘(𝑜𝑡) = 𝑁(𝑜𝑡|𝜇𝑗𝑘, 𝐶𝑗𝑘) =
1
√2𝜋𝜎exp (−
(𝑥 − 𝜇)2
2𝜎2)
However, if 𝑜𝑡 is a vector a multivariate Gaussian density is used in the form:
𝑔𝑗𝑘(𝑜𝑡) = 𝑁(𝑜𝑡|𝜇𝑗𝑘, 𝐶𝑗𝑘) =
1
(2𝜋)𝐷/2|𝐶𝑗𝑘|1/2
exp (−1
2(𝑜𝑡 − 𝜇𝑗𝑘)
𝑇𝐶𝑗𝑘−1(𝑜𝑡 − 𝜇𝑗𝑘))
Baum-Welch Algorithm
Given the new emission probability function the Baum-Welch algorithm needs to consider the mean and
covariance values from the Gaussian distribution, therefore, a new quantity 𝛾𝑡(𝑖, 𝑗) is defined which
represents the probability of selecting in state j the kth mixture component at time t for generating the
continuous observation 𝑜𝑡.
12
Similar to the discrete parameter estimation, before starting with the iterative process the model
parameters are set with random initial conditions, however, the continuous model presents a different
emission function characterized by a Gaussian density, therefore, three new quantities are initialized in
addition to the transition matrix A and initial matrix, those are the covariance 𝐶, the mean matrices 𝜇
and the Gaussian mixtures 𝑤. The covariance 𝐶 represents a matrix of matrices, the mean 𝜇 represents
a matrix of vectors representing the mean vector of the vector valued quantities while the matrix 𝑤 is a
states-by-mixtures that stores all different Gaussian mixtures in each state. Taken the last statement
into account the parameter estimation for continuous observations using Baum-Welch algorithm is
described as follows:
Forward Procedure:
Having 𝛼1(𝑖) = 𝑝(𝑜1, 𝑜2, … , 𝑜𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of ending in state 𝑠𝑖 given the observation
sequence of vectors 𝑜1, 𝑜2, … , 𝑜𝑡 is recursively computed,
1. 𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑜1)
2. 𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑜𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑁𝑖=1
Backward Procedure:
Having 𝛽𝑡(𝑖) = 𝑝(𝑜1, 𝑜2, … , 𝑜𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of the ending sequence of vectors 𝑜1, 𝑜2, … , 𝑜𝑡
given the model 𝜆 and the 𝑠𝑖 at time t is recursively computed,
1. 𝛽𝑇(𝑖) = 1
2. 𝛽𝑡(𝑖) = ∑ 𝛽𝑡+1(𝑗)𝑁𝑗=1 𝑎𝑖𝑗𝑏𝑗(𝑜𝑡+1)
Optimization:
𝛾𝑡(𝑖) = 𝑃(𝑠𝑡 = 𝑖|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛽𝑡(𝑖)
∑ 𝛼𝑡(𝑗)𝛽𝑡(𝑗)𝑁𝑗=1
,
𝜉𝑡(𝑖, 𝑗) = 𝑃(𝑠𝑡 = 𝑖, 𝑠𝑡+1 = 𝑗|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛼𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑜𝑡+1)
∑ ∑ 𝛼𝑡(𝑘)𝑎𝑘𝑙𝛽𝑡+1(𝑗)𝑏𝑙(𝑂𝑡+1)𝑁𝑙=1
𝑁𝑘=1
,
𝛾𝑡(𝑖, 𝑗) = 𝛾𝑡(𝑖)
𝑤𝑖𝑗𝑔𝑖𝑗(𝑜𝑡)
𝑏𝑖(𝑜𝑡)
With the computation of this three quantities it is now possible to update the model determining the
expected quantities �̂�, �̂�, �̂�, �̂�:
Update:
𝜋�̂� = 𝛾1(𝑖)
13
𝑎𝑖�̂� =∑ 𝜉𝑡(𝑖,𝑗)𝑇−1𝑡=1
∑ 𝛾𝑡(𝑖)𝑇−1𝑡=1
𝑤𝑖𝑗 =∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1
∑ 𝛾𝑖(𝑡)𝑇𝑡=1
𝜇𝑖𝑗 =∑ 𝛾𝑖𝑗(𝑡)𝑜𝑡𝑇𝑡=1
∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1
𝐶𝑖𝑗 =∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1 (𝑜𝑡−𝜇𝑖𝑗)(𝑜𝑡−𝜇𝑖𝑗)
𝑇
∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1
Termination:
If the quality measure 𝑃(𝑜|�̂�) was not improved by the updated model �̂� compared to 𝜆, the process
stops, however, if not, repeat all steps.
Flowchart of the Baum-Welch algorithm for a continuous HMM