Forex Market Prediction Using Multi Discrete Hidden Markov ......dentro do intervalo de teste consta um período de grande crise e instabilidade financeira como o verificado em 2008.

Forex Market Prediction Using Multi Discrete Hidden

Markov Models

José Pedro de Oliveira Alves

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Prof. Doutor Rui Fuentecilla Maia Ferreira Neves

Prof. Doutor Nuno Cavaco Gomes Horta

Examination Committee

Chairperson: Prof. Doutor Horácio Cláudio Campos Neto

Supervisor: Prof. Doutor Rui Fuentecilla Maia Ferreira Neves

Member of the Committee: Prof. Doutor Joaquim Amaro Graça Pires Faia e Pina

Catalão Lopes

May 2015

i

Resumo

O trabalho desenvolvido ao longo desta dissertação de mestrado apresenta uma nova metodologia de

análise e previsão diária da direção do preço de fecho do mercado Forex. Com este objetivo, foi utilizado

o “Hidden Markov Model” (HMM) largamente utilizado em diferentes áreas para análise de séries

temporais (e.g., predição genética, criptoanalise, reconhecimento da fala). Neste trabalho é utilizada a

versão discreta do HMM (DHMM) devido à transformação dos dados em três valores discretos

representantes do aumento, decréscimo e manutenção do preço em relação ao dia anterior. A utilização

da versão discreta mostrou-se um desafio já que em grande parte das aplicações do HMM em

mercados financeiros é utilizada a versão contínua do mesmo. Os parâmetros do modelo são treinados

utilizando o algoritmo de Baum-Welch e a previsão é feita a partir de uma adaptação do algoritmo de

Viterbi.

Uma das grandes novidades deste trabalho é a utilização de três DHMMs em simultâneo na previsão

da direção do preço de fecho do mercado. Cada um dos três DHMMs é treinado com uma janela de

tamanho diferente, permitindo que cada um adquira uma sensibilidade diferente a flutuações no

comportamento do mercado. A adição de indicadores técnicos ao modelo para indicar modificações

na tendência do mercado possibilitou utilizar de forma específica as especificidades de cada um dos

DHMMs, tornando a adaptação a comportamentos diferentes do mercado muito mais eficiente. Com a

utilização de indicadores técnicos e dos três DHMM, foram desenvolvidos submodelos que

apresentaram diferentes características e resultados entre eles. Os melhores foram posteriormente

agregados criando assim um supermodelo capaz de se adaptar e responder às exigências do

mercado Forex.

Os testes realizados utilizaram dados Forex do par EUR/USD entre Janeiro de 2002 a Dezembro de

2013. O somatório de “price interest points” (pips) foi a principal métrica utilizada para a análise dos

resultados isto porque dá uma maior perceção do lucro a obter quando utilizado o modelo apresentado

nas datas analisadas. Os resultados obtidos mostram um excelente desempenho indicando uma

percentagem de acerto de 57% e atingindo um total positivo de 26349 pips ao fim dos 11 anos

analisados. Este resultado mostra a fiabilidade e qualidade do modelo desenvolvido mesmo quando

dentro do intervalo de teste consta um período de grande crise e instabilidade financeira como o

verificado em 2008.

Palavras-Chave: Deteção de padrões, MACD, mercados financeiros, Modelo Oculto de Markov, multi-

HMM, previsão, RSI, séries temporais.

ii

iii

Abstract

The work developed throughout this dissertation presents a new methodology for analysis and forecast

the direction of the Forex market daily closing price. For this purpose, it was used the "Hidden Markov

Model" (HMM), widely used in different areas for time series analysis (e.g., gene prediction,

cryptanalysis, speech recognition). This work used the discrete version of the HMM (DHMM) due to the

transformation of data into three discrete values representatives of the increase, price decrease and

maintenance of the value from the previous day. The use of discrete version proved to be a challenge

since in most applications of HMM in financial markets is used continuous version. The model

parameters are trained using the Baum-Welch algorithm and the prediction is made from the adaptation

of the Viterbi algorithm.

One of the great innovations of this work is the use of three DHMMs simultaneously for the prediction

of the direction of the market closing price. Each of these three DHMMs is trained with different window

sizes, permitting that each one acquire a different sensitivity to fluctuations in market behavior. The

addition of technical indicators to the model to indicate changes in market trends enabled to use

specifically the particular feature of each DHMMs, making adaptation to different market behaviors much

more rapid and efficient. With the use of technical indicators and the three DHMM were developed sub-

models that showed different characteristics and results between them. The best were added later

creating a supermodel able to adapt and respond to the demands of the Forex market.

Tests were conducted using data from the Forex EUR / USD pair between January 2002 and December

2013. The sum of "price interest points" (pips) was the primary metric used for the analysis of the results

because it gives a greater perception of profit. The results show an excellent performance indicating a

percentage of 57% of correct predictions and reaching a total of 26349 pips at the end of the 11 years.

This result shows the reliability and quality of the model developed even when within the test range is

included a period of great financial instability as seen in 2008.

Keywords: Financial market, forecasting, Forex, Hidden Markov Model, MACD, multi-HMM time series

analysis, pattern discovery, RSI, time series.

iv

v

Acknowledgements

This dissertation could not have been done without the help, cooperation and interest shown by the

supervisor Prof. Dr. Rui Fuentecilla Maia Ferreira Neves and co-supervisor Prof. Dr. Nuno Cavaco

Gomes Horta.

I want to thank to my family for all their support.

Finally I want to thank my wife for all the unconditional support, patience and encouragement passed

over the preparation of this dissertation.

vi

vii

Table of Contents

Resumo .....................................................................................................................................................i

Abstract.................................................................................................................................................... iii

Acknowledgements ..................................................................................................................................v

Table of Contents ................................................................................................................................... vii

List of Tables ........................................................................................................................................... ix

List of Figures .......................................................................................................................................... xi

List of Acronyms and Abbreviations ...................................................................................................... xiii

CHAPTER 1 INTRODUCTION ............................................................................................................. 1

1.1 Motivation ................................................................................................................................ 2

1.2 Work’s Purpose ....................................................................................................................... 2

1.3 Document Structure ................................................................................................................. 2

CHAPTER 2 RELATED WORK ............................................................................................................ 3

2.1 Foreign Exchange Market ....................................................................................................... 3

2.2 Market Analysis ....................................................................................................................... 3

2.2.1 Fundamental Analysis ..................................................................................................... 3

2.2.2 Technical Analysis ........................................................................................................... 4

2.3 Existing Solutions .................................................................................................................... 8

2.3.1 A New Sax-GA Methodology ........................................................................................... 9

2.3.2 A Dynamic Pattern Recognition Approach Based on Neural Network .......................... 11

2.3.3 Stock Market Forecasting Using Hidden Markov Model: A New Approach .................. 13

2.3.4 Stock Market Prediction Using Hidden Markov Models ................................................ 14

2.3.5 A fusion model of HMM, ANN and GA for stock market forecasting ............................. 15

2.4 Why a Discrete HMM approach ............................................................................................. 18

2.5 Conclusions ........................................................................................................................... 19

CHAPTER 3 MULTI DISCRETE HMM APPROACH .......................................................................... 21

3.1 Discrete HMM for Financial Time Series Analysis ................................................................ 21

3.1.1 Historic data transformation ........................................................................................... 21

3.1.2 DHMM Training: Baum-Welch Algorithm ...................................................................... 22

3.1.3 DHMM Testing: Viterbi Algorithm .................................................................................. 24

viii

3.1.4 Why Multi Hidden Markov Models ................................................................................. 26

3.2 Developed Methods ............................................................................................................... 27

3.2.1 Development of a Multi HMM strategy .......................................................................... 27

3.2.2 Fusion between different methods ................................................................................ 30

3.3 Multi DHMM Automation ........................................................................................................ 31

3.4 Conclusion ............................................................................................................................. 32

CHAPTER 4 RESULTS ...................................................................................................................... 33

4.1 HMM Analysis, Comparison and Decision ............................................................................ 33

4.1.1 Case Study I – Analysis with specific patterns .............................................................. 33

4.1.2 Case Study II – Analysis with real data ......................................................................... 34

4.1.3 Case Study II – Continuous and Discrete HMM comparison ........................................ 35

4.2 Construction and Improvement ............................................................................................. 36

4.2.1 Case Study I – Simple means ....................................................................................... 36

4.2.2 Case Study II – Usage of mixed training days .............................................................. 38

4.2.3 Case Study IV – MACD indicator .................................................................................. 41

4.2.4 Case Study V – Combining RSI and MACD with ADX .................................................. 42

4.2.5 Case Study VI – Combining MACD and RSI ................................................................. 42

4.2.6 Case Study VII – Combining MACD, RSI and mixed training days .............................. 43

4.3 Multi DHMM Automation ........................................................................................................ 45

4.4 Conclusions ........................................................................................................................... 48

CHAPTER 5 CONCLUSIONS AND FUTURE WORK ........................................................................ 49

5.1 Conclusions ........................................................................................................................... 49

5.2 Future Work ........................................................................................................................... 49

APPENDIX 1 - GRAPHS USED TO ASSESS THE HMM VIABILITY .................................................... 1

APPENDIX 2 – HIDDEN MARKOV MODEL TUTORIAL ........................................................................ 3

ix

List of Tables

Table 1 - List of macroeconomic indicators ............................................................................................. 4

Table 2 - Breakpoints vs. a division from [15] ....................................................................................... 10

Table 3 - Number of NN inputs and outputs and corresponding patterns from [12] ............................. 13

Table 4 - Comparison between models ................................................................................................. 18

Table 5 - Transform pseudo code ......................................................................................................... 22

Table 6 - Selected models from each phase ......................................................................................... 31

Table 7 - Results using a predefined set of patterns ............................................................................. 34

Table 8 - Results using different training window sizes ........................................................................ 35

Table 9 - DHMM and CHMM comparison using hourly, daily and weekly data .................................... 35

Table 10 - DHMM and CHMM comparison using data from 2002 to 2013 and 60 days window ......... 36

Table 11 - Total of the DHMM and Simple Mean results from 2002 to 2013 ........................................ 37

Table 12 - Results of different sets of DHMM training window sizes from 2002 to 2013 ...................... 38

Table 13 - Results from 2002 to 2013 of the DHMM training using 15, 30 and 90 days window size .. 39

Table 14 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 20 and 45 days steps ...... 40

Table 15 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 15 and 20 days steps ...... 40

Table 16 - Results from 2002 to 2013 from Multi DHMM using MACD and Divergence ...................... 41

Table 17 - Results from 2002 to 2013 from Multi DHMM using Divergence ......................................... 41

Table 18 - Results from 2002 to 2013 from Multi DHMM using MACD ................................................ 42

Table 19 - Results from 2002 to 2013 ................................................................................................... 42

Table 20 - Results from 2002 to 2013 from Multi DHMM combining MACD and RSI .......................... 43

Table 21 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM MACD ............................ 43

Table 22 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM RSI ................................. 44

Table 23 - Results from 2002 to 2013 combining HMM MACD and HMM RSI .................................... 44

Table 24 - Resulting percentages from the final model ......................................................................... 45

Table 25 - Resulting percentages from the final model without stop signal .......................................... 45

Table 26 - Results from 2002 to 2013 from the Final Model ................................................................. 46

Table 27 - Summary of the results from sub-models and Final Model ................................................. 47

x

xi

List of Figures

Figure 1- Example of a 50 days and 30 days moving averages ............................................................. 5

Figure 2 - RSI (below) and EUR/USD (above) Graphic .......................................................................... 6

Figure 3 - MACD (green line), Signal (blue line) and Histogram (red) Graph below with EUR/USD

graph above ............................................................................................................................................. 7

Figure 4 - ADX (black), +DI (green) and -DI (red) graph below with EUR/USD graph above ................ 8

Figure 5 - Normalization process of the stock quote time series from [15] ............................................. 9

Figure 6 - SAX representation from [15] ................................................................................................ 10

Figure 7 - Flexible length window design .............................................................................................. 12

Figure 8 - Flexible length window design .............................................................................................. 13

Figure 9 - Block diagram of the fusion model from [14] ........................................................................ 15

Figure 10 - Integrated ANN-HMM model ............................................................................................... 16

Figure 11 - GA optimization of the parameters of ANN HiMMI method ................................................ 17

Figure 122 - Transformation of the historic data ................................................................................... 21

Figure 133 - Flowchart of the HMM training .......................................................................................... 22

Figure 144 - Flowchart of the Baum-Welch algorithm for a discrete HMM ........................................... 24

Figure 155 – Optimal path using Viterbi Algorithm ................................................................................ 25

Figure 166 - Focus of different windows size ........................................................................................ 26

Figure 17 - Aggregation of both 0 and 1 signals into the 0 signal ......................................................... 27

Figure 18 - Flowchart of the three DHMM model integration ................................................................ 28

Figure 19 - Flowchart of a multi DHMM model with technical indicators .............................................. 29

Figure 20 - Technical indicators used in the Technical Indicator box stated in Figure 22 .................... 29

Figure 21 - Flowchart of the fusion model ............................................................................................. 30

Figure 22 - Different sub-models aggregation ....................................................................................... 30

Figure 23 - Flowchart of the final model ................................................................................................ 31

Figure 24 - Comparison of the 200 days SMA (left) and the 50 days SMA (right) in 2011 ................... 37

Figure 25 - Comparison of the 200 days SMA (left) and the 30 days SMA (right) in 2006 ................... 38

Figure 26 - Results from multi DHMM in 2010 ...................................................................................... 39

Figure 27 - Performance of each developed model from 2002 to 2013 ................................................ 45

Figure 28 - Final model resulting performance from 2002 to 2013 ....................................................... 46

xii

xiii

List of Acronyms and Abbreviations

Optimization and Computer Engineering Related

ADX – Average Directional Index

ANN – Artificial Neural Network

CHMM – Continuous Hidden Markov Model

DHMM – Discrete Hidden Markov Model

DI – Directional Indicator

EM – Expectation Maximization

EMA – Exponential Moving Average

GA – Genetic Algorithm

HMM – Hidden Markov Model

MACD – Moving Average Convergence Divergence

MAP – Maximum A Posteriori

NN – Same as ANN

PAA – Piecewise Aggregate Approximation

PIP – Perceptual Important Point

RSI – Relative Strength Index

SAX – Symbolic Aggregate Approximation

SMA – Simple Mean Average

Investment Related

CPI – Consumer Price Index

ECB – European Central Bank

EUR – Euro

FED - Federal Reserve System

Forex – Foreign Exchange Market

GDP – Gross Domestic Product

pip – Price interest point

USD – United States Dollar

xiv

1

CHAPTER 1 INTRODUCTION

Due to the enormous cash flow present on financial markets they are constantly attracting companies

seeking capital to expand their business and small investors who invest in order to achieve a return on

their investment. Amongst the different types of available financial markets the stock market and the

Forex (Foreign Exchange market) are the ones that attract more investment and curiosity due to the

volume of daily transactions that are performed [1]. Although these markets are very attractive for those

who decide to invest, they have important differences which led some investors to choose one over the

other [3].

The main difference between these two markets is in the daily average traded value that for the Forex

market, according to 2007 data, is 3.43 times the sum of all the bond markets of the world [1]. Another

difference lies in the possibility of using a higher level of leverage that despite significantly increase risks

also significantly increase earnings but since the Forex market is dependent on two different currencies

its analysis becomes more complex. Despite the greater complexity and risk of high leverages, Forex

was the financial market selected for analysis. Its choice is due to the high volume of money traded

daily. In recent years with the introduction of online trading and the evolution of technology increased

the ease with which one can invest and have access to the values of each market, thereby, also

increased the ease with which one can compute and have access to technical indicators to analyse

future market trends [1]. While it is important to analyse the behaviour of the market, the main goal of

an investor shall be the prediction of the future values and market trends. Many models have been

developed but due to the volatility and the non-linearity of the financial market has been extremely

difficult to develop effective models that can credibly predict the behaviour of the market. Amongst all

the research and models developed in recent years, a model that has shown good results is the hidden

Markov model. This machine learning model is already widely used in gene prediction [39], protein

folding [38], cryptanalysis [40], part-of-speech tagging [41] and speech recognition [42]. For the analysis

of financial time-series is typically used their continuous variant (CHMM) in order to analyse and predict

the exact value, or the closest possible approximation to the real value. This quest for the real value in

a continuous universe becomes quite difficult due to the infinite number of possibilities, a small deviation

in prediction, apart from providing the wrong value, can give wrong market trend information. For this

reason, the prediction of a certain trend can be presented as a credible alternative to the real value

prediction. With this objective, it was used simultaneously three DHMMs for the prediction of the

direction of the market closing price. Each of these three DHMMs is trained with different window sizes,

permitting that each one acquire a different sensitivity to fluctuations in market behavior. The efficient

adaptation to different market behaviors is achieved with the addition of technical indicators to the model

to indicate changes in market trends. With the use of technical indicators and the three DHMM were

developed sub-models that showed different characteristics and results between them. The best were

added later creating a supermodel able to adapt and respond to the demands of the Forex market.

2

1.1 Motivation

The possibility of taking in knowledge commonly used in telecommunication and speech recognition and

apply in a completely different field proved to be an extremely interesting challenge. I always had an

interest in financial markets but it was the choice of this dissertation that encouraged me to research

deeply and expand my knowledge on a subject with a huge importance today these days such as the

financial markets. Of course there was also the monetary attractive as it would be possible to use the

mechanisms developed in the course of this work to have some financial return by investing in Forex.

1.2 Work’s Purpose

This dissertation proposes a new Hidden Markov Model (HMM) approach to pattern discovery using

MACD and RSI technical indicators to assist in the HMM forecast. This approach will use three discrete

HMMs (DHMM) each of which will be trained with different windows size. Having been trained differently,

each HMM will have a different sensibility to direction variations in the financial time series. With the

assistance of the above technical indicators it will be possible to adapt each of the three different HMM

to the market behavior. Using the above approach will then developed a set of sub-models to be

integrated in the final solution.

The developed approach will be tested using EUR/USD Forex market historical data from January 2002

to December 2013 and is performed a sum of “price interest points” (pips) over the 11 years period.

Price interest point are used as the primary metric for the analysis of the results because it gives a

greater perception of profit difficult to obtain from other metric analysis.

1.3 Document Structure

The presented dissertation is structured as following:

Chapter 2: It is made a description of the financial market, specifically the Forex market and

thereafter the two most popular type of financial analysis are discussed and analyzed. After,

are analyzed some of the existing strategies that seek to address the same problem and is

described why the choice of the HMM as the main model.

Chapter 3: It is described the entire development process of the final model as its sub-models,

from the processing of data and model training, to the forecast of the next day market

direction.

Chapter 4: Are presented and analyzed all tests throughout the development of the sub

models and final model.

Chapter 5: Presents the conclusions of the model development and proposes possible

improvements to be explored in the future.

APPENDIX: Provides the list of graphics used to assess the HMM viability and provides a

HMM tutorial.

3

CHAPTER 2 RELATED WORK

2.1 Foreign Exchange Market

The foreign Exchange market is the market that more money moves daily and includes all world

currency. According to data from Bank for International Settlements [1] in 2010 the daily average foreign

exchange market turnover reached $4 trillion in April 2010, 20% higher than in 2007. More recently with

the emergence of the internet the Forex trading has ceased to belong to large corporations, large

financial institutions, central banks, hedge funds or extremely wealthy people and began to be

accessible to ordinary investors. This change, according the same data is one of the main factors that

enabled the growth of the Forex market of 20% from 2007 to 2010.

The daily fluctuations [3] typically represent less than 1% in the value of money making forex the less

volatile financial market in the world, this characteristic allows the use of higher leverages compared to

other financial markets. Although quite risky, the use of higher leverage has remained possible due to

its continuous operation (24 hours a day except weekends) and high liquidity as a result of a high trading

volume. Each currency [6] is intrinsically linked to his correspondent country or group of countries being

one of the most important determinants of a country’s relative level of economic health, this

characteristic makes that those influential factors responsible of the Exchange rates different from other

markets factors, such as those from the stock market. In the Forex market are included as main factors

the inflation, interest rates, current-account deficits, public debt, terms of trade, political stability and

economic performance and each one have a combination of factors in its analysis, this makes the Forex

analysis extremely complex.

2.2 Market Analysis

It is extremely important to analyze the market if one want to understand their behavior and predict its

movement to for this purpose are used a number of different exploring techniques approaches in its

analysis. The two most commonly used approaches are the Fundamental Analysis and Technical

Analysis. Although the purpose of these two techniques is the same, they are quite different. For the

Forex market a fundamental analysis examines macroeconomic indicators, asset markets and political

considerations. To analyse all this aspects of a currency pair the fundamental analysis takes relatively

long time compared to the technical analysis, on the other hand, the technical analysis uses past price

data and charts to find patterns that enables the prediction of future behaviour. A technical analysis

assumes that people will constantly repeat their behavior. This type of analysis has been become

extremely popular in recent years, as more and more people believe that all the necessary information

can be found in its charts.

2.2.1 Fundamental Analysis

To perform a solid fundamental analysis in the Forex market one must consider the trading currency

pair that, using macroeconomic indicators, asset markets and political considerations corresponding to

each currency and try to determine with these information the market behavior. When we analyze one

4

currency we are indirectly doing an analysis of its country or group of relevant countries, and therefore

a fundamental analysis in the Forex market will be different from an assets markets analysis. In Forex

is necessary to take into account macroeconomic indicators as shown in Table 1 [3]:

Macroeconomic

Indicator

Definition

GDP (Gross Domestic

Product)

The value of a country’s overall output of goods and services at market

prices, excluding net income from abroad

Interest Rates The annualized cost of credit or debt-capital computed as the

percentage ration of interest to the principal

Inflation Rates The percentage change in the value of the Wholesale Price Index (WPI)

on a year-on year basis

Unemployment Rates Number of unemployed persons divided by the number of people in the

labor force

Money Supply Total amount of money in circulation

Foreign Exchange

Reserves

Total of a country’s foreign currency deposit and bonds held by the

central bank and monetary authorities

Productivity Measurement of the amount of real GDP produced by an hour of labor

CPI (Consumer Price

Index)

Measurement of the changes in the prices paid by urban consumers for

a representative basket of goods and services

PPI (Producer Price

Index)

Measurement of the average change over time on the selling prices

received by domestic producers for their output

Table 1 - List of macroeconomic indicators

In addition to the macroeconomic indicators it is possible to detect a link between the value of a currency,

its asset markets and politics. Markets such as stock markets [5], gold and oil commodity prices often

align with some currencies trend. For instance, in the past, the behavior of the stock markets and the

USD were extremely similar and the rise and fall of the gold and oil price translate into movements in

the Forex markets. For instance, the fall of the oil price the Russia’s and Angola dependence of their oil

exportation has been appointed as one of the reasons why ruble is losing its value, at the other hand,

the Swiss political neutrality and the fact that a major part of its currency reserves have been backed by

gold made the Swiss franc a safe haven during periods of uncertainty.

2.2.2 Technical Analysis

Over the past two decades technical analysis has become popular and widely applied in financial

markets. Technical analysts [7][22] believe that the markets historical performance is an indication of

future performance and changes on price and volume already incorporates all the fundamental factors.

Thus, in order to monetarily benefit technical analysts have been developing a set of indicators with

predictable and analytical capabilities.

Due to its heuristic nature, [4][22] technical analysis can hardly be proven mathematically, consequently

different studies came across with different conclusions. Jensen and Benington [43] indicate that past

5

information cannot be used to predict future prices. Neftçi [17] argues that technical analysis cannot

beat the market if the underlying process is linear. Opposing to these conclusions Treynor and Ferguson

[18] argue that when the non-public information is considered, technical analysis can produce sizable

profits and Gunasekarage and Power [19], Kwon and Kish [20] and Chong and Ng [21] also report

significant excess returns to technical trading rules.

2.2.2.1 Technical Indicators

Mathematical metrics have been developed based on historic data/volume of an asset. Those metrics,

named technical indicators, aim to predict the value, direction or simply try to understand the behavior

of an asset. Some of the most used technical indicators are described next.

2.2.2.2 Moving Averages

Moving averages (MA) are one of the most used technical indicators, they are a trend-following based

on past prices. They are commonly used to smooth out short-term fluctuations and highlight longer-term

trends. The two most common used moving averages are the simple moving average (SMA) and the

exponential moving average (EMA), the first is a simple average over a defined number of time periods

while the second differs from the first by giving a higher weight to more recent prices. This type of

indicators used to determine levels of support and resistance and to identify trend directions. These

indicators also provide the basis for more complex indicators such as RSI and the MACD which will be

explained below

Figure 1- Example of a 50 days and 30 days moving averages

2.2.2.3 RSI – Relative Strength Index

Developed by Welles and Wilder in 1978, the RSI (Relative Strength Index) indicator is widely used

analyze the momentum of an asset and [16] is called an oscillator because it moves, or oscillates,

between 0 and 100 based on the price movement of the correspondent asset.

Welles and Wilder [9][11] consider that when RSI is above 70, an overbought condition is potentially

presented and consequently, as prices are too high the asset must be sold. Likewise, when RSI is below

30 an oversold condition is presented and as prices are considered to be low it presents as the best

opportunity to buy the asset.

6

To calculate the value of the RSI one must decide the period length. When Wilder developed the RSI

he used a 14-day period. As the RSI becomes less sensitive with longer time periods most books on

technical analysis when discussing RSI will typically use a 14-day look back period to perform its

calculations. The RSI calculation is as follow:

𝑅𝑆𝐼 = 100 −

100

1 + 𝑅𝑆, (1)

Where,

𝑅𝑆 =

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑎𝑖𝑛

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐿𝑜𝑠𝑠 (2)

The initial calculation for the average gain and loss are simple averages using the chosen time period.

Subsequent calculations are based on prior averages and the current gain or loss using an EMA

approach:

{

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑎𝑖𝑛 =

(𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐺𝑎𝑖𝑛) × (𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑 − 1) + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝐺𝑎𝑖𝑛

𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐿𝑜𝑠𝑠 =(𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐿𝑜𝑠𝑠) × (𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑 − 1) + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝐿𝑜𝑠𝑠

𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑

(3)

Figure 2 - RSI (below) and EUR/USD (above) Graphic

It is expected the occurrence of divergence with technical indicators. When using RSI, those

divergences can be interpreted as a signal. When the RSI don’t follow a price uptrend a negative

divergence is presented and the asset must be sold, otherwise, when the RSI don’t follow a price

downtrend, a positive divergence is presented and is an opportunity to acquire the asset.

2.2.2.4 MACD – Moving Averages Convergence and Divergence

The MACD indicator, created by Gerald Appel [35] in the late 1970’s, is a trend indicator which shows

relationship between prices and exponential moving averages (EMA). One of the advantages of MACD

is its potential to incorporate the aspects of both trend and momentum in a single indicator [8]. It is

calculated computing the difference between the 26 and 12 days EMA’s besides, a 9 day exponential

moving average of MACD, called “trigger” or “signal” is calculated to indicate long and short

opportunities. The MACD and the Signal calculation are as follow:

7

𝑀𝐴𝐶𝐷 = 𝐸𝑀𝐴[𝐶𝑙𝑜𝑠𝑒 𝑉𝑎𝑙𝑢𝑒𝑠, 12] − 𝐸𝑀𝐴[𝐶𝑙𝑜𝑠𝑒 𝑉𝑎𝑙𝑢𝑒𝑠, 26] (4)

𝑆𝑖𝑔𝑛𝑎𝑙 = 𝐸𝑀𝐴[𝑀𝐴𝐶𝐷 𝑉𝑎𝑙𝑢𝑒𝑠, 9] (5)

Figure 3 - MACD (green line), Signal (blue line) and Histogram (red) Graph below with EUR/USD graph above

To interpret and apply the MACD as a trend indicator, one need to examine the intersection between

the MACD and the Signal.

When the MACD cross above the Signal is an indication to buy the asset.

When the MACD cross below the Signal is an indication to sell the asset.

Divergence in MACD is also interpreted as a signal. When a negative divergence is present, i.e. when

MACD do not follow a price uptrend, it means there is a buy opportunity. On the other hand, when the

MACD do not follow a price downtrend, a positive divergence is present and there is a sell opportunity.

2.2.2.5 ADX – Average Directional movement Index

ADX is a trend strength indicator developed by J.Welles Wilder [9] in 1978. The ADX itself is an indicator

that measures trend strength rather than direction although to determine both direction and strength of

a trend the ADX needs to be complemented with other two indicators, the Plus Direction Indicator (+DI)

and Minus Directional Indicator (-DI) [21]. The ADX combines them and smooth the result with an

exponential moving average [20]. The calculation is as follow:

Calculation of the Up Move and Down Move using the low and high values of the price data:

𝑈𝑝 𝑀𝑜𝑣𝑒 = 𝑡𝑜𝑑𝑎𝑦′𝑠 ℎ𝑖𝑔ℎ − 𝑦𝑒𝑠𝑡𝑎𝑟𝑑𝑎𝑦′𝑠 ℎ𝑖𝑔ℎ (6)

𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 = 𝑦𝑒𝑠𝑡𝑒𝑟𝑑𝑎𝑦′𝑠 𝑙𝑜𝑤 − 𝑡𝑜𝑑𝑎𝑦′𝑠 𝑙𝑜𝑤 (7)

Calculation of the +DM and –DM :

Positive Directional Movement (+DM) :

𝐼𝑓 𝑈𝑝 𝑀𝑜𝑣𝑒 > 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 𝑎𝑛𝑑 𝑈𝑝 𝑀𝑜𝑣𝑒 > 0 → +𝐷𝑀 = 𝑈𝑝 𝑀𝑜𝑣𝑒 𝐸𝑙𝑠𝑒 → +𝐷𝑀 = 0

(8)

8

Negative Directional Movement (-DM):

𝐼𝑓 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 > 𝑈𝑝 𝑀𝑜𝑣𝑒 𝑎𝑛𝑑 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 > 0 → −𝐷𝑀 = 𝐷𝑜𝑤𝑛 𝑀𝑜𝑣𝑒 𝐸𝑙𝑠𝑒 → −𝐷𝑀 = 0

(9)

Calculation of the positive and negative Directional Indicator (DI) after selecting the time

period (Wilder used 14 days originally):

+𝐷𝐼 =

100 × 𝐸𝑀𝐴[+𝐷𝑀, 14]

average true range (10)

−𝐷𝐼 =

100 × 𝐸𝑀𝐴[−𝐷𝑀, 14]

average true range (11)

Where

𝑡𝑟𝑢𝑒 𝑟𝑎𝑛𝑔𝑒 = max[(ℎ𝑖𝑔ℎ − 𝑙𝑜𝑤), |ℎ𝑖𝑔ℎ − 𝑐𝑙𝑜𝑠𝑒𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠|, |𝑙𝑜𝑤 − 𝑐𝑙𝑜𝑠𝑒𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠|] (12)

Calculation of the ADX:

𝐴𝐷𝑋 =

100 × 𝐸𝑀𝐴[|(+𝐷𝐼) − (−𝐷𝐼),14]

(+𝐷𝐼) + (−𝐷𝐼) (13)

Figure 4 - ADX (black), +DI (green) and -DI (red) graph below with EUR/USD graph above

The value of ADX will oscillate between 0 and 100, if ADX crosses above 25, it is believed that prices

moving trend has enough strength. Conversely, when ADX crosses below 25 it is a warning to avoid

trend trading strategies.

2.3 Existing Solutions

Over the past few years have been developed different models and algorithms dedicated to the analysis

and forecasting of financial markets. Many of these approaches have proved ineffective or financially

unattractive, others in turn have raised interest of investors and researchers. In this chapter are revised

9

some of the most interesting methods presented these days with special focus to those using the HMM

in its process.

2.3.1 A New Sax-GA Methodology

This algorithm [15] combines a Symbolic Aggregate approXimation (SAX) technique, to describe the

financial time series, so that, relevant patterns can be efficiently identified, with an optimization kernel

based on genetic algorithms (GA) used to identify the most relevant patterns and generate investment

rules. The program will slide a window along the time series and converts it to a SAX sequence, then

the patterns in the chromosomes are compared with each sequence to apply the algorithm rules to get

a buy or sell decision.

The Symbolic Aggregate approXimation (SAX) is a recently new method which is based on Piecewise

Aggregate Approximation (PAA). First, the algorithm divides the time series in windows and each

window in segments to reduce each a set of point in each segment to their arithmetic mean and convert

the value to a symbol.

2.3.1.1 SAX Method

The aim of the SAX methodology is the transformation of large time series of dimension m into a symbol

representation of a smaller time series windows of size n << m. With this purpose, a normalization is

performed without affecting the original shape and scales the data to the same relative magnitude using

in (14):

𝑥𝑖′ =

𝑥𝑖−𝜇𝑥𝜎𝑥

(14)

Where 𝑥𝑖 are the point in window 𝑊𝑘, 𝜇𝑥 is the mean of the points in 𝑊𝑘 and 𝜎𝑥 is the standard deviation

of all the 𝑥𝑖.

Figure 5 - Normalization process of the stock quote time series from [15]

Even if after normalization all data windows are ready to be compared, the dimension of this data is

high. To address this problem is used a dimensionality reduction method based on PAA.

As one can see in (14), the time series windows are divided in w equal size segments and each segment

is represented by the arithmetic mean of the points in it. This equation is valid if 𝑛/𝑤 has an integer

10

result where each point contribute entirely to the frame where is inserted. If a non-integer relation is

presented, is used the method developed by Li Wey where the point in the frontier between segments

contribute with some part to each of the segments.

After the PAA transformation, a normal distribution curve is applied to the vertical axis and breakpoints

(𝛽) are calculated to produce equal areas under the curve, then the amplitude of the time series is

divided into 𝑎 intervals and to each of them is assigned a symbol.

Figure 6 - SAX representation from [15]

Table 2 - Breakpoints vs. a division from [15]

Each segment is evaluated to determinate to which interval belongs and for each PAA level a symbol

is assigned to represent that segment.

To identify patterns is used in (15) to evaluate the distance between sequences P and Q, and reveal

the degree of similitude between them.

𝑀𝐼𝑁𝐷𝐼𝑆𝑇(𝑃, 𝑄) = √𝑛

𝑤√∑(𝑑𝑖𝑠𝑡(𝑝𝑖 , 𝑞𝑖))

2𝑤

𝑖=1

(15)

Where 𝑑𝑖𝑠𝑡(. ) is a function defined as (16):

𝑑𝑖𝑠𝑡(𝑝𝑖 , 𝑞𝑗) = {

0𝛽𝑗−1 − 𝛽𝑖𝛽(𝑖−1) − 𝛽_𝑗

|𝑖 − 𝑗| ≤ 1𝑖 < 𝑗 − 1𝑖 > 𝑗 + 1

(16)

Where 𝛽′𝑠 are the breakpoints defined in Table 2:

11

2.3.1.2 Genetic Algorithm

After the SAX discretization of the time series, a Genetic Algorithm (GA) is used to produce patterns

and detect if they are present on the time series and since the SAX representation is used, the patterns

are sequence of symbols and the distance from the pattern to the time series must be calculated to

identify their presence.

The GA used in this algorithm uses two distance measures. The first one is presented in (15) and the

second measures the distance between symbols (17)

𝑑𝑖𝑠𝑡 = √∑(𝑇𝑖 − 𝑃𝑖)2

𝑤

𝑖=1

(17)

{

𝑤 → 𝑊𝑜𝑟𝑑 𝑠𝑖𝑧𝑒𝑇𝑖 → 𝑆𝑦𝑚𝑏𝑜𝑙 𝑖 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠

𝑃𝑖 → 𝑆𝑦𝑚𝑏𝑜𝑙 𝑖 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 (18)

And the chromosome presented in figure x is used in the population and it is divided in two major parts.

The first one labeled as “Parameters for rule decision” includes the two distances that permits to

evaluate if the pattern is present in order to buy or if its effects is no longer present and it is time to sell

(distance to buy and distance to sell), another gene defines the number of days should the algorithm

sell if it is in a buying position (days to sell). The last gene identifies which of the measures should be

used to evaluate the distances (measure type). The last part labeled as “Pattern Symbols” includes the

symbols that constitute the pattern sequence.

The selection process applies a random selection to the best half of the population and then uses a two

point crossover to generate the offspring. The fitness function is the total earnings produced by the

investment strategy defined by the pattern and application rule associated with it.

A buy signal is applied if the distance is less than the “Distance to buy” defined in the chromosome. The

application sells the stock if the distance is bigger than the “Distance to Sell” gene or if the stock was

bought more days ago than the specified by the “Days to Sell” gene.

Although this methodology shows an approach with great potential, proves incapable to identify the

same patterns at different time intervals. Either a pattern is set aside in the pattern detection process or

if the pattern develops over a long period of time, it will be segmented and analyzed as a different set

of patterns.

2.3.2 A Dynamic Pattern Recognition Approach Based on Neural Network

In this approach [12] vertexes are extracted in stock-series using a slide window with flexible length and

a dynamic perceptual important point (PIP) locating method to avoid the computation expense problem.

For pattern recognition and window length identification this methodology has chosen to use a neural

network (ANN) approach.

12

Figure 7 - Flexible length window design

2.3.2.1 Sliding window method and dynamic feature extraction

This approach use a flexible slide window length considering that the length of a pattern differs greatly

and a fixed window length is a factor difficult to be decided because it will directly affect the

performance and the efficiency of extraction.

Due to its huge computation cost and the reduced efficiency when using the PIP locating method this

methodology proposed a dynamic algorithm based on PIP location. First, a binary tree structure is used

to implement the PIP locating method and organize the located PIP point and then the dynamic PIP

location is used to avoid the computation expense problem.

There are three differences between the PIP method and the dynamic PIP method presented:

1. Density variable: Used with the purpose of avoiding the dense distribution of vertexes and

normally larger than 6, this value is set as 5.

2. Shortening sequence lengths: The design of shortening sequence lengths is based on that,

when the slide window move forward, the tree structure changes, meaning that if a vertex get a

higher position in the prior tree Q[1:m], the height will not change greatly in the next tree

Q[2:m+1]. Therefore, assuming the need of search the root node of the tree Q[2:m+1], a

traverse to the first 3 levels of the nodes in the tree Q[1:m] will guarantee that will be possible

to achieve the goal.

3. Vertex validation: The vertex validation aims to remove the overlapping computation. This

validation tries to identify in prior trees the start and ending vertex of a sequence and use the

existing sequence to construct the new tree instead computing the same sequence again.

2.3.2.2 Neural Network (NN) design

For pattern recognition, three-layer NNs are designed to detect the interior relationships between the

pattern vertexes and for the 11 classic patterns was designed 4 Neural Networks, each NN have a

specific number of inputs and outputs and each output node corresponds to a predefined pattern, as

described in Table 3:

13

Neural Network NN1 NN2 NN3 NN4

Num of Inputs 3 4 5 6

Num of Outputs 2 7 9 9

Correspondig

Patterns

two tops

/

bottoms

Two tops /

bottoms, triple

bottom, diamond

top / bottom,

head and

shoulder top

/bottom

Triple bottom, diamond

top / bottom, head and

shoulder top / bottom,

symetric triangle upside

/ downside, bump and

run upside /downside

Triple bottom, diamond

top / bottom, head and

shoulder top / bottom,

symetric triangle upside

/ downside, bump and

run upside /downside

Table 3 - Number of NN inputs and outputs and corresponding patterns from [12]

Based on the description above, first the ANN is trained, and the dynamic PIP method is used with the

sliding window. The result is checked and if the output binary tree satisfies the ANN input condition, the

algorithm has a match and the output of the neural network is checked. The algorithm get the pattern or

a vertex of the pattern, if it is a vertex the algorithm will decide if whether have to enlarge the window or

not. The process is described in Figure 8:

Figure 8 - Flexible length window design

Despite the dynamic approach to show better than the same static approach, the use of 11 patterns

limits the detection of patterns that do not meet the 11 used patterns. They also found that some patterns

were not suitable to use with this method because the number of segments cannot be fixed.

2.3.3 Stock Market Forecasting Using Hidden Markov Model: A New Approach

In this approach [26] a continuous Hidden Markov Model is trained with past stock datasets to predict

the next day’s closing price. The model receives as input a vector of four observations corresponding to

the opening, high, low and closing prices from those past stock datasets.

14

To work with the CHMM and given the model 𝜆 = (𝐴, 𝐵, 𝜋) and the observation sequence 𝑂 =

𝑂1, 𝑂2, … , 𝑂𝑡 was used the forward-backward algorithm to compute 𝑃(𝑂|𝜆), Viterby algorithm to choose

a state sequence that best explains the observations and Baum-Welch algorithm to train the HMM. Each

continuous algorithm stated above is explained in detail in APPENDIX 2.

To forecast the next day’s closing price, the model computes a likelihood value “𝜗” for the day and locate

from the past data set those instances that would produce the same “𝜗” or nearest to the “𝜗” likelihood

value. Assuming that the next day’s stock price should follow the same pattern, from the located past

day(s) is calculated the difference of that day’s closing price and next to that day’s closing price and the

next day’s stock closing price is obtained adding the above difference to the current day’s closing price.

The results showed that this approach can be very similar results to approaches that have been

extensively used in the detection and prediction of patterns in financial markets. Furthermore, the HMM

have a strong mathematical structure and theoretical basis for use in a wide range of applications.

2.3.4 Stock Market Prediction Using Hidden Markov Models

In this approach [13] the daily opening, closing, high and low indices of the stock market are model as

continuous observations from underlying hidden states and the training of the HMM from given

sequences is done using the Baum-Welch algorithm.

Both continuous Hidden Markov Model (CHMM) and Baum-Welch algorithm are extensively explained

in APPENDIX 2.

In this model the observations are represented as a 3-dimensinal vector,

𝑂𝑡 = (

𝑐𝑙𝑜𝑠𝑒 − 𝑜𝑝𝑒𝑛

𝑜𝑝𝑒𝑛,ℎ𝑖𝑔ℎ − 𝑜𝑝𝑒𝑛

𝑜𝑝𝑒𝑛,𝑜𝑝𝑒𝑛 − 𝑙𝑜𝑤

𝑜𝑝𝑒𝑛) = (𝑓𝑟𝑎𝑐𝐶ℎ𝑎𝑛𝑔𝑒, 𝑓𝑟𝑎𝑐𝐻𝑖𝑔ℎ, 𝑓𝑟𝑎𝑐𝐿𝑜𝑤) (19)

After the training of the model is undertaken the test using an approximate Maximum a Posteriori (MAP)

approach. The objective of this approach is the computation of the close value for the (𝑑 + 1)𝑠𝑡 day,

given the HMM model 𝜆 and the stock values for 𝑑 days (𝑂1, 𝑂2, … 𝑂𝑑) and the open value for the

(𝑑 + 1)𝑠𝑡 day.

Let Ô𝑑+1 be the MAP estimate of the observation and vector on the (𝑑 + 1)𝑠𝑡 day and the observation

vector 𝑂𝑑+1 varied over all possible, this approach can be described as follow:

Ô𝑑+1 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑂𝑑+1𝑃(𝑂𝑑+1|𝑂1 , 𝑂2, … 𝑂𝑑 , 𝜆) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑂𝑑+1 (

𝑃(𝑂1, 𝑂2, … , 𝑂𝑑 , 𝑂𝑑+1|𝜆)

𝑃(𝑂1 , 𝑂2, … , 𝑂𝑑 , 𝜆)) (20)

Since the denominator is constant it is possible to simplify the MAP estimate as,

Ô𝑑+1 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑂𝑑+1𝑃(𝑂1, 𝑂2, … , 𝑂𝑑 , 𝑂𝑑+1|𝜆) (21)

15

The previous joint probability can be computed using the forward-backward algorithm, this algorithm is

also extensively explained in APPENDIX 2.

From the description above it is possible to perceive the usefulness of using an HMM approach. In this

particular case, the use of MAP must be taken into account that not always the best estimate value can

become the best solution because even closer values may give a wrong indication of the market

direction, that is, a price rise or fall, which will result in a wrong buy or sell decision.

2.3.5 A fusion model of HMM, ANN and GA for stock market forecasting

Next a fusion model [14] is implemented using a combination of a continuous Hidden Markov Model

(CHMM), Genetic Algorithm (GA) and Artificial Neural Networks (ANN) to forecast financial market

behaviors. The GA and ANN are used for the HMM optimization and a weighted average of the price

differences of similar patterns is obtained to prepare a forecast for the required next day. The diagram

shown in Figure 9 summarizes the links between the various processes and algorithms mentioned

above.

Figure 9 - Block diagram of the fusion model from [14]

2.3.5.1 ANN combination with HMM

The ANN is used to transform the daily price of the shares in groups of independent values that become

input values for the HMM. The process can be described in the following steps:

1. Create a random ANN structure having "n" number of inputs and "n" number of outputs where

n is the number of predictors

2. Initialize the weights of the ANN randomly

3. Current observations vectors are fed into the ANN as inputs

4. The 𝑌𝑡 output vector produced by the ANN at time 𝑡 is fed into the HMM as input observation

vector

16

Figure 10 - Integrated ANN-HMM model

2.3.5.2 GA combination with ANN and HMM

After the HMM be powered by the output of the ANN is then used the GA to find the optimal parameters

(1. Observation emission probability matrix, 2. State transition probability matrix and 3. Initial probability

matrix) for the HMM given the transformed sequences to minimize the Mean Absolute Percentage Error

(MAPE) of the ANN-HMM forecast method. The process can be described in the following steps:

1. Choose randomly the initial parameter values

2. Execute the GA to obtain the observation emission probability matrix initial values keeping the

initial values assigned to the remaining parameters in step 1.

3. Execute the GA to obtain the state transition probability matrix initial values keeping the value

computed in step 2 and the initial value assigned to the remaining parameter in step 1.

4. Execute the GA to obtain the initial probability matrix initial values keeping the values

computed in step 2 and 3.

5. If the resulting fitness value converges go to step 2.

In this model chromosome size is equal to the size of the parameters to be optimized. For instance, for

a one dimensional Gaussian probability distribution as emission probability distribution, the size of the

chromosome will be 16 (considering a 4 state HMM). Again for the initial probability matrix the

chromosome will be 4.

17

Figure 11 - GA optimization of the parameters of ANN HiMMI method

2.3.5.3 Next day forecast

To forecast the next day value a range of data vector are identified for having likelihood values closer

to that of current data vector. Next, the price difference between the value of each identified vector at

time t and the value of the vector of the day ahead t+1 is computed. Then, the equation x is used to

give more weight to the most recent vector price differences. The result is added to the current price of

the day in order to obtain a prediction of the next day price.

𝑤𝑑𝑖 =

∑ 𝑤𝑚∗ 𝑑𝑖𝑓𝑓𝑚𝑚

∑ 𝑤𝑚𝑚 (22)

Where,

𝑖 – Index number of current day

𝑚 – Index number of matched day

𝑤𝑚 - Weight assigned to the day ‘m’ using the equation 𝑤𝑚 = exp (1

𝑖−𝑚 + 1)

𝑤𝑑𝑖 - Weighted average of price difference for current day 'i'

𝑑𝑖𝑓𝑓𝑚 - Price difference between day ‘m’ and ‘m+1’

This approach offers the ANN and GA as an alternative to the training of the HMM. One of the differences

presented focuses on the importance of the start of the HMM parameters, such concern does not exist

when using the Baum-Welch Algorithm gaining in performance and simplicity. Also, this process

developed from the HMM cannot be trained if the training sequence does not fit properly on the selected

18

parameters. In addition, the proposed model choses the number of state as the number of attributes in

the observation vectors and this may not be suitable for some instances.

2.4 Why a Discrete HMM approach

The Hidden Markov Model is presented as a statistical model that has been widely used in different

areas for the analysis of time series, such as biological sequence analysis, protein sequence alignment

[ref], genetic prediction [ref], cryptanalysis [ref], part-of-speech tagging and speech recognition [ref] [ref].

Although there are different models for detecting and predicting patterns in financial time series, these

turn out to have limitations that the HMM is committed to overcome. Unlike much of the existing models,

this model has the particularity of not requiring a prior list of well-defined patterns in order to be able to

proceed with the analysis. This model takes care of automatically identifying temporal patterns present

in the sequences analyzed, this feature leads to an identification and characterization of a greater

number of patterns. Unlike SAX, identification of patterns is not dependent on their size where either a

pattern is set aside in the pattern detection process, or if the pattern develops over a long period of time,

it will be segmented and analyzed as a different set of patterns.

Model Advantages Disadvantages

Sax-GA

The conversion of data into symbols sequences enables simple implementation and easy identification of patterns from strings

Either a pattern is set aside in the pattern detection process or if the pattern develops over a long period of time, it will be segmented and analyzed as a different set of patterns.

PIP and ANN

Created for the purpose of handling financial data. Great data preservation even when using a high level dimensional reduction

Limited number of patterns. Some patterns are not suitable due to the number of segments cannot be fixed.

Continuous HMM and Likelihood Values

Easy to implement, using the HMM gives a mathematical dimension that other analytical models do not have.

The use of likelihood values must be taken into account that not always the best estimate value can become the best solution because even closer values may give a wrong indication of the market direction.

Continuous HMM and MAP

Easy to implement, using the HMM gives a mathematical dimension that other analytical models do not have.

The use of MAP must be taken into account that not always the best estimate value can become the best solution because even closer values may give a wrong indication of the market direction.

HMM, ANN and GA

Using the HMM gives a mathematical dimension that other analytical models do not have. Uses two models to adapt in an optimal way the model to the analyzed financial time series.

This process developed from the HMM cannot be trained if the training sequence does not fit properly on the selected parameters. In addition, the proposed model choses the number of state as the number of attributes in the observation vectors and this may not be suitable for some instances.

Table 4 - Comparison between models

19

By using the HMM one of the most used approaches is the use of its continuous version to predict the

daily closing value. This quest for the real value in a continuous universe becomes quite difficult due to

the infinite number of possibilities, a small deviation in prediction, apart from providing the wrong

value, can give wrong market trend information. Whichever method is used the main objective shall be

to predict the market price trend. Thus a discrete approach to the problem may be advantageous that

a continuous approach does not present the reasons given above.

2.5 Conclusions

Many models have been developed to analyze and predict the behavior of financial market. Few are

those who actually end up convincing and present as a practical solution used by investors. In this

chapter were analyzed five of these solutions. The first two represent two models that have come to

arouse curiosity of investors and scholars. The following three models represent three strategies using

the main model adopted in the development of this thesis.

The conclusions drawn from the analysis of each model are presented in Table 4. Finally, it is explained

the use of the DHMM and its advantages over other models. This model will therefore be applied to the

analysis and prediction of the Forex market behavior, specifically the analysis of the EUR / USD pair.

20

21

CHAPTER 3 MULTI DISCRETE HMM APPROACH

In this chapter is explained in detail the adjustments made in the development of Multi DHMM. First, it

is described how the DHMM is used to analyze and forecast the direction of the price of Forex EUR /

USD from the given historical data. Then it is explained how the chosen technical indicators interact with

DHMM and the expected benefits of that interaction. After the different developed models are presented

and indicated the choice of models to enter the Multi DHMM. In section 4.4 is explained in detail the

implementation process of the developed method. Finally conclusions will be drawn from all the

adaptation and development performed.

3.1 Discrete HMM for Financial Time Series Analysis

To use the discrete HMM is necessary to adapt the data to the model, for it to be trained in order to

model the assigned time series. The model forecast from the use of the Viterbi algorithm also assumes

a small change to it. This changes and modifications are presented in this section along with the

justification of the need to use different DHMM with different window sizes.

3.1.1 Historic data transformation

The format of the historical data for each day are composed of opening, high, low and closing values

indicating the main points of interest in the market that day. Since the main objective of the model is the

direction of the market forecast is possible to make a discretization of values for this direction is

calculated by the difference of the closing level of the day 𝑁 with the day 𝑁 − 1. This transformation will

be made for all a set of historical data in order to adjust the values to DHMM. The transformation is

described in Figure 17.

Figure 122 - Transformation of the historic data

The selection of the values 0, 1 and 2 to represent drop, maintenance and rising signals from closing

values was due to the use of matrices in the HMM. Since the emission probability matrix 𝐵𝑠(𝑂𝑡)

comprises all possible observations, it has become simpler to make a direct association between each

observation and the line of the 𝐵𝑠(𝑂𝑡) matrix. Since computationally an array starts at index 0 and the

22

process would need three indices, the choice of values came trivially. Thus, processing is made simple

and is represented in Table 4.

𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 (𝑵 − 𝟏,𝑵) IF N – 1 > 0:

return 0; IF N – 1 < 0:

return 2; ELSE:

return 1; Table 5 - Transform pseudo code

3.1.2 DHMM Training: Baum-Welch Algorithm

How can we estimate the model parameters given an observation set?

In order to answer the question is important to use an algorithm capable of finding the unknown

parameters 𝜆 = {𝜋, 𝐴, 𝐵} of a HMM. Although exists some algorithms with capacity to address the

question, due to the type of data on which it will be used it is necessary that the algorithm does not need

any model initialization. This algorithm is called the Baum-Welch and use the EM algorithm to find the

maximum likelihood estimate of 𝜆 = {𝜋, 𝐴, 𝐵} given the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 and uses the

production probability 𝑃(𝑂|𝜆) as the optimization criterion.

The Viterbi training algorithm is another algorithm that can be presented as an answer to the parameter

estimation question, however, The Viterbi training requires some reasonable initialization and makes a

limited use of the training data and is less robust, since it only uses observations inside the segments

corresponding to a given HMM state to re-estimate the parameters of that state. [29][23][27]

For the training of DHMM it is crucial to have the transformed data as signals of declining, raising or

maintaining (0,1,2) of the market value and define the number of states of the HMM. The decision of the

number of states appears as an unknown variable in the process but in the HMM process is often taken

as a rule, though not required, having the number of states equal to the number of observations, that is,

have a possible strategy (state) for each existing observation. This rule will be adopted in the

development of the model. Decided the number of states, the use of the Baum-Welch algorithm will be

straight forward, as described in section 3.1.2.1.

Figure 133 - Flowchart of the HMM training

The last question that remains to be defined is the initialization of matrices that characterize the HMM.

When using the Baum-Welch Algorithm it is irrelevant the initial values of the HMM parameters so

random values are assigned. These values must be in agreement with the following rules:

23

0 ≤ 𝑥 ≤ 1, where 𝑥 is the random value assign

∑ 𝜋𝑖𝑖 = 1

∑ 𝐴𝑖𝑗𝑖 = 1

∑ 𝐵𝑖(𝑂𝑘)𝑖 = 1

Where:

𝜋 – Initial matrix,

𝐴 – Transition matrix,

𝐵 – Emission matrix,

𝑖 – State,

𝑂𝑘 – Observation k;

3.1.2.1 Baum-Welch Algorithm

The Baum-Welch algorithm starts using the Forward-Backward algorithm. Before starting with the

iterative process the transition matrix A, emission matrix B and initial matrix 𝜋 from 𝜆 are set with random

initial conditions.

The Baum-Welch algorithm is described as follow [23][29]:

Forward Procedure:

Having 𝛼1(𝑖) = 𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of ending in state 𝑠𝑖 given the observation

sequence 𝑂1, 𝑂2, … , 𝑂𝑡 is recursively computed,

1. 𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1)

2. 𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑂𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑁𝑖=1

Backward Procedure:

Having 𝛽𝑡(𝑖) = 𝑃(𝑂𝑡+1, 𝑂𝑡+2, … , 𝑂𝑡|𝑠𝑡 = 𝑖, 𝜆), the probability of the ending sequence 𝑂𝑡+1, 𝑂𝑡+2, 𝑂𝑇 given

the model 𝜆 and the 𝑠𝑖 at time t is recursively computed,

1. 𝛽𝑇(𝑖) = 1

2. 𝛽𝑡(𝑖) = ∑ 𝛽𝑡+1(𝑗)𝑁𝑗=1 𝑎𝑖𝑗𝑏𝑗(𝑂𝑡+1)

Optimization:

It is now possible to compute the temporary variables:

𝛾𝑡(𝑖) = 𝑃(𝑠𝑡 = 𝑖|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛽𝑡(𝑖)

∑ 𝛼𝑡(𝑗)𝛽𝑡(𝑗)𝑁𝑗=1

,

(23)

24

This quantity 𝛾𝑡(𝑖) represents the probability of being in state 𝑠𝑖 and time t having the observation set

𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.

𝜉𝑡(𝑖, 𝑗) = 𝑃(𝑠𝑡 = 𝑖, 𝑠𝑡+1 = 𝑗|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛼𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)

∑ ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)𝑁𝑗=1

𝑁𝑖=1

,

(24)

This quantity 𝜉𝑡(𝑖, 𝑗) represents the probability of being in state i and j in times t and t+1 respectively,

having the observation set 𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.

With the computation of this two quantities it is now possible to update the model determining the

expected quantities �̂� = {�̂�, �̂�, �̂� }

Update of the model �̂�:

𝜋�̂� = 𝛾1(𝑖)

𝑎𝑖�̂� =∑ 𝜉𝑡(𝑖,𝑗)𝑇−1𝑡=1

∑ 𝛾𝑡(𝑖)𝑇−1𝑡=1

𝑏�̂�(𝑘) =∑ 𝛾𝑡(𝑖)𝑡=1,𝑂𝑡=𝑜𝑘

∑ 𝛾𝑡(𝑖)𝑇𝑡=1

Termination:

If the quality measure 𝑃(𝑂|�̂�) was not improved by the updated model �̂� compared to 𝜆, the process

stops, however, if not, repeat all steps.

Figure 144 - Flowchart of the Baum-Welch algorithm for a discrete HMM

3.1.3 DHMM Testing: Viterbi Algorithm

The Viterbi algorithm is presented as the chosen option to forecast the direction of the market close

price. This algorithm look at every state sequence and simply select the most likely sequence in a

process assumed to be a finite-state and discrete in time Markov process.

Like the forward or the backward algorithm, the Viterbi algorithm also have a variable represented

by 𝛿𝑡(𝑖). This new variable generate the segment from the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 with

maximum likelihood and ends in state 𝑠𝑖.

𝛿𝑡(𝑖) = max𝑠1,𝑠2,…,𝑠𝑡−1

𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠1, 𝑠2, … , 𝑠𝑡−1, 𝑠𝑡 = 𝑖|𝜆) (25)

25

This variable 𝛿𝑡(𝑖) can be compared with the forward variable 𝛼𝑡(𝑖), except that the Viterbi algorithm

uses maximization instead a summation over previous states.

Figure 155 – Optimal path using Viterbi Algorithm

The Viterbi algorithm is described as follows:

1. Select the most likely sequence in the process using the Viterbi algorithm [23]:

Initialization:

For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 1 we have:

𝛿1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1),

𝜓1(𝑖) = 0,

(26)

(27)

Recursion:

For all times 𝑡, 1 ≤ 𝑡 ≤ 𝑇 − 1 and all states 𝑖, 𝑗 ∈ [1, 𝑁] we have:

𝛿𝑡+1(𝑗) = max𝑖{𝛿𝑡(𝑖) 𝑎𝑖𝑗}𝑏𝑗(𝑂𝑡+1)

𝜓𝑡+1(𝑗) = 𝑎𝑟𝑔max𝑖{𝛿𝑡(𝑖)𝑎𝑖𝑗}

(28)

(29)

Termination:

For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 𝑇 we have:

𝑃∗(𝑂|𝜆) = 𝑃(𝑂|, 𝑠∗|𝜆) = max𝑖𝛿𝑇(𝑖) (30)

𝑠𝑇∗ = 𝑎𝑟𝑔max

𝑗 𝛿𝑇(𝑗) (31)

2. Having the most likely sequence from 𝑡 = 1 to 𝑡 = 𝑇, the next step will to assess the most likely

state in 𝑡 = 𝑇 + 1. This is calculated from the manipulation of the algorithm equations. It is created

26

the matrix 𝜑𝑖(𝑂𝑇+1) that holds the probability of going to state 𝑖 in case of having the observation 𝑂

in 𝑇 + 1,

𝜑𝑖(𝑂𝑇+1) = max𝑖{𝛿𝑇(𝑖)𝑎𝑖𝑗}𝑏𝑗(𝑂𝑇+1) (32)

3. Next using (63) the most probable state in T is extracted from the results obtained with the Viterbi

algorithm,

𝑆𝑡𝑎𝑡𝑒 = 𝑎𝑟𝑔max𝑖𝛿𝑇(𝑖) (33)

4. The state extracted the previous topic is used in 𝜓𝑇(𝑗) to extract the most likely predecessor state,

i.e., the most likely state at 𝑇 + 1 where,

𝑃𝑟𝑒𝑑𝑒𝑐𝑒𝑠𝑠𝑜𝑟 = 𝜓𝑇(𝑗 = 𝑆𝑡𝑎𝑡𝑒) (34)

5. Having the most probable predecessor state, it is now possible to compute the most probable

observation in 𝑇 + 1,

𝐹𝑜𝑟𝑒𝑐𝑎𝑠𝑡 = 𝑎𝑟𝑔max𝑖𝜑𝑝𝑟𝑒𝑑𝑒𝑐𝑒𝑠𝑠𝑜𝑟(𝑂𝑇+1) (35)

3.1.4 Why Multi Hidden Markov Models

The focus of the HMM is bounded by the size of the training window. Higher training windows give the

HMM the capacity to perceive the formation of long-term trends but make the model less sensitive to

detect changes in trends, as opposed, the use of reduced training windows give the model the ability to

identify the formation of short and transient patterns and greater sensitivity in detecting changing trends.

Due to constant fluctuations in the financial markets, it is important that the developed model is able to

adapt to such fluctuations. Thus, it is important that the model is capable of analyzing long term trends,

while quickly adapts to these market changes.

Figure 166 - Focus of different windows size

To this end, it was decided to use three DHMM trained with different window sizes. The size of the

windows has been obtained through test. The size was determined 90 days because this window size

27

achieved the best results in all tested window sizes, nevertheless it was found that the 90-day DHMM

slowly adapts to changing trends, and so it was decided that this size would be the maximum size to be

considered. The choice of the 15-day size was again determined through tests that showed that the

quality of forecasts and the sensitivity to fluctuations would eventually present itself as the most

balanced choice. To bridge the gap between the maximum and minimum value of the window, it was

decided to adopt a third DHMM with the window size that would be between the other two. Two cases

were then tested, the 30 days size, equivalent to 1 month, and 90 days, equivalent to 2 months. The

tests results showed the DHMM 30 days as the best among them.

3.2 Developed Methods

This section will analyze the methods carried out over the final solution development process. More

experiments and tests were made and the results can be found in Chapter 5, but the focus of this section

centers on the discussion and explanation of the methods that are actually used in the final solution.

The presentation order of each solution coincide with the order of development as new challenges have

emerged that needed to be overcome with the addition and improvement of constructed models.

3.2.1 Development of a Multi HMM strategy

This first model intends to add three different DHMM in a single model. The idea centers on the use of

the output of each model to generate the final prediction using the most predicted signal. For this

decision to be possible, it was necessary to reduce the number of the DHMM output of observations

for two, this because there was a risk of each of the three DHMM generates a different value having a

halt in the forecast decision. Thus, it was decided to add the observation that there is a price drop with

the observation concerning the existence of price maintenance. It can be concluded from the tests that

the probability of the same repeating closing price on these two days is greatly reduced.

Figure 17 - Aggregation of both 0 and 1 signals into the 0 signal

The objective of this approach has focused on balancing both the long term prediction and the

adjustment sensitivity of the model. As stated above and shown in Figure 20, each of these three DHMM

produce an output and having three forecasts for two possible values, there will be at least two equal

forecasts, the value of these forecasts is then chosen to be the value reported as final forecast from the

model.

28

Figure 18 - Flowchart of the three DHMM model integration

Having three DHMM with different sensibility to direction variations enables a rapid and effective

adaptation to the behavior of financial time series. Thus, stick to use on an analysis of the value predicted

in higher number is to be under-utilizing the capabilities that this approach can offer. That said, the new

objective focused on the use of DHMM 90 as the main model and the use of DHMM 30 and DHMM 15

each time a new trend is detected to accelerate the adaptation of the model to new market behaviors.

To this end, were used three technical indicators to the detection and indication of possible overbought

or oversold (RSI), new trends (MACD) or indication of the strength of those trends (ADX) and a fourth

approach when the RSI and MACD indicators were used simultaneously. The above indicators are

explained in detail in chapter 2.

To deal with the detection by the technical indicator of a trend change was developed an application

controller of each HMM in an orderly manner. This controller applies the HMM 15 for the first 15 days

following the indicator signal, the 15-20 day the HMM 30 and finally returns to the use of the HMM 90.

These time intervals are chosen firstly such a way that the window of each HMM model did not stop

contain the day when the indicator has detected a change in trend. Adaptation and final decision was

made in the days presented from testing, which showed these are values that better obtained results.

29

Figure 19 - Flowchart of a multi DHMM model with technical indicators

In Figure 22 is shown the flowchart of the process where the Technical Indicator block is replaced by

the analysis of the results of each indicator and Compute Indicator block compute the value of the

indicator. The analysis process for each indicator can be found in Figure 23.

Figure 20 - Technical indicators used in the Technical Indicator box stated in Figure 22

The first stage of the model development was preceded by their respective testing which showed a

significant improvement in the results, except for the ADX that due to its poorer performance this

indicator will not be included in the second model development. The remaining cases were able to

achieve the objectives. For the second stage the reduction of losses over the years has become as the

primary goal, even if it means a decrease of profits, to increase the reliability and the investment safety

of the model.

30

3.2.2 Fusion between different methods

For this second phase of development, as mentioned in the previous section the aim is to reduce the

losses per year of the model even if the profit per year will eventually reduce. Thus, three of five models

previously developed were added to this new model and a new "0" signal was added. Aggregation is

made so that there is confirmation from two models of direction provided by DHMM, the signal will be

transmitted whenever the prediction of both algorithms is contradictory. In these days of uncertainty, in

which the two DHMM give a different forecast will not be done investment, avoiding possible losses.

From this second stage the signal is transformed into estimates of the new representations. In addition

to adding the "0" signal, the down signal of closing market value is converted to "-1", and the increase

to "1".

Figure 21 - Flowchart of the fusion model

In Figure 24 is shown the flowchart of the process where the Aggregation of Developed Models block is

replaced by the analysis of the results of each. The analysis process for each sub-model aggregation

can be found in Figure 25.

Figure 22 - Different sub-models aggregation

The results show that there was a substantial reduction in losses despite having been a slight drop in

earnings. So, to try to recover losses in annual earnings without further increase the damage was

developed the third and final phase which is described in the next section.

31

3.3 Multi DHMM Automation

The development of the third and final stage were added the five developed models that showed the

best results. Once again in order to have a confirmation of direction provided by the various models, the

most predicted value will be chosen. Table 5 lists the selected models from phase one and two:

Table 6 - Selected models from each phase

Therefore, were chosen the first phase models with a higher total gain, but in order to mitigate the losses

occurring in some of the years in these two models are used three models of the second phase, thus

incorporating to the final model the signal "0" so that no investments were made when there are large

uncertainties in the calculation of own forecast.

Figure 23 - Flowchart of the final model

In Figure 26 is shown the flowchart of the final model. Each of the five models that incorporate issues a

forecast and is subsequently chosen the most expected forecast of three possible, its value is saved

and if there are more dates to predict, the windows slide and the process is resumed. When there is no

data stored values are then recorded in a text file for further analysis.

Phase 1 Phase 2

DHMM MACD

DHMM MACD RSI

DHMM RSI and DHMM MACD

DHMM 15 30 90 AND DHMM MACD

DHMM 15 30 90 and DHMM RSI

32

3.4 Conclusion

In this chapter was described the entire development process and the models used to develop the final

model. The chapter is divided into four parts corresponding to the division by stages of development of

the model. In the first phase was used a Multi DHMM strategy, using three DHMM trained with different

window sizes to take advantage of the specific features of each one. In the second phase were used

the RSI, ADX and MACD technical indicators in order to efficiently use the strategy developed in the

first phase. The third phase was developed taking into account limitations of the prior phase, thus the

scope of this phase is centered in the reduction of losses existing in some years, for this it was added a

new signal that suggests that the model is uncertain of its prediction. Finally, was developed the final

model which added the best models yet developed to make the most of their abilities, each of these

sub-models proceed with a prediction and the most predicted signal from these sub-models is chosen

as the final model prediction.

33

CHAPTER 4 RESULTS

In this section are presented all results and conclusions from each case study. From the analysis of the

algorithm results it’ll be described the process and decisions that leaded to the final algorithm

combination and subsequent automation.

This chapter is divided in three main sections. In the first section is tested the ability of the DHMM to

detect patterns and forecast. In this same section the performance of the discrete version is compared

with the continuous version. The second section describes the construction of the solution and

necessary improvements that lead to the development of different sub-models to address DHMM

limitations. The last subsection analyzes the results obtained in tests performed to the final model and

compare with the results obtained in each sub-model used in the final model.

4.1 HMM Analysis, Comparison and Decision

In this chapter are held the first DHMM algorithm analysis, in order to assess its usefulness as a Forex

market direction prediction algorithm. Thereafter, in order to perceive if a discrete approach of the

algorithm would have any advantage over a continuous approach, both algorithms were compared and

evaluated in same circumstances.

4.1.1 Case Study I – Analysis with specific patterns

Before proceeding with the development of models using the DHMM, it was necessary to see if the

algorithm was able to meet the necessary requirements to analyze financial time series. With this

purpose were designed 8 different patterns, which give some idea of the type of analysis and quality in

predicting that one would expect from DHMM.

Each pattern contains three different observations (0.1 and 2 respectively), corresponding to the same

number of states used in the training of DHMM. For these tests, all the DHMM was trained with 30 data

points from each pattern before forecast the next point. The shape of each pattern can be found in

Appendix 1 and the obtained results can be found in Table 6 stated below:

34

Table 7 - Results using a predefined set of patterns

The results lead to believe that the use of the model as a basis for model development appears to be a

good choice. Although overall the results are satisfactory, in patterns 5 and 6, one can find some difficulty

in detecting abrupt transitions in the pattern direction. Such difficulty in detecting abrupt or semi-abrupt

transitions is considered the biggest limitation of DHMM, so during the development was necessary to

adopt strategies to solve this problem.

4.1.2 Case Study II – Analysis with real data

After assessing the DHMM feasibility for detecting the patterns direction it is now important to know how

the same algorithm behaves with actual data from the EUR / USD Forex market. For this purpose it was

used 2011 historic data. The year 2011 was chosen due to its characteristics over the months. The year

can be divided into three parts, the first part shows a price rise from the EUR / USD pair, the second

has a price stagnation zone and the third part presents a drop in price. In addition, throughout the year,

there are periods of sudden drops and rises. With this variety of cases in the same year, it was possible

to conduct a more thorough algorithm behavior analysis taking account of all these factors.

For the analysis of these first tests were calculated percentages corresponding to the two case studies.

In the first case one used two states and three points, while in the second case it was used three states

and three observations. The number of observations, as discussed in chapter 4, correspond to the three

possible conditions, i.e. drop, maintenance or increase of the next EUR/USD closing value. For each

case where the number of states varies between two and three, the algorithm is trained with different

quantities of data, characterized by weeks, months or years, as follows:

Direction 2 2 2 2 0 0 0 0

Prediction 0 2 2 2 2 0 0 0

Direction 2 0 2 0 2 0 2 0

Prediction 2 0 2 0 2 0 2 0

Direction 0 0 0 0 2 0 0 0

Prediction 0 0 0 2 2 0 0 2

Direction 1 2 1 2 1 0 1 0

Prediction 1 0 1 2 1 2 1 2

Direction 0 2 0 2 0 0 2 0

Prediction 2 2 0 2 0 2 2 0

Direction 1 1 1 1 1 1 1 1

Prediction 1 1 1 1 1 1 1 1

Direction 2 0 2 0 2 0 2 0

Prediction 2 0 2 0 2 0 2 0

Direction 2 2 0 2 2 2 0 2

Prediction 2 2 0 2 2 2 0 2

Pattern 7

Pattern 8

Pattern 1

Pattern 2

Pattern 3

Pattern 4

Pattern 5

Pattern 6

35

Table 8 - Results using different training window sizes

The results show that in all six cases the use of three states, the same number of observations obtained

a superior performance. The model with the 90-day window size showed the best result reaching a

percentage of 59% correct predictions that presents itself as an excellent result for this first test from

real data. From the results one can conclude that the best decision is to use three states HMM in the

development of the next models.

4.1.3 Case Study II – Continuous and Discrete HMM comparison

Next, it was implemented and tested the HMM continuous version comparing both continuous and

discrete to assess if the discrete HMM could surpass the performance of its continuous version. As

described in chapter 4, the aim of the CHMM algorithm is to forecast the exact price of an asset. To use

the same algorithm to predict the direction of the same asset was added a new procedure to compare

the predicted value with the previous value and assigned a new value corresponding to a decrease,

increase or maintenance signal.

Table 9 - DHMM and CHMM comparison using hourly, daily and weekly data

For the test results below, it was used 3 states and 3 observations on both algorithms and

complementary to each pair of state and observation it was test the impact of having from 2 to 6 mixtures

in the CHMM. The data used in the test correspond to Forex values from 2002 to 2013 and both

algorithms were trained with 15, 30, 60 and 90 data values from hourly, daily and weekly values. The

comparison of the two alternatives, using the best result from each time are shown in Table 9:

Training Window Size States and Observations Correct

2 States 3 Observations 52%












7 days

15 days

30 days

60 days

90 days

1 year

Hourly 54,68% 50,43%

Daily 55.73% 44,17%

Weekly 51,85% 40,13%

Discrete HMM Continuous HMMTime

36

Table 10 - DHMM and CHMM comparison using data from 2002 to 2013 and 60 days window

From Table 9 is possible to verify that the choice for a DHMM model for direction prediction was the

most appropriate. Only in 2005 the continuous HMM managed to perform better than the discrete model.

4.2 Construction and Improvement

For the construction of the final model several tests were performed in order to detect the limitations of

the model and carry out with possible solutions. The analysis and subsequent model improvements led

to be held 7 case studies until reaching the final model. These tests were made using Forex EUR/USD

historical data from 2002 to 2013 and a sum of pips is performed to assess the revenue of using the

proposed strategy over the 11 years period. When a wrong prediction is obtained the pips difference

between the closing value of the expected day and its former is subtracted from the pip total, in the

event of a correct forecast this difference is added to the total. As mentioned previously, the approach

uses a sliding window strategy for training and testing.

At the same time an analysis was made of the impact of the meetings of the European Central Bank

(ECB) and the Federal Reserve System (FED) in predicting and behavior of the solutions developed,

since the existence of such meetings insert a period of uncertainty in the Forex pair EUR / USD.

4.2.1 Case Study I – Simple means

The first indicator chosen to engage with the HMM was the SMA (Simple Moving Average). The choice

was due to its importance, simplicity and usability. The idea behind its use focuses on the closing value

estimate from the HMM only when the closing value is higher than the average value calculated by the

SMA. Thus, it was expected to avoid times of greater uncertainty that could hinder the pattern analysis

and subsequent prediction of direction.

For this purpose we analyzed four sets of moving averages, i.e., 30, 50, 100 and 200 days sets in its

calculation. These values were selected for being the most used SMA sets in the financial market

technical analysis.

YearDescrete

HMM

Continuous

HMMYear

Descrete

HMM

Continuous

HMM

2002 50.97% 45,56% 2008 51,72% 48,28%

2003 55,21% 37,45% 2009 53,08% 43,85%

2004 52,11% 44,06% 2010 51,74% 44,02%

2005 50,38% 55,38% 2011 56,54% 50,00%

2006 54,23% 41,15% 2012 51,55% 46,12%

2007 50,96% 45,59% 2013 53,67% 50,58%

52,84% 46,00%Mean

37

Table 11 - Total of the DHMM and Simple Mean results from 2002 to 2013

After analyzed the four cases in Table 10, it was concluded that the HMM / Moving Averages set features

the best performance when used the 200 days SMA, in contrast the 50 days SMA showed the worst

performance and the difference lies in the prediction of the year 2011, where the 200 days SMA

presented lower losses compared to the 50 days SMA and in opposition, in 2006 where the forecast

have presented considerable profits, the 200 days SMA stood out scoring 2191pips.

As you can see below, the discrepancy of results in 2011 relates to the last quarter of the year. The

respective quarter may be divided into two parts. In the first part both SMA's managed to stop the HMM

when it appeared an uncertain area. In the second part, while the 200 days SMA is stopped, the SMA

50 days SMA no longer considered this part as an uncertainty area, causing a considerable loss in the

remaining quarter.

Figure 24 - Comparison of the 200 days SMA (left) and the 50 days SMA (right) in 2011

Comparing the 30 days SMA to the 200 days in 2006, i.e. worst and best case, one can conclude that

the difference concerns once again with the different analysis in the same area performed by both SMAs.

As shown in Figure 28 the 30 days SMA considered some areas that could have been profitable as

areas of uncertainty:

SMA Normal W/out FED W/out ECB W/out FED+ECB

30 2414 2947 3574 4107

50 2815 3549 3312 4046

100 3302 3097 4089 3884

200 3735 3395 4227 3887

-750

-250

250

750

1250

1,25

1,3

1,35

1,4

1,45

1,5

j f m a m j j a s o n d

200 days SMA - Year 2011

EUR/USD Earnings

-1500

-1000

-500

0

500

1000

1,25

1,3

1,35

1,4

1,45

1,5

j f m a m j j a s o n d

50 days SMA - Year 2011

EUR/USD Earnings

38

Figure 25 - Comparison of the 200 days SMA (left) and the 30 days SMA (right) in 2006

It is concluded that the best result is displayed by the pair HMM 90 / SMA 200 days, although the results

are positive, other constraints may be mentioned that this indicator could not overcome and may be

create profit limitations. One of these limitations is related to the delay of the HMM to realize a change

in direction when leaving a strong trend.

4.2.2 Case Study II – Usage of mixed training days

The time detecting a change in direction depends of the size of the window used for training the DHMM.

Smaller windows turn more sensitive the algorithm to slight changes but less sensitive to long-term

trends, on the other hand, larger windows turn to have a low sensitivity to immediate changes because

its focus will be on long-term trends.

Being said, it is important to find a balance between both cases. Thus arises the possibility of using a

combination of three DHMM with different training window sizes, a DHMM sensitive to small changes,

other sensitive to long-term trends and a third who is relatively between both cases. The result will be

three predictions from each one and will be chosen the more predicted direction of the three.

Table 12 - Results of different sets of DHMM training window sizes from 2002 to 2013

From the results in Table 11 it is possible to verify that the DHMM 15, 30 and 90 set has a much higher

performance than the remaining sets, achieving in 2010 a total of 2461 pip. In Figure 29 it is noticeable

the use of a long-term component between January and June when the algorithm does not change the

direction of the forecast in April when the direction reversed temporarily. Logically that during this period

the prediction proved wrong resulting in a loss of Pips. It is also easy to identify a greater sensitivity to

changes since September the algorithm was able to follow a slight change in direction. In this approach,

it was noted that an analysis without the days of the EDF or ECB meeting are even more harmful than

a global approach.

-300

200

700

1200

1700

2200

1,15

1,175

1,2

1,225

1,25

1,275

1,3

1,325

1,35

01 02 03 04 05 06 07 08 09 10 11 12

EUR/USD Earnings

-500

0

500

1000

1500

1,15

1,175

1,2

1,225

1,25

1,275

1,3

1,325

1,35

01 02 03 04 05 06 07 08 09 10 11 12

EUR/USD Earnings

Set [days] Normal W/out FED W/out ECB W/out FED+ECB

15, 30 and 90 5936 5646 5167 4877

15, 60 and 90 4096 3072 3415 2391

30, 60 and 90 4230 3110 3467 2347

39

Table 13 - Results from 2002 to 2013 of the DHMM training using 15, 30 and 90 days window size

Although this approach has able to contemplate a macro and micro view of the patterns and their

changes in direction, its use is somewhat random. To overcome this challenge, it is important to use a

technical momentum indicator which will information the algorithm of a possible change in market

behavior for a faster and more efficient adaptation.

Figure 26 - Results from multi DHMM in 2010

To attempt to reduce the time it takes the algorithm to detect a new direction transition resorted to the

use of RSI. This technical momentum indicator attempts to determine overbought and oversold

conditions and this information is expected to detect a change in behavior and market direction. That

said, and from the results obtained with previous tests, especially with the results in 5.2.2 were used

DHMM three algorithms:

Training the algorithm using the previous 15 days:

As mentioned in the previous chapter, when using a small dataset for training the algorithm

makes it more sensitive to small changes of direction. So when the RSI indicate a possible

change of direction, we expect that the algorithm is as quickly as possible to detect it.

Normal W/out FED W/out ECB W/out FED+ECB Normal W/out FED W/out ECB W/out FED+ECB

2002 -446 -267 -531 -352 2008 1455 1675 964 1184

2003 1075 819 880 624 2009 369 449 -115 -35

2004 222 213 111 102 2010 2461 1739 2024 1302

2005 235 233 546 544 2011 1080 916 1578 1414

2006 22 222 139 339 2012 -763 -436 -741 -414

2007 -46 -179 53 -80 2013 272 262 259 249

Normal W/out FED W/out ECB W/out FED+ECB

5936 5646 5167 4877

YearPip Values

YearPip Values

TOTAL

-250

250

750

1250

1750

2250

2750

1,15

1,2

1,25

1,3

1,35

1,4

1,45

01 02 03 04 05 06 07 08 09 10 11 12

EUR/USD Earnings

40


It is necessary that as soon as it detects and confirm the direction of change detected by the

RSI, the algorithm again not be sensitive to noise and to focus on a detection medium and

long term patterns.


It was considered the need for a smooth transition between the last two points. So one can

still discard false direction changes detected by RSI within a short period while carried forward

to a stage where there is less sensitivity to noise.

It was necessary to define the time that each of the DHMM's presented in the previous 3 points should

have after the RSI detect a possible change in market behavior. The first analysis was to use DHMM 15

during the first 20 days, the DHMM 30 during the following 15 days proceeding with the use of the

remaining time until a new indication of RSI. Results are shown below:

Table 14 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 20 and 45 days steps

The first results were lower than expected. The main reason was the use of DHMM 15 and 30 DHMM

for too long. To analyze the impact, it was decided to reduce the use of each DHMM to the following

intervals: DHMM 15 until 15 days after the statement of RSI, DHMM 30 within 5 days later and DHMM

until next indication of RSI. The results in Table 18 show the positive effect of a small reduction of these

ranges. It is thus possible to complete the utility of this approach in the detection and adapt to market

changes in behavior. The supplementary analysis the presence or absence of the days of the ECB and

Fed meetings show that for this case, their absence has no advantage after 14 years.

Table 15 – Results from 2002 to 2013 from Multi DHMM and RSI using 0, 15 and 20 days steps


2002 -18 -301 -103 -386 2008 1135 807 586 258

2003 -229 -295 -326 -392 2009 -2761 -2681 -2883 -2803

2004 240 137 577 474 2010 3185 2463 2784 2062

2005 1481 1479 1286 1284 2011 -266 -446 316 136

2006 -228 -28 -377 -177 2012 825 976 911 1062

2007 -1408 -1475 -1289 -1356 2013 -464 -456 127 135


1492 180 1609 297

YearPip Values

YearPip Values

TOTAL


2002 -60 -273 -45 -258 2008 1867 1293 956 382

2003 483 357 1168 1042 2009 879 959 395 475

2004 1068 1391 889 1212 2010 1235 789 834 388

2005 1533 1399 1664 1530 2011 -1624 -1668 -748 -792

2006 98 298 -7 193 2012 587 642 429 484

2007 -422 -489 -227 -294 2013 -604 -964 -225 -585


5040 3734 5083 3777

YearPip Values

YearPip Values

TOTAL

41

4.2.3 Case Study IV – MACD indicator

This approach to try to improve the results of the model with the addition of the trend-following

momentum indicator MACD. By using this indicator we are left with three possibilities at their junction,

these are the indicator used in its entirety (MACD and divergence), only use the divergence or only use

the values of the MACD. In each case were used the same strategy as the previous case study,

corresponding to DHMM 15 until 15 days after the statement of RSI, DHMM 30 within 5 days later and

DHMM until next indication of RSI.

Table 16 - Results from 2002 to 2013 from Multi DHMM using MACD and Divergence

The results presented in Table 15 show a great total of 8578 pips when combining Divergence and

MACD with DHMM but by analyzing Table 16, which is present only the analysis of the combination of

DHMM and Divergence one sees a decrease to half of the total achieved in the test of Table 15. Thus,

the good result shown in the first case should be of the entire responsibility of the MACD.

Table 17 - Results from 2002 to 2013 from Multi DHMM using Divergence

Confirmation can be obtained by analyzing Table 17, where the combination of DHMM and MACD

was a 10038 pips profit far superior to the two previous cases. It is possible to verify that this result

was due to the large profits between 2008 and 2011 and reduced losses over the 11 years ensuring a

performance far superior compared to the two previous cases.


2002 -1172 -1177 -1219 -1224 2008 2563 2119 1710 1266

2003 835 767 882 814 2009 1901 1373 1417 889

2004 606 929 325 648 2010 1899 1843 1396 1340

2005 357 283 522 448 2011 1164 1136 1956 1928

2006 -926 -726 -881 -681 2012 927 926 1141 1140

2007 -298 -497 -129 -328 2013 722 566 847 691


8578 7542 7967 6931

YearPip Values

YearPip Values

TOTAL


2002 -1146 -1151 -1193 -1198 2008 883 1745 30 892

2003 517 223 642 348 2009 1887 1359 1403 875

2004 416 739 173 496 2010 977 905 420 348

2005 37 -71 510 402 2011 506 726 862 1082

2006 -556 -594 -621 -659 2012 473 566 747 840

2007 86 -139 281 56 2013 644 654 769 779


4724 4962 4023 4261

YearPip Values

YearPip Values

TOTAL

42

Table 18 - Results from 2002 to 2013 from Multi DHMM using MACD

From the results is possible to conclude that the use of MACD greatly increases the algorithm's

performance. The divergence and the set, despite having an interesting result, it falls short of the results

reported by isolated MACD. It is even possible to say that the use of divergence with the MACD makes

the result decreases in 1566 Pips over the years.

4.2.4 Case Study V – Combining RSI and MACD with ADX

Trend Strength Indicator ADX was used to confirm that a new trend indicated by the RSI or MACD for

would be strong or weak. The objective focused on ignoring false signals of new trends. Therefore, if

the trend of the MACD or RSI was indicated by the ADX how strong would be used the analysis of 15,

30 and 90 days as referred to in the two previous case studies, if the trend is weak, the HMM continues

with an analysis at 90 days. The results in Table 18 showed that due to the poor results there is no

advantage in using this indicator in conjunction with other already analyzed. The result managing use

MACD - ADX could even be considered, but this result alone does not reflect the performance of the

ADX but the high-performance of the MACD indicator that had the same influence as in the previous

section with the analysis of MACD with Divergence.

Table 19 - Results from 2002 to 2013

4.2.5 Case Study VI – Combining MACD and RSI

Although the results obtained in the previous case study are quite satisfactory, would be very useful to

decrease the amount of losses on negative years, to that end joined a momentum technical indicator

(RSI) with a trend-following momentum indicator (MACD) for the detection the change of a trend would

be better identified which would allow a more efficient action of the HMM those moments reducing the

amount of losses.


2002 806 975 859 1028 2008 3169 2705 2756 2292

2003 1575 1621 1158 1204 2009 1873 1845 1507 1479

2004 -848 -847 -641 -640 2010 1889 1459 1498 1068

2005 19 -115 150 16 2011 1666 1110 2164 1608

2006 458 658 649 849 2012 -121 170 -599 -308

2007 -500 -587 -487 -574 2013 52 -106 -153 -311


10038 8888 8861 7711

YearPip Values

YearPip Values

TOTAL

Model Normal W/out FED W/out ECB W/out FED+ECB

Multi DHMM - MACD - ADX 6668 5410 6141 4883

Multi DHMM - RSI - ADX 3434 1582 2413 561

43

Table 20 - Results from 2002 to 2013 from Multi DHMM combining MACD and RSI

The result turned out to meet with the objective pursued, managing to reduce the maximum value of

losses in a year to -680 Pips without reducing the total amount of earnings at the end of 14 years. Once

again one can view that the analysis without the data of the days of the communications made by the

Fed or ECB have a negative impact.

4.2.6 Case Study VII – Combining MACD, RSI and mixed training days

Although the above results show an increase in the total amount of pips and a reduction of losses over

the years, it was decided to try to further reduce the possible annual losses even if it represents a fall

on the positive results of the remaining years. The objective focuses on having losses of less than -500

pips. That said, the models analyzed in previous case studies are paired and if their predictions are

different, the algorithm does not predict any direction. Therefore, on that day will not be opened any

position and closed all the existing positions.

Table 21 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM MACD

In Table 22 are the results obtained with the combination HMM 15, 30 and 90 and MACD HMM are

plotted according to the purpose. On average this model managed to have an annual loss of -93.45

pips and an 842.91 pip profit. This combination also showed the best pip total in 11 years from the

three analyzed in this section. In Table 23


2002 790 751 647 608 2008 3365 2791 2638 2064

2003 1787 1833 1312 1358 2009 1647 1619 1281 1253

2004 -464 -463 -257 -256 2010 1627 1197 1050 620

2005 1061 907 1158 1004 2011 -282 -838 300 -256

2006 566 766 425 625 2012 303 458 53 208

2007 -680 -767 -581 -668 2013 -10 -168 -45 -203


9710 8086 7981 6357

YearPip Values

YearPip Values

TOTAL


2002 437 700 499 762 2008 2312 2190 1860 1738

2003 1325 1220 1019 914 2009 1121 1147 696 722

2004 -313 -317 -265 -269 2010 2175 1599 1761 1185

2005 127 59 348 280 2011 1373 1013 1871 1511

2006 240 440 394 594 2012 -442 -133 -670 -361

2007 -273 -383 -217 -327 2013 162 78 53 -31


8244 7613 7349 6718

YearPip Values

YearPip Values

TOTAL

44

Table 22 - Results from 2002 to 2013 combining HMM 15 30 90 and HMM RSI

Table 23 shows the results obtained by combining HMM RSI and HMM 15 30 90. Although this sub-

model has the lowest total of the three, has an average value much lower than the previous sub-model.

This sub-model achieved an average loss of -49.1818 pips and annual earnings of 544.273 pips. Finally,

Table 23 shows the results obtained from the HMM RSI and MACD HMM model. This model has the

lowest average loss over the 11 years of the three achieving a total of -39,82pips a very small result

compared to all other models already developed. Their average gain is 698pips, which presents itself

as the second best of the three models analyzed in this section.

Table 23 - Results from 2002 to 2013 combining HMM MACD and HMM RSI

With this approach, the aim of having losses below 500 Pips was reached, naturally in addition to the

lower, the gains have also suffered from this new change. Nevertheless, the three cases studied had

very positive results, specifically the pair HMM 15 30 90 and HMM MACD who got one of the best results

of all simulations.

The junction of the various algorithms ultimately creates a stronger forecast removing large variations

in earnings. Despite these variations can be seen as beneficial when positive, is not the case with large

negative changes that can completely disable the algorithm.


2002 -10 154 33 197 2008 129 728 -391 208

2003 757 539 765 547 2009 247 273 -102 -76

2004 657 648 652 643 2010 2066 1669 1685 1288

2005 -127 -205 231 153 2011 478 233 1227 982

2006 841 906 847 912 2012 -404 -289 -442 -327

2007 107 -21 157 29 2013 705 520 681 496


5446 5155 5343 5052

YearPip Values

YearPip Values

TOTAL


2002 359 429 393 463 2008 986 1243 505 762

2003 1007 940 904 837 2009 999 971 709 681

2004 122 118 276 272 2010 1780 1529 1422 1171

2005 -235 -379 33 -111 2011 771 330 1520 1079

2006 1059 1124 1102 1167 2012 -83 14 -371 -274

2007 -120 -225 -113 -218 2013 595 336 475 216


7240 6430 6855 6045

YearPip Values

YearPip Values

TOTAL

45

Figure 27 - Performance of each developed model from 2002 to 2013

4.3 Multi DHMM Automation

This section aims to analyze the result of the final adaptation of the Multi DHMM. This final adaptation

had as main objective the substantial reduction of its losses each year. For this purpose, were introduced

to the model some of the approaches analyzed in the previous case studies which have demonstrated

better performance. A characteristic withdrawal of the model analyzed in the previous case study

focuses on the addition of a STOP signal that tells that there is uncertainty in the forecast and therefore

it will be better to discard the current forecast. By analyzing Table 28 and 29 is possible to verify the

positive impact that this new signal had in the final results. When analyzed all possible outcomes from

the model, the percentage of correct predictions reaches 52%, if the estimates that generated the STOP

sign are discarded, since in these moments there were no losses nor gains, the percentage rises to

57%, one considerable difference of 5% which has its impact on the final results.

Table 24 - Resulting percentages from the final model

Table 25 - Resulting percentages from the final model without stop signal

It is possible to identify Table 30 the complete elimination of years with a negative pip total. The year of

2007 that appeared negative in the previous case of states, recovered to a gain of 217 pips. Despite

this gain can be considered small, also turns out to be the lowest value obtained over the 12 years

analyzed. Furthermore, 2008 and 2010 shows an above the earning average of 2196 pip. The analysis

-1500

500

2500

4500

6500

8500

10500

0,8

0,9

1

1,1

1,2

1,3

1,4

1,5

1,6

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Close Values HMM153090 + HMMMACD

HMM153090 + HMMRSI HMMMACD + HMMRSI

False True Stop Total

Num of Days 1210 1624 283 3117

Percentage 39% 52% 9% 100%

False True Total

Num of Days 1210 1624 2834

Percentage 43% 57% 100%

46

without the days of FED and ECB press conferences suggests once again that despite these days

represent days of instability in the market, their inclusion brings more benefits than one might think at

first analysis being noticed a degradation of the results when those days are removed.

Table 26 - Results from 2002 to 2013 from the Final Model

Over the 12 years analyzed the EUR / USD often changed behavior. Between 2002 and 2013 can be

detected relatively stable periods and sharp falls, as the 2008 financial crisis and subsequent instability

as large oscillation periods. From the graph we can see that the developed method detects and quickly

adapts to new market trends, such as the rapid detection of the 2008 financial crisis where the current

approach had the highest profit. These results suggest that the developed method is well prepared for

fluctuations or different market trends that may arise in the future. Thus, the results show the multi

DHMM a profitable method readily adaptable even in unpredictable market conditions.

Figure 28 - Final model resulting performance from 2002 to 2013

The first analysis of the DHMM demonstrated that this model is an excellent algorithm to perform the

forecast of the Forex values direction. In tests conducted the discrete version was more effective as the

continuous version.


2002 1076 934 997 855 2008 5690 5058 5027 4395

2003 1180 1121 1419 1360 2009 2923 2458 2717 2252

2004 2481 2208 2132 1859 2010 4567 3761 4000 3194

2005 2050 1896 2053 1899 2011 2249 1898 2736 2385

2006 754 927 981 1154 2012 2398 2269 2123 1994

2007 217 385 194 362 2013 764 774 639 649


26349 23689 25018 22358

YearPip Values

YearPip Values

TOTAL

-500

4500

9500

14500

19500

24500

29500

0,8

0,9

1

1,1

1,2

1,3

1,4

1,5

1,6

1,7

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Close Value Pip Somation

47

The HMM itself has limitations in adapting to new patterns and trends. The speed of adjustment depends

on the window size which the DHMM is trained, i.e. the smaller the window more sensitive the algorithm

will be to variations and as result, will be more vulnerable to noise.

To help overcome the difficulties encountered were used three indicators (RSI, MACD and ADX), and a

joint between the DHMM 15, DHMM 30 and DHMM 90. Results showed that the use of RSI and MACD

indicators could provide relevant information on market behavior changes. With the information provided

by these indicators it is possible to make the adaptation of the algorithm to the new trend much faster.

The use of ADX, to confirm the existence of a trend turned out to be of no use, since the obtained

results failed to achieve the intended objective.

The following objective focused on the limitation of the annual losses in -500 Pips. The objective was

achieved by combining previous tested cases, such as, HMM 15 30 90 and HMM MACD, HMM 15 30

90 and HMM RSI and for last the pair HMM MACD and HMM RSI. Thus, values obtained from the best

cases were selected for incorporation the automated method, these are:

DHMM MACD

DHMM MACD RSI

DHMM RSI and DHMM MACD

DHMM 15 30 90 and DHMM RSI

DHMM 15 30 90 and DHMM MACD

Having analysed the automated method, we observed that the aim of reducing annual losses was largely

achieved. This new approach could register 12 years of steady profits. In addition, the inclusion of a

STOP signal largely expanded capacity to contain unnecessary losses and readily adaptable even in

unpredictable market conditions. In Table 31 are described the results of the underlying methods that

are presented in the Multi DHMM as the result of the final method.

Algorithms HMM

MACD

HMM

MACD RSI

HMM RSI +

HMM MACD

HMM 15 30 90

+ HMM RSI

HMM 15 30 90

+ HMM MACD

Multi

DHMM

Total (Pip) 10038 9710 7240 5446 8244 26349

Table 27 - Summary of the results from sub-models and Final Model

The secondary analysis on the inclusion of the press conferences days by the ECB and FED concluded

that although they represent days of greater uncertainty, there is no advantage in removing the

respective days of algorithm analysis being noticed a degradation of the results when those days are

removed.

48

4.4 Conclusions

Throughout this chapter we analyzed the different case studies related to strategies that were chosen

along the development of the model. The viability of DHMM was tested and compared their performance

with the performance of CHMM. Subsequently were tested all sub-models until reaching the final model.

The results show that the inclusion of sub-models in the model developed during the process led to

much higher gains compared to the individual gains from each sub-model, i.e., the final model has a

gain of 26349 pips after 11 years (from 2002 to 2013), this value is 2.6 times higher than the best results

obtained from the separate analysis of the sub-models for the same period. This is due to the need for

confirmation of the estimate from the majority of the sub-models. The confirmation of the adaptation of

the model to new trends can also be confirmed from the model performance over the chosen years for

analysis since between 2002 and 2013 it can be found diversified behaviors of the EUR / USD and

despite this diversification, the model has de capability to adapt to each one of those cases.

49

CHAPTER 5 CONCLUSIONS AND FUTURE WORK

The work developed throughout this dissertation presents a new methodology for analysis and forecast

the direction of the Forex market daily closing price. For this purpose it was used the discrete version of

the HMM (DHMM) due to the transformation of data into three discrete values representatives of the

increase, price decrease and maintenance of the value from the previous day.

One of the great innovations of this work is the use of three DHMMs simultaneously for the prediction

of the direction of the market closing price. Each of these three DHMMs is trained in a different window

size, permitting that each one acquire a different sensitivity to fluctuations in market behavior. The

addition of technical indicators to the model to indicate changes in market trends enabled to use

specifically the particular feature of each DHMMs, making adaptation to different market behaviors much

more rapid and efficient. With the use of technical indicators and the three DHMM were developed sub-

models that showed different characteristics and results between them. The best were added later

creating a supermodel able to adapt and respond to the demands of the Forex market.

Tests were conducted using data from the Forex EUR / USD pair between January 2002 and December

2013. The sum of "price interest points" (pips) was the primary metric used for the analysis of the results

because it gives a greater perception of profit.

5.1 Conclusions

After developing the model and analyzed the results is easy to conclude that the strategy used to

forecast the direction of the daily closing value for the FX EUR / USD is presented as a great choice.

The ease with which this strategy adapts to new trends and behaviors of the market and the quality of

their predictions not only allowed faster adaptation but also a substantial reduction in losses that the

DHMM alone had. These features were made possible by the merger of technical indicators (RSI and

MACD), already widely used in technical analysis of financial markets, with the implementation strategy

of three DHMM. The results show that the inclusion of sub-models in the model developed during the

process led to much higher gains compared to the individual gains from each sub-model, i.e., the final

model has a gain of 26349 pips after 11 years (from 2002 to 2013), this value is 2.6 times higher than

the best results obtained from the separate analysis of the sub-models for the same period. This is due

to the need for confirmation of the estimate from the majority of the sub-models. The confirmation of the

adaptation of the model to new trends can also be confirmed from the model performance over the

chosen years for analysis since between 2002 and 2013 it can be found diversified behaviors of the

EUR / USD and despite this diversification, the model has de capability to adapt to each one of those

cases.

5.2 Future Work

A major problem identified during the development of the Multi-DHMM was the delay in adapting to new

market trends, these problems despite being clogged do not ceased to exist. Therefore, one of the key

points to be developed in future work could go through the reduction of time delay in detection of new

50

trends in financial time series. Beyond the point above, were also identified other changes that would

be interesting to develop in future work:

The changing trends and market behavior and the delay that the model notes to adapt to these

new circumstances create losses that although minimized are still be recorded. One possible

solution, possible to remedy that problem could involve the use of wavelets, for a time-frequency

analysis present in financial time series. It would be desirable to divide this analysis time series

into sets of frequencies and use a single model in each set. Can be associated the different

frequencies to different behaviors of the market and so it was possible to train each model to

different behaviors and market trends.

In this model is used as final result the most expected forecast. This type of prediction does not

take into account the performance of each sub-model at every moment. That said, it would be

interesting to study the use of an assessment metric to understand which sub-models have

been showing better performance in latest forecasts and predict taking into account this same

analysis.

Throughout the analysis of financial time series there may be times when the use of three

DHMMs may be considered insufficient or excessive, so it was interesting to examine how to

adapt the number of the possible need DHMMs analysis and pass a number of dynamic models

dependent on market behavior.

For the training of DHMM there are alternative models that could have been used, so it was

interesting to study the impact on the Multi-DHMM to use an alternative method to the Baum-

Welch Algorithm for model training.

One of the methods used to predict the value from the Hidden Markov Model is the use of

likelihood models. These models search in past data the most likely time and predict the next

moment from the next instant from the instant identified. It would be interesting to study the

impact and compare the results of using these types of models to predict compared to the Viterbi

algorithm used in the model developed for this thesis.

51

References

[1] M. R. King and D. Rime, “The $4 trillion question: what explains FX growth since the 2007

survey?,” pp. 27-39, December 2010.

[2] K. G. Kulkarni and G. A. Kulkarni, “FUNDAMENTAL ANALYSIS vs. TECHNICAL ANALYSIS: A

CHOICE OF SECTORAL ANALYSIS,” Internacional Journal of Engineering & Management

Sciences, vol. Vol. 4 Issue 2, p. 234, Apr. 2013.

[3] M. McDonald, “FOREX SIMPLIFIED,” in Behind the Scenes of Currency Trading, Marketplace

Books, Aug. 2007, pp. 19-32.

[4] C. J. Neely and P. A. Weller, “Technical Analysis in the Foreign Exchange Market,” Federal

Reserve Bank of St. Louis Research Division, St. Louis, Jan. 2011.

[5] P. S. Froidevaux, “Fundamental Equity Valuation,” in Stock Selection based on Discounted Cash

Flow, Fribourg, University of Fribourg, Faculty of Economics and Social Sciences, 2004, pp. 3-4.

[6] K. Maciejczyk and X. Hu, “Forex Analysis and Money Management,” in Interactive Qualifying

Project, Worcester, Worcester Polytechnic Institute, 2012, pp. 21-35.

[7] S. B. Achelis, “Technical Analysis From A-To-Z,” Vision Books, 2000.

[8] M. D. Sheimo, “Cashing in on the dow,” in Using Down Theory to Trade and Determine Trends in

Today's Markets, Apr. 1998, p. 87.

[9] W. J. Wilder, “New Concepts in Technical Trading Systems,” Greensboro, Trend Research,

1978.

[10] A. W. Lo, H. Mamaysky and J. Wang, “Foundations of Technical Analysis: Computational

Algorithms, Statistical Inference and Empirical Implementation,” The Journal of Finance, August

2000.

[11] J. Hayden, Trend Determination - a Quick, Accurate, & Effective Methodology.

[12] B. Zhou and J. Hu, “A Dynamic Pattern Recognition Approach Based on Neural Network for

Stock Time-Series,” Graduate School of Information, Production and System, Waseda

University, Fukuoka, 2009.

[13] A. Gupta and B. Dhingra, “Stock Market Prediction Using Hidden Markov Models”.

[14] M. R. Hassan, B. Nath and M. Kirley, “A fusion model of HMM, ANN and GA for stock market

forecasting,” Elsevier Ltd, Melbourne, 2006.

52

[15] A. Canelas, R. Neves and N. Horta, “A New SAX-GA Methodology Applied to Investment

Strategies Optimization,” in GECCO'12, 2012.

[16] E. Stephan and G. Kiell, “Decision processes in professional investors: Does expertise moderate

judgemental biases,” IAREP/SABE Proceeding, pp. 416-420.

[17] S. N. Neftci, “Naive Trading Rules in Fiancial Markets and Wiener-Kolmogorov Prediction

Theory: A Study of "Technical Analysis",” Journal of Business, Volume 64, Issue, 1991.

[18] J. L. Treynor and R. Ferguson, “In Defense of Technical Analysis,” in Anual Meeting American

Finance Association, Dallas, 1985.

[19] A. Gunasekarage and D. M. Power, “The profitability of moving average trading rules in South

Asian stock markets.,” in Emerging Markets Review 2, 2001, pp. 17-33.

[20] K.-Y. Kwon and J. R. Kish, “Technical trading strategies and return predictability: NYSE,” in

Applied Financial Economics, 2002.

[21] T. T.-L. Chong and W. K. Ng, “Technical analysis and the London stock exchange: Testing the

MACD and RSI rules using the FT30,” in Appl. Econ. Lett., 2008, pp. 1111-1114.

[22] T. T.-L. Chong, W.-K. Ng and V. K.-S. Liew, “Revisiting the Performance of MACD and RSI

Oscillators,” J. Risk Financial Manag. , vol. 7, pp. 1-12, 2014.

[23] G. A. Fink, “Markov Models for Pattern Recognition,” in From Theory to Applications, Dortmund,

Springer, 1998, pp. 61-92.

[24] A. Papoulis, “Brownian Movement and Markov Processes,” in Probability Random Variables and

Stochastic Processes, 2nd ed., New York, McGraw-Hill, 1984, pp. 515-553.

[25] D. Ramage, “Hidden Markov Models Fundamentals,” 2007.

[26] M. R. Hassan and B. Nath, “Stock Market Forecasting Using Hidden Markov Model: A New

Approach,” Computer Society, Melbourne, 2006.

[27] Y. Zhang, “Prediction of Financial Time Series With Hidden Markov Models,” Shandong, 2001.

[28] M. Collins, “The Forward-Backward Algorithm,” Department of Computer Science, Columbia

University, Columbia.

[29] L. J. Rodríguez and I. Torres, “Comparative Study of the Baum-Welch and Viterbi Training

Algorithms Applied to Read and Spontaneous Speech Recognition,” in Pattern Recognition and

Image Analysis, Springer, 2003, pp. 847-857.

53

[30] J. A. Bilmes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation

for Gaussian Mixture and Hidden Markov Models,” International Computer Science Institute,

Berkeley, 1998.

[31] J. N. Liu and R. W. Kwong, “Automatic extraction and identification of chart patterns towards

financial forecast,” ScienceDirect, Hong Kong, 2006.

[32] X. Ge and P. Smyth, “Deformable Markov Model Templates for Time-Series Pattern Matching,”

Department of Information and Computer Science, University of California, Irvine, 2000.

[33] A. X. Huang and J. M, “Hidden Markov Models for speech recognition,” Edinburgh University

Press., 1990.

[34] R. L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech

Recognition,” Proceedings of the IEEE, vol. 77(2), pp. 257-286, 1989.

[35] G. Appel, Technical Analysis: Power Tools for Active Investors, FT Press, 2005.

[36] J. J. Murphy, “How to Spot Market Trends,” in The Visual Investor, John Wiley and Sons, 2009,

p. 100.

[37] A. Krogh, “An Introduction to Hidden Markov Models for Biological Sequences,” in Computational

Methods in Molecular Biology, Lyngby, Elsevier, 1998, pp. 45-63.

[38] N. Mimouni, G. Lunter and C. Deane, “Hidden Markov Models for Protein Sequence Alignment,”

University of Oxford, Oxford.

[39] M. Stanke and S. Waack, “Gene prediction with a hidden Markov model and a new intron

submodel,” Oxford Journals, vol. 19, no. Bioinformatics, pp. 215-225, 2003.

[40] C. Karlof and D. Wagner, “Hiden Markov Model Cryptanalysis,” Department of Computer

Science, University of California, Berkeley.

[41] S. M. Thede and M. P. Harper, “A Second-Order Hidden Markov Model for Part-of-Speech

Tagging,” pp. 175-182.

[42] M. Gales and S. Young, “The Application of Hidden Markov Models in Speech Recognition,”

Foundations and Trends in Signal Processing, vol. 1, pp. 195-304, 2007.

[43] M. Jensen and G. Bennington, “Random Walks and Technical Theories: Some Additional

Evidences,” Journal of Finance 25 , vol. 2, pp. 469-482, 1970.

54

1

APPENDIX 1 - GRAPHS USED TO ASSESS THE HMM VIABILITY

0 5 10 15

01234

Time

Val

ue

1.txt

0 5 10 15

01234

Time

Val

ue

2.txt

0 5 10 15

01234

time

valu

e

3.txt

0 5 10 15

01234

time

valu

e

4.txt

0 5 10 15

01234

time

valu

e

5.txt

0 5 10 15

01234

time

valu

e

6.txt

0 5 10 15

0

2

4

time

valu

e

7.txt

0 5 10 15

01234

time

valu

e

8.txt

2

3

APPENDIX 2 – HIDDEN MARKOV MODEL TUTORIAL

Markov Models

Considering the discrete stochastic sequence of random variables X1,X2,…,Xt , which take on values xt,

from a continuous or discrete domain according to individual probability distributions, it is said to be

stationary, if the probability distribution is the same for all random variables Xt. The process is further

said to be causal, if the distribution of the random variable Xt is only dependent on past states x1,x2,…,xt-

1. For a discrete, stationary and causal stochastic process the probability distribution can be written as:

𝑃(𝑋𝑡 = 𝑥𝑡|𝑋1 = 𝑥1, 𝑋2 = 𝑥2, … , 𝑋𝑡−1 = 𝑥𝑡−1)

Due to the causality of the stochastic process, while the sequence of random variables evolve over time

the number of random variables will grow considerably and, therefore, an arbitrary long set of

dependencies for the probability distribution can be generated.

The restriction of the causality in each stochastic process is imposed by the Markov property. , this

property refers that the conditional probability distribution depends only upon the last state to the

prediction of the present state, it also means that future states do not depend on the past, i.e. ,

memoryless.

A first order Markov process is a stationary and causal stochastic process that satisfies the Markov

property and can be written as:

𝑃(𝑋𝑡 = 𝑥𝑡|𝑋1 = 𝑥1, 𝑋2 = 𝑥2, … , 𝑋𝑡−1 = 𝑥𝑡−1) = 𝑃(𝑋𝑡 = 𝑥𝑡|𝑋𝑡−1 = 𝑥𝑡−1)

Markov Chains

A Markov Chain is a collection of random variables {Xt} having the property that, given the present, the

future is conditionally independent of the past, in other words, (24) define a Markov chain, where the

possible values of Xi form a countable set S called the state space of the chain.

For example, considering the Forex market where we are interested in the movement of the GBP/EUR

pair and representing the 0, 1 and 2 nodes as a possible 3 states vector {drop, no change, rise} one can

construct the Markov chain in Figure 12.

4

Example of a Markov Chain

According to the Figure 12 a drop on price is followed by another drop 38% of the time, a rise 60% of

the time and have no change in 0,02% of the time. The transition matrix for this example is:

𝐴 = [0,38 0,02 0,600,42 0,05 0,580,60 0,06 0,34

]

To predict which price movement is more likely to happen in time t+3, if in time t the system is in state

0 (drop). The distribution over states can be re-written using a stochastic row vector 𝑥 with the relation

𝑥(𝑡+1) = 𝑥(𝑡)𝐴 :

𝑥(𝑡+3) = 𝑥(𝑡+2)𝐴 = (𝑥(𝑡+1)𝐴)𝐴 = 𝑥(𝑡+1)𝐴2 = (𝑥(𝑡)𝐴2)𝐴 = 𝑥(𝑡)𝐴3

Using (27) when x(t) is the initial matrix 𝜋, one can predict that is more likely for the price to rise (be in state 2) in time t+3:

𝑥(𝑡+3) = [1 0 0] [0,38 0,02 0,600,42 0,05 0,580,60 0,06 0,34

]

3

= [0,480 0,039 0,484]

From the same model we can calculate the probability of a certain sequence, e.g., the probability of

seeing “drop, drop, rise, no change”. For this example the state sequence is defined as X = {0, 0, 2, 1}

and given the initial matrix 𝜋 and the transition matrix A the probability of the given sequence is: [6], [3]

𝑃(𝑋|𝐴, 𝜋) = 𝑃(0,0,2,1|𝐴, 𝜋) =∏𝜋𝑋𝑡𝑋𝑡+1 =

𝑇−1

𝑡=1

𝑃(0)𝑃(0|0)𝑃(2|0)𝑃(1|2)

= 𝜋 × a00 × a02 × a21

5

In Markov chains, the state is directly visible to the observer but in the financial market, despite the

previous example, is possible to find other important factors hidden from the observable data which

plays an important role to predict the market behaviour (e.g. volatility, overseas markets, monetary

policies…).

Introduction to the Hidden Markov Model

The Hidden Markov Model (HMM) presents as a model capable of overcome the Markov Chain

limitations. In this new model the state is not directly visible but the observation generated by the

probabilistic function associated to the state is visible. The process can be presented in the Bayesian

network shown at Figure 13. [5]

Bayesian network

In a HMM the state transition can be described within a finite and discrete state space while the

observation can either be discrete or continuous. As a Markov Model the Hidden Markov Model satisfies

the Markov property, therefore, each state in a given time t only depend on his immediate predecessor

state t-1 and as in every time t a new observation (also known as emission) is generated and only

dependent on the respective state in time t both can be characterized as (29) and (30).

𝑃(𝑆𝑡|𝑆1, 𝑆2, … , 𝑆𝑡−1) = 𝑃(𝑆𝑡|𝑆𝑡−1)

𝑃(𝑂𝑡|𝑂1 , 𝑂2, … , 𝑂𝑡−1, 𝑆1, 𝑆2, … , 𝑆𝑡) = 𝑃(𝑂𝑡|𝑆𝑡)

A Hidden Markov Model of first order (usually denoted as λ) can be completely characterized be the

following [1], [4]

a finite set of states

a finite (discrete) or infinite (continuous) set of observations.

a state transition probability matrix A:

𝑨 = {𝑎𝑖𝑗|𝑎𝑖𝑗 = 𝑃(𝑆𝑡 = 𝑗|𝑆𝑡−1 = 𝑖)}

a vector π of start probabilities

𝝅 = {𝜋𝑖|𝜋𝑖 = 𝑃(𝑆1 = 𝑖)}

6

an observation emission probability distribution that characterizes each state

{𝑏𝑗(𝑜𝑘)|𝑏𝑗(𝑜𝑘) = 𝑃(𝑂𝑡 = 𝑜𝑘|𝑆𝑡 = 𝑗)}

However, if the observation set is finite each observation have a symbolic nature and the quantity 𝑏𝑗(𝑜𝑘)

represent a discrete probability distribution which can be described as an emission matrix of probabilities

(34). If the set of observations are an infinite set of vector valued quantities 𝑥 ∈ ℝ𝑛, the observations are

described from a continuous probability density function (35).

𝑩 = {𝑏𝑗(𝑜𝑘)|𝑏𝑗(𝑜𝑘) = 𝑃(𝑂𝑡 = 𝑜𝑘|𝑆𝑡 = 𝑗)}

𝑏𝑗(𝑥) = 𝑝(𝑥|𝑆𝑡 = 𝑗)

Three fundamental questions in the HMM

Taken the characterization of HMM, to understand how the model can be used to analyse and predict a

certain temporal pattern it is important to answer three fundamental questions [1][4][8][9]

Evaluation

Given the model 𝜆 = (𝐴, 𝐵, 𝜋) how can we assess the quality of the model to describe the

statistical properties of certain data (𝑂|𝜆) ?

This question, also known as filtering, can also be interpreted as the need to compute the probability of

a state at a certain time, given the history of evidence.

To resolve the previous question one can use a merely probabilistic analysis (36) to assess the

production probability 𝑃(𝑂|𝜆), however, this strategy is inefficient leading to an exponential number of

operations, NT. [9]

𝑃(𝑂|𝜆) =∑𝑃(𝑂|𝑠, 𝜆)𝑃(𝑠|𝜆)

𝑠

where

𝑃(𝑂, 𝑠|𝜆)𝑃(𝑠|𝜆) =∏ 𝑎𝑠𝑡−1,𝑆𝑡𝑏𝑠𝑡

𝑇

𝑡=1(𝑂𝑡)

Forward Algorithm

Given a certain model λ in state j and time t, is totally irrelevant to know which path and generated

outputs have led to the respective state. Due to the Markov property is sufficient to consider all possible

states in t-1.

The Forward algorithm presents itself as a solution to the previous problem, taking into account the

irrelevant of the past states and decreases drastically the complexity to linear in T.

To compute the production probability the forward algorithm uses the variable 𝛼𝑡(𝑖), known as forward

variable, is defined as the probability of ending in state 𝑠𝑖 given the observation sequence 𝑂1 , 𝑂2, … , 𝑂𝑡

7

𝛼𝑡(𝑖) = 𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠𝑡 = 𝑖|𝜆)

The algorithm is described as follow:

Initialization

𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1)

Recursion

𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑂𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑖 , for t = 1 … T - 1

Termination

𝑃(𝑂|𝜆) = ∑ 𝛼𝑇(𝑖)

𝑁

𝑖=1

Complementary to the forward algorithm a similar process can be used to take account future history.

This process is called smoothing and use the backward algorithm.

Backward Algorithm

As the forward algorithm, the backward algorithm has its own quantity and is referred as the backward

variable. It represents the probability of the ending sequence 𝑂𝑡+1, 𝑂𝑡+2, 𝑂𝑇 given the model 𝜆 and the 𝑠𝑖

at time t.

𝛽𝑡(𝑗) = 𝑃(𝑂𝑡+1, 𝑂𝑡+2, … , 𝑂𝑇|𝑠𝑡 = 𝑗, 𝜆)

The algorithm is described as follow:

Initialization

𝛽𝑇(𝑖) = 1

Recursion

𝛽𝑡(𝑖) = ∑ 𝑎𝑖𝑗𝑏𝑗(𝑂𝑡+1)𝛽𝑡+1(𝑗)𝑗 , for t = T – 1 … 1

Termination

𝑃(𝑂|𝜆) = ∑ 𝜋𝑖𝑏𝑖(𝑂1)𝛽1(𝑖)

𝑁

𝑖=1

Decoding

What is the most probable state sequence that for a given model 𝜆 = (𝐴, 𝐵, 𝜋) generate the

observation sequence?

The question is resolved by the Viterbi algorithm. This algorithm look at every state sequence and simply

select the most likely sequence in a process assumed to be a finite-state and discrete in time Markov

process.

8

Viterbi Algorithm

Like the forward or the backward algorithm, the Viterbi algorithm also have a variable represented

by 𝛿𝑡(𝑖). This new variable generate the segment from the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 with

maximum likelihood and ends in state 𝑠𝑖.

𝛿𝑡(𝑖) = max𝑠1,𝑠2,…,𝑠𝑡−1

𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠1, 𝑠2, … , 𝑠𝑡−1, 𝑠𝑡 = 𝑖|𝜆)

This variable 𝛿𝑡(𝑖) can be compared with the forward variable 𝛼𝑡(𝑖), except that the Viterbi algorithm

uses maximization instead a summation over previous states.

Optimal path using Viterbi Algorithm

The Viterbi algorithm is described as follows:

Initialization:

For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 1 we have:

𝛿1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1),

𝜓1(𝑖) = 0,

Recursion:

For all times 𝑡, 1 ≤ 𝑡 ≤ 𝑇 − 1 and all states 𝑖, 𝑗 ∈ [1, 𝑁] we have:

𝛿𝑡+1(𝑗) = max𝑖{𝛿𝑡(𝑖) 𝑎𝑖𝑗}𝑏𝑗(𝑂𝑡+1)

𝜓𝑡+1(𝑗) = 𝑎𝑟𝑔max𝑖{𝛿𝑡(𝑖)𝑎𝑖𝑗}

Termination:

For all states 𝑖, 𝑗 ∈ [1, 𝑁] in 𝑡 = 𝑇 we have:

𝑃∗(𝑂|𝜆) = 𝑃(𝑂|, 𝑠∗|𝜆) = max𝑖𝛿𝑇(𝑖)

9

𝑠𝑇∗ = 𝑎𝑟𝑔max

𝑗 𝛿𝑇(𝑗)

Optimal Path

Back-tracking for all times 𝑡, 𝑇 − 1 ≥ 𝑡 ≥ 1 we have:

𝑠𝑡∗ = 𝜓𝑡+1(𝑠𝑡+1

∗ )

In the previous description of the Viterbi algorithm one can observe a new variable 𝜓𝑡(𝑗) known as

backward pointer that for each 𝛿𝑡(𝑖) stores the optimal predecessor state. The optimal path can be

recursively reconstructed in inverse chronological order using (53).

The complexity of the Viterbi algorithm is 𝑂(𝑇 × |𝑆|2).

Parameter Estimation

How can we estimate the model parameters given an observation set?

In order to answer the last question is important to use an algorithm capable of finding the unknown

parameters 𝜆 = {𝜋, 𝐴, 𝐵} of a HMM. Although exists some algorithms with capacity to address the

question, due to the type of data on which it will be used it is necessary that the algorithm does not need

any model initialization. This algorithm is called the Baum-Welch and use the EM algorithm to find the

maximum likelihood estimate of 𝜆 = {𝜋, 𝐴, 𝐵} given the observation sequence 𝑂1, 𝑂2, … , 𝑂𝑡 and uses the

production probability 𝑃(𝑂|𝜆) as the optimization criterion.

The Viterbi training algorithm is another algorithm that can be presented as an answer to the parameter

estimation question, however, The Viterbi training requires some reasonable initialization and makes a

limited use of the training data and is less robust, since it only uses observations inside the segments

corresponding to a given HMM state to re-estimate the parameters of that state. [11][1][6][12]

Baum-Welch Algorithm

The Baum-Welch algorithm starts using the Forward-Backward algorithm. Once this algorithm

corresponds to an aggregation of the Forward and the Backward algorithms explained before no detailed

explanation will be given.

Before starting with the iterative process the transition matrix A, emission matrix B and initial matrix 𝜋

from 𝜆 are set with random initial conditions.

The Baum-Welch algorithm is described as follow:

Forward Procedure:

Having 𝛼1(𝑖) = 𝑃(𝑂1, 𝑂2, … , 𝑂𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of ending in state 𝑠𝑖 given the observation

sequence 𝑂1, 𝑂2, … , 𝑂𝑡 is recursively computed,

3. 𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑂1)

4. 𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑂𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑁𝑖=1

10

Backward Procedure:

Having 𝛽𝑡(𝑖) = 𝑃(𝑂𝑡+1, 𝑂𝑡+2, … , 𝑂𝑡|𝑠𝑡 = 𝑖, 𝜆), the probability of the ending sequence 𝑂𝑡+1, 𝑂𝑡+2, 𝑂𝑇 given

the model 𝜆 and the 𝑠𝑖 at time t is recursively computed,

3. 𝛽𝑇(𝑖) = 1

4. 𝛽𝑡(𝑖) = ∑ 𝛽𝑡+1(𝑗)𝑁𝑗=1 𝑎𝑖𝑗𝑏𝑗(𝑂𝑡+1)

Optimization:

It is now possible to compute the temporary variables:



,

This quantity 𝛾𝑡(𝑖) represents the probability of being in state 𝑠𝑖 and time t having the observation set

𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.

𝜉𝑡(𝑖, 𝑗) = 𝑃(𝑠𝑡 = 𝑖, 𝑠𝑡+1 = 𝑗|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛼𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)

∑ ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑂𝑡+1)𝑁𝑗=1

𝑁𝑖=1

,

This quantity 𝜉𝑡(𝑖, 𝑗) represents the probability of being in state i and j in times t and t+1 respectively,

having the observation set 𝑂1, 𝑂2, … , 𝑂𝑡 and the parameters from 𝜆.

With the computation of this two quantities it is now possible to update the model determining the

expected quantities �̂� = {�̂�, �̂�, �̂� }

Update of the model �̂�:

𝜋�̂� = 𝛾1(𝑖)

𝑎𝑖�̂� =∑ 𝜉𝑡(𝑖,𝑗)𝑇−1𝑡=1

∑ 𝛾𝑡(𝑖)𝑇−1𝑡=1

𝑏�̂�(𝑘) =∑ 𝛾𝑡(𝑖)𝑡=1,𝑂𝑡=𝑜𝑘

∑ 𝛾𝑡(𝑖)𝑇𝑡=1

Termination:

If the quality measure 𝑃(𝑂|�̂�) was not improved by the updated model �̂� compared to 𝜆, the process


11

Flowchart of the Baum-Welch algorithm for a discrete HMM

The previous description from the Baum-Welch algorithm only take into account a discrete hidden

Markov model. In a continuous hidden Markov model the function 𝑏𝑗(𝑜𝑡) is in the form of a continuous

probability density function (pdf) or a mixture of continuous pdfs, therefore, the procedure will be slightly

different.

The Continuous Hidden Markov Model

To represent a continuous sequence or vector valued quantities the emission probability function 𝑏𝑗(𝑜𝑡)

can no longer be described as a simple matrix of probabilities but rather as a continuous probability

function or a defined number of continuous pdfs as the Gaussian mixture represented in (56) in which

M is the number of mixtures, 𝑤𝑗𝑘 is the weight in mixture k and state j and 𝑁(𝑜𝑡|𝜇𝑗𝑘 , 𝐶𝑗𝑘) represents the

Gaussian density, also known as Normal distribution.

𝑏𝑗(𝑥) = ∑𝑤𝑗𝑘𝑁(𝑜𝑡|𝜇𝑗𝑘, 𝐶𝑗𝑘)

𝑀

𝑘=1

Gaussian Densities

Depending on whether one uses a simple continuous sequence or a vector quantity the type of Gaussian

density will be different. For the first example a simple Gaussian density function will be performed:

𝑔𝑗𝑘(𝑜𝑡) = 𝑁(𝑜𝑡|𝜇𝑗𝑘, 𝐶𝑗𝑘) =

1

√2𝜋𝜎exp (−

(𝑥 − 𝜇)2

2𝜎2)

However, if 𝑜𝑡 is a vector a multivariate Gaussian density is used in the form:

𝑔𝑗𝑘(𝑜𝑡) = 𝑁(𝑜𝑡|𝜇𝑗𝑘, 𝐶𝑗𝑘) =

1

(2𝜋)𝐷/2|𝐶𝑗𝑘|1/2

exp (−1

2(𝑜𝑡 − 𝜇𝑗𝑘)

𝑇𝐶𝑗𝑘−1(𝑜𝑡 − 𝜇𝑗𝑘))

Baum-Welch Algorithm

Given the new emission probability function the Baum-Welch algorithm needs to consider the mean and

covariance values from the Gaussian distribution, therefore, a new quantity 𝛾𝑡(𝑖, 𝑗) is defined which

represents the probability of selecting in state j the kth mixture component at time t for generating the

continuous observation 𝑜𝑡.

12

Similar to the discrete parameter estimation, before starting with the iterative process the model

parameters are set with random initial conditions, however, the continuous model presents a different

emission function characterized by a Gaussian density, therefore, three new quantities are initialized in

addition to the transition matrix A and initial matrix, those are the covariance 𝐶, the mean matrices 𝜇

and the Gaussian mixtures 𝑤. The covariance 𝐶 represents a matrix of matrices, the mean 𝜇 represents

a matrix of vectors representing the mean vector of the vector valued quantities while the matrix 𝑤 is a

states-by-mixtures that stores all different Gaussian mixtures in each state. Taken the last statement

into account the parameter estimation for continuous observations using Baum-Welch algorithm is

described as follows:

Forward Procedure:

Having 𝛼1(𝑖) = 𝑝(𝑜1, 𝑜2, … , 𝑜𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of ending in state 𝑠𝑖 given the observation

sequence of vectors 𝑜1, 𝑜2, … , 𝑜𝑡 is recursively computed,

1. 𝛼1(𝑖) = 𝜋𝑖𝑏𝑖(𝑜1)

2. 𝛼𝑡+1(𝑗) = 𝑏𝑗(𝑜𝑡+1) ∑ 𝛼𝑡(𝑖)𝑎𝑖𝑗𝑁𝑖=1

Backward Procedure:

Having 𝛽𝑡(𝑖) = 𝑝(𝑜1, 𝑜2, … , 𝑜𝑡 , 𝑠𝑡 = 𝑖|𝜆), the probability of the ending sequence of vectors 𝑜1, 𝑜2, … , 𝑜𝑡

given the model 𝜆 and the 𝑠𝑖 at time t is recursively computed,

1. 𝛽𝑇(𝑖) = 1

2. 𝛽𝑡(𝑖) = ∑ 𝛽𝑡+1(𝑗)𝑁𝑗=1 𝑎𝑖𝑗𝑏𝑗(𝑜𝑡+1)

Optimization:



,

𝜉𝑡(𝑖, 𝑗) = 𝑃(𝑠𝑡 = 𝑖, 𝑠𝑡+1 = 𝑗|𝑂, 𝜆) =𝛼𝑡(𝑖)𝛼𝑖𝑗𝛽𝑡+1(𝑗)𝑏𝑗(𝑜𝑡+1)

∑ ∑ 𝛼𝑡(𝑘)𝑎𝑘𝑙𝛽𝑡+1(𝑗)𝑏𝑙(𝑂𝑡+1)𝑁𝑙=1

𝑁𝑘=1

,

𝛾𝑡(𝑖, 𝑗) = 𝛾𝑡(𝑖)

𝑤𝑖𝑗𝑔𝑖𝑗(𝑜𝑡)

𝑏𝑖(𝑜𝑡)

With the computation of this three quantities it is now possible to update the model determining the

expected quantities �̂�, �̂�, �̂�, �̂�:

Update:

𝜋�̂� = 𝛾1(𝑖)

13

𝑎𝑖�̂� =∑ 𝜉𝑡(𝑖,𝑗)𝑇−1𝑡=1

∑ 𝛾𝑡(𝑖)𝑇−1𝑡=1

𝑤𝑖𝑗 =∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1

∑ 𝛾𝑖(𝑡)𝑇𝑡=1

𝜇𝑖𝑗 =∑ 𝛾𝑖𝑗(𝑡)𝑜𝑡𝑇𝑡=1

∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1

𝐶𝑖𝑗 =∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1 (𝑜𝑡−𝜇𝑖𝑗)(𝑜𝑡−𝜇𝑖𝑗)

𝑇

∑ 𝛾𝑖𝑗(𝑡)𝑇𝑡=1

Termination:

If the quality measure 𝑃(𝑜|�̂�) was not improved by the updated model �̂� compared to 𝜆, the process


Flowchart of the Baum-Welch algorithm for a continuous HMM

Forex Market Prediction Using Multi Discrete Hidden Markov ......dentro do intervalo de teste consta um período de grande crise e instabilidade financeira como o verificado em 2008.

Documents

Forex Market Prediction Using Multi Discrete Hidden Markov ......dentro do intervalo de teste consta um período de grande crise e instabilidade financeira como o verificado em 2008.