Robustly Forecasting the Bucharest Stock
Post on 07-Apr-2018
226 Views
Preview:
Transcript
8/4/2019 Robustly Forecasting the Bucharest Stock
1/21
Professor Vasile GEORGESCU, PhD
Department of Mathematical Economics
Faculty of Economics and Business Administration
University of Craiova
Email: v_geo@yahoo.com
ROBUSTLY FORECASTING THE BUCHAREST STOCK
EXCHANGE BET INDEX THROUGH A NOVEL
COMPUTATIONAL INTELLIGENCE APPROACH
Abstract. In this paper two computational intelligence approaches arecontrasted: the common approach, based on one-value-ahead neural network
forecasting methods, and a novel approach, based on a mix of computationalintelligence techniques (noise filtering with wavelets, fuzzy clustering, neuralmapping of fuzzy transitions between cluster prototypes and robust prediction) forone-subsequence-ahead forecasting of stock market indices. The first approach
serves to demonstrate that emerging markets are deeply affected from globalinfluences such as external shocks or signals and at least with neural networkmodels the inclusion of exogenous variables from well established global markets
significantly improves the forecasting performance of the emerging market model.
However, one-value-ahead forecasting of price levels is not as useful as the shapeof middle-term up and down movements, due to their inherent short-termrandomness. The second approach proposes a novel one-subsequence-ahead
forecasting framework that allows the prediction of stock index movements in amore robust way, focusing on predicting one price subsequence rather than one
price level at a time.Keywords: Computational intelligence, Subsequence time series fuzzy
clustering, Neural mapping, One-subsequence-ahead forecasting of time series.
JEL Classification: C22, C45, C51, C53, C63 G17
1. INTRODUCTION
The vision of looking at computational economics from the perspective of
Computational Intelligence (CI) arises essentially from acknowledging the legacyof Herbert Simon to economics and thus primarily tries to face the challenge ofmodeling intelligent behaviors. As opposite to the human-neutral dynamics inphysics, economic dynamics are deeply and inherently induced by either plenty orat least bounded human rationality. Basically, the idea behind CI is to model the
8/4/2019 Robustly Forecasting the Bucharest Stock
2/21
Vasile Georgescu
intelligence observed from natural behavior (neural sciences, linguistic behavior,biology, adaptive ecologic systems, immune systems, and so on). CI is particularlysuitable for modeling and forecasting complex nonlinear and time-varyingfinancial processes, where many difficult problems are attempted to achieve
tractability and robustness: the lack of an a priori specification of model's structure,high noise levels, non-stationarities induced by structural changes over time,fluctuations and shocks, nonlinear effects of either underlying dynamics orcomplex human behavior, and so on. Exploiting the potential of computationalintelligence techniques is at the core of this paper, which particularly focuses onthe prediction of the future change in a stock market index based on informationavailable at the time of the prediction.
The mathematical characterization of stock market movements has been a
subject of intense interest. In principle, stock trading can be profitable if thedirection of price movement can be predicted consistently. However, the predictionof financial markets is a very complex task, because the financial time series areinherently noisy, non-stationary, and deterministically chaotic (i.e., short-term
random but long-term deterministic). Within traditional financial economics most believe that not only financial crises, but also daily price movements, are simply
unpredictable. This conviction is based upon the Eugene Fama's efficient-markethypothesis (EMH) and the related random-walk hypothesis, which staterespectively that markets contain all information about possible future movements
and that the movement of financial prices is random and practically unpredictable.As a consequence, investors' reactions should be random and should follow a
normal distribution pattern so that the net effect on market prices cannot be reliablyexploited to make an abnormal profit especially when considering transactioncosts. Benot Mandelbrot first observed that stock price variations follow complexdynamics where periods characterized by near random walk movements areoccasionally disrupted by large movements (i.e. crashes). Such turbulent events are
much more common than would be predicted in a normal distribution. Althoughthe conventional assumption has been that stock markets behave according to arandom Gaussian distribution, statistical evidence proves this assumption incorrect.By contrary, it suggests that the stock market prices follow an inverse cubic powerlaw. An empirical confirmation of such an assumption is provided by manyfinancial indices, including the Bucharest Stock Exchange BET Index (see Figure3). This led to the conclusion that the nature of market movements is generally
much better explained using nonlinear dynamics and concepts of chaos theory.In the last few decades, both the theoretical advances in behavioral financeand the empirical analyses have consistently found problems with the efficient-market hypothesis. It has become controversial because substantial inefficiencieswere observed in the market (e.g., stocks with low price to earnings, cash-flow or
book value outperform other stocks), leading investors to purchase overpricedgrowth stocks rather than value stocks. Speculative bubbles (anomalies in marketsdriven by buyers operating on irrational exuberance) are yet another contradictionof EMH. Despite the erratic fluctuations in stock prices in the short term, non-
8/4/2019 Robustly Forecasting the Bucharest Stock
3/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
random walk and serial correlation evidence shows that the true value will in thelong run be reflected in the stock price. This means that the problem of stockmarket predictability is difficult in its very nature, but not completely intractable.
As a response to real-world complexity, more and more sophisticated
techniques have been proposed in an attempt to increase the predictability offinancial instruments, including a wide range of computational intelligencetechniques. Among them, feedforward and recurrent neural networks (NNs) gainedincreasing popularity. They, however, did not bear outstanding prediction accuracy(just slightly overperformed the benchmark random walk accuracy rate that is50%) partly because of the tremendous noise and non-stationary characteristics instock market data. Little evidence of predictability is commonly shown when out-
of-sample forecasts are considered. On the other hand, the presence of short-term
randomness suggests that larger profits can be consistently generated if long-termmovements in the stock price are accurately predicted rather than short-termmovements. Unfortunately, most of the proposed models focused on the accurateforecasting of the levels (i.e. value) of the underlying stock index (e.g., the next
day's closing price forecast). Actually, the absolute value of a stock price is usuallynot as interesting as the shape of up and down movements (direction of change).
As an alternative to one-value-ahead forecasting framework, this paperproposes a novel one-subsequence-ahead forecasting approach, which focuses onthe predictability of the direction of stock index movement. It is based upon
computational intelligence techniques and consists of four stages. We start with thepreprocessing stage, which consist of normalizing and de-noising the time series by
wavelet decomposition. A non-overlapping subsequence time series clusteringprocedure with a sliding window and a lower-bound of the Dynamic Time Warpingdistance are addressed when applying the Fuzzy C-Means algorithm. Afterwards,the subsequence time series fuzzy transition function is learned by neural mapping,consisting of deriving, for each subsequence time series, the degrees to which it
belongs to the c cluster prototypes, when the pc membership degrees of theprevious p subsequences are presented as inputs to the neural network. Finally, thisfuzzy transition function is applied to forecasting one-subsequence-ahead timeseries, as a weighted mean of the c cluster prototypes to which it belongs, and theBET index data are used for testing.
In what follows, we will contrast the two computational intelligence basedapproaches.
2. ONE-VALUE-AHEAD FORECASTING OF BUCHAREST
STOCK EXCHANGE BET INDEX, BASED ON NEURAL
NETWORK AR/ARX MODELS
2.1. Nonlinear Neural Network ARX (NNARX) Architecture
The most wide-spread feedforward NN that has been proved to beuniversal function approximator (Hornik, Stinchcombe, & White, 1989) is the
8/4/2019 Robustly Forecasting the Bucharest Stock
4/21
Vasile Georgescu
multilayer perceptron (MLP), with hidden units having sigmoidal transferfunctions. The class of MLP-networks considered here is furthermore confined tothose having one hidden layer. Hyperbolic tangent activation functions are usually preferred for hidden nodes and linear activation functions for output nodes. This
architecture allows the MLP to approximate any computable function on a compactset arbitrarily closely by
nivwzwvFzFwvy ij
m
jj
q
j
ijiMLP
i ,,1,)(),( 0011
Kl
l
l=
+
+==
==
(1)
where j is a sigmoidal function, iF is a linear function, q is the number of
hidden units, ijv and ljw are weights, 0iv and 0jw are biases (thresholds).MLPs offer a straightforward extension to the classical way of modeling
time series. Namely, they can use a specific mechanism to deal with temporalinformation (layer delay without feedback or time window) and can thus extend thelinear autoregressive model with exogenous variables (ARX) to the nonlinear ARXform:
( ) tnbnktnktnattMLP
t XXyyFy += ,,,,,1 KK (2)
whereMLP
F is a non-linear function, na is the number of past outputs, nb is the
number of past inputs and nk is the time delay.
Nonlinear neural network ARX (NNARX) models are potentially more
powerful than linear ones in that they can model more complex underlyingcharacteristics of time series and theoretically do not have to assume stationarity.
Feedforward networks are well suited only for NNARX models, whichallow a series-parallel architecture that has a predictor without feedback. In a
NNARX model, ty is a function on its lagged values jty and the lagged values of
some exogenous variables. In principle, the output of the NARX network can be
considered as an estimate ty of the output ty of some nonlinear dynamic system
and thus it should be fed back in the next stage to the input of the feedforward
neural network. However, because the true previous outputs jty are available at
time t during the training of the network, a series-parallel architecture can be
created, in which the true outputs jty are used instead of feeding back the
estimated outputs jty , as shown in Figure 1. This has two advantages. The first is
that the input to the feedforward network is more accurate. The second is that theresulting network has a purely feedforward architecture, and static backpropagation
can be used for training.For other types of models used in time series processing that involve
predictors with feedback, one can resort to recurrent networks, where futurenetwork inputs will depend on present and past network outputs.
8/4/2019 Robustly Forecasting the Bucharest Stock
5/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
Figure 1. A purely feedforward architecture of the NNARX(na, nb, nk) neuralnet
2.2. NNARX based One-Value-Ahead Forecasting of BET Index
One of the major issues in neural network forecasting is how much data arenecessary for neural networks to capture the dynamic nature of the underlyingprocess in a time series. There are two facets to this issue:
(i) How many lagged observations should be used as inputs to the neuralnetwork (or, equivalently, how many input nodes should the neural network have)?Each actual historical value depends upon a number of preceding values(endogenous and exogenous lagged observations).
(ii) How many historical values should be used in training the neuralnetwork? Although larger sample size, in the form of a longer time series, isusually recommended in model development, empirical results suggest that longer
time series do not always yield models that provide the best forecastingperformance. Using a smaller sample of time series or data close in time to the out-of-sample can sometimes produce more accurate neural networks.
Actually, we focus our effort on forecasting the Bucharest Stock ExchangeBET index. Some of its characteristics, such as the synchronization with well
established indices from global markets (Dow Jones, FTSE100-London, Nikkei-Tokyo) provide us with further guidance for choosing the sample size. The BETindex has been relatively recently introduced and its first period of about threeyears is characterized by a relatively flat evolution and a lack of synchronization
with major indices. After that, it starts to synchronize well with the global market(Figure 2).
On the other hand, commonly employed neural network design heuristic is
to capture the dynamics of a stock market index through a time series model, whichrepresents the movement of an endogenous stochastic variable only in terms of itslagged values. The nonlinear neural network autoregression (NNAR) model is atypical example. As an alternative, a NNARX model may be considered, whereone or more exogenous variables are also included. Their role is to capture
8/4/2019 Robustly Forecasting the Bucharest Stock
6/21
Vasile Georgescu
extraneous influences. Indeed, the ever more global economy causes interactioneffects among the various economies around the world. From this perspective,market indices are classified as either local and emerging or global and mature.Large global markets will have dynamically changing effects on an emerging
market. Results indicate that, as global information is introduced, the forecasting performance of the neural network models for the emerging market indeximproves. Thus, especially with emerging markets, neural network models mustincorporate global economic information into the input vector variable set toachieve optimal performance when forecasting financial time series. All indices inFigure 2 reveal non-stationary patterns. This is confirmed by the empiricaldistribution of BET index, which is asymmetric and exhibits rather a power law
pattern than a Gaussian pattern (see Figure 3).
Figure 2. Parallel evolution of BET, Dow Jones and FTSE100 Indices. Chosing
training and validation data samples
Figure 3. The empirical distribution of BET index, when comparing with the
normal standard distribution
8/4/2019 Robustly Forecasting the Bucharest Stock
7/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
We have tested a large set of model specification structures and theirrelated neural network architectures in order to choose and validate the bestsettings for our neural-network approach to BET index forecasting. We started with
the simplest NNAR( an ) model, without exogenous variables, and then we
successively introduced one or two exogenous variables (Dow Jones and FTSE100,respectively), thus passing from a purely time series model (NNAR) to a dynamic
system representation, i.e. NNARX( an , bn , kn ).
It is worth to mention that the well known order specification tests, i.e.AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion) andMDL (Minimum Description Length criterion) respectively, may produce
inconsistent results when are used for the selection of the appropriate order na , nb
and nk of a nonlinear model such as NNARX( nknbna ,, ). However, theirindicative guess for the case of linear models can be the starting point in aspecification procedure based on numeric experiments. For example, the orders can be chosen such that the Akaike's Information Criterion (AIC)
( ))/21(log NdVAIC += is minimized, where V is the loss function, i.e.,
( )= N ttNV 1 )),())(,((1det , dis the length of the parameter vector , andNis the number of data points used for the estimation. The Akaike's InformationCriterion suggests the following linear AR/ARX models:
ttt yayAR =+ 11:)1(
ttttttt XbXbXbXbyayARX ++++=+ 1
64
1
53
1
42
1
3111:)3,4,1( tttttttt XbXbXbXbXbyayARX +++++=+
2522
2421
1313
1212
111111:])4,1[],2,3[,1(
where y stays for BET, 1X stays for DJ and 2X stays for FTSE100.
For the nonlinear models NNAR( na ) and NNARX( nknbna ,, ) we chosen
to determine the orders by numerical experiments, trying for orders in the range
]5,1[ .
Once the model structure was specified and the neural network architecturewas selected, the neural network BET index forecasting model was trained over thetraining data and then applied against the validation data. We adopted the standardapproach for training and testing, that is, to evaluate a model by testing its
performance on a validation set consisting of out-of-sample data (Figure 2).Financial time series forecasting neural networks may be evaluated in
many different ways. However, it is commonly admitted that the forecasting performance must result in the form of a tradeoff between statistical accuracymeasurement and trading strategies profitability. The problem cannot be reduced
to how significant (statistically) is the relationship between error measures (used totrain a forecasting NN). It is as important as the trading profitability (based on thatforecast). In other words, the goal is to improve both the predictability and the profitability. Commonly, the neural network forecasting models are evaluated
8/4/2019 Robustly Forecasting the Bucharest Stock
8/21
Vasile Georgescu
through the MSE (Mean Square Error). But statistical accuracy is not always agood warranty for profitability. It is to be noticed that, even very small errors thatare made in the wrong direction of change may cause significant negative returns,resulting in a capital loss for an investor following the recommendation of the
neural network model. Hence, predicting the correct direction of change to anexisting market index value is the primary criterion for financial forecastingmodels. Instead of measuring the mean standard error of a forecast, a better methodfor measuring the performance of neural networks is to analyze the proportion of predictions that correctly indicate the direction of change to the current marketvalue. Therefore, all of the reported results in this paper will be based on both theMSE and the accuracy or percentage of correct predictions with regard to direction
of change.
The neural network results will be compared to a standard benchmark, therandom walk forecast. The random walk hypothesis assumes that, since tomorrowcannot be predicted, the best guess we can make is that tomorrows price will bethe same as todays price. If neural networks outperform the random walk, then it
can be concluded that there is a nonlinear function or process inherent in the datatested. The implication is that a short-term (typically, one-step-ahead) forecast can
be successfully generated.We first assumed the hypothesis of an insulated stock market and
constructed a neural network forecasting model that shows only the endogenous
dynamics of BET index, relating its current value only to its lagged historicalvalues. This leads to a purely time series based NNAR (nonlinear neural network
AR) model. For each order na , with na from 1 to 5 , we trained the NNAR( na )
architecture and estimated the model. The forecasting model has been evaluatedwith both the MSE (Mean Square Error) and the trading performance (percentageof correct predictions with regard to direction of change). The table below showsthe results.
na=1 na=2 na=3 na=4 na=5
Performance 0.496855 0.507886 0.493671 0.517460 0.490446
MSE 0.000085 0.000087 0.000088 0.000088 0.000089
Although the statistical accuracy measured by MSE was optimal for the NNAR(1) model, the maximum trading performance has been obtained for the
NNAR(4) model. The neural network architecture associated with this model isdepicted in Figure 4.
The forecasting performance of this model is about 51.75%, which is closeto the random walk performance. Based upon such a modest performance, thehypothesis of possible market inefficiencies appears to be rejected. The modelsupports rather a random walk interpretation of BET market index and thus does
not enables a forecasting advantage through nonrandom walk or predictablebehavior.
8/4/2019 Robustly Forecasting the Bucharest Stock
9/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
Figure 4. The MLP architecture of the NNAR(4) model
However, including additional global knowledge by introducing in theBET index equation some other well established market indices such as DJ and/orFTSE100 proved to ameliorate the overall forecasting performance of the BETindex neural network forecasting model. Thus, modeling the BET index as a NNARX-type nonlinear dynamic system with at least one exogenous variableappears to achieve a statistically significant improvement over the random walkbenchmark dynamic behavior and to provide exploitable market inefficiency.
The DJ index was the first exogenous variable we introduced in the model.Now we have to chose between a large number of possible combinations, allowing
na , nb and nk to vary into the range ]5,1[ . The results are partially shown (only
for nk=1 and nk=2) in the following tables.
For nk=1:Performnace nb=1 nb=2 nb=3 nb=4 nb=5
na=1 0.484277 0.552050 0.566456 0.542857 0.554140
na=2 0.476341 0.548896 0.522152 0.526984 0.525478
na=3 0.481013 0.528481 0.506329 0.517460 0.522293
na=4 0.473016 0.517460 0.504762 0.526984 0.535032
na=5 0.461783 0.519108 0.506369 0.509554 0.528662
For nk=2:
Performance nb=1 nb=2 nb=3 nb=4 nb=5
na=1 0.533123 0.547468 0.536508 0.525478 0.495208
na=2 0.517350 0.528481 0.507937 0.531847 0.523962na=3 0.531646 0.522152 0.495238 0.500000 0.488818
na=4 0.495238 0.520635 0.533333 0.500000 0.463259
na=5 0.490446 0.484076 0.535032 0.519108 0.507987
The selected model is NNARX(1, 3, 1), with a forecasting performance
that increase from 51.75% to 56.65%. The neural network architecture associatedwith this model is depicted in Figure 5.
8/4/2019 Robustly Forecasting the Bucharest Stock
10/21
Vasile Georgescu
Figure 5. The MLP architecture of the NNARX(1, 3, 1) model
Finally, both the DJ index and the FTSE100 index are included asexogenous variable in the model. The structure specification consists of choosing
orders and time delays for which the best forecasting performance is reached in the
class of NNARX( na ,[ 1nb , 2nb ],[ 1nk , 2nk ]) models. The results are partially given
below.
For 2nb =4, 1nk =1, 2nk =4:
Performance nb1=1 nb1=2 nb1=3 nb1=4 nb1=5
na=1 0.426282 0.483974 0.522436 0.583333 0.557692na=2 0.442308 0.544872 0.564103 0.512821 0.551282na=3 0.522436 0.544872 0.580769 0.512821 0.554487
na=4 0.458333 0.528846 0.608974 0.544872 0.535256na=5 0.487179 0.525641 0.563590 0.548077 0.541667
The selected model is NNARX (4, [3, 4], [1, 4]), which increases theforecasting performance to over 60%. Note that the exogenous variable FTSE100
has a time delay of 2nk =4. The neural network architecture associated with this
model and its learning capability is displayed in Figures 6 and 7.
A series-parallel architecture of the NNARX (4, [3,4], [1,4]) neural network
BET (t-1)
BET (t-2)
BET (t-3)
BET (t-4)DJ (t-1)
DJ (t-2)DJ (t-3)
FTSE100 (t-4)
FTSE100 (t-5)
FTSE100 (t-6)FTSE100 (t-7)
BEThat
(t)
Input Layer Hidden Layer Output Layer
Figure 6. The MLP architecture of the NNARX (4,[3, 4],[1, 4]) model
8/4/2019 Robustly Forecasting the Bucharest Stock
11/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
Figure 7. Learning capability of the selected neural network architecture
The plot comparing predictions to actual measurements for the validationdataset (out-of-sample predictions) confirms by visual inspection that forecasts arereasonably accurate, but are sometimes out-of-phase, which means over-anticipation (Figure 8).
Figure 8. Comparing out-of-sample predictions to actual measurements
The prediction errors for the validation dataset are depicted in Figure 9.
Figure 9. Out-of-sample prediction errors
8/4/2019 Robustly Forecasting the Bucharest Stock
12/21
Vasile Georgescu
Just as expected, the in-sample forecasts (i.e., forecasts derived from thetraining dataset) are much more accurate than the out-of-sample ones (Figure 10).However, in-sample data cannot serve for performance validation because theyusually produce overfitted results.
Figure 10. Comparing in-sample (i.e. based on the training dataset)
predictions to actual measurements
In conclusion, the highest forecasting performance was achieved by a NNARX(2,3,2,1,1) model, which utilizes input values from both the DJ and theFTSE100 indices (recall that the performance measures the percentage of neuralnetwork forecasts that are in the same direction, up or down, as the actual index forthe forecast period). With this model, a significant improvement over the randomwalk benchmark dynamic behavior has been obtained and exploitable market
inefficiency has been proved for the Bucharest Stock Exchange BET index, provided that exogenous variables with global impact are involved in the neuralnetwork forecasting model.
3. A NOVEL COMPUTATIONAL INTELLIGENCE BASED
FORECASTING FRAMEWORK
3.1. Time Series Preprocessing
This stage consists of de-noising data by wavelet decomposition and some
other transformations that rely heavily on the selection of a distance measure forclustering.
The Discrete Wavelet Transform (DWT, [10]) uses scaled and shiftedversions of a mother wavelet function, usually with compact support, to form eitheran orthonormal basis (Haar wavelet, Daubechies) or a bi-orthonormal basis(Symlets, Coiflets). Wavelets allow cutting up data into different frequencycomponents (called approximations and details), and then studying each
component with a resolution matched to its scale. They can help de-noiseinherently noisy data such as financial time series through wavelet shrinkage and
8/4/2019 Robustly Forecasting the Bucharest Stock
13/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
thresholding methods, developed by David Donoho ([3]). The idea is to set to zeroall wavelet coefficients corresponding to details in the data set that are less than a particular threshold. These coefficients are used in an inverse wavelettransformation to reconstruct the data set. An important advantage is that the de-
noising is carried out without smoothing out the sharp structures and thus can helpto increase both the clustering accuracy and predictive performance.
Care has to be taken in choosing suitable transformations such that the timeseries distance measure chosen in the clustering stage is meaningful to theapplication. Normalization of data is common practice when using Fuzzy C-Means,which means applying scaling and vertical translation to the time series as a whole.Moreover, as we already mentioned, the absolute value of a stock price is not as
interesting as the shape of up and down movements. Thus, for allowing stock
prices comparisons subsequence by subsequence, a local translation is alsonecessary, in such a way to have each subsequence starting from zero. A subset of2048 daily closing BET index values drown from Bucharest Stock ExchangeMarket data, as well as the normalized and de-noised data are shown in Figure 11,
were a level 5 decomposition with Sym8 wavelets and a fixed form softthresholding were used.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
2000
4000
6000
8000
10000
Original time series
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
1
2
Normalized and de-noised time series
Figure 11. A normalized and de-noised data subset, drawn from daily closing
BET index
3.2. Subsequence Time Series Fuzzy Clustering
The idea in subsequence time series (STS) clustering is as follows. Just a
single long time series is given at the start of the clustering process, from which weextract short series with a sliding window. The resulting set of subsequences are
8/4/2019 Robustly Forecasting the Bucharest Stock
14/21
Vasile Georgescu
then clustered, such that each time series is allowed to belong to each cluster to acertain degree, because of the fuzzy nature of the fuzzy c-means algorithm we use.The window width and the time delay between consecutive windows are two keychoices. The window width depends on the application; it could be some larger
time unit (e.g., 10 days for time series sampled as daily closing BET stock index, inour application). Overlapping or non-overlapping windows can be used. If thedelay is equal to the window width, the problem is essentially converted to non-overlapping subsequence time series clustering. We will follow this approach,being motivated by the Keoghs criticism presented in [8], where using overlappingwindows has been shown to produce meaningless results, due to a surprisinganomaly: cluster centers obtained using STS clustering closely resemble sine
waves, irrespective of the nature of original time series itself, being caused by the
superposition of slightly shifted subsequences. Using larger time delays for placingthe windows does not really solve the problem as long as there is some overlap.Also, the less overlap, the more problematic the choice of the offsets becomes.
Since clustering relies strongly on a good choice of the dissimilarity
measure, this leads to adopting an appropriate distance, depending on the verynature of the subsequence time series.
Let 1,, += wmm yyS K be a subsequence with length w of time series
nyyY ,,1 K= , where 11 + wnm . Subsequences will be represented as
vectors in a w-dimensional vector space. For relatively short time series, shape-
based distances, such as pL norms, are commonly used to compare their overall
appearance. The Euclidean distance (2
L ) is the most widely used shape-based
distance. Other pL norms can be used as well, such as Manhattan ( 1L ) and
Maximum ( L ), putting different emphasis on large deviations.
There are several pitfalls when using an pL distance on time series: it does
not allow for different baselines in the time sequences; it is very sensitive to phase
shifts in time; it does not allow for acceleration and deceleration along the time
axis (time warping). Another problem with pL distances of time series is when
scaling and translation of the amplitudes or the time axis are considered, or whenoutliers and noisy regions are present.
A number of non-metric distance measures have been defined to overcomesome of these problems. Small distortions of the time axis are commonly addressed
with non-uniform time warping, more precisely with Dynamic Time Warping(DTW, [7]). The DTW distance is an extensively used technique in speechrecognition and allows warping of the time axes (accelerationdeceleration ofsignals along the time dimension) in order to align the shapes of the two time series better. The two series can also be of different lengths. The optimal alignment is
found by calculating the shortest warping path in the matrix of distances betweenall pairs of time points under several constraints (boundary conditions, continuity,
monotonicity).
8/4/2019 Robustly Forecasting the Bucharest Stock
15/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
The warping path is also constrained in a global sense by limiting how farit may stray from the diagonal. The subset of the matrix that the warping path isallowed to visit is called the warping window. The two most common constraintsin the literature are the Sakoe-Chiba band and the Itakura parallelogram. We can
view a global or local constraint as constraining the indices of the warping path
kk jiw ),(= , such that rjirj + , where r is a term defining the allowed
range of warping, for a given point in a sequence. In the case of the Sakoe-Chiba
band (see Figure 12), ris independent of i ; for the Itakura parallelogram, ris a
function of i .
(a) (b)
Figure 12. (a) Aligning two time sequences using DTW. (b) Optimal warping
path with the Sakoe-Chiba band as global constraints.
DTW is a much more robust distance measure for time series than 2L ,
allowing similar shapes to match even if they are out of phase in the time axis.
Unfortunately, however, DWT is calculated using dynamic programming with time
complexity )( 2nO . Recent approaches focus more on approximating the DTW
distance by bounding it from below. For example, a novel, linear time (i.e., with
complexity reduced to )(nO ), lower bound of the DTW distance, was proposed in
[9]. The intuition behind the approach is the construction of a special envelopearound the query. It can be shown that the Euclidean distance between a potentialmatch and the nearest orthogonal point on the envelope lower bounds the DTWdistance. To index this representation, an approximate bounding envelope iscreated.
Let },,{ 1 nqqQ K= and },,,{ 1 mccC K= be two subsequences and
kk jiw ),(= be the warping path, such that rjirj + , where r is a term
defining the range of warping for a given point in a sequence. The term rcan be
used to define two new sequences, L and U, where ):min( ririi qqL += ,
):max( ririi qqU += , with L and U standing forLowerand Upper, respectively.
An obvious but important property ofL and Uis the following:ii
LqUi i , .
C
8/4/2019 Robustly Forecasting the Bucharest Stock
16/21
Vasile Georgescu
Given L and U, a lower bounding measure for DTW can now be defined (seeFigure 13):
LB-Keogh(Q, C)
( )
( )
=
=
=
otherwise
UcifUc
UcifUc
ii
n
i ii
ii
n
i ii
01
2
1
2
.(3)
0 5 10 15 20 25 30 35 40-1.5
-1
-0.5
0
0.5
1
1.5
U
Q
C
L
Figure 13. The lower bounding function LB-Keogh(Q,C). The original
sequence Q is enclosed in the bounding envelope ofUandL
We are now going to generalize the fuzzy c-means algorithm tosubsequence time series clustering. In this particular context, the entities to be
clustered, denoted by kx , and the cluster prototypes (centroids), denoted by iv , are
both set-defined objects, i.e. subsequence time series. The centroids are computed
as weighted means, where the weights, denoted by iku , are the fuzzy membership
degrees to which each subsequence belongs to a cluster. Both the DTW and LB-
Keogh distances outperform 2L and thus qualify better to be used with the fuzzy c-
means algorithm. However, the LB-Keoghs lower bound of DTW distance has been preferred, due to its linear time complexity. Figure 14 plots the clustercentroids (prototypes) and the subsequence time series grouped around eachcentroid.
3.3. Estimation of the Fuzzy Transition Function between Clusters by
Neural Mapping
At this stage, a fuzzy transition function between clusters must be learned,
which is a nonlinear vector function mapping a number of -c dimensional
membership degree vectors )( 1+jtSTS , pj ,,1 K= , into a -c dimensional
8/4/2019 Robustly Forecasting the Bucharest Stock
17/21
8/4/2019 Robustly Forecasting the Bucharest Stock
18/21
Vasile Georgescu
0 50 100 150 2000
0.5
1
Membership degrees to cluster 1
0 50 100 150 2000
0.5
1
Predicted membership degrees to cluster 1
0 50 100 150 2000
0.5
1Membership degrees to cluster 2
0 50 100 150 2000
0.5
1Predicted membership degrees to cluster 2
0 50 100 150 200
0
0.5
1Membership degrees to cluster 3
0 50 100 150 200
0
0.5
1Predicted membership degrees to cluster 3
0 50 100 150 2000
0.5
1Membership degrees to cluster 4
0 50 100 150 2000
0.5
1Predicted membership degrees to cluster 4
0 50 100 150 2000
0.5
1Membership degrees to cluster 5
0 50 100 150 2000
0.5
1Predicted membership degrees to cluster 5
Figure 15. Accurate neural mapping: actual and predicted membership
degrees to which each of the 256 subsequence time series belongs to one of the
5 clusters
0 50 100 150 200-1
0
1Prediction errors for cluster 1
-0.5 0 0.50
200
400Histogram of errors for cluster 1
0 50 100 150 200-1
0
1Prediction errors for cluster 2
-0.5 0 0.50
200
400Histogram of errors for cluster 2
0 50 100 150 200-1
0
1Prediction errors for cluster 3
-0.5 0 0.50
200
400Histogram of errors for cluster 3
0 50 100 150 200-1
01
Prediction errors for cluster 4
-0.5 0 0.50
200400
Histogram of errors for cluster 4
0 50 100 150 200-1
0
1Prediction errors for cluster 5
-0.5 0 0.50
200
400Histogram of errors for cluster 5
Figure 16. Prediction errors and their histogram for each of the 5 clusters
8/4/2019 Robustly Forecasting the Bucharest Stock
19/21
Robustly Forecasting the Bucharest Stock Exchange BET Index .
1 5 10-0.3375
0.0034
1 5 100
0.1264
1 5 10-0.0704
0
1 5 10-0.011
0.0155
1 5 10-0.0269
0.00792
1 5 100
0.1069
1 5 10-0.0806
0
1 5 100
0.0717
1 5 10-0.0359
0.0346
1 5 100
0.0727
1 5 10-0.017
0.0205
1 5 10-0.0443
0
1 5 10-0.0528
0.006
1 5 10-0.0155
0.0323
1 5 100
0.0306
Observed out-of-sample sequence Predicted out-of-sample sequence
Figure 17. One-subsequence-ahead forecasts of 15 out-of-sample subsequences
The two forecasting approaches can now be easily contrasted. As oppositeto the common neural network forecasting approach that focuses on price levelforecasts attempting overperform (more or less) the random walk benchmarkaccuracy, our computational intelligence is intended to reliably exploit the shape of
middle-term up and down price movements. The 15 out-of-sample subsequenceforecasts shown in Figure 17 cover 15*10=150 transaction days and are proved to
be considerably robust in filtering the short-term randomness and in predicting theright direction of change.
6. CONCLUSION
Predicting price levels is an intriguing, challenging, and admittedly riskyendeavor. Technical analysis uses trend following strategies to forecast future pricemovements and to infer trading decision rules, based on the assertion that pricechanges have inertia. However, experimental works show little evidence ofpredictability, with accuracy rates that slightly exceed the random walk benchmarkperformance.
The first approach in this paper served essentially to test and reject thehypothesis of an insulated emerging stock market. We presumed that among otheremerging capital markets, Bucharest Stock Exchange is tremendously affected byglobal interaction effects and extraneous influences from mature markets. In an
attempt to validate this presumption, we compared the statistical accuracy and thetrading performance of several neural network forecasting models for BET indexagainst each other and against the random walk benchmark performance. NNAR(na) models were used to capture only the endogenous dynamics of BETindex, and NNARX(na, nb, nk) models to additionally capture exogenousinfluences induced by global market indices (Dow Jones and/or FTSE100). To
conclude, significantly more accurate results have been obtained through theinclusion of exogenous variables from well established global markets.
8/4/2019 Robustly Forecasting the Bucharest Stock
20/21
Vasile Georgescu
However, because of the stock market prices short-term randomness,next-day forecasts cannot be efficiently exploited for making consistent tradingstrategies. The second approach introduced a novel computational intelligenceframework allowing one-subsequence-ahead instead of one-value-ahead forecasts.
Experimental evidence with ten-day-length-subsequence-ahead forecasts for BETindex proved to be significantly more robust than one-day-value-ahead forecasts inshowing the right direction of change.
REFERENCES
[1] Chen, S-H., Jain L., Tai C.-C. (Eds.) (2006), Computational Economics:A Perspective from Computational Intelligence, Idea Group;
[2] Chen, S-H., Wang P.P., Kuo T.-W. (Eds.)(2007), ComputationalIntelligence in Economics and Finance, Springer-Verlag;
[3] Donoho, D. (1993), Nonlinear Wavelet Methods for Recovery of Signals,Densities, and Spectra from Indirect and Noisy Data. In: DifferentPerspectives on Wavelets, Proceeding of Symposia in Applied Mathematics,Vol 47, I. Daubechies (eds). Amer. Math. Soc., Providence, R.I., pp. 173205;
[4] Georgescu, V. (2009), Generalizations of Fuzzy C-Means Algorithm toGranular Feature Spaces, based on Underlying Fuzzy Metrics: Issues and
Related Works. In: 13th IFSA World Congress and 6th Conference of
EUSFLAT, pp. 1791--1796, Lisbon, Portugal;[5] Georgescu, V. (2009), A Time Series Knowledge Mining FrameworkExploiting the Synergy between Subsequence Clustering and Predictive
Markovian Models. Fuzzy Economic Review, vol.XIV, No.1, pp. 41--66, 20;
[6] Georgescu, V.(2005), Applied Econometrics: Time Series Analysis (Amaster course in English), Universitaria, Craiova;
[7] Keogh, E., Pazzani, M.J.(1999), Scaling up Dynamic Time Warping toMassive Datasets. In: Zytkow, J.M., Rauch, J. (eds), 3rd EuropeanConference on Principles of Data Mining and Knowledge Discovery
(PKDD'99), pp. 1--11. Springer;
[8] Keogh, E., Lin, J., Truppel, W. (2003), Clustering of Time SeriesSubsequences is Meaningless: Implications for Previous and Future
Research. In: 3rd IEEE International Conference on Data Mining, pp. 115122;
[9] Keogh, E., Ratanamahatana, C. A. (2005), Exact Indexing of DynamicTime Warping. Knowledge and Information Systems, 7, pp. 358386;
[10]Mallat, S. G., Peyr, G. (2009),A Wavelet Tour of Signal Processing: TheSparse Way. Academic Press, 3
rdEdition.
8/4/2019 Robustly Forecasting the Bucharest Stock
21/21
Copyright of Economic Computation & Economic Cybernetics Studies & Research is the property of Economic
Computation & Economic Cybernetics Studies & Research and its content may not be copied or emailed to
multiple sites or posted to a listserv without the copyright holder's express written permission. However, users
may print, download, or email articles for individual use.
top related