International Journal of Neural Systems, Vol. 8, No. 4 (August, 1997) 399–415 Special Issue on Data Mining in Finance c World Scientific Publishing Company A CONSTRAINED NEURAL NETWORK KALMAN FILTER FOR PRICE ESTIMATION IN HIGH FREQUENCY FINANCIAL DATA PETER J. BOLLAND * and JEROME T. CONNOR † London Business School, Department of Decision Science, Sussex Place, Regents Park, London NW1 4SA, UK In this paper we present a neural network extended Kalman filter for modeling noisy financial time series. The neural network is employed to estimate the nonlinear dynamics of the extended Kalman filter. Conditions for the neural network weight matrix are provided to guarantee the stability of the filter. The extended Kalman filter presented is designed to filter three types of noise commonly observed in financial data: process noise, measurement noise, and arrival noise. The erratic arrival of data (arrival noise) results in the neural network predictions being iterated into the future. Constraining the neural network to have a fixed point at the origin produces better iterated predictions and more stable results. The performance of constrained and unconstrained neural networks within the extended Kalman filter is demonstrated on “Quote” tick data from the $/DM exchange rate (1993–1995). 1. Introduction The study of financial tick data (trade data) is be- coming increasingly important as the financial in- dustry trades on shorter and shorter time scales. Tick data has many problematic features, it is often heavy tailed (Dacorogna 1995, Butlin and Connor 1996), it is prone to data corruption and outliers (Chung 1991), and it’s variance is heteroscedastic with a seasonal pattern within each day (Dacorogna 1995). However the most serious problem with ap- plying conventional methodologies to tick data is it’s erratic arrival. The focus of this study is the predic- tion of erratic time series with neural networks. The issues of robust prediction and non-stationary vari- ance are explored in Bolland and Connor (1996a) and Bolland and Connor (1996b). There are three distinct types of noise found in real world time series such as financial tick data: Process noise represents the shocks that drive the dynamics of the stochastic process. The distri- bution of the process/system noise is generally assumed to be Gaussian. For financial data the noise distributions can often be heavy tailed. Measurement noise is the noise encountered when observing and measuring the time series. The measurement error is usually assumed to be Gaussian. The measurement of financial data is often corrupted by gross outliers. Arrival noise reflects uncertainty concerning whether an observation will occur at the next time step. Foreign exchange quote data is strongly effected by erratic data arrival. At times the quotes are missing for forty seconds, at other times several ticks are contemporaneously aggregated. These three types of noise have been widely stud- ied in the engineering field for the case of a known deterministic system. The Kalman filter was in- vented to estimate the state vector of a linear deter- ministic system in the presence of the process, mea- surement, and arrival noise. The Kalman filter has been applied in the field of econometrics for the case * E-mail: [email protected]† E-mail: [email protected]399
17
Embed
A CONSTRAINED NEURAL NETWORK KALMAN FILTER FOR …smartquant.com/references/NeuralNetworks/neural32.pdf · tjt−1 is the innovations process and N tjt−1 is the covariance of that
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
A CONSTRAINED NEURAL NETWORK KALMAN FILTERFOR PRICE ESTIMATION IN
HIGH FREQUENCY FINANCIAL DATA
PETER J. BOLLAND∗ and JEROME T. CONNOR†
London Business School, Department of Decision Science,Sussex Place, Regents Park, London NW1 4SA, UK
In this paper we present a neural network extended Kalman filter for modeling noisy financial timeseries. The neural network is employed to estimate the nonlinear dynamics of the extended Kalmanfilter. Conditions for the neural network weight matrix are provided to guarantee the stability of thefilter. The extended Kalman filter presented is designed to filter three types of noise commonly observedin financial data: process noise, measurement noise, and arrival noise. The erratic arrival of data (arrivalnoise) results in the neural network predictions being iterated into the future. Constraining the neuralnetwork to have a fixed point at the origin produces better iterated predictions and more stable results.The performance of constrained and unconstrained neural networks within the extended Kalman filter isdemonstrated on “Quote” tick data from the $/DM exchange rate (1993–1995).
1. Introduction
The study of financial tick data (trade data) is be-
coming increasingly important as the financial in-
dustry trades on shorter and shorter time scales.
Tick data has many problematic features, it is often
heavy tailed (Dacorogna 1995, Butlin and Connor
1996), it is prone to data corruption and outliers
(Chung 1991), and it’s variance is heteroscedastic
with a seasonal pattern within each day (Dacorogna
1995). However the most serious problem with ap-
plying conventional methodologies to tick data is it’s
erratic arrival. The focus of this study is the predic-
tion of erratic time series with neural networks. The
issues of robust prediction and non-stationary vari-
ance are explored in Bolland and Connor (1996a) and
Bolland and Connor (1996b).
There are three distinct types of noise found in
real world time series such as financial tick data:
Process noise represents the shocks that drive the
dynamics of the stochastic process. The distri-
bution of the process/system noise is generally
assumed to be Gaussian. For financial data the
noise distributions can often be heavy tailed.
Measurement noise is the noise encountered when
observing and measuring the time series. The
measurement error is usually assumed to be
Gaussian. The measurement of financial data is
often corrupted by gross outliers.
Arrival noise reflects uncertainty concerning
whether an observation will occur at the next
time step. Foreign exchange quote data is
strongly effected by erratic data arrival. At
times the quotes are missing for forty seconds, at
other times several ticks are contemporaneously
aggregated.
These three types of noise have been widely stud-
ied in the engineering field for the case of a known
deterministic system. The Kalman filter was in-
vented to estimate the state vector of a linear deter-
ministic system in the presence of the process, mea-
surement, and arrival noise. The Kalman filter has
been applied in the field of econometrics for the case
ral network specification was used, with a 4 hidden
unit feed forward network using sigmoidal activation
functions.
An estimation maximization (EM) algorithm is
employed at the center of a robust estimation pro-
cedure based on filtered data (for full details see
Bolland and Connor 1996). The EM algorithm, see
Dempster, Laird, and Rubin (1977), is the standard
approach when estimating model parameters with
missing data. The EM algorithm has been used in
the neural network community before, see for ex-
ample Jordan and Jacobs (1993) or Connor, Mar-
tin, and Atlas (1994). During the estimation step,
the missing data, namely the xt, εt, and ηt of (1)
and (2) must be estimated. This amounts to esti-
mating parameters of the state update function f
and the noise variance matrices Qt and Rt. With
the estimated missing data assumed to be true, the
parameters are then chosen by way of maximizing
the likelihood. This procedure is iterative with new
parameter estimates giving rise to new estimates of
missing data which in turn give rise to newer pa-
rameter estimates. The iterative estimation proce-
dure was initialized by constructing a contiguous
data set (no arrival noise) and estimating a linear
auto-regressive model. The variances of the distur-
bance terms are non-stationary. To remove some of
this non-stationarity the intra day seasonal pattern
of the variances were estimated (Bolland and Connor
1996). The parameters of the state update function
were assumed to be stationary across the length of
the data set.
Table 1. Non-iterated forecasts.
Fig. 2. Estimated function.
A Constrained Neural Network Kalman Filter . . . 411
Fig. 3. Estimated function at origin.
Fig. 4. Filtered tick data.
Table 1 gives the performance of the two modelsfor non-iterated forecast. The constraints on the net-work are not detrimental to the overall performance,with the percentage variance explained (r-squared)and the correlation being very similar.
Figure 2 shows the fitted function of a simpleNAR(1) model for the constrained neural network
and the unconstrained neural network. The qualita-tive difference in the models estimated function areonly slight.
Figure 3 shows the estimated function around theorigin. At the origin the constrained network has abias as it has been restrained from learning the meanof the estimation set. Although this bias is only very
412 P. J. Bolland & J. T. Connor
Fig. 5. Stable points of network.
Table 2. Test set performance.
Fig. 6. Iterated forecast error.
A Constrained Neural Network Kalman Filter . . . 413
small (for linear regression the bias is 5.12 × 10−7
with a t-statistic of 0.872), its effect large as it iscompounded by iterating the forecast.
The filter produces estimated states (shown inFig. 4) which can be viewed as the “true mid-prices,” the noise due to market friction’s has beenestimated and filtered (bid-ask bounce, price quan-tization, etc.). The iterated forecasts reach thestable point after only small number of iterations(approx. 5).
Figure 5 shows a close up of the iterated forecastsof the two networks. The value of the stable pointin the case of the simple NAR(1) is the final gra-dient of the iterated forecasts. The stable point ofthe constrained neural network is zero, and the sta-ble point of unconstrained is 1.57× 10−6. This is theresult of a small bias in the model. When the fore-cast is iterated this bias is accumulated and thereforethe unconstrained network predictions trend. Forthe constrained network the iterated forecast soonreach the stable point zero, reflecting our prior be-lief in the long term predictability of the series. Themean squared error (MSE) as well as the medianabsolute deviations (MAD) of the constrained andunconstrained networks are given in Table 2, andshown in Fig. 6. As the forecast is iterated the MSEfor the unconstrained grows rapidly. This is due toits trending forecast.
It is clear that the performance is improved byconstraining the neural network. The MSE for theconstrained neural network remains relatively con-stant with prediction. The accuracy of iterated pre-diction should decrease as the forecast is iterated.From Fig. 6 it is clear that the MSE is not increasingwith number of iterations. However, only in periodsof very low trading activity are forecasts iterated for40 time steps also in periods of low trading activitythe variance in the time series is low. So the errorsin these periods are only small even though the timebetween observations can be large.
6. Conclusion and Discussion
Using neural networks within an extended Kalmanfilter is desirable because of the measurement andarrival noise associated with foreign exchange tickdata. The desirability of using a stable system withina Kalman filter was used as an analogy for develop-ing a “stable neural network” for use within an ex-tended Kalman filter. The “stable neural network”was obtained by constraining the neural network tohave a fixed point of zero input gives zero output. Inaddition, the fixed point at zero reflected our belief
that price increments beyond a certain horizon areunknowable and a predicted price increment of zerois best (random walk). This constrained neural net-work is optimized for foreign exchange modeling, forother problems a constrained neural network with afixed point at zero would be undesirable.
The behavior of the neural network within the ex-tended Kalman filter under normal operating condi-tions is roughly the same as the unconstrained neuralnetwork. But in the presence of missing data, the it-erated predictions of the constrained neural networkfar outperformed the unconstrained neural networkin both quality and performance metrics.
References
Y. S. Abu-Mostafa 1990, “Learning from hints in neuralnetworks,” J. Complexity 6, 192-198.
Y. S. Abu-Mostafa 1993, “A method for learning fromhints,” Advances in Neural Information Processing 5,ed. S. J. Hanson (Morgan Kaufmann), pp. 73–80.
Y. S. Abu-Mostafa 1995, “Financial applications of learn-ing from hints,” Advances in Neural Information Pro-cessing 7, eds. J. Tesauro, D. S. Touretzky and T.Leen (Morgan Kaufman), pp. 411–418.
H. Akaike 1975, “Markovian representation of stochasticprocesses by stochastic variables,” SIAM J. Control13, 162–173.
M. Aoki 1987, State Space Modeling of Time Series,Second Edition (Springer–Verlag).
J. S. Baras, A. Bensoussan and M. R. James 1988,“Dynamic observers as asymptotic limits of recur-sive filters: Special cases,” SIAM J. Appl. Math. 48,1147–1158.
E. K. Blum and X. Wang 1992, “Stability of fixed pointsand periodic orbits and bifurcations in analog neuralnetworks,” Neural Networks 5, 577–587.
P. J. Bolland and J. T. Connor 1996, “A robust non-linearmultivariate Kalman filter for arbitrage identificationin high frequency data,” in Neural Networks in Fi-nancial Engineering (Proceedings of the NNCM-95),eds. A-P. N. Refenes, Y. Abu-Mostafa, J. Moody andA. S. Weigend (World Scientific), pp. 122–135.
P. J. Bolland and J. T. Connor 1996b, “Estimationof intra day seasonal variances,” Technical Report,London Business School.
S. J. Butlin and J. T. Connor 1996, “Forecastingforeign exchange rates: Bayesian model compari-son using Gaussian and Laplacian noise models,”in Neural Networks in Financial Engineering (Proc.NNCM-95), eds. A.-P. N. Refenes, Y. Abu-Mostafa, J.Moody and A. Weigend (World Scientific, Singapore),pp. 146–156.
B. P. Carlin, N. G. Polson and D. S. Stoffer 1992, “AMonte Carlo approach to nonnormal and nonlin-ear state-space modeling,” J. Am. Stat. Assoc. 87,493–500.
414 P. J. Bolland & J. T. Connor
P. Y. Chung 1991, “A transactions data test of stockindex futures market efficiency and index arbitrageprofitability,” J. Finance 46, 1791–1809.
M. A. Cohen 1992, “The construction of arbitrary sta-ble dynamics in nonlinear neural networks,” NeuralNetworks 5, 83–103.
J. T. Connor, R. D. Martin and L. E. Atlas 1994,“Recurrent neural networks and robust time se-ries prediction,” IEEE Trans. Neural Networks4, 240–254.
G. Cybenko 1989, “Approximation by superpositions ofa sigmoidal function,” Math. Control, Signals Syst. 2,303–314.
M. M. Dacorogna 1995, “Price behavior and models forhigh frequency data in finance,” Tutorial, NNCM con-ference, London, England, Oct, pp. 11–13.
P. de Jong 1989, “The likelihood for a state space model,”Biometrica 75, 165–169.
A. P. Dempster, N. M. Laird and D. B. Rubin 1977,“Maximum likelihood from incomplete data via theEM algorithm,” J. Royal Stat. Soc. B39, 1–38.
R. F. Engle and M. W. Watson 1987, “The Kalman filter:Applications to forecasting and rational expectationmodels,” in Advances in Econometrics Fifth WorldCongress, Volume I, ed. T. F. Bewley (CambridgeUniversity Press).
E. Ghysels and J. Jasiak 1995, “Stochastic volatility andtime deformation: An application of trading volumeand leverage effects,” Proc. HFDF-I Conf., Zurich,Switzerland, March 29–31, Vol. 1, pp. 1–14.
C. L. Giles, R. D. Griffen and T. Maxwell 1990, “En-coding geometric invariance’s in higher order neuralnetworks,” in Neural Information Processing Systems,ed. D. Z. Anderson (American Institute of Physics),pp. 301–309.
A. V. M. Herz, Z. Li and J. Leo van Hemmen, “Statisticalmechanics of temporal association in neural networkswith delayed interactions,” NIPS, 176–182.
T. W. Hilands and S. C. Thomoploulos 1994, “High-orderfilters for estimation in non-Gaussian noise,” Inf. Sci.80, 149–179.
J. J. Hopfield 1984, “Neurons with graded responsehave collective computational properties like thoseof two-state neurons,” Proc. Natl. Acad. Sci. 81,3088–3092.
P. J. Huber 1980, Robust Statistics (Wiley, New York).A. H. Jazwinshki 1970, Stochastic Processes and Filtering
Theory (Academic Press, New York).L. Jin, P. N. Nikiforuk and M. Gupta 1994, “Absolute
M. I. Jordan and R. A. Jacobs 1992, “Hierarchical mix-tures of experts and the EM algorithm,” Neural Com-put. 4, 448–472.
R. E. Kalman and R. S. Bucy 1961, “New results in lin-ear filtering and prediction theory,” Trans. ASME J.Basic Eng. Series D 83, 95–108.
D. G. Kelly 1990, “Stability in contractive nonlinearneural networks,” IEEE Trans. Biomed. Eng. 37,231–242.
G. Kitagawa 1987, “Non-Gaussian state-space modelingof non-stationary time series,” J. Am. Stat. Assoc.82, 1033–1063.
B. F. La Scala, R. R. Bitmead and M. R. James 1995,“Conditions for stability of the extended kalman fil-ter and their application to the frequency trackingproblem,” Math. Control Signals Syst. 8, 1–26.
Y. Le Cun, B. Boser, J. S. Denker, D. Henderson, R. E.Howard, W. Hubbard and L. D. Jackel 1990, “Hand-written digit recognition with a back-propagation net-work,” in Advances in Neural Information ProcessingSystems 2, ed. D. S. Touretzky (Morgan Kaufmann),pp. 396–404.
T. K. Leen 1995, “From data distributions to reg-ularization in invariant learning,” in Advances inNeural Information Processing 7, eds. J. Tesauro,D. S. Touretzky and T. Leen (Morgan Kaufman),pp. 223–230.
A. U. Levin and K. S. Narendra 1996, “Control of nonlin-ear dynamical systems using neural networks — PartII: Observability, identification, and control,” IEEETrans. Neural Networks 7(1).
A. U. Levin and K. S. Narendra 1993, “Control of nonlin-ear dynamical systems using neural networks: Con-trollability and stabilization,” IEEE Trans. NeuralNetworks 4(2).
C. M. Marcus and R. M. Westervelt 1989, “Dynam-ics of analog neural networks with time delay,” Ad-vances in Neural Information Processing Systems 2,ed. D. S. Touretzky (Morgan Kaufmann).
C. J. Mazreliez 1973, “Approximate non-Gaussian filter-ing with linear state and observation relations,” IEEETrans. Automatic Control, February.
K. Matsuoka 1992, “Stability Conditions for nonlinearcontinuous neural networks with asymmetric connec-tion weights,” Neural Networks 5, 495–500.
J. S. Meditch 1969, Stochastic Optimal Linear Estimationand Control (McGraw-Hill, New York).
U. A. Muller, M. M. Dacorogna, R. B. Oslen, O. V.Pictet, M. Schwarz and C. Morgenegg 1990, “Statisti-cal study of foreign exchange rates, empirical evidenceof a price change scaling law, and intraday analysis,”J. Banking and Finance 14, 1189–1208.
B. A. Pearlmutter 1989, “Learning state-space trajecto-ries in recurrent neural networks,” Neural Computa-tion 1, 263–269.
F. J. Pineda 1989, “Recurrent backpropagation and thedynamical approach to adaptive neural computa-tion,” Neural Computation 1, 161–172.
R. Roll 1984, “A simple implicit measure of the effectivebid-ask spread in an efficient market,” J. Finance 39,1127–1140.
P. Simard, B. Victorri, Y. La Cunn and J. Denker 1992,“Tangent prop- a formalism for specifying selectivevariances in an adaptive network,” in Advances in
A Constrained Neural Network Kalman Filter . . . 415
Neural Information Processing 4, eds. J. E. Moody,S. J. Hanson and R. P. Lippman (Morgan Kaufman),pp. 895–903.
Y. Song and J. W. Grizzle 1995, “The extended Kalmanfilter as a local observer for nonlinear discrete-timesystems,” J. Math. Syst. Estim. Control 5, 59–78.
A. S. Weigend, B. A. Huberman and D. E. Rumel-hart 1990, “Predicting the future: A connectionist ap-proach,” Int. J. Neural Systems 1, 193–209.