Top Banner
1222 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997 Neural Networks in Financial Engineering: A Study in Methodology Apostolos-Paul N. Refenes, A. Neil Burgess, and Yves Bentz (Invited Paper) Abstract—Neural networks have shown considerable successes in modeling financial data series. However, a major weakness of neural modeling is the lack of established procedures for performing tests for misspecified models, and tests of statis- tical significance for the various parameters that have been estimated. This is a serious disadvantage in applications where there is a strong culture for testing not only the predictive power of a model or the sensitivity of the dependent variable to changes in the inputs but also the statistical significance of the finding at a specified level of confidence. Rarely is this more important than in the case of financial engineering, where the data generating processes are dominantly stochastic and only partially deterministic. Partly a tutorial, partly a review, this paper describes a collection of typical applications in options pricing, cointegration, the term structure of interest rates and models of investor behavior which highlight these weaknesses and propose and evaluate a number of solutions. We describe a number of alternative ways to deal with the problem of variable selection, show how to use model misspecification tests, we deploy a novel way based on cointegration to deal with the problem of nonstationarity, and generally describe approaches to predictive neural modeling which are more in tune with the requirements for modeling financial data series. Index Terms— Cointegration, computational finance, financial engineering, model identification, model selection, neural net- works, variable selection, volatility, yield curve. I. ACTIVE ASSET MANAGEMENT, NEURAL NETWORKS, AND RISK T HE ultimate goal of any investment strategy is to max- imize returns with the minimum risk. In the framework of modern portfolio management theory, this is achieved by constructing a portfolio of investments which is weighted in a way that seeks to achieve the required balance of maximum return and minimum risk. The construction of such an optimal portfolio clearly requires a priori estimates of asset returns and risk. Traditionally, it used to be accepted that returns are random and that the best prediction for tomorrow’s return is today’s return. Over a longer period, expected returns were calculated by averaging historical returns. Any deviation from this (“naive”) prediction was considered as unpredictable noise Manuscript received August 10, 1996; revised August 10, 1997. This work was supported in part by the ESRC under the ROPA program, Research Grant R022 250 057, Barclays, Citibank, Hermes Investment Management, the Mars group and Sabre Fund Management. The methodology and applications described in this paper were developed over a period of two years at the Decision Technology Centre, London Business School. The authors are with the London Business School, Regents Park, London NW1 4SA, U.K. Publisher Item Identifier S 1045-9227(97)07972-1. and so asset risks were estimated by the standard deviation of historical returns. Subsequently, portfolio theory suggested that the efficient frontier [51] is obtained by solving for the weights which maximize a utility of the following form: (1) According to (1) the portfolio’s expected return is determined by the expected returns of the individual securities in the portfolio and the proportion of each security represented in the portfolio . The expected risk of the portfolio is determined by three factors: the proportion of each security represented in the portfolio , the standard deviation of each security from its expected return, and the correlation between these deviations for each pair of securities in the portfolio . (The term is commonly referred to as the covariance). This traditional assumption was founded upon the theory of market efficiency, which stated simply implies that all public information on future price movement for a tradable asset has already been incorporated in its current price, and that therefore it is not possible to earn economic profits by trading on this information set. In statistical terms, this implies the so-called “random walk” model, whereby the expectation for the next period is the current value. The empirical finance literature up to the 1970’s universally reinforced this view for all actively traded capital markets, by testing and failing to refute the random walk hypothesis on daily, weekly, and monthly data. Yet this posed a serious dilemma, a gulf between theory and practice, as traders did continue to make profits in the short term. If they were just lucky, their luck seemed to show no signs of running out. By the end of the 1980’s theory had matured to provide a more comfortable fit with trading realities. In the first place it was recognized that the conventional tests of the random walk hypothesis were very “weak,” in the sense that the evidence would have to be very strong to reject this null hypothesis. Typically, period by period changes were tested for zero mean and white noise. Minor departures from randomness would not be significant in these tests; yet it only takes minor departures to offer real trading opportunities. From the perspective of scientific method, it is remarkable that the EMH should have gained such empirical support based upon 1045–9227/97$10.00 1997 IEEE
46
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 006

1222 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Neural Networks in Financial Engineering:A Study in Methodology

Apostolos-Paul N. Refenes, A. Neil Burgess, and Yves Bentz

(Invited Paper)

Abstract—Neural networks have shown considerable successesin modeling financial data series. However, a major weaknessof neural modeling is the lack of established procedures forperforming tests for misspecified models, and tests of statis-tical significance for the various parameters that have beenestimated. This is a serious disadvantage in applications wherethere is a strong culture for testing not only the predictivepower of a model or the sensitivity of the dependent variableto changes in the inputs but also the statistical significance ofthe finding at a specified level of confidence. Rarely is this moreimportant than in the case of financial engineering, where thedata generating processes are dominantly stochastic and onlypartially deterministic. Partly a tutorial, partly a review, thispaper describes a collection of typical applications in optionspricing, cointegration, the term structure of interest rates andmodels of investor behavior which highlight these weaknessesand propose and evaluate a number of solutions. We describe anumber of alternative ways to deal with the problem of variableselection, show how to use model misspecification tests, we deploya novel way based on cointegration to deal with the problem ofnonstationarity, and generally describe approaches to predictiveneural modeling which are more in tune with the requirementsfor modeling financial data series.

Index Terms—Cointegration, computational finance, financialengineering, model identification, model selection, neural net-works, variable selection, volatility, yield curve.

I. ACTIVE ASSET MANAGEMENT,NEURAL NETWORKS, AND RISK

T HE ultimate goal of any investment strategy is to max-imize returns with the minimum risk. In the framework

of modern portfolio management theory, this is achieved byconstructing a portfolio of investments which is weighted ina way that seeks to achieve the required balance of maximumreturn and minimum risk. The construction of such an optimalportfolio clearly requiresa priori estimates of assetreturnsand risk. Traditionally, it used to be accepted that returns arerandom and that the best prediction for tomorrow’s return istoday’s return. Over a longer period, expected returns werecalculated by averaging historical returns. Any deviation fromthis (“naive”) prediction was considered as unpredictable noise

Manuscript received August 10, 1996; revised August 10, 1997. This workwas supported in part by the ESRC under the ROPA program, ResearchGrant R022 250 057, Barclays, Citibank, Hermes Investment Management, theMars group and Sabre Fund Management. The methodology and applicationsdescribed in this paper were developed over a period of two years at theDecision Technology Centre, London Business School.

The authors are with the London Business School, Regents Park, LondonNW1 4SA, U.K.

Publisher Item Identifier S 1045-9227(97)07972-1.

and so asset risks were estimated by the standard deviation ofhistorical returns.

Subsequently, portfolio theory suggested that the efficientfrontier [51] is obtained by solving for the weights whichmaximize a utility of the following form:

(1)

According to (1) the portfolio’sexpected return isdetermined by the expected returns of the individual securitiesin the portfolio and the proportion of each securityrepresented in the portfolio . The expected riskof theportfolio is determined by three factors: the proportion ofeach security represented in the portfolio , the standarddeviation of each security from its expected return,and the correlation between these deviations for each pairof securities in the portfolio . (The term iscommonly referred to as the covariance).

This traditional assumption was founded upon the theory ofmarket efficiency, which stated simply implies that all publicinformation on future price movement for a tradable assethas already been incorporated in its current price, and thattherefore it is not possible to earn economic profits by tradingon this information set. In statistical terms, this implies theso-called “random walk” model, whereby the expectation forthe next period is the current value. The empirical financeliterature up to the 1970’s universally reinforced this viewfor all actively traded capital markets, by testing and failingto refute the random walk hypothesis on daily, weekly, andmonthly data. Yet this posed a serious dilemma, a gulf betweentheory and practice, as traders did continue to make profits inthe short term. If they were just lucky, their luck seemed toshow no signs of running out.

By the end of the 1980’s theory had matured to provide amore comfortable fit with trading realities. In the first placeit was recognized that the conventional tests of the randomwalk hypothesis were very “weak,” in the sense that theevidence would have to be very strong to reject this nullhypothesis. Typically, period by period changes were tested forzero mean and white noise. Minor departures from randomnesswould not be significant in these tests; yet it only takesminor departures to offer real trading opportunities. Fromthe perspective of scientific method, it is remarkable that theEMH should have gained such empirical support based upon

1045–9227/97$10.00 1997 IEEE

Page 2: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1223

a testing methodology that started by assuming it is true, andthen adopted tests which would rarely have the power to refuteit!

Econometric tests introduced during the 1980’s specifieda more general model for the time series behavior of assetreturns, involving autoregressive and other terms, such thatthe random walk would be a special case if the first-orderautoregressive coefficient were equal to one, and all otherswere zero. Thus a more general structural model was proposedfor which the random walk is a special case. It turned out thatunder this model-based estimation procedure, it was possibleto reject the random walk special case hypothesis for almostall of the major capital market series. Not only is this turn-around more satisfactory in providing results which close thegap between statistical conclusions and practical observation,it also demonstrated the methodological need to propose ageneral model first, before concluding that a time-series hasno structure.

Finance theory has now matured to the position wherebymarkets can still be considered efficient in the more sophisti-cated way of representing the expectations, risk attitudes andeconomic actions of many agents, yet still have a deterministiccomponent to their price movements relating to fundamentalfactors. Thus we now have the so-called “multifactor” capitalasset pricing model [70] and arbitrage pricing theory [65]which attempt to explain asset returns as a weighted com-bination of the assets’ exposure to different factors as shownin (2)

(2)

where is the return of asset are the determinant factors,the exposure of asset to factor is the expected

abnormal return from the asset, andis the nonpredictablepart of the return, i.e., the error of the model.

Hence, the “naive” estimate (or unconditional expectation)of asset returns is replaced by a more general estimate condi-tioned on the values of (fundamental or market) factors.Accepting the random work hypothesis is now the defaultcase which will be accepted should it turn out that none ofthe factors under consideration is statistically significant. Inthe more general case there is no reason why the structuredmodel in (2) should be limited to include only linear structureswith noninteracting independent variables and Gausian distri-butions. In terms of inviting the question of how general amodel should be proposed in the first place, this focus onempirical model-building allows us to consider the use ofneural-network technology as being a consistent, if extreme,example, of this new approach. By proposing the most generalof modeling frameworks, it is also providing a stronger “test”for market efficiency conclusions, albeit with tests that are notbased upon statistical hypothesis protocols, but on accuracyand performance metrics.

However, this seemingly minor departure from the efficientmarkets hypothesis (EMH) has major implications on the wayin which we manage risk and reward. It also induces strin-gent requirements on the design and validation of predictivemodeling methodologies, particularly so on neural networks(NN’s). To appreciate the implications of this apparently minor

departure from EMH let us formulate portfolio managementtheory in a more general framework in which the mean-variance optimization is a special case. The general case isa simple extension to the utility in (1), whereby

(3)

where represents the expected return for securityconditioned on its exposure to a vector of factorsanddefines the exact nature of the model by indexing a class ofstructured models or predictors. For example for the randomwalk, and for a multifactor CAPM modeltakes the form of the structured model in (2).

measures the deviation of each security, fromits expected value i.e., the standard error of modelforeach security in the portfolio. For example for the randomwalk this is given by the standard deviation i.e.,

but for the more general case the predictionrisk is

According to (3), theexpected returnof our portfolio isdetermined by two factors: 1) the returns of the individualsecurities in the portfolio whose expectation for the next periodis no longer the historical average but is given by and2) the proportion of each security represented in the portfolio.

The expected riskof the portfolio is determined by threefactors: 1) the proportion of each security represented in theportfolio; 2) the deviation of each security from its predictedreturn (i.e., the standard error of modelfor each security inthe portfolio); and 3) the correlation between the predictionerrors for each pair of securities in the portfolio.

Clearly, the expected value of portfolio[as defined in (3)]assumes that our expectations for the individual returns areaccurate. However, this is never the case and the actual valueof the portfolio depends on theaccuracy of the individualpredictors. To illustrate the effects (benefits and risks) ofprediction accuracy on actual portfolio value let us considera simple portfolio composed of two securities. In order toseparate the effects of prediction accuracy from the effects thatare due to covariances between prediction errors we make oneof the securities the risk-free. For the other asset, we simulateit as hypothetical security the returns of which randomlygenerated (100 independent observations, zero mean, standarddeviation one). For each point in the series, we then generatea set of predictors (with increasing predictive power) and foreach predictor we compute the actual value of the portfolio.Fig. 1 shows the value of this portfolio as a function ofprediction accuracy. Theprediction accuracy, plotted in the

axis is in fact the correlation between actual and predicted

Page 3: 006

1224 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 1. “It is not actually terribly difficult to make money in the securities markets.”1 Actual return/risk of a simple portfolio of two assets as a functionof prediction accuracy(�). The x axis shows the correlation between actual and predicted returns. The actual return/risk of the mean-variance portfolio(correlation zero) is positioned at the origin.

Fig. 2. “It is actually terribly difficult to make money in the securities markets.”2 Actual return/risk of a simple portfolio of two assets as a function of predictionaccuracy. Thex axis shows the correlation between predicted and actual. The return/risk of the mean-variance portfolio(� = 0) is shown by the horizontal line.

for the hypothetical security. Theaxis plots the actual valueof the portfolio equation.

The return/risk for the mean variance optimization (corre-lation zero) is positioned at the origin. As we move fromthe “naive” predictor (with ) toward the theoreticallyperfect predictor (with ) the actual return increases butnot uniformly. It is much steeper in the initial stages but ittails off for predictors with . Most of the payoff isobtained by predictors which can explain between 15–25% ofthe variability in the securities’ returns. In other words it onlyrequires minor improvement upon the random walk hypothesisto gain significant improvement in returns.

In view of this fact, it is seemingly remarkable that theEMH should have gained such wide acceptance and havesurvived for so long. However, to appreciate the forces which

1Allegedly attributable to Peter Baring, two months prior to the collapse ofBarings Bank.

are helping to keep EMH in widespread use, one does not needto look further than understanding the risks which can arise ifthe predictive models are of poor quality. Fig. 2 shows howportfolio value decreases as a function of prediction accuracy.The axis shows the correlation between actual and predictedfor each pair of securities in the portfolio. The theoreticallyperfect predictor (with ) is depicted on the right-handside (RHS) of the axis. The worst-case predictor (with

) is depicted on the extreme left-hand side (LHS)of the axis. The “naive” predictor (with ) correspondsto the random walk model.

It is clear from Fig. 2 that in terms of risk/reward the randomwalk model is a rather “efficient” predictor, despite its “naive”nature. But it is also clear from Fig. 1 that it only requiresminor improvements upon the random walk hypothesis to

2Nick Leeson, “Rogue Trader,” (1996) immediately after the collapse ofBarings.

Page 4: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1225

gain significant improvement in returns. It is also clear that ifany predictive modeling methodology is to become as widelyaccepted as the random walk model it must be accompaniedby robust procedures for testing the validity of the estimatedmodels. In terms of extending the EMH to provide a morecomfortable fit with trading realities in the more sophisticatedway of representing the nonlinearities in the expectations, riskattitudes, and economic actions of many agents participatingin the market, NN’s can be seen as a consistent example ofmultifactor CAPM and APT models whereby asset returnscan be explained as anonlinear combination of the assets’exposure to different factors

(4)

where is an unknown function, the inputs are drawnindependently with an unknown stationary probability densityfunction, is a vector of free parameters which determinethe structure of the model and is an independent randomvariable with a known (or assumed) distribution. The learningor regression problem is to find an of giventhe dataset D from a class of predictors or models indexed by

.The novelty about NN’s lies in the ability to model nonlinear

processes with few (if any)a priori assumptions about thespecific functional form of . This is particularly usefulin financial engineering applications where much is assumedand little is actually known about the nature of the processesdetermining asset prices. Neural networks are a relativelyrecent development in the field of nonparametric estimation.Well-studied and frequently used members of the familyinclude nearest neighbors regression and kernel smoothers(e.g., [39]), projection pursuit [31], alternating condional ex-pectations (ACE’s), or average derivative estimation (ADE)[78], classification and regression trees (CART’s) [11], etc.Because of their universal approximation properties, NN’sprovide an elegant formalism for unifying all these paradigms.However, much of the application development with neuralnetworks has been done in anad hoc basis without dueconsideration for dealing with the requirements which arespecific in financial data. These requirements include: 1)testing the statistical significance of the input variables; 2)testing for misspecified models; 3) dealing with nonstationarydata; 4) handling leverages in the datasets; and 5) generallyformulating the problem in a way which is more amenable topredictive modeling.

In this paper we describe a collection of applications inoptions pricing, cointegration, the term structure of interestrates, and models of investor behavior which highlight theseweaknesses and propose and evaluate a number of solutions.We describe a number of ways for principled variable selectionincluding a stepwise procedure building upon the Box–Jenkinsmethodology [9], analysis of variance and regularization. Weshow how model misspecification tests can be used to ensureagainst models which make systematic error. We describehow the principle of cointegration can be used to deal withnonstationary data and generally describe ways of formulatingthe problem in a manner that makes predictive modeling morelikely to succeed.

In Section II, we use the problem of high-frequency volatil-ity forecasting to demonstrate a stepwise modeling strategywhich builds upon classical linear techniques for model iden-tification and variable selection. The analysis identifies someimportant linear and nonlinear characteristics of the data withstrong implications for the market maker. A strong linearmean reversion component of intraday implied volatility isreported. A second component induced by the bid/ask bounceof volatility quotes is also present. Changes in the underlyingasset have a significant non linear effect on implied volatilityquotes over the following trading hour. The evolution of thestrike price also has a strong influence on volatility changesover the next trading hour. Finally, a volatility smile is reportedindicating how implied volatility innovations, are related tomaturity effects.

In Section III, we use the principle of cointegration to dealwith nonstationary data and to demonstrate how partial-testscan be used to perform significance tests on parts of an NNand particularly variable selection. The approach is applied tothe nontrivial problem of explaining the cointegration residualsamong European equity derivative indexes in the context ofexchange rate volatility, interest rate volatility, etc.

In Section IV, we analyze the problem of modeling theyield curve to demonstrate how the careful application offinancial economics theory and regularization methods canbe used to deal with both the problems of nonstationary andwith variable selection. A factor analysis indicates that thebulk of changes in Eurodollar futures are accounted for byunpredictable parallel shifts in the yield curve which closelyfollow a random walk. However, the second and third factorscorrespond roughly to a tilt and a flex in the yield curve andthese show evidence of a degree of predictability in the form ofmean-reversion. We construct portfolios of Eurodollar futureswhich are immunized against the first two factors but exposedto the third and compare linear and nonlinear techniques tomodel the expected return on these portfolios. The approachis best described as attempting to identify combinations ofassets which offer opportunities for statistical arbitrage.

In Section V, we take a metamodeling approach and arguethat even if security price fluctuations are largely unpre-dictable, it is possible that investor behavior may not be.Within this framework, we construct a metamodel of investorbehavior using a dataset composed of financial data on 90French Stocks drawn from the SBF250 index. Based on thepredictions of the metamodel, we show how it is possible tomanage actively investment strategies rather than the underly-ing assets in order to produce excess returns.

II. V OLATILITY FORECASTING AND VARIABLE SELECTION

A. Overview

Of all the inputs required in option valuation, volatility isthe most difficult for traders to understand. At the same time,as any experienced trader will attest, volatility often playsthe most important role. Changes in our assumptions aboutvolatility can have a dramatic effect on our evaluation of anoption, and the manner in which the market assesses volatility

Page 5: 006

1226 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

can have an equally dramatic effect on an option’s price. Thetwo basic approaches to measuring volatility [15] are eitherto compute the realized volatility over the recent past fromhistorical price data or to calculate the “implied volatility”from current option prices by solving the pricing model forthe volatility that equates the model and market prices. It hasbecome widely recognized in the academic finance literaturethat the implied volatility is the “market’s” volatility forecast,and that it is a better estimate than historical volatility [47],[17], [75], [4], [73], [55]. Other approaches to forecastingvolatility have been available in the past years using formaltime-varying volatility models, fitting models of the ARCHfamily to past data and computing the expected value of futurevariance [79]. Again, it has been found that volatility forecastsestimated from call and put options prices withexogenousvariablescontain incremental information relative to standardARCH specifications for conditional volatility which onlyuse information in past returns. The two approaches are notmutually exclusive. Subsequently, it would be desirable todevelop nonlinear models of implied volatility which takeadvantage not only of the time structure in a (univariate)series (of volatility) but also make use of exogenous variablesi.e., information contained in other potentially informativevariables that have been reported to have an influence onvolatility such as trading effects, maturity effects, spreads, etc.

The literature suggests several potential advantages thatNN’s have over statistical methods for this type of modelingbut one of the important weaknesses of NN’s is that they are asyet not supported by the rich collection of model diagnosticswhich are an integral part of statistical procedures. We utilizea stepwise modeling procedure which builds upon classicallinear techniques for model identification and variable selec-tion. Within this framework, we investigate the relationshipbetween changes in implied volatility and various exogenousvariables suggested in the literature. Using European indexIbex35 option data we investigate the construction of reliableestimators of implied volatility by a simple stepwise procedureconsisting of two phases.

In the first phase the objective is to construct a well-specifiedmodel of implied volatility which captures both the time de-pendency of implied volatility and the most significantlineardependencies on the exogenous variables. In the second phase,the objective is to provide incremental value by extending themodel to capture any residualnonlinear dependencies whichmay still be significant.

We show that with conservative use of complexity penaltyterms and model cross validation, this simple step-wise for-ward modeling strategy has been successful in capturingboth the main linear influences and some important nonlineardependencies in high-frequency models of implied volatility.Changes in intraday implied volatility are dominated by amean revertingtime dependencycomponent which is primarilya linear effect. The NN model is able to capture this effectand also some significant nonlinear characteristics of the datawith important implications to the market maker. Of particularsignificance is the relationship between implied volatility andthe effects on thestrike, maturity,andchanges in spotprices ofthe underlying market. For example, large movements (either

positive or negative) in the price of the underlying indexinduce a U-shaped response on the quoted volatility while avolatility smile is reported near expiration dates.

B. Experimental Setup—Empirical Properties of Volatility

We examine intraday movements of the Ibex35 ImpliedVolatility series obtained from short maturity close to themoney call options during a six-month sample period: Novem-ber 92 through to April 93. Intraday historical data are avail-able on the Spanish Financial Futures Exchange option con-tract on Ibex35. The Ibex35 index contains the 35 most liquidstocks that trade in the Spanish Stock Exchange throughits CATS system. This dataset has been made available bythe research department of MEFF providing high-quality andprecise real time information from electronically recordedtrades. Options on Ibex35 are European style options, havea monthly expiration date and at every point in time the threecloser correlative contracts are quoted (i.e., in March 1993,there will be quotes for the end of March, April, and Maycontracts). The measure of the implied volatility is obtained bysolving the pricing model [8] for the volatility that equates themodel and market prices using the Newton–Raphson method.

Our sampling interval is 60 min and the prediction intervalone (60-min) period ahead. The calculated volatility is theimplied volatility of the option price which is nearest to thenext time border. We define thechangein implied volatility as

(5)

where denotes the hourly changes of implied volatility attime . So is the difference between the impliedvolatility immediately past the time border 12:00:00 and theimplied volatility immediately past the time boarder 11:00:00.

Our empirical analysis of the implied volatility uses volatil-ity changesas defined in (5) rather than the levels as quoted (inbasis points) at MEFF. Two considerations motivate this. Firstthe academic and practitioner interest lies in understanding thechanges or innovations to expected volatility: how changesin expected volatility influence changes in security valuation.Second, the series of Ibex35 implied volatility levels appear tobe a near random walk. Implied Volatility levels have a first-order autocorrelation of 90%, indicating that although a unitroot can be rejected for the series such high autocorrelationmay affect inference in finite samples.

We select our universe of exogenous variables (see summaryon Table I) on the basis of their availability to the marketmaker at a specific point of time and the variable’s relevanceto explaining implied volatility as reported in the literature. Forexample, [27] and [29] found that volatility is larger when theexchange is open than when it is closed while [30] suggestedthat volatility is to some extent caused by trading itself. Itwould therefore seem desirable to make use of an exogenousvariable to account for trading effects on volatility. The high-frequency data available in this study facilitates the use of twosuch variables:volumeandvelocity. Volume is the number ofcontracts traded in an hour and velocity is the number oftrades per hour.

Page 6: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1227

TABLE ISUMMARY OF ALL VARIABLES

Day and Lewis [20] found that option prices reflect increasesin the volatility of the underlying stock indexes at bothquarterly and nonquarterly expiration dates. The behavior ofthe implied volatility for options spanning the expiration dateis consistent with an unexpected component to the increasein volatility nearbyexpiration dates. It would be desirable toaccount for thismaturity effectby making use of an exogenousdummy variable which is encoded as one when there are fouror less days to expiration and zero otherwise. A separatevariable, time to maturity, is introduced for the followingreason: volatility is calculated for call options close to themoney with short time to maturity (less than three months)and the quotation system provides the three closer correlativecontracts. This means that the obtained implied volatility serieshas been derived from different maturity contracts. Althoughwe expect that the implied volatilities obtained from differentmaturity contracts will be significantly similar (due to the factthat we are dealing with short maturity contracts), we shallnevertheless introduce atime to maturityvariable.

One possible explanatory factor for future changes in im-plied volatility that is often described in literature, is therelativebid-ask spreador any shifts in it. In a market with zerotransaction costs, changes in the price of an option are causedby the arrival of new information regarding the distributionsof returns on the option or by innovations in the processgoverning the option value. In other words, if the dealers incurcosts in executing a transaction, they require compensation.One part of the compensation includes the bid-ask spread.On the other hand, the market maker will widen the bid-askspread if the probability increases that he is dealing with betterinformed traders [18]. The variableaverage-spreadis used toencode this information as the sum of the spreads in all tradestaking place within an hour divided by the number of tradesin the hour.

Some recent studies of S&P index options show that optionswith low strikes have significantly higher implied volatil-ities than those with high strikes [34]. Derman and Kani[21] showed that the average ofat-the-moneycall and putimplied volatilities fall as the strike level increases.Out-of-the-moneyputs trade at higher implied volatilities than out-of-the-money calls. This asymmetry is commonly called thevolatility “ skew.” They attempt to extend the Black–Scholes

model to account for the volatility smile replacing the constantlocal volatility term with a local volatility function deducedfrom observed market prices. Pozo [58] has observed U-shaped patterns occurring in the Ibex35 options during varioussubperiods between 1991 and 1993. It is not the task of thisstudy to look for volatility smile patterns in the Ibex35 databut in order to account for a possible volatility smile it wouldbe desirable to make use of a measure of the degree to whichshort maturity call options arein-, at-, or out-of-the money.We use the variablemoneynessas a measure of the degreeto which short maturity call options arein-, at-, or out-of-themoney. This is calculated as the average ratio of the spot/strikeprice for every option traded within the hour.

It has also been argued that volatility is expected to behigher on certain days than others as well as on certain timeswithin the day. While this problem cannot be totally eliminatedit is helpful to include aweekendor day effectin our modelsand confirm of refute the presence of negative correlationin the time series. Three variables are used to capture theseeffects:day, week-end, and time effect: day effectis a variableset to (1, 2, , 5) for each week day. Theweekend effectis set to one for Fridays and Mondays and zero elsewhere.Likewise a dummytime of tradevariable is incorporated asan input into the models: trades registered in the first hour ofthe day are coded one, the rest are coded zero (i.e., overnightnew information can affect the behavior of the market at theopening hour).

The remaining variables in Table I are easy to encode:Change in spotis a measure of changes in the spot price ofthe underlying asset (Futures on Ibex35) at the end of everyhour. So, at 12:00:00 the value of the variablechange of spotwill be the difference of the closing price at (or just before)12:00:00 minus the closing price at (or just before) 11:00:00.Historic volatility is the standard deviation of past indexIbex35 returns. The horizon over whichhistoric volatilityis computed is related to time-to-maturity of traded options.When the majority of options traded have a time to maturitylonger than 15 days, a historical volatility measure is computedwith a sample horizon of 25 days of index returns. Otherwisethe sample horizon is 15 days. The combination of these twomeasures appears to have the highest correlation coefficientwith the implied volatility series. Finally, theinterest rateis

Page 7: 006

1228 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

the current yield of the T-bill whose maturity most closelymatches the option expiration.

The objective here is to identify the subset of variableswhich are most significant in explaining implied volatilityvariations from the universe of the 11 variables outline abovealso augmented by any significant time-dependency terms.This requires a procedure for searching a relatively large spaceof candidate variables and selecting those which: 1) give thebest explanation in terms of ; 2) have the strongest effect;and 3) are statistically the most significant. The modelingstrategy is described in the next section.

C. Modeling Strategy

Given the hypothesis that the data generating process forimplied volatility is inherently a non linear process (as theoption pricing model would suggest) it would seem desirableto attempt to estimate the unknown “true” model with anonparametric method which can capture nonlinear effectssuch as an NN. However, a major weakness of NN’s is the lackof an established procedure for performing tests of significancefor input variables. Many techniques and algorithms havebeen proposed to “measure” the “effect” of different parts ofan NN model, usually for use in one of three cases: eitherinput variables (in the case of methods devoted to “sensitivityanalysis”), weights (in the case of “pruning” algorithms) or,and more rarely, hidden units. Typically these methods are notsupported by a coherent statistical argument but are insteadjustified on the practical grounds of being better than nothing.In the cases where statistical arguments are presented thealgorithms are often very computationally and conceptuallycomplex (for instance requiring the calculation of the Hessianmatrix for the network, and/or assuming nonnegative eigen-values) and always prone to producing misleading results dueto overfitting.

This section describes a step-wise procedure for neuralmodel identification which builds upon classical linear tech-niques for model identification and variable selection. Weadopt a conservative approach whereby complexity is intro-duced only if the resultant model provides significant incre-mental value, and the additional parameters are statisticallysignificant. The procedure consists of two phases. In the firstphase the objective is to construct a well-specified model ofthe dependent variable (i.e., implied volatility) which capturesboth the time dependency of implied volatility and the mostsignificant linear dependencies on the exogenous variables.In the second phase, the objective is to provide incrementalvalue by extending the model to capture any residual nonlineardependencies which may still be significant.

Phase 1:

1) Identifying time structure : Using univariate time seriesanalysis and the Box–Jenkins identification procedureinvestigate the time structure of the output variable andbuild a reliable estimator of implied volatility of the form

(6)

Arguably, if the ARMA(p, q) orders are “well specified,”by definition we accept the hypothesis that the residuals

are white noise disturbances. In practise however, dueto weaknesses in randomness tests it is often observed thatthe addition of exogenous variables may also produce “wellspecified” models with better performance.2) Identifying exogenous influences: Using stepwise mul-

tiple linear regression analysis identify the predictivepower of each exogenous variable and build a par-simonious model, incorporating any error correctingand/or autoregressive terms as suggested in Step 1), e.g.,

(7)

where are the most significantexogenous variables.

3) Identifying nonlinear dependencies: Using multivari-ate NN analysis with the variables and/or error correct-ing/autoregressive terms identified in Step 2), constructa well-specified neural model of implied volatility

(8)

withand the vectors denoting the significantautoregressive terms, moving average terms, and exoge-nous variables, respectively, as identified in (7). In thispaper the number of hidden units are chosen by simplecross validation but more sophisticated methods can beused.

Clearly, this simple procedure is rather limiting to theneural-network approach for two reasons. Firstly, the predic-tive variables have already been selected in Step 2) in a waythat best suits the linear regression. It is therefore probablethat the selected variables only explain the linear part of therelationship. Nevertheless, even in these restrictive conditionsit is possible that a nonlinear estimator can provide incrementalvalue to the model, perhaps arising from the interactionbetween the exogenous variables and/or the time dependencies.Furthermore, to the extend that one of the main criticismsof NN methods is their lack of statistical explainability,working from a platform of statistical insight and lookingfor incremental value over a conventionally understood set ofinput variables, we can view this restriction as being the pricefor potentially improved credibility. Second, it is possible thatvariables which could have higher explanatory power in thenonlinear sense may have been rejected by the linear criterionin the first phase. This hypothesis needs to be verified in asecond pass.

The second phase of the model identification procedureverifies if incremental value can be achieved by includingadditional variables in the model which although rejectedby the linearity criterion in the first phase, may still havesignificant explanatory power in the nonlinear sense. Thesecond phase is an iterative procedure analogous to forwardvariable selection with regression analysis.

Page 8: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1229

Phase 2:

1) Forward model estimation: For each variable notalready included in the input vector of the model atthe current iteration construct a new well-specifiedmodel incorporating all the old variables plus each

as follows:

(9)

this will produce as many new models as there are“unused” variables. The starting input variable vector

comprises only those significant variables that wereidentified in (7) and used in (9).

2) Variable significance estimation: Compute acomplexity-adjusted payoff measure to evaluatethe change in model performance that would result ifvariable were added to the model. The payoff iscomputed by

(10)

The first term in (10), measures the correlation betweenthe observed values of the dependent variable (impliedvolatility) and the predictions of the model which usesan additional variable The second term in (10)penalises the model for the complexity introduced by theadditional degrees of freedom,. There are several waysin which we can measure payoff against complexity inan NN model. These will be explored in Section II-G.

3) Model extension: If during this current iteration thereexists no model whose performance is greater thanthe previous iteration the procedure is termi-nated. Our baseline metric, , is defined as the correla-tion between observed and predicted by the multivariateneural model which uses only the variables selectedby the linear analysis [see (8)]. If there is at leastone model which outperforms the baseline metric weproceed to construct a well-specified model in which

is extended to include allthe old variables plus one (or more) new variablesselected on the basis of the highest payoff. Theprocedure is repeated from Step 1).

The procedure described here is only one of several alterna-tive ways in which forward selection can be controlled. Ideally,it would be desirable to consider not only a single candidatevariable in each step but (at least) pairwise combinationsof variables. This is particularly important if the candidatevariables are believed to be interacting in some way, cf. theeXclusive OR problem where none of the individual inputshas any explanatory power but it is only when both of themare put together that the solution is achievable. We select thesimple version of the forward search in order to reduce thecomputational requirements. Nevertheless, in order to accountfor potential interaction effects we shall use a “short-cut”whereby we construct a single model with the entire universeof variables at the end of the procedure. If the performanceof this model (in the validation set) is better than the baselinemodel (in unadjusted terms) we shall take this as an indicationof significant interaction.

Clearly, it is possible to perform the search entirely in theopposite (backward) direction. Our main reason for preferringforward selection is that it is difficult to obtain an accurateestimate of the effective degrees of freedom in a neural modelfor the complexity penalty term. Although several methodsare described in the literature (e.g., [53] among others) theyrequire the Hessian to be positive definite which in practice isdifficult to achieve and very expensive to compute. As we aretaking a cautious approach to avoid overfitting our estimate of

will always be upper bounded by the number of additionalparameters that a variable introduces to the model.

Another optimization to the procedure described in phase 2above would be to control the selection of candidate variableson the basis of the (linear) correlation of each candidatevariable with the residuals of the current model or moreeffectively in the nonlinear case to use analysis of variance.Such optimizations are analyzed with synthetic data undercontrolled simulation elsewhere (see, for example, [14]) andalso in Section III of this paper.

Having described the basic steps in our stepwise modelingstrategy, we return our attention to its application on theproblem of modeling implied volatility changes.

D. Identifying Time Dependencies—Time Series Analysis

Let us begin by considering the univariate properties of thehourly changes in volatility. Summary statistics are calculatedfor the output variable over the entire sample and for threedifferent subperiods (to highlight any nonstationarity or drifteffects). Each subsample is a correlative window and contains200 observations. Table II shows the subsample periods andsummarizes the properties of hourly changes in implied volatil-ity. The mean hourly changes ranges from0.011% in thethird subsample to 0.0134% in the second period. The meanvolatility change over the entire sample is 0.28%. Apparently,over the six-month period, market volatility did not drift inone direction or another. The standard deviation of the hourlyvolatility changes is also fairly stable, ranging from 0.1985 inthe second subsample to 0.2422 in the third.

Table II also provides the autocorrelation structure of thehourly volatility changes for one through six lags. Like themean and standard deviation, the autocorrelation structure isquite stable for the three subsamples. The first-order autocor-relations, for example, range from0.507 to 0.435 and thesecond-order range from0.102 to 0.017, revealing a negativecorrelation. This degree of correlation, is higher than theautocorrelation reported in [40] for S&P100 options. Clearlywe encounter a strongmean revertingphenomena in ourIbex35 data. Big movements are followed by smaller changesbut with opposite sign. This behavior could be induced bydifferent causes. Some primary causes for Ibex35 are describedin [61] also in [58]. A similar behavior is encountered byJacquillatet al. [44] on the Paris Stock Exchange, and [64]attributes the behavior to the bid-ask bounce.

Our task is to identify a parsimonious representation of thegenerating process for this data set. If the autocorrelations havea cutoff point, that is, if they are zero for all greater thansome small number and the partial autocorrelations taper off

Page 9: 006

1230 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE IISTATISTICAL PROPERTIES OFHOURLY Ibex35 MARKET CHANGES OF IMPLIED VOLATILITY

TABLE IIIUNIVARIATE ESTIMATION OF HOURLY CHANGES OF IMPLIED VOLATILITY

for growing an MA representation is suggested. In our case,see Table II, the autocorrelations have a cutoff point afterthe first lag, and the partial autocorrelations are reduced inmagnitude slowly. We identify an MA of order 1 for everysubsample period. The MA(1) estimation results are shown inTable III.

In every subsample period, the coefficient of the MA(1) islarge and negative, ranging form0.7213 in subsample 1 to

0.6865 in subsample 3. These coefficients are all significant,with -statistics above 10.00. They are also consistent with themoving average identification reported in Table II, confirmingthe strong mean reversion effect of the data.

This simple linear model is able to explain a large proportionof the variability in changes of implied volatility. Adjusted

-squared values range from 34.97% for the entire sample

to 36.54% for the first subsample. To check the overallacceptability of the residual autocorrelations (white noise),the portmanteau statistic, Q statistic is often used. If theARMA orders are well specified, for an ARMA(p, q) processthe statistic Q is approximately distributed with - -degrees of freedom. In Table III, the hypothesis of white noisedisturbance can be accepted for the residuals. Nevertheless, anumber of researchers have detected intertemporal relationsbetween expected volatility and market information. Let usnow turn our attention more explicitly to these relations.

E. Identifying Exogenous Influences

We are interested in assessing the degree to which theexogenous variables in Table I have a significant influence inexplaining changes in implied volatility, and whether by using

Page 10: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1231

TABLE IVESTIMATION RESULTS FOR MLR EQUATIONS ON HOURLY CHANGES IN IMPLIED VOLATILITY

these variables we obtain incremental value over and above thetime series model. We use the regression model proposed inthe second step of our modeling strategy [see (7)]. To accountfor the strong MA(1) component in the regression equationwe incorporate an autoregressive element in the multivariatemodel. If we reverse the MA(1) model

we obtain

that is when

(11)

The acf andpacf in Table II suggest that the first three lagsare the only ones that make a significant contribution to theexplanation of changes in implied volatility. After lag3 foreach subsample there is a significant cutoff in the pacf. Weshall therefore use three autoregressive variables in our inputset: and to take into accountthe infinite autoregressive order of (11).

Our regression analysis is based on backward stepwisevariable selection. With backward selection, we begin with amodel that contains all of the candidate variables and eliminatethem one at a time. We check at each stage to verify that pre-viously removed variables are still not significant. We reentervariables into the model if they become significant when othervariables are removed. We run regression estimations for allthree subsamples.

Table IV gives a summary of the results. Overall, theaddition of exogenous variables gives a small but significant

improvement in model fit. (i.e., 40.4% Vs 34.9% in terms ofadjusted R-squared).

The results are largely consistent with those obtained inTable III, and confirm the strong negative linear relationbetween changes in implied volatility and its lagged values.The coefficients for lag one, two, and three are negative andsignificant, decreasing in magnitude. This pattern is presentin every subsample period, dominating the explanation ofthe output variable. While the autoregressive terms representthis negative relationship, new variables (moneyness, changein spot, and maturity effect) appear in the equation addingextra value. Note the stability of estimation results among thesubsample periods. The coefficient formoneynessis alwayspositive and significant while thechange in spotvariable hasa negative linear relation with the output variable. The slopeof the dummy variable,maturity effect, is also significant.(Volume, which is not shown on the table, also appears assignificant but only in the second subsample with a weakcontribution to the model-statistic of 1.831).

All the estimations present a high-squared value. Evi-dence in Table IV indicates that there is an improvement inexplaining the variability of the output variable, that rangesfrom 12.8% for the third subsample to 2% in the first sub-sample. To test normality for the residuals we examine theskewness and kurtosis values. A skewness of zero indicatesthat the data are symmetrically distributed. Positive values ofskewness suggest that the upper tail of the curve is longer thanthe lower tail, while negative values suggest that the lowertail is longer. To assess the significance of the skewness valuewe use its standardized value. We can reject the hypothesisabout normality of the residuals at the 0.05 significance levelfor every subsample. For a normal distribution, the kurtosiscoefficient is zero. When the coefficient is less than zero, thecurve is flat and when is greater than zero, the curve is eithervery steep at the center or has relatively long tails. Table IV

Page 11: 006

1232 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE VLEARNING RESULTS FOR DIFFERENT TOPOLOGIES AND TRAINING TIME

shows that the residuals are far from being Gaussian. Theyrepresent curves with relatively long tails for each subsample.In summary, the normality assumption does not hold butresiduals do not present serial correlation (Durbin–Watsonstatistics range from 1.950 in subsample three to 2.060 insubsample one).

Having analyzed the time structure of the implied volatilityseries, and shown that a small but significant incremental valuecan be obtained with the use of additional variables and partic-ularly moneyness, changes in spot, volume,and maturity, thenext step in the modeling procedure is to investigate whethernonlinear analysis using NN’s can add further incrementalvalue.

F. Nonlinear Dependencies and Variable Interactions

We start our nonlinear analysis with the third step of phaseone of our modeling strategy whereby we use a multivariateneural model with the variables and autoregressive termsidentified in Step 2) to construct a well-specified neural modelof implied volatility changes. We then verify if incrementalvalue can be obtained by including additional variables inthe model through a stepwise forward procedure (phase 2 inour modeling strategy). To control model selection we makeuse of a simple statistical technique based on cross validation.cross validation requires that the available dataset is dividedinto two parts: atraining set, used for determining the valuesof the weights and biases, and avalidation set, used fordeciding when to stop training. In the experiments reportedhere we allocate the earlier available timespan data consistingof 400 observations to the training set and the following 100observations to the validation set. The final 50 observationswill be used as an (one-off) ex-ante test set to compare allmethods. We use the common backpropagation algorithm forestimating the network weights with one layer of (up to five)hidden units, a logistic energy transfer function, linear outputs,a weight decay term (set to0.001) and a momentum term(set to 0.1).

Clearly, since the predictive variables have already beenselected according to a linear criterion (see Table III) anyincremental value obtained in this step will be primarilyattributable to possible interactions between the exogenousvariables and/or time dependencies. A summary of the resultsis shown in Table V for four different neural models withtwo, three, four, and five hidden units, respectively. Three

sets of performance measures are shown for the validationsample ( observations), and the training sample( observations). For each sample the table showsthe root mean square error (RMSE), the number of iterations atwhich minimal was obtained (Converg. It.), the correlationbetween estimated and observed (), and the percentage ofcorrectly predicted directional changes (Poccid).

The performance results are stable and similar for differenttopologies (except for Net3). The correlations between esti-mated and observed in the cross validation set range from0.6465 for Net2 to 0.6941 for Net1 all of which are betterthan the in-sample correlation for multiple linear regression(i.e., 0.635, see Table IV). Rmse values range from 0.075 15in Net1 to 0.091 84 in Net3. Correlation measures of the vector

are around 0.65 except for Net3 where the correlationis 0.47.

The first and simplest neural model (Net1) with two hid-den units does not appear to be able to fit the data well.Its in-sample RMSE remains constant at 0.1063 throughouttraining, with a correlation between actual and predicted of0.6578 which is only marginally better than the multiple linearregression (i.e., 0.635 see Table IV). Also its performance (interms of correlation ) in the validation set is suspiciouslyhigher than that in the training set.

The third neural model (Net3) with four hidden units appearsto have been trapped in a local minimum with a mean squareerror similar to the first network and an in-sample correlationbetween actual and predicted of 0.41 which is worse than themultiple linear regression.

In principle, models with the characteristics of Net1 andNet3 exhibit all the signs of a misspecified model. Indeed,a test for serial correlation on the residuals of both thesemodels shows that their residuals are autocorrelated to nearlythe same level as the original implied volatility series with onesignificant lag . This can also be seen in theirD.W. statistics. Although we cannot formally test the D.W.for these two nonlinear models, we shall reject both models asmisspecified. Misspecified models can occur with NN’s for theclassical reasons i.e., omitting important variables or fitting alow-order polynomial to a dataset which has been generated bya high-order nonlinear process. Being unable to escape froma local minimum (as for example with Net3) or not having asufficiently high number of free parameters (or a combinationof the two) can also lead to a badly specified model.

Page 12: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1233

TABLE VIIN-SAMPLE PERFORMANCE ALL DATA

For both the remaining models Net2 and Net4 with threeand five hidden units, respectively, we can accept the nullhypothesis of the residuals being white noise disturbances.Net4 appears to give a better fit for the in-sample data( as opposed to ) but not overly so.This can be seen in the validation figures (i.e.,versus ). Thus, we select the model provided byNet4 for further experiments.

Table VI gives a summary of the overall results so far.Clearly, the multivariate neural model has provided significantincremental value in terms of explaining changes in impliedvolatility over the MLR and Box–Jenkins models. As weshall see in Section II-H, this incremental value is persistentin the ex-ante test set and also in the economic evaluationof the models. But before evaluating the models on theex-ante test set let us examine whether further incrementalvalue can be achieved by including additional variables whichalthough rejected by the linear criterion in the first phase ofour modeling methodology, may nevertheless have significantexplanatory power in the nonlinear sense.

G. Nonlinear Variable Selection

In this, the second, phase of our model identification proce-dure we attempt to identify any residual nonlinear dependen-cies that may be present through a process of forward variableselection.

The first step in this procedure is to compute a complexity-adjusted payoff measure which evaluates the change in modelerror that would result if an additional variable, wereadded to the model. The payoff (is analogous to the Akaikeinformation criterion and) penalizes the gain in performanceby a complexity term which depends upon the sample size andthe additional degrees of freedom. Recall, that it is computedby (10) where is the correlation betweenobserved and estimated (in the cross validation set) given bya model which includes all previous variables plus anadditional variable . The second term in the equationis the complexity penalty term where is the sample sizeand is the additional number of free parameters. So, forexample if we have a network with four hidden units, theaddition of a new variable will use a maximum of newparameters.

With our validation sample size fixed to 100 and amaximum number of potential degrees of freedom(i.e., the new connections from the additional variable to allfive hidden units), the complexity penalty term grows almostlinearly with . In other words we shall only include a newvariable if it contributes to a performance gain of 5% ormore. This is a relatively strong criterion for variable entryparticularly when compared to forward regression analysis

TABLE VIIFORWARD VARIABLE SELECTION—FIRST FORWARD PASS

where we are generally prepared to allow variables to enter themodel with a relatively low -statistic. However, by using the-statistic we are effectively testing the hypothesis of variable

significance at a specified level of confidence. With neuralmodels it is not straightforward to test the same hypothesis asit involves making strong and potentially invalid assumptions(see for example the seminal work of [76] or [62] for variablesignificance testing; and also [3] for omitted nonlinearity). Analternative criterion is goodness of fit but given the fact thatNN’s can fit the data arbitrarily close and the validation sampleis relatively small it is desirable to use a stricter criterion for(variable) entry (into the model).

The results of the first pass are summarized in Table VII.From the remaining eight variables, it appears that the additionof volume, day-effect, velocity, average spread, and maturitycan produce models with better correlation between observedand estimated in the cross validation period. However, whenadjusted for additional complexity none of these variables ap-pears to add significant incremental value, over and above thebenchmark multivariate neural model (with

.The results shown in Table VII essentially preclude any

further analysis for the step wise forward procedure. Withperhaps less restrictive penalties for the additional complexityit might have been desirable to includevolume, andday effectbut our conservative use of extra complexity precludes that.Instead of running the risk of overfitting the data we chooseto verify this result by a variable significance estimationprocedure which operates in the opposite, i.e., backwardfashion.

Backward variable selection operates in the opposite direc-tion. It is similar to thesensitivity based pruningintroducedby Moody and Utans [53] whereby we fit a model with allavailable variables (using cross validation) and attempt todetermine the most significant variables by computing thechange in error that would result if variable were removedfrom the network. By setting each of the input variablestoits mean value (one at a time), we compute the measure

, which assesses the overall contribution to predictionaccuracy due to that particular variable. We use two criteriafor prediction accuracy: 1) the RMSE between observed andestimated in the validation set and 2) the correlation between

Page 13: 006

1234 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE VIIILEARNING RESULTS FOR DIFFERENT TOPOLOGIES AND TRAINING TIME. ALL DATA

observed and estimated (also in the validation)

(26)

and

(27)

where lies between zero and one. Note that in computingno retraining of the network is required. By ranking the

importance of all variables using a nonlinear criterion, theobjective of this test is simply to verify or refute the hypothesisthat no significant variable have been left out without havingto incur the cost of iteratively reestimating models with allpossible combinations of variables. This test differs fromMoody and Utans [53] in that the sensitivity (contribution)of each variable is computed only for the validation set andin that it uses the ratio rather than the difference in the twoerror measures.

The results are shown in Tables VIII and IX for differentneural models and different variables, respectively. Table VIIIshows the performance measures obtained for each network’scrossvalidation, and training sets ( andInsam, respectively).The performance results are similar for different topologies(except for NET1—note its D.W. statistic).

Overall including all 14 variables in the model gives worseresults than both the multiple linear regression with the sixmost significant variables and the multivariate neural modelwith the same six variables. RMSE measures in the validationset ranges from 0.0833 in NET3 to 0.0966 in NET1. Corre-lation analysis of the pair of vectors (observed, estimated) isaround 0.59 except for NET1 where correlation is below 0.40(again note also its D.W. statistic). Again,Poccid shows avery stable behavior around 75% for all the networks trained,except for NET1 wherePoccid is a low 55%.

The objective of this test, however, is to verify that nosignificant variable have been omitted from the nonlinearanalysis. Table IX gives a ranking of all 14 variables accordingto both versions of the metric. The results broadly confirmthe information that we had extracted from the univariateand multivariate linear analysis. The contribution of the au-toregressive variables, Vol.lag(1), Vol.lag(2), and Vol.lag(3) isevident, confirming the strong mean reversion effect in thedata. These variables appear always in the top part of thetable for both versions of the metric. Thematurity effect,moneyness, andchanges in spotalso appear in the top half of

TABLE IXSTATISTIC < BASED ON RMSE CORRELATION

the table. However, there are two new variables that appear atthe top of the table:Historical volatility, day effect and volume,but this is not consistent. Historical volatility is at the bottomof the table with the RMSE measure of while volumeandday effectare inconsistent in that their importance varies withthe subsampling period.

It also appears from Table IX that the contribution of thesetwo variables becomes important when they areboth includedin the model rather than through their individual contribution(as the forward analysis has shown) indicating an interactioneffect between volume and day of the week. With hindsightit might have been desirable to continue the forward selectionprocess beyond the first pass. Ultimately there is a price to bepaid for being (perhaps too) conservative with the complexitypenalty term in the payoff criterion which we are prepared toaccept.

So far we have studied the in-sample and/or validation setperformance of the models, and have concluded that a simplestep-wise modeling strategy can capture both the main linearand nonlinear features of the data. In addition, we have shownthat there is significant incremental value to be obtained fromnonlinear dependencies and interaction between the variables(see Table V). The ultimate goal, however, is topredict futurevalues of the time series. In the next section we comparethe ex-ante forecasting performance of Box–Jenkins univariatepredictors, forecasts from standard MLR and nonlinear neuralmodels.

Page 14: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1235

TABLE XOUTSAMPLE PERFORMANCE. ALL DATA (a)

H. Ex Ante Evaluation of The Models

The ex-ante test set consists of 50 observations held out ofthe sample to test the validity of our findings. The three modelsare evaluated both in terms of theirprediction accuracyon thistest set and in terms of their theeconomic performance.

The prediction accuracy of each model is evaluated by asingle-step ahead prediction. At the closing of each tradinghour, when the values of all independent variables are knownwe make a forecast for the change in implied volatility overthe next trading hour. The forecasts of each model are thencompared with the actual changes. Two measures are used toevaluate forecasting accuracy. Thecorrelation between actualand predicted over the unseen 50 observations, andPoccid:the percentage of correctly predicted directional changes. Theresults for each model are shown in Table X. Recall that theBox–Jenkins and MLR models have been estimated over theentire sample of 500 observations, whereas the neural modelhas used only the first 400 of these observations directly fortraining.

The results both in terms of correlation and in directionalchanges are consistent with the expectations from the in-sample period (and in the case of neural models with thevalidation period). The neural model yields correlation val-ues around 0.63 against 0.55 of the MLR model and 0.50of the univariate model. The improvement obtained by theneural model is significant in terms of . Starting fromthe Box–Jenkins model which uses the least informationavailable (i.e., the implied volatility series alone) it is pos-sible to add extra value by introducing exogenous influencesvia new inputs: first in a straightforward linear model andthen considering possible nonlinear interactions among thevariables.

Financial data series are very often nonstationary. One testthat is often used to safeguard against spurious models isto divide the training data into a number of subsamples.A separate model is then reestimated for each subsampleand a test for stationarity is performed on the parameters ofeach model. If the separate models are radically different ortheir performance on the ex-ante test set is radically differentthen there is risk that the estimated models have capturedsome temporary and perhaps unrepeatable relationship ratherthan a relationship which is invariant through time. Theresults, shown in Table X, are consistent with those obtainedwith applying the modeling strategy to each of the threesubsampling periods as shown in Table II. The analysis isdescribed in [61] and safely concludes that all models giveacceptably stable performance.

The economic evaluation of the models involves the useof a simple “trading strategy” for a market participant usingthe predictions of the neural, MLR, and time-series models topurchase or sell delta-hedged short maturity call options in theIbex35 option market (between ten and 45 days). This requiresthe purchase/sale of at-the-money call options on Ibex35 inevery time border. The option positions are held until theend of the hour at which point another position is establishedbased on the forecasted direction by the models. In a generalsense, the strategy will profit when the forecasted directionis the same as the true direction taken by the market andwhen the true implied volatility quotes remain rather volatileor in other words the magnitude of the movement of truevolatility is large. We conduct the evaluation in two stages. Inthe first stage we ignore transaction costs and assume thattrades are available at the end of every time border. Thenumber of trades is set to 50. The profit/loss figures are basedon hourly investments of 100 option contracts and do notincorporate reinvested profits. Clearly, this is a simplificationof real market conditions.

The cumulative profit curves associated with each modelforecasts (NN, MLR, and Box–Jenkins) show a steady prof-itability for each model (see Fig. 3).

Though cumulative profits have a slight drawdown near theend of the forecasted period for the three models, the trends forthe remainder of the period are strong and consistent. The cu-mulative profit curve for the MA(1) model predictions presentsa considerable number of peaks and troughs. Nonetheless, thestrategy using the forecasts from the MA(1) model otherwiseearns consistent and positive profits. The NN predictions aremore profitable than MLR or univariate predictions. Univariateforecasts present the worst performance overall. The ability toconsistently profit from the trading strategy based on any ofthe model predictions may be due to the strong mean reversioneffect reported in earlier sections. In that sense, these profitsare not riskless. Their riskiness must also be considered inevaluating their significance.

Table XII provides a measure of the risk-adjusted returns ofthe strategy. The mean hourly profit is reported, along withthe -statistic with the null hypothesis that the sample meanis zero. Since the divisor of the computation is the samplestandard divided by the squared root of the sample size, the-statistic (other than a scaling factor) bear a correspondence

to the reward/risk tradeoff of each model.Table XII indicates that the profits based on the forecasts

of all models are impressively large. The reward/risk ratiosranges from 1.99 for MA(1) model to 4.00 for the NN

Page 15: 006

1236 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XIIHOURLY PROFITS FOR Ibex35 OPTION TRADING STRATEGY (a)

Fig. 3. Profit and loss of different models under a simplistic trading strategy.

Fig. 4. Profit and loss of different models, adjusted for transaction costs.

system. This evidence supports the notion that model forecastsare able to give economic value to the market participant.However, to demonstrate that the strategy can benefit fromthe forecasting models, it is also required that we includethe effect of transaction costs. This is shown in Fig. 4 Alsothe last two columns of Table XII indicate the influence ofbid/ask transactions costs on trading strategy profits. As theresults indicate, the introduction of transaction costs (at 0.5%)virtually eliminates the profitability of the linear models butnot so for the neural forecasts. The-statistic or reward/risk

ratio ranges from 0.233 for the univariate model to 2.06 forthe NN model. Clearly, trading the forecasts of the neuralmodel appears to have economic significance in contrast withthe MLR and univariate models.

So far we have described a step-wise model selectionprocedure for NN’s and evaluated its use in forecasting impliedvolatility changes. We have shown that this simple procedure issuccessful in capturing the main linear and nonlinear depen-dencies of implied volatility to various exogenous variablesdescribed in the literature, and that in the case of Ibex-35

Page 16: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1237

options neural models give significant incremental value bothin terms of forecasting accuracy and also in economic interms. However, it has been argued that NN’s are “blackboxes” which are difficult to analyze and understand and hencecompetitive rather than synergetic with theory formulation. Inthe next section, we use sensitivity analysis to obtain someinsight into the nature of the relationship that has been capturedby the neural model between innovations in implied volatilityand its determinant factors.

I. Sensitivity Analysis

Fig. 5 shows graphs of the response functions of the outputwith respect to the inputs. The graphs obtained refer to thetraining of NET4. Recall that the variablesVol.lag1, Vol.lag2,andVol.lag3are those with the greatest influence on the changein implied volatility and they should represent a mean revertingeffect. Fig. 5 clearly confirms this effect.

Fig. 6 indicates how the independent variable, volatilitychanges, moves with the maturity effect variable while holdingthe remaining inputs constant to their most typical value(median). The neural model activates only two different pointsin the maturity effect domain, i.e., zero and one. This result isnot surprising since this is a dummy variable (0,1). What themodel does is to sensitise the output variable near these twopoints. Note that the influence on volatility changes is minimalin the intermediate values of the interval (0,1).

The model succeeds in recognizing and differentiating theeffect of the dummy in the data, separating the changes involatility occurred in days where the maturity of the contract isequal or less than three days (dummy ) from those realizedwhen the option has a maturity longer than three trading days(dummy ). The eventual relationship between these twovariables is not entirely clear since the extreme values zeroand one, appear to compensate each other.

Fig. 7 depicts how the independent variable varies as wechange the other input variable in our model, (changes in spot).Large changes in spot would typically induce similarly largechanges in implied volatility. Alternatively, when changes inspot are relatively small, the derivatives with respect to thisvariable are almost flat. This behavior of volatility changeswith respect to changes in spot is what we would expect butwhich is undetectable with linear methods..

Fig. 8 shows how changes in volatility moves with themoneynessvariable. Hull and White [43] and Stein and Stein[72] have shown that it is rational for implied volatility tovary with the strike price , when the asset volatility isbelieved to be stochastic. When special assumptions are made,their equations and calculations show that a plot of theoreticalimplieds against displays asmile: the function has a U-shape and the minimum implied occurs at a value ofnearthe forward price of the underlying asset.

Some authors have reportedsmile effectsin different datasets [69], [71], [32]. Pozo [58] has argued that U-shapedpattern occurred for Ibex35 options during various subperiodsbetween 1991 and 1993. It is not the task of this study tolook for volatility smile patterns in the Ibex35 data. However,Fig. 8 gives a measure of the way in which volatility changes

are influenced by the situation of the strike. In other words,it indicates how the volatility smile changes through timewith changes in the strike. When moneyness is close to zero,i.e., strike is close to spot price, the influence of this inputvariable is minimal. Conversely, when moneyness moves awayfrom zero the influence on the output becomes larger althoughthis relation appears to be asymmetric. For in-the-moneyoptions moneyness seems to influence the changesin volatility to a higher degree than for out-of-the-moneyoptions . Again, this result has important economicimplications for market participants. The neural model hassucceed in capturing the importance of this variable and indoing so providing valuable information.

We described a step-wise model selection procedure forNN’s and evaluated its use in forecasting implied volatilitychanges. We have shown that this simple procedure is success-ful in capturing the main linear and nonlinear dependencies ofimplied volatility to various exogenous variables described inthe literature, and that in the case of Ibex-35 options neuralmodels give significant incremental value both in terms offorecasting accuracy and also in economic in terms.

The modeling strategy is based on a integrative approach,combining information in a hierarchical building process. Wedeparted from the least available information set given bythe univariate series itself using the well-known Box–Jenkinstime series analysis. From this bottom line, we incorporatedexogenous influences, first making use of a linear econometrictool (MLR) and then implementing a more complex nonlinearmodel using NN’s. This strategy intends to overcome the lackof a systematic model-building process for neural models, andaims to make neurotechnology a more understandable anduseful tool for financial economists.

Besides the problem of variable selection, the constructionof reliable neural estimators for financial data entails twofurther problems: dealing with nonstationary data and handlinginfluential observations or leveraged data. In this section weaddressed the problem of nonstationarity bydifferencingtheimplied volatility series. In the next section we use the conceptof conditional cointegration as a framework for handlingnonstationary data which has the added benefit of enhancingstatistical inference. Within this framework, we describe analternative way for principled variable selection and addressthe issue of robustness to influential observations.

III. M ODELING COINTEGRATION IN

INTERNATIONAL EQUITY INDEX FUTURES

A. Overview

Recent years have witnessed a growing dissatisfaction withthe stationarity and ergodicity assumptions upon which thebulk of time-series theory has been founded. Although theseassumptions may be reasonable for many time series in thenatural sciences they are rather too restrictive for economicand financial data series most of which appear to have firstand second unconditional moments that are far from beingtime invariant. A time series can be nonstationary in anunlimited number of ways but one particular class of non-

Page 17: 006

1238 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

(a)

(b)

(c)

Fig. 5. Mean reversion effects in volatility.

stationary process has monopolized the interest of econome-tricians, namely that ofintegrated(i.e., stochastically trending)processes primarily due to Granger and Newbold [35]. Theyadvocated the use ofdifferencingas a way of dealing withnonstationarity. As a result, a vast literature on testing for

integrated behavior, as well as on statistical inference inthe presence of integrated variables, has appeared and keepsgrowing steadily. However, when modeling only in terms ofdifferencedvariables considerable long-run information (whichmight be present on thelevels of the variables), is lost and

Page 18: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1239

Fig. 6. Volatility changes as a function of the maturity effect.

Fig. 7. Volatility changes as a function of changes in the underlying spot.

Fig. 8. Volatility changes as a function of moneyness.

statistical inference may be impeded. This is particularly

important in finance where theory suggests the existence of

long-run equilibrium relationships among variables. Although

short-run deviations from the equilibrium point are most

likely, due for example to random shocks or indeed due to

deterministic but unknown factors, these deviations tend to

be bounded through the actions of a variety of agents which

act as stabilizing mechanisms bringing the system back to

Page 19: 006

1240 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

equilibrium. Granger [36]–[38] and Engle [25] developed whatcan be regarded as the statistical counterpart of these ideas:the concept of cointegration.

Cointegration allows individual time series to be stationaryin first differences (but not in levels), while some linearcombinations of the series are stationary in levels. By inter-preting such a linear combination as a long-run relationshipor an “attractor” of the system, cointegration implies thatdeviations from this attractor are stationary, even thoughthe series themselves have infinite variance. Cointegrationprovides a way of alleviating the inefficiencies caused bythe disuse of long-run information (available on the levelsof the variables) while considerably facilitating statisticalinference. The need to make use of information about long-run relationships among the levels of variables has long beenrecognized by the advocates of error correction mechanisms(ECM’s) [66], [19], [42], which attempt to incorporate bothshort-run dynamics and long-run information through theuse of error-correction terms in linear regression. There ishowever an impressive relationship between error correctionmechanisms and cointegration. Granger [37] showed thatif a set of variables is cointegrated, then it has an ECMrepresentation and conversely, an ECM always produces a setof variables that are cointegrated. This means that for variablesthat move stochastically together over time the ECM modelprovides alinear representation which is capable of adequatelyrepresenting the time series properties of the data set.

Such models of cointegration have been very useful inexpanding our understanding of financial time series; never-theless many empirical “anomalies” have remained unexplain-able. This may be due to the linear nature of the models andthe fact that short-run deviations from the equilibrium point areattributed to random shocks. It is entirely plausible that short-run deviations may be (at least partially) attributable to othereconomic factors. For example, if the short term influenceof other factors can disturb the longer term cointegration,we would hope to be able to model the strength of thecointegration as a function of other events occurring in thefinancial markets. For instance, it is entirely plausible tosuggest that a rapid increase in oil prices would tend todecouple two cointegrated markets, especially if one belongedto an oil-producing nation and the other to an oil consumer.Unfortunately, traditional linear models fail to capture this typeof behavior because the sensitivity of the model to a givenfactor is constant and reflected by a single coefficient. In atraditional regression model this coefficient comes to representthe averagestrength of the relationship between asset returnsand a given factor. Subsequently it would be desirable todevelop nonlinear models of cointegration in which short-rundeviation from the equilibrium point areconditionedon otherfactors which may effect the strength of the cointegration innonlinear ways.

In this section we introduce the concept of “conditionalcointegration” and show how this provides both a frameworkfor dealing with nonstationarity and enhancing statistical infer-ence. The methodology consists of four stages. In the first stagewe verify the presence of cointegration in the linear sense.In the second stage we examine if incremental value can be

provided by extending the model to capture any dependenciesof the cointegration on other factors both in the linear andnonlinear sense. In the third stage we apply a nonlinearvariable selection methodology to identify the most significantof the (unknown) factors which influence the strength of thecointegration. Finally in the fourth stage we address the issueof robustness to outliers and influential observations and showthat combining a median estimator with the less robust mean-squared-error estimator, using a simple trading rule, furtherenhances the out-of-sample risk/return performance of thesystem.

B. Experimental Setup—Identifying Cointegration

We examine cointegration among international equity fu-tures indexes mostly drawn from the European markets. Theidea behind cointegration is quite simple: there are marketsand/or economic variables which in the long run share acommon stochastic trend but from which there may be tem-porary divergences. Such markets are called cointegrated ifthe residuals of the regression of one variable on anotherare stationary (mean-reverting). The hypothesis is that theseresiduals, which represent temporary discrepancies in therelative values of the variables, are due to factors outside thecointegration and, in the short run, current events may takepriority over the longer term cointegration.

To demonstrate how cointegration works let us consider theindex of FTSE futures and its relationship with a basket ofother indexes, comprising the U.S. S&P, German Dax, FrenchCac, Dutch Eoe and Swiss SMI. The data are daily closingprices for all the indexes from 6 June 1988 to 17 November1993.

The procedure for identifying cointegration is as follows:firstly we regress thelevel of the FTSE on thelevelsof theother indexes. The coefficients of the regression represent theamounts of each of the other indexes which should be held in aportfolio in order to obtain, on average, the same performanceas the FTSE. Saying that the indexes are cointegrated isequivalent to saying that the markets have an idea of a “fair”level of the FTSE compared to the other indexes and that ifthe FTSE rises above or falls below this level then it will tendto move back toward it; this mean reversion usually takesplace over a longish time period (weeks or months). In astatistical sense we can test for a stable relationship over timeby testing the residuals of this regression for stationarity. Thisis a straightforward procedure using the well-known “unit roottest” [22].

Fig. 9 shows the levels of FTSE plotted together with thelevels of a basket of DAX, CAC, EoE, S&P, and SMI. Theproportions of each of the other indexes is determined by theirbeta in the regression

FTSE DAX EoE S P

SMI (28)

where is random noise.It easy to see that the FTSE is periodically under/over

valued. This becomes clearer when we plot the residualsof the regression which show a clear mean-reversion (seeFig. 10).

Page 20: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1241

Fig. 9. Cointegration between the FTSE and a portfolio of international indexes—in level terms, the six indexes are individually stochastically trendingtogether over time but in the long run the system is brought back to equilibrium.

Fig. 10. Residuals from the cointegrating regression.

We test the residuals for stationarity using the unit-root test.We find that the critical value for the 99% confidence levelof the stationarity test is 3.5 and that the actual value whichwe obtain is 5.2. This value is higher than the critical valueand hence shows that the cointegration effect is statisticallysignificant.

If cointegration were the only factor at work then anysmall discrepancies in prices would rapidly be eliminatedby the mean-reversion effect. In fact this would then beindistinguishable from a conventional “no arbitrage” situationin which price discrepancies can be exploited to generatea riskless profit. However, this is clearly not the case inthis instance where significant mispricings can occur and canpersist for long periods of time. This reflects the fact thateconomic, financial, political, and industry specific factorswill influence the relative prices of the different markets.The cointegration hypothesis suggests that in spite of thesefactors the markets will move together in the long termand that a “mispricing” will cause upward price pressureon the undervalued asset and downward pressure on theovervalued asset, tending to push the prices of the two assetstogether.

The key difference between cointegration and no-arbitragerelationships is that cointegration is a statistical rather thana guaranteed relationship; the fact that prices have movedtogether in the past cansuggestthat they will continue to doso in future but can notguaranteethat this will in fact happen.Because of this cointegration does not offer a riskless profitand is perhaps best considered asstatistical arbitrageto reflectthe similarity to “normal” arbitrage while acknowledging thepresence of uncertainty or risk.

If the short-term influence of other factors can disturb thelonger term cointegration, we would hope to be able to modelthe strength of the cointegration as a function of other eventsoccurring in the financial markets. Table XIII describes a set ofcandidate variables which might explain short-run deviationsfrom the equilibrium point (i.e., the way in which the residuals

fluctuate).For instance, it is entirely plausible to suggest that a rapid

change in oil prices would tend to decouple two cointegratedmarkets, especially if one belonged to an oil-producing nationand the other to an oil consumer. Unfortunately, traditionallinear models fail to capture this type of behavior becausethe sensitivity of the model to a given factor is constant

Page 21: 006

1242 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XIIISUMMARY OF CANDIDATE VARIABLES

and reflected by a single coefficient. In a traditional regres-sion model this coefficient comes to represent theaveragestrength of the relationship between asset returns and a givenfactor.

However, this situation changes when we consider NNmodels. The powerful universal approximation abilities ofNN’s allow them to represent arbitrary nonlinear functions.An important corollary of this is that the sensitivity (or partialderivative) of the model to a given factor can arbitrarily varyas a function of the other inputs to the model. In the caseof cointegration, this would allow us to model a relationshipwhich is strongly cointegrated under certain circumstances,weakly cointegrated under others and perhaps even negativelycointegrated under yet others.

This notion of conditional cointegrationbecomes particu-larly relevant when we consider the increasing efficiency ofthe financial markets. Cointegration effects strong enough toprevail under all market conditions would be relatively easyto detect using linear techniques, and hence would be unlikelyto generate excess profits for very long. On the other hand,more subtleconditionalcointegration effects are less amenableto linear techniques and hence are less likely to have beeneliminated by market activity. For instance, two assets mightexhibit an average cointegration which generates insufficientprofits, but this might be a composite of a strong (profitable)cointegration under certain circumstances and a weak (or evennegative) cointegration under other circumstances. NN’s andother nonlinear methodologies provide a tool for identifyingand exploiting this type of relationship. From a technical per-spective, the concept of conditional cointegration provides aneconomically and financially feasible framework for applyingNN’s to financial modeling and forecasting applications.

C. Modeling Strategy

The modeling strategy described here is applicable to mar-kets and instruments where finance theory and/or marketdynamics may recognize long-run equilibrium relationships.The objective of the procedure is to 1) verify the presenceof cointegration; 2) identify exogenous factors influencing theshort-run dynamics of the cointegration; 3) identify the mostsignificant of those factors; and 4) construct a reliable esti-

mator which is robust to influential observations and leveragedata.

There are many ways of dealing with these issues. It is,for example, entirely acceptable to use the modeling strategydescribed in the previous section of this paper to identify timestructure in the cointegrating residuals, exogenous influences,and nonlinear dependencies. However, with this method thecriteria for variable selection and hypothesis testing are to alarge extent model-dependent. It is often desirable for suchcriteria to be independent of the actual model. Although notalways possible, this is particularly true for models which aresusceptible to overfitting. In this section we shall deploy aslightly different procedure for the purposes of illustratingan alternative way of variable selection (which is modelindependent) and hypothesis testing. The procedure consistsof the following steps.

1) Identifying cointegration : Using multiple linear regres-sion regress the level of the dependent variableagainstthe levels of the independent variables and test theresiduals for stationarity

(29)

(30)

The test for stationarity involves constructing the auxil-iary regression in (30) and rejecting the hypothesis that

This simple method of identifying cointegrationmay not be appropriate in all cases (for example whenthere is no obvious causal relationship it can producebiased coefficients); nevertheless since it is one of thesimplest available we retain its use. For more sophisti-cated tests of identifying cointegration, see, for example,[56] and [46].

2) Identifying exogenous influences: The objective of thistest is to verify if short-run deviations for the equilibriumpoint can be attributed to a vector of exogenous variables

such as for example those described in Table XIII.Although this can be achieved by using the approachdescribed in Section II of this paper, in this step weshall deploy a slightly different procedure. We start byconstructing the following well-specified models of the

Page 22: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1243

cointegrating residuals

(31)

(32)

To test if incremental value can be obtained by themultivariate linear and neural model we evaluate theirperformance on an unseen dataset. For each estimator wetest the hypothesis that its performance is significantlybetter than the auxiliary regression in (30). We alsorepeat the testbetweenthe two estimators in (31) and(32). For any two estimators, this is done using thestatistic

(33)

where and denote the respective mean performanceand and the standard errors of the relevant estima-tors. Themeanperformance can be formulated in termsof correlation between observed and estimated in the testsample. It can also be formulated in terms of economicperformance should the two estimators be used in thecontext of the same trading strategy. The hypothesis isformulated in terms of H0: , in whichcase the statistic should follow a standard normaldistribution, against the alternative H1: .

3) Identifying nonlinear dependencies and interactions:The use of an ANOVA-derived F-test of conditionalexpectations is suggested as an alternative means ofpreliminary variable selection for NN’s. Unlike othermodel-independent methods such as correlation analysis,it is capable of identifying nonlinearities, eitherdirectly,in the relationship between individual explanatory vari-ables and the dependent variable, orindirectly, in theform of interaction effects. Using this approach werefine the models down to a small number of variables.These models both perform better than the original,overspecified, models and also make sense from an eco-nomic viewpoint. Again, the nonlinear model performssubstantially better out-of-sample than the equivalentlinear model.

Median learning and robustness: The use of a mean-absolute-deviation cost function leads to an estimator ofthe conditional median of the dependent variable and assuch is more robust than the conventional estimators ofthe mean which arise from the use of MSE/OLS. In par-ticular, estimators of mean and median will diverge mostnoticeably under the influence of extreme observations andsuch divergence can be taken as an indicator of unreliablepredictions. Using a simple rule for combining forecastswhereby we take a position if the estimators of mean andmedian agree on the sign of the predicted return we showthat the hybrid system produces very similar out-of-samplereturns (approx. 46% over two years) but with substantiallyreduced risk; the Sharpe ratio of the combined system

(representing risk/return profile) is 2.42 compared to the1.78 and 1.72 of the individual neural models.

The modeling strategy described above is not necessarilythe only procedure or indeed the most appropriate modelingstrategy. It is, for example, entirely acceptable to use thestrategy described in the previous section of this paper inSteps 2) and 3) of the procedure. The two methods are notmutually exclusive but they differ primarily in the way inwhich variables are selected for entry into the model. The-test selects variables in a model independent way and in manyrespects it is preferable. However, when testing for interactioneffects, in a large number of dimensions it requires relativelylarge samples which are not always available.

Having applied the first step of our modeling procedure (seeSection III-B) and satisfied the cointegration criterion let usturn our attention to verifying that the cointegration residualsreflect the existence of other factors which effect the marketsin the short-run.

D. Identifying Exogenous Influences

The cointegration residuals reflect the existence of otherfactors which affect the markets in the short run. Changesin the cointegration residual reflect changes in the relativereturn of the FTSE compared to the basket of internationalindexes [the RHS of (28)]. The evaluation of nonlinear andlinear models will be based on a simple trading strategy whichtakes long/short positions on both sides of (28) based on aprediction of whether the return of the FTSE will be higheror lower than that of the basket portfolio over the subsequentten-day forecast period.

There is a slight complication to the modeling procedurebecause the nature of the cointegration regression in itselfinduces a slight element of mean-reversion into the residuals(the insample residuals of a regression are, by construction,unbiased). Also, for implementation purposes, the cointegrat-ing relationship should only be estimated using past data.The solution to both of these problems is to use a movingwindow regression for the cointegrating relationship and thento generate the (out-of-sample) residuals by applying thismodel to future data. In our case we use a window of 200points to estimate the coefficients of the regression [the’s in(28)] and reestimate the relationship every 100 points.

The initial step is to verify the existence of the underlyingcointegrating relationship itself. Rather than performing atraditional “in-sample” test such as those discussed above,our focus is to test the predictive information contained inthe cointegrating residual, as indicated by the generalizationability of the models in the out-of-sample period. We test botha linear model of the form:

(34)

Note that modeling the relative price changes on the LHSof (34) is equivalent to modeling changes in the residualsexcept during periods where the cointegrating regression isreestimated. We then go on to test for direct nonlinearities bybuilding the model which is the nonlinear generalization of

Page 23: 006

1244 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 11. Cumulative profit/loss for the initial linear and nonlinear models.

TABLE XIVOUT OF SAMPLE PERFORMANCE OF UNI-VARIATE MODELS

(34) given by

(35)

The parameters in (34) and the functionin (35) wereestimated by gradient descent in the parameter space usinga feedforward NN. A moving window modeling approachwas deployed using the first 300 observations for training,the next 100 for cross validation/early-stopping and a further100 for out-of-sample testing. The “window” is then shiftedforward 100 observations and the process repeated until atotal of 500 out-of-sample predictions were available. As themodel was built using overlapping daily observations this isapproximately equivalent to a two-year out-of-sample period.

Fig. 11. shows the cumulative returns of the two models inthe out of sample period of 500 observations (transaction costsof eight basis points are included, which is consistent with themodel being executed through the futures markets).

Selected performance metrics for the two models are shownin Table XIV. “Correlation” refers to standard correlationcoefficient of predicted and actual returns, “Correct” representsthe proportion of predictions with the same sign as the actualreturn, “Return” is the profit generated by a simple long–shorttrading strategy over the two-year period and “Sharpe Ratio”is the (annualized) Sharpe ratio of the equity curve whichindicates risk/return profile.

The first thing to notice is that both models show a highlysignificant predictive correlation between actual and forecastedrelative returns. This is also reflected in the positive equity

produced by each of the two models. Secondly, the overallsimilarity of the two models combined with the slightly betterperformance of the linear model, suggests that there are nosignificant nonlinearities in this direct relationship and thatthe slight degradation in the performance of the nonlinearmodel is due to overfitting. For this reason we will use onlythe performance of the linear model as a benchmark for thesubsequent stages.

The next step was to examine multivariate models. Startingwith the set of candidate variables listed in Table XIII, weconsider the following two models that might explain therelative returns on the two assets. First the linear model

(36)

Second, to account for the possibility of 1) unknown non-linear effects between these factors and the residuals and 2)possible nonlinearities accruing from the interaction betweenthese factors, we also consider a nonparametric nonlinearmodel with the same independent variables but of the form

(37)

As before, the function in and the parameter vectorin (37) were estimated by gradient descent in the parameter

Page 24: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1245

Fig. 12. Cumulative profit/loss for the multivariate linear and nonlinear models.

TABLE XVOUT-OF-SAMPLE PERFORMANCE OF MULTIVARIATE MODELS

TABLE XVISTATISTICAL AND ECONOMIC SIGNIFICANCE OF RESULTS

space using a feedforward NN, and the same moving-windowmodeling approach is retained.

Fig. 12 shows the cumulative returns of the two multivariatemodels.

The performance statistics of these two models are summa-rized in Table XV.

The results in Table XV indicate that both multivariate mod-els return a profit during the out-of-sample period. Let us nowtest the hypothesis that the results are significantly differentfor the different models. We do this by performing-tests tocompare both the correlation and the economic performance ofeach pair of models. For instance, in comparing the predictivecorrelation of the multivariate models we us

(38)

The results are summarized in Table XVI.On the whole, the performance of the models is broadly

similar, as indicated by the reltively high-values. The multi-variate linear model is not significantly different from the basicmodel in terms of correlation and actually under-performsin terms of trading performance. Thus any small predic-tive information which might be contained in the additional

variables, in a linear sense, is offset by overfitting. This isconsistent with the view that the markets should be efficientto commonly available data applied within a conventionalmodeling framework. On the other hand, the NN model, whileonly significantly better in the correlation sense at aroundan 80% level of confidence, does improve on the other twomodels from a trading perspective with a confidence level of90% against the basic model and 99% against the multivariatelinear model.

The much improved performance of the NN model over thelinear model suggests the presence of significant nonlinearitieseither in the cointegration itself or in the relationships betweenthe FTSE and the other explanatory variables. Nevertheless, itwas our belief that many of the variables were either redundantand/or insignificant and that the presence of these was morelikely to cause problems due to overfitting. The next phase ofthe study was to perform a variable selection process in orderto obtain a parsimonious model of the cointegration.

E. A Methodology for Nonlinear Variable Selection

Within a linear framework there are a variety of statisticalprocedures which assist in the stages of model identification,

Page 25: 006

1246 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

construction and verification. The model identification stage ofthese techniques is typically based on correlation analysis. Fortime-series modeling for instance there is the well-known andestablished Box–Jenkins framework, [9], which examinationof the autocorrelation function (ACF) allows the modellerto identify both the order and type (autoregressive, movingaverage, or mixed) of the model. Similarly, for multivariatemodeling the most commonly used methodology is that ofstepwise regression, [74], which is likewise based upon theconcepts of correlation and covariance.

Currently there is a lack of an established nonlinear modelidentification procedure. When performing time-series model-ing using NN’s it is still the norm to use correlation analysis asa preliminary variable selection technique. However, correla-tion analysis will in many cases fail to identify the existence ofsignificant nonlinear relationships. This might cause variablesto be erroneously left out of later stages of modeling and resultin poorer models than would be the case had a more powerfulidentification procedure been used.

Both linear regression models and NN’s are commonly fittedby minimizing mean squared error. They provide an estimateof the mean value of the output variable given the currentvalues of the input variable, i.e., aconditional expectation.This insight allows us to realize why measures of dependencewhich have been developed for linear models are not alwayssuitable for nonlinear models.

In a broad sense, a linear relationship exists if and onlyif the conditional expectation eitherconsistently increasesor consistently decreasesas we increase the value of theindependent variable. A nonlinear relationship, however, onlyrequires that the conditional expectationvariesas we increasethe independent variable. The class of functions which wouldthus be missed by a linear measure but identified by a suitablenonlinear one includes symmetric functions, periodic functions(e.g., sinewave), and many others besides.

We propose a measure of the degree to which the conditionalexpectation of the dependent variable varies with a given setof independent variables but which imposes no condition thatthis variation should be of a particular form.

1) Analysis of Variance or ANOVA:The ANOVA tech-nique is a standard statistical technique, usually used foranalysis of categorical independent variables, which divides asample into groups and then tests to see if the differencesbetween the groups are statistically significant. It does thisby comparing the variabilitywithin the individual groups tothe variability betweenthe different groups. First, the totalvariability within the groups is calculated by

SSW (39)

Where is the group to which the th observationbelongs. With groups we lose degrees of freedom inestimating the group means, so we can estimate the varianceof the data by SSW . Second, the variabilitybetweenthe groups is calculated by

SSB (40)

Where is the number of observations in group. Herewe lose one degree of freedom for using the sample mean.Thus we can also use SSB to estimate the true variance of thedata by dividing by .

If the different groups represent samples from the sameunderlying distribution then SSW and SSB are simply depen-dent on the underlying true variance. Adjusting for degreesof freedom, each can be used as an estimate of the varianceand the ratio given in (41) follows an distribution with

degrees of freedom

SSBSSW

(41)

Under the null hypothesis that all groups are drawn fromthe same distribution then the ratio will be below the 10%critical value nine times out of ten and below the 1% criticalvalue 99 times out of 100, etc. However if a pattern existsthen the between group variation will be increased. This willcause a higher -ratio and, if the pattern is sufficiently strong,lead to the rejection of the null hypothesis.

2) Testing a Single Variable:Thus we can perform anANOVA test to establish whether the variance in theconditional expectation of given different values of isstatistically significant.

The first step is to choose the number of groups;typically this would be in the range three–ten. Followingthis, each observation is allocated to the appropriate groupby dividing the continuous range of the original variable into

nonoverlapping regions. For a normally distributed variable,using boundary values which correspond to ofthe cumulative normal distribution will cause the number ineach group to be approximately the same.

For example, let be normally distributed with mean tenand standard deviation five, let , the values for 1/4, 2/4and 3/4 are 0.675, zero, and 0.675. These correspond tovalues of 6.625, ten, and 13.375. Group 1 consists of all thoseobservations for which ; group 2 is those for which

; group 3 those where andgroup 4 those where .

The mean value of thedependentvariable within each groupcan then be computed. Under the null hypothesis that theindependent variable contains no useful information about thedependent variable the -ratio SSB SSWfollows an distribution with degrees offreedom.

3) Testing Sets of Two or More Independent Variables:Wecan also test sets of variables simultaneously. For instance,with two variables the variation between the groups can bebroken down into one component due to the first variable, onecomponent due to the second variable, and a third componentdue to the interaction between the two, i.e.,

SSB SSB SSB

degrees of freedom (42)

Thus we can use this approach to testdirectly for bothpositive interactions, where the two variables together contain

Page 26: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1247

Fig. 13. Cumulative profit/loss for parsimonious models.

more information than separately, and also negative interac-tions, where some of the information is redundant.

F. Ex-Ante Evaluation of the Models

From the original set of 12 variables in Table XIII wechose to retain those variables which are consistently the mostsignificant (i.e., those with the largest-statistic but also thosethat retain their significance when we divide the training setinto two consecutive parts). According to these criteria theselected variables, at the 99% level of confidence, are thefollowing:

1) the cointegration residual (F-ratio 5.94);2) the change in oil price (F-ratio 3.87);3) the change in the sterling index (F-ratio 4.62);4) the volatility in interest rates (F-ratio 2.76 used alone,

but 8.47 allowing for two-way combination with coin-tegration residual).

We then repeated the modeling process using only these fourselected variables. Fig. 13 shows the cumulative returns of theresulting linear and NN models for the out-of-sample period.

As can be seen in Fig. 13, there is a marked improvement onthe net returns for both models (see also Table XVII). This isreflected in the improved correlations and correctness but also,and most importantly, in the Sharpe ratios of the two models.This suggests that the original models were overspecified andsuffered from the problem of overfitting which is particularlydangerous in high-dimensional highly noisy data. Intuitively,the reduced models make economic sense because they sug-gest that uncertainties about short-term interest rates tend toinfluence the cointegration effect (most likely by distractingthe attention of participants from longer term issues such ascointegration). The presence of recent changes in oil pricein the model is perhaps linked to the fact that the U.K.is a net oil producer, whereas the indexes comprising thecointegrating portfolio primarily represent countries which arenet consumers of oil. Similarly the changes in the Sterlingindex reflect changes in the average strength of the U.K.currency against foreign currencies and it is not surprisingthat under certain conditions, this might influence the relativeperformance of the FTSE against international equity markets.

TABLE XVIIPERFORMANCE OF PARSIMONIOUS MODELS

Comparing the performance of the network against thelinear model we find that the appropriate-statistics arecorrelation: 1.02 -value 0.15), and economic: 1.39 (-value0.082). While these improvements are not proof of nonlinear-ities in the statistical sense (i.e., the-values are too high toreject the null hypothesis of equal performance) they representconfidence levels of over 80 and 90%, respectively, that thenonlinear model has captured interaction effects which areignored by the linear model.

G. Combining Standard and Robust Estimators

An important issue when using NN’s to model noisy datasuch as asset returns is that of robustness to influential obser-vations. Influential observations are those (groups of) trainingvectors which although relatively a small proportion of thesample size they have the potential to dominate the character-istics of the fitted function. A main reason for this is the useof the mean-squared-error cost function in training NN’s. Theeffect of using the MSE cost function is to cause the networkto learn the conditional expectation of the dependent variable.It may be possible to improve the robustness properties of ourtrading system by using an NN trained using a mean absolutedeviation (MAD) cost function. As opposed to themean, thiscauses the NN to learn the conditionalmedianof a function.In contrast to the mean, any given point can only exert alimited influence on an estimate of the median. Thus, movingany single observation to infinity will not drag the medianwith it, and even changing the sign of an extreme observationwould only be expected to affect the median very slightly.More detailed intuitive and theoretical treatments of this andrelated issues are included in [74] and [12].

In practice, it is unlikely that we would want to largelyignore magnitude information and rely solely on a prediction

Page 27: 006

1248 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 14. Equity curve for combined strategy.

of the median. In general we might choose to use the twoestimates in conjunction; assuming that under normal circum-stances our estimate of the mean provides useful informationconcerning both direction and magnitude but that we can useour estimate of the median to indicate abnormal circumstances.For instance if the estimates of the mean and median aresubstantially different from each other this suggests that asmall number of influential observations are dominating themodel at the particular query point. This suggests that weshould attach less reliance to the network predictions andconsequently either downweight our position or stay out ofthe market completely. This is the approach which we shalladopt here.

In the first step we train a network using the MAD costfunction and the four previously selected variables. Notably,the out-of-sample performance of this model is roughly com-parable to that of the NN model trained using MSE costfunction. This is not surprising because except under thepresence of extreme observations the mean and median arelikely to be similar and hence we would expect the two modelsto be broadly the same.

The next step is to combine the two models using a simpletrading strategy—namely that a position in the market is onlytaken when the two models are in agreement. If the two modelsproduce predictions which are opposite insign then the systemtakes a neutral position (i.e., stays out of the market). Theequity curve for the combined system is shown in Fig. 14.

The performance statistics for both the median network andthe combined system are shown in Table XVIII

In practice, the model combination strategy causes us totake 379 positions (i.e., we are active in the market almostexactly three quarters of the time). Of these trades 66% arecorrect which is an improvement over either of the individualnetworks. Although the overall profit made is very similarto that of the individual networks, it is achieved with fewertrades and a much improved risk/return profile (as shown in theSharpe ratio). If we adjust the trading strategy for the combinedsystem in such a way as to make the average market exposurematch that of the previous models (i.e., allowing for beingout of the market roughly one-quarter of the time) the Sharpe

ratio improves from 2.07 to 2.42, representing almost a 50%improvement on the best individual models and over a 150%improvement on the benchmark cointegration model given by(34). The -statistic for the trading performance of the finalsystem, versus that of the basic cointegration model is 2.85,which is significant at a confidence level of over 99.5%. Onthis evidence at least, it seems that combining the predictionsof mean and median does provide useful information aboutthe reliability of the predictions, which can be translated intoimproved trading performance.

We have described the concept of conditional cointegrationand showed how it can be used to deal with nonstationarydata and to enhance statistical inference. We have shown thatnonlinear methods can explain short-run deviations from theequilibrium point better than their linear counterparts. The keyfactor contributing to this success is attributable to carefulproblem formulation. This involves the “intelligent use” offinancial economics theory on market dynamics so that theproblem can be formulated in a way which makes the task ofpredictive modeling more likely to succeed. In other words it ismuch easier to model the dynamics of a (stationary)mispricingthan it is to model the dynamics ofprice fluctuations. In thenext section we take this approach one step further and showhow such mispricings can be detected in the term structure ofinterest rates and how they can be exploited, particularly whenwe have noa priori knowledge of the exogenous variablesinvolved. This approach is complementary to variable selectionapproaches described in the previous two sections and it isparticularly useful in cases that there is a high degree ofcolinearity between the independent variables.

IV. Y IELD CURVE ARBITRAGE IN EURODOLLAR FUTURES

A. Overview

To maximize the return on a portfolio of bonds with differentmaturities, it is critical to understand how the price differ-ences between bond of different maturities (i.e., the so-calledyield curve) change over time and what factors drive thosechanges. Recent research using principal components analysisdemonstrates that 90–95% of the yield curve movements are

Page 28: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1249

TABLE XVIIIPERFORMANCE OF MEDIAN AND COMBINED MODELS

(a)

(b)

explained by two uncorellated factors. The first factor accountsfor approximately 90% of the variability, the second factorapproximately 5%. [49]. Furthermore, these two factors, haveapproximately been identified as (unpredictable)parallel shiftsand changes in the slopeof the yield curve which closelyfollow a random walk. Parallel shifts are largely a functionof the long rate, while changes in slope are a function ofthe spreadbetween the long and short rates. By showing thatthe correlation between the long rate and spread is very closeto zero, [67] indicated that they are nearly equivalent to thetwo “unknown” factors postulated by the principal componentsanalysis.

These findings are now applied by major financial institu-tions in every day trading. It is therefore reasonable to assumethat any arbitrage opportunities arising from discrepancies inhow accurately securities prices reflect yield curve changesresulting from these two factors are quickly and efficiently“traded away.” Thus, the yield curve is essentially arbitrage-free with respect to parallel shifts and changes in slope.However, the residual yield curve movements arising from“unexplained” factors may not be equally well understoodby market participants, and there may still exist arbitrageopportunities here. It has been suggested that a third factoris identifiable, and that this “unexplained” factor responsiblefor some 2–5% of yield curve movements is in large partattributable to volatility. Furthermore, several authors suggest

that volatility may be mean reverting. If, therefore, one couldconstruct a portfolio which immunises against the first twofactors, the behavior over time of the third (unknown) factorwould be isolated and analyzed. Moreover, if this factor isindeed mean-reverting one would expect the value of this“immune” portfolio to revert through time. If this reversionis consistent or predictable it would in theory be possible toearn excess returns through arbitrage.3

It is therefore desirable to identify the variables which drivechanges in this unknown factor and to construct predictivemodels of its behavior. Should these changes be attributableto a vector of exogenous variables, this would suggest the useof a methodology similar to that employed in the previoustwo sections of this paper. But, there is a slight complicationin the modeling procedure. In our case it is not clear which,if any, exogenous factors might influence an asset as abstractas the “third factor of the Eurodollar yield curve.” We aretherefore confined to using a purely time series approach.However, past experience in a variety of financial time-series applications has shown that simply using lagged valuesof the price as informative variables may lead to modelswith unstable performance. Another option is to explore thepredictive ability of “technical indicators” such as “moving

3Moreover, if this factor is indeed a proxy for volatility as it has beenargued we should be able to extract information about volatility from theyield curve which can be used in a variety of ways (e.g., pricing options)beyond the scope of this paper.

Page 29: 006

1250 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

averages” and “oscillators,” but since most technical indicatorsare themselves parameterized, this creates the possibility of“overfitting” the technical indicators themselves by choosingparameter values which are predictive during the trainingperiod but which fail to generalize to subsequent data. Instead,we propose an approach whereby we generate a limitedset of “technical indicators” (which are unavoidably highlycorrelated) and then applying a variable selection methodologywhich explicitly removes the correlations within this dataset.The approach is in many respects complementary to thosedescribed in the previous sections and is particularly usefulin cases that there is a high degree of colinearity betweencandidate variables and/or limited sample sizes prohibit theuse of analysis of variance.

In this section we describe a study of statistical yield curvearbitrage among Eurodollar futures. A factor analysis confirmsthat the bulk of changes in Eurodollar futures are accountedfor by unpredictable parallel shifts and tilts in the yield curvewhich closely follow a random walk. However, the third factorcorresponds roughly to a flex in the yield curve and showsevidence of a degree of predictability in the form of mean-reversion. We construct portfolios of Eurodollar futures whichare immunized against the first two factors but exposed to thethird and use both linear and nonlinear techniques to model theexpected return on these portfolios. The third factor is foundto be predictable by nonlinear, but not by linear, techniques.

Using daily Eurodollar futures closing prices between 1987and 1994, we construct a portfolio which is immunized againstparallel shifts and rotations of the yield curve. There are weakindications that the value of this portfolio is mean-reverting,which if correct gives rise to arbitrage opportunities. We use anNN to model the changes in the portfolio value, on a weeklybasis. A significant risk-free return can be obtained by activelyresetting the portfolio at the predicted turning points.

B. Experimental Setup—Term Structure ofInterest Rates and the Yield Curve

Today’s value of a portfolio of risk-free interest rate securi-ties is determined by the market interest rates for each relevantpoint in the future. For example, the value of a portfoliocomprising a five-, ten-, and 15-year zero-coupon bond is afunction of today’s observed five-, ten-, and 15-year spot ratesof interest. The “spot” rate is defined as the return demandedby the market for an investment of a given maturity.

The spot rates depend on current (short-term) spot ratesas well as anticipated future spot rates, which in turn are afunction of market risk premia for different factors and interestrate volatility. If a portfolio contains multiple securities, itsvalue will be a function of the spot rates for all maturitiesincluded in the portfolio, or theterm structureof interest rates.A common way of referring to the term structure is theyieldcurve, which plots spot rates as a function of time to maturity.Thus, when a portfolio contains most or all maturities tradedin the market, its value will be a function of the yield curve.

Fig. 15(a) shows some examples of different maturitiescontracts for the Eurodollar futures. Overall our data consistsof daily high, low, open, and close prices for Eurodollarfutures over the time period August 1987 to July 1994, giving

1760 daily observations in all. The problem of discontinuitiesis avoided by effectively ignoring any data which crosses achangeover point (by setting the “return” to zero for any suchperiod), this is a harsher approach than simply matching thelevels of the spliced series and was intended to reflect thepossibility that not only the price itself but also the dynamics ofthe price changes might be discontinuous across the differentcontracts. The size of each contract is $1 million and at expirythe futures are settled for a cash amount based on the Londoninterbank offer rate (LIBOR) for dollar time deposits of aduration of three months. The prices as quoted in the formof 100—annualized yield, e.g., a price of 91.35 is equivalentto an annualized yield of 8.65%. Each basis point move in thequoted value is equivalent to $25. The margin requirementsare $500 or $250 for a spread (long in one future, short in anequivalent future of a different maturity). The small marginrequirements allow for very high gearing (one million/5002000) and hence the key target of any trading strategy is notprofitability per sebut rather consistency and smoothness ofthe equity curve.

The three different curves shown in Fig. 15(a) depict dailyclosing prices for the shortest maturity (90 days) contract(close 1), a medium maturity contract (close 2; 450 days)and the longest maturity contract (close 3; 720 days). Notethe high correlation in daily movements. This reflects thesimilarity of the underlying assets, a long or short positionin any of these futures is equivalent to a deposit or loan fora given duration and it would be surprising for the pricesto diverge greatly given that the loan durations differ onlyby three-month increments. It is this property which inducesus to search for “statistical arbitrage” opportunities because,although in the short term price anomalies could occur, wewould generally expect them to be rectified in due course.This rectification is not guaranteed and is thus a statistical(or risky) proposition rather than a traditional (or riskless)arbitrage condition. Another way of viewing this is that theassets are cointegrated [38] in some way.

Fig. 15(b) shows four examples of theyield curve; the yieldcurve is constructed at each point in time by joining the pricesof all contracts with term to maturity ranging from less thanthree months at the short end to three years at the long end.Considering cross-sectional “snapshots” of the yield curve onfour different dates we obtain some clues as to the likelystructure of the relationships between different contracts. Theyield curve can be flat, upward-sloping, down-ward sloping,and may also have a “hump” at some maturities. Samples ofthis are clearly shown in Fig. 15(b). The yield curve on 12June, 1988 (marked *) is generally flat. On 13 July, 1987,the yield curve (marked is upwards sloping, etc. The moststriking fact about the curves is simply that they are smoothcurves rather than jagged lines. Contracts tend to have a yieldwhich is similar to that of neighboring contracts. Comparingthe yield curves on the different dates we note that they differprimarily in level but also somewhat in slope and some of thecurves are very smooth while others exhibit some kinks andcurves. Most of the curves are upward sloping reflecting thefact that longer term loans are more risky and hence undernormal circumstances require a higher premium.

Page 30: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1251

(a)

(b)

Fig. 15. Sample futures contracts and yield curves. (a) the closing prices of three contracts of different maturities. (b) Different types of yield curve shapes.

Fig. 16. Scree plot for principal components of changes in the yield curve.

In order to better understand the structure of the relationshipbetween different maturity contracts, let us conduct a factoranalysis on changes in the yield curve using the method ofprincipal components. The scree plot in Fig. 16 shows thepercentage of the total variability which can be attributed toeach factor.

The first factor accounts for almost 90% of the total vari-ability. The second factor accounts for just under 8% ofthe variance. Together, these two factors account for almost

98% of the total variance and thus conceptually our resultsare largely consistent with theoretical two-factor bond-pricingmodels such as those in [67] and [41]. However, our analysisalso suggests the presence of a third factor which accounts foralmost exactly 2% of the total variance. Analysis of the factorloadings (eigenvectors), see Fig. 17, shows that the first factorrepresents a parallel shift in the curve. The second factor cor-responds to a tilt in the yield curve. The loadings for the thirdfactor suggest that it relates to a flex or bend in the yield curve.

Page 31: 006

1252 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 17. Factor loadings (eigenvectors) for first three principal components.

The results of this factor analysis of Eurodollar futures bearsome resemblance to those of Litterman and Scheinkman [49],in which three similar factors were found to be primarilyresponsible for the price changes of different maturities ofU.S. government bonds.

Our expectation is that, due to widespread awareness ofthe theoretical two-factor models, the markets are likely tobe efficient with respect to the first and second of thesefactors, but that some predictable inefficiencies (such as mean-reversion) might exist with respect to the third factor. This canbe tested with a number of tests for “omitted” nonlinearity.Barnett et al. [3] compare a number of such tests one ofwhich uses the approximating ability of NN’s. The modelingstrategy we describe in the next section attempting to capturethe dynamics of this third factor, can itself form the basis ofsuch test.

C. Modeling Strategy

The purpose of our strategy is to identify opportunities forstatistical arbitrage in the sense of predictable price correctionswhich occur to eliminate short-term anomalies or deviationsfrom long-term relationships. Following the results of ourfactor analysis we wish to investigate whether portfolios ofbonds constructed specifically to be exposed to the a factorwould exhibit a degree of predictability, perhaps in the formof mean-reversion effects. In particular, we expect the marketsto be efficient with respect to the first two factors and hencewe concentrate our efforts upon modeling the third factor.

The first step in our modeling procedure is to isolate theeffects of the first two factors. In other words, using the12 available contracts we wish to construct aportfolio whose value we denote byand weigh with weights

in such a way that its sensitivity to parallel shifts and tiltsof the yield curve is set to zero. i.e.,

and find weights such that

PCA PCA(41)

Since the two principal componentsPCA and PCA arelinear combinations of the 12 contracts this is easy to do. In

(a)

(b)

Fig. 18. Immunization to parallel shifts (a) and tilts 9b) of the yield curvewith a simple butterfly portfolio. If the yield curve shifts up by an amountx,the portfolio appreciates by2�x on the outside positions but simultaneouslydepreciates by2�x on the center contract. A similar effect is achieved forthe tilt.

practice however, we do not wish to turnover a portfolio of12 contracts. Due to the high degree of colinearity betweenadjacent contacts, the same effect can be achieved by a muchsmaller portfolio, with three contracts: at the short, middle,and long end of the yield curve (see, for example, [26]). Toavoid liquidity effects at the long-end of the yield curve andvolatility effects at the short end we decided to only consider

and Thus, if we construct a portfolio

(42)

with and we achieve thedesired effect. This is illustrated in Fig. 18. By holding twooutside positions in equal sizes and twice the opposite positionon the center contract we simultaneously immunise againstboth parallel shiftsand tilts of the yield curve.

Any fluctuations in the value of this portfolio depend solelyon changes in the third factor. Fig. 19 depicts the value of theportfolio over time.

The value of the butterfly portfolio appears to be stationary,supporting our original hypothesis that the third factor mightexhibit mean reversion. If this was truly the case then itwould be possible to generate excess profits using a simplelong–short strategy (buy the portfolio if the current value is

Page 32: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1253

Fig. 19. Value (over time) of butterfly portfolio (1� Long on contractc2; 2� Short onc5; 1� Long on c8).

Fig. 20. Value of butterfly portfolio, adjusted for discontinuities.

below average, sell the portfolio when the current value isabove average). However, it turns out that the mean-reversionis simply an artifact of the periodic expiry, and consequentshifting along, of the contracts. Adjusting for discontinuitieswe obtain the price series shown in Fig. 20 (the series has beeninverted so that it is appreciating rather than depreciating invalue):

Clearly, the adjusted series is not mean-reverting, but itdoes appear to exhibit some structured behavior, notably inthe persistence of both upward and downward trends.

The second step in our modeling strategy is to attemptto model the dynamics of this price series and to com-pare neural and linear regression models; however, first it isnecessary to identify suitable input variables. Because it isnot clear which, if any, exogenous factors might influencean asset as abstract as the “third factor of the Eurodollaryield curve” we are confined to taking a purely time seriesmodeling approach. However, past experience in a varietyof financial time-series applications has shown that simplyusing lagged values of the price as informative variables maylead to models with unstable performance. A better optionis to exploit the predictive ability of “technical indicators”such as “moving averages” and “oscillators.” However, mosttechnical indicators are parametric; this instantly creates thepossibility of “overfitting” the technical indicators themselvesby choosing parameter values which are predictive during thetraining period but which fail to generalize to subsequentdata. Instead, we shall adopt the approach of generatinga limited set of candidate variables and then applying avariable selection methodology. This provides a set of 16

candidate variables, all of which are derived from the port-folio price. The candidate input “variables” are shown inTable XVIII.

The next step in our modeling procedure is to construct areliable estimator of weekly changes in portfolio value usingthe variables in Table XVIII. If we denote the portfolio valueat time as then our target variable is simply .It would appear that there at least two possible approachesto the variable selection problem as described in the previoussections. However due to the high degree of correlation andcolinearity within the data set in Table XVIII the techniquesare not ideal in this case. The approach we shall follow isone which explicitly removes the correlations within the dataset—namely principal components analysis. The scree plot forthe principal components analysis of the candidate variablesis shown in Fig. 21.

The PCA indicates that much of the information in the16 variables is actually redundant and that 98% of the totalvariability within the candidate inputs can be represented bythe first six principal components. By simply transforming thedata and using the first six principal components as inputs tothe models we reduce both colinearity and model complexitywhile losing very little information. In fact, by observing thevalues of the principal components over time (see Fig. 22), wenote that the first PC is clearly nonstationary and has capturedthe overall drift of the portfolio.

Interestingly the other five PC’s are all stationary andappear to be derived from higher frequency components ofthe portfolio dynamics. On the basis of this we chose only toinclude the stationary PC’s (numbers 2–6) in our predictive

Page 33: 006

1254 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

Fig. 21. Scree plot for principal components of all 16 candidate variables.

TABLE XVIIICANDIDATE “V ARIABLES” FOR ESTIMATING Pt+5 � Pt

(a)

(b)

models of the third factor in Eurodollar futures. In the nextsection we use regression and neural analysis to model theweekly changes in this factor.

D. Modeling the Third Factor in the Yield Curve

In this section we construct both linear and neural regressionmodels using 252 weeks of data for training and the final 100weeks as a testing period. The results for the regression modelinsample are shown in Table XIX.

TABLE XIXLINEAR REGRESSIONRESULTS (INSAMPLE)

These results indicate that the linear model is weak ifnot nonexistent, Factor 4 with a value of 1.71 is (just)

Page 34: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1255

Fig. 22. The charts show the time series which represent the first six principal components of the 16 original variables (a mixture of technical andtime series variables).

significant at the 90% level while the other variables are notat all significant.

Out-of-sample the linear model behaves as might be ex-pected from the statistics above: It predicts the correct di-rection in 53 out of 100 weeks and the magnitude of thecorrectly predicted moves is four basis points more than themagnitude of incorrectly predicted moves. The total volatilityof the portfolio is 614 basis points so the predictive ability ofthe linear system is clearly insignificant.

To test the hypothesis that changes in this third “unknown”factor might be partially deterministic but in a nonlinear sensewe use a standard multilayer perceptron. The model usestanhactivation functions and shortcut connections from input tooutput layer; the principle being that the shortcut connectionslearn the linear component of the input–output relationship

while the indirect connections (through the hidden layer) learnthe nonlinear component of the input–output relationship. Thenumber of hidden units is specifieda priori as two; thisgives a network with a total of 18 parameters—roughly threetimes the representational capacity of the linear model but stillsatisfying the heuristic condition that we have more than tentimes as many training examples as network parameters (e.g.,[33]).

Due to the relatively small size of the NN its ability tooverfit is severely limited and hence we simply train thenetwork to the point where the error is no longer decreasing.In fact, this occurs quite quickly in around 200 epochs of batchlearning. Then we employ the technique described in [13] toperform a statistical analysis of the estimated neural model.The insample results are shown in Table XX.

Page 35: 006

1256 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XXNEURAL-NETWORK RESULTS (INSAMPLE)

The model has a predictive correlation of almost 0.4 withan adjusted of over 8%; the overall value for the model(with the full set of 18potential parameters) is significantat the 99% level of confidence (NB: this is without makingany adjustments for theeffectivenumber of parameters which,if anything, would increase the significance of the result).The Durbin–Watson statistic is not significantly different fromtwo, indicating that there is no problem with autocorrelatedresiduals. Within the model factors 2, 3, and 6 are significantat over 95%; factor 4 presumably still has a linear effectbut does not make any use of the additional parameters andhence is insignificant overall; only factor 5 appears to lack anypredictive ability. Clearly the dynamics of the third factor inthe yield curve appear to be predictable by nonlinear, but notby linear, techniques.

While the above results are significant in a statistical sense,it is the out-of-sample test which is critical because it indicateswhether the results are stable or whether they are invalidatedby nonstationarities in the underlying process. The networks’performance out-of-sample is very promising: it predicts thecorrect direction in 60 out of 100 weeks and the magnitude ofthe correctly predicted moves is 192 basis points more than themagnitude of incorrectly predicted moves. The total volatilityof the portfolio is 614 basis points so the predictive ability ofthe NN is sufficiently large to be significant.

E. Economic Ex-Ante Evaluation of the Results

In order to evaluate the performance of the NN from afinancial rather than a statistical perspective it is necessary toanalyze the properties of the equity curve which the systemgenerates and to compare this to the buy and hold performance.The performance should also be evaluated with respect to theother factors involved—such as transactions costs and gearingeffects.

Fig. 23 shows the equity curves for the network and thebuy and hold strategy. The buy and hold strategy is tosimply maintain a Long position in the butterfly portfolio(Short contract 2, Long 2*contract 5, Short contract 8). Incontrast, the network strategy alternates between long andshort positions in the butterfly portfolio according to thedirection of the price change predicted by the network.

The network clearly outperforms the buy and hold strategy.During the out-of-sample period (roughly July 1992 to July1994) the network continues to be profitable, making almost200 basis points, while the performance of the buy and holdis almost flat. The added value of the NN is even more

apparent if we look at the chart of out-performance over time[see Fig. 23(b)]. In fact the ratio of average weekly returnto standard deviation of returns is 0.270 in-sample and 0.255out-of-sample; testing against the null hypothesis that the trueaverage return is zero we find values of 4.32 and 2.55,respectively. Thus both in-sample and, more importantly, out-of-sample performance is significantly better than zero at the99% level of confidence.

The results so far are interesting from a theoretical andmodeling perspective, however the true test lies in whetherthey can be used in a practical setting to “beat the market;” dothey provide further evidence against the EMH? This requiresus to calculate the returns which the system can generate infinancial terms, taking into account transactions costs and anappropriate level of gearing. It is only really meaningful tomeasure the performance during the out-of-sample period. Thisconsists of 100 weeks of data from July 1992 to July 1994.First, let us review the various costs and returns involved.The transaction cost is £1 sterling per contract which weapproximate in dollar terms as $1.50. The value of a pricechange in a contract is $25 per basis point. The largest draw-down during theinsampleperiod is approximately 100 basispoints or $2500. we will assume an initial account of $5000 asthis would require a drawdown twice as large as the historicalmaximum in order to wipe out the account.

During the out-of-sample period the system makes totalprofits of 192 basis points $4800; the number of transactionsis 32 caused by changing predictions plus 7*2 caused byswitching contracts in all; each transaction includes buy-ing or selling four contracts (the constituents of the butterflyportfolio) giving a total cost of 46 * 4 *$1.5 $276. Thusduring the out-of-sample period, the system makes profits of$4800, net $276 costs giving net profits of $4524 on an accountof $5000 in a period of just under two years. This equates to arate of return of approximately 47% per year. Note, however,that this figure is sensitive (both upwards and downwards) tothe level of gearing which is assumed.

In summary, the factor analysis of the Eurodollar yield curvefrom August 1987 to July 1994 indicates that the first twofactors, which can be viewed as a shift and a tilt of the curve,respectively, jointly account for just under 98% of the totalprice variability. This is consistent with the two factor bond-pricing models in the theoretical literature. However, a thirdfactor, which the factor loadings identify as a flex of the yieldcurve, is also indicated. This third factor represents almostexactly 2% of the total price variability.

Page 36: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1257

(a)

(b)

Fig. 23. (a) Equity curves for network and holding strategies. The RHS of the vertical line (i.e., the last 100 weeks constitute the out-of sampleperiod). (b) Relative outperformance.

Assuming that the markets are efficient with respect to thepreviously known first two factors, we attempted to modelchanges in the third factor. A butterfly portfolio was con-structed which is immunized against shifts and tilts in theyield curve but exposed to the third flex factor. A time seriesmodeling approach was adopted using technical indicators,such as simple oscillators and moving averages, as informativevariables. In order to avoid over-fitting the parameters of thetechnical indicators while also avoiding excessive intercorre-lations of the variables, a variety of indicators were generatedand subjected to principal components analysis. The first sixfactors were found to account for 98% of the variability inthe original set of 16 variables. Of these the first factor repre-sented the underlying trend of the portfolio and, being clearlynonstationary, was excluded from the subsequent modelingprocedure. The second through fifth principal componentswere used to build both linear and nonlinear predictive modelsof price changes in the butterfly portfolio.

The linear model was found to be of marginal statisticalsignificance at best and this was reflected in an out of sampletest with results which were no better than random. The NNmodel, however, was found to be significant overall with noless than three variables being significant at the 95% levelof confidence. The out-of-sample test showed the NN togenerate consistent profits which were statistically significant

at the 99% level. Translated into cash terms, and allowing fortransactions costs, the out-of-sample performance was foundto result in an average annual return of 47% per year, at afairly conservative level of gearing. This is sufficient evidenceto conclude that the third factor of the Eurodollar yield curveappears to be predictable by nonlinear, but not by lineartechniques.

So far we have used a modeling approach which attemptsto develop estimators for expected returns in the context of(3). We have argued that careful problem formulation is inmany cases a key to success. In the next section we takethis approach one step further and argue that even if securityprice fluctuations are largely unpredictable, it is possible thatinvestor behavior may not be. By using a simple model ofinvestment strategy performance and switching in and out of asimple investment style we show that it is possible to produceexcess returns.

V. META-MODELS OF INVESTOR BEHAVIOR

A. Overview

The target of traditional predictive modeling approacheshave been price fluctuations on “real” assets (e.g., stocks,bonds, derivatives, etc.). These assets can be purchased or soldand their prices can be observed on the market. The motivation

Page 37: 006

1258 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

behind the “metamodeling” we describe in this section is that ifasset prices can not be forecasted, there might nevertheless bea way to predict the performance of particular investment (orfund management) styles and strategies. The basic idea behind“metamodeling” is that an investment strategy (or a tradingrule) can be dealt with in much the same way as an assetbecause it can be seen as a synthetic asset. The return of thissynthetic asset is the profit and loss made by the strategy, whenapplied on an underlying asset or a portfolio of (underlying)assets. Subsequently, metamodeling handles a class of objects(i.e., investment strategies) which are abstracted from the“real” assets. The returns of these strategies are not directlyobservable in the markets, but have to be computed by themodeller and correspond to the equity that would reach avirtual investor if he were using the strategy to take a positionin the underlying “real” assets.

The concept of analyzing investment strategy performanceis not entirely new. It has been used in one form or anotherby several researchers and practitioners. Active investmentstyle management, which is a specific case of metamodelinghas become increasing popular among practitioners (see, forexample, [23]). Of particular relevance to our approach is thework of Palmeret al. [54] and Refenes and Azema [60]. Usinga large scale simulation in an evolutionary framework, Palmeret al. describe a model of a stockmarket in which independentadaptive agents can buy and sell stock on a central market. Theoverall market behavior is therefore an emerging property ofthe agents’ behavior. The simulation starts with a predefinedset of investment strategies in which investors make theirdecisions on the basis of technical and fundamental “analysis.”Through an evolutionary process ofmutation (i.e., randomalteration of the decision rules) andcrossover(i.e., creation ofnew investors by systematic combination of existing ones) thestudy attempts to emulate the way in which markets evolve asa direct result on investor behavior. The analysis also identifiesthe type of variables that successful investors are observing inorder to generate excess return. Typical variables which enterthe decision process are changes in the stock’s fundamentalvalue (measured by an appropriately discounted series ofexpected future dividends), a technical indicator signifyingmean-reversion in prices (measured by an oscillator), and atechnical indicator signifying trends in prices of the underlying(measured by a moving average crossover).

Refenes and Azema [60] describe an approach for dynam-ically selecting trading strategies from a predefined set oftrend-following and mean-reversion based trading strategies.The key idea is to predict (on the basis of past performance)which of two alternative strategies is likely to perform better inthe current context. The expected returns of the two strategiesare conditioned on their relative prediction error. If the errorin one of the two strategies is decreasing while the error inthe other is increasing the metamodel switches strategies. Inmany respects this approach is analogous to a hidden Markovmodel and does make use of other potentially informativevariables that could explain the performance of a particularstrategy. The approaches of Palmeret al. [54], [60] and ourown are not mutually exclusive. Subsequently it would bedesirable to develop nonlinear models of investor behavior

which take advantage not only of the time structure (e.g.,cyclicity) and historical performance in a trading strategy butalso make use of exogenous variables. It is entirely plausiblethat the expected returns of a particular investment style maybe conditioned not only on a single measure of fundamentalvalue as suggested by [54] or historical performance alone assuggested by Refenes and Azema [60] but also on the currenteconomic environment (e.g., interest rates) as well as marketexpectations (e.g., estimated growth of earnings per share), orthey may vary according to the country or sector in which theinvestment style is applied.

In this section we take these approaches one step further bycombining results from [54] and [60] and introducing othervariables suggested in the literature. We utilize a stepwiseprocedure of modeling investment strategy returns whichbuilds upon these results and also the results described inearlier sections on model identification and variable selection.Within this framework, we construct a metamodel of investorbehavior using a dataset composed of financial data on 90French Stocks drawn from the SBF250 index. We comparetwo metamodels of investment style performance: a linearregression and a neural model. Both models provide smallbut consistent incremental value with the NN outperformingits linear counterpart by a factor of two. This suggests thepresence of nonlinearities probably induced by the interac-tions between some of the technical, economic, and financialvariables and which are undetectable by standard econometricanalysis.

B. Experimental Setup—Modeling Investor Behavior

One of the dominant themes in the academic literaturesince the 1960’s has been the concept of an efficient capitalmarket [27]. According to the EMH, security prices fullyreflect all available information and therefore can not beforecasted. However, if asset prices are unpredictable, investorbehavior might not be. In other words, if asset prices can notbe forecasted, there might nevertheless be a way to modelthe performance of fund management styles and investmentstrategies. In that context, metamodeling is useful because itmodels an element of investor behavior.

From a more practical point of view, metamodeling providesa method of quantifying the conditions under which a certaininvestment strategy should be used. Let us consider for exam-ple a commonly used investment strategy based onscreening,whereby the professional fund manager dynamically maintainsa portfolio of “value stocks” i.e., composed of say the 10%of stocks having the lowest price/earning (PE) ratio. Theinvestment strategy can be seen as a synthetic asset that has adynamic composition but a single andconstantcharacteristic(e.g., “low PE”). In contrast, a stock with a low PE ratio has afixed composition but no constant and single characteristic. Itencompasses a collection of aspects: for example,Air Liquidehas a large capitalization, it is also a blue chip stock, haslow volatility, high PE, average dividend yield. Modelingsuch a stock in isolation would give very little informationabout the behavior of high/low PE stocks, but modeling theperformance of the fund manager who does have a constant

Page 38: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1259

Fig. 24. Asset price and P&L of a trend following strategy. The P&L can be seen as a cyclicity indicator: Negative values correspond to sidewaysmovements, positive values to trending periods.

characteristic might be more predictable. It is entirely plausiblethat the market may reward low PE stocks under certaineconomic conditions but not under others. Variations in theperformance of such manager may therefore be attributable tofactors which are exogenous to his decision making. If it werepossible to develop a predictive model of the performanceof the investment style, then excess returns could be madeby buying the portfolio recommended by the strategy whenthe prediction is positive, and selling the portfolio when itsexpected return is negative.

In this section, we describe a methodology for developingsuch predictive models. The modeling procedure consistsof two steps. The objective in the first step is to definea universe of commonly used investment strategies and toquantify their expected returns. Since these returns are notdirectly observable in the market we need to compute themfrom historical data; they correspond to the equity that wouldreach a notional investor if were using the strategy(ies) totake a position on the underlying assets. This is easily doneby simulating the notional investor. The second step consistsof building a metamodel of these expected returns which areconditioned on a number of variables which are exogenous tothe investment style and selecting those variables that are themost significant (in the statistical sense) in explaining varia-tions of the investment styles’ performance. By treating theseinvestment styles as synthetic assets it is easy to see how theycan be used in the framework of active portfolio managementas described in (3). However, since the number of possibleinvestment styles is practically unlimited and the purpose ofthis section is to describe the methodology, we shall focusour study on the second step of the procedure. To facilitateclarity we shall, without loss of generality, restrict our analysisto a single (and rather simple) investment style, whereby thenotional fund manager invests in a set of underlying assetsby utilizing a cyclicity indicator. There are several indicatorsthat can be used to measure cyclicity such as the varianceratio [50], drift-stationary tests inspired by the unit root test[22] and various measures based on spectral analysis. Thesemeasures suffer from two limitations. First the amount of datarequired to measure cyclicity is prohibitively large. Second,

these measures are sometimes only remotely related to theequity that would reach an investor with a trading strategybased upon them.

In this study we shall adopt a simple measure of cyclicitywhich is directly related to the equity that would reach aninvestor if he were using the spread between two movingaverages to (i.e., moving average crossover) to trade theunderlying (see [5] for a rationalization). To do so he usesa simple decision rule whereby given a time series of priceson the underlying asset he computes two movingaverages for the th point. If the short moving averageintersects the long moving average from below he purchasesthe underlying and conversely if the short moving averageintersects from above he (short) sells the underlying.

Fig. 24 shows the prices on the underlying asset over 300days (single line), as well as the weekly returns (i.e., profitand loss) of the notional investor (shaded area). This simpleinvestment style performs quite well when the underlying assetis in a state of trend but rather poorly when the underlyingis moving sideways. Notably, there are persistent periods oftrending followed by persistent periods of mean reversion. Thisis reflected in the cyclicity of the manager’s expected return(shaded area).

In an earlier paper [60] described a technique for alternatingbetween investment strategies as the market changes from atrending into an oscillatory behavior. The key idea is analogousto a hidden Markov process, whereby the underlying asset ismodeled as the composite of two distinct states. The decisionto switch over strategies is made on the basis of monitoring theerror (i.e., profit/loss) that each state model is producing andchoosing the one in which the error is decreasing. There aretwo main disadvantages of the approach: first the measures ofprofitability may not be sufficiently responsive. For example,the underlying might have entered a trending period for sometime already, while the measure might still indicate somemean reversion due to the delays in constructing the averages.Essentially the approach is one of error tracking rather thanpredictive modeling.

The second reason is more important and relates to theinability of the approach to make use of exogenous variables

Page 39: 006

1260 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

that might be useful in explaining variations in the perfor-mance of the investment style. For example, it is entirelyplausible that trending periods are induced by extreme movesin interest rates, while sideways moves may be induced byfrequent revisions on earnings expectations for the underlyingby financial analysts. It is therefore desirable to developpredictive models of investment style performance whichmake use of such exogenous variables.

We examine monthly variations in the performance ofthe investment style applied to a universe of 90 FrenchStocks drawn from the SBF250 stock index. The datasetcovers the period from January 1988 to January 1995. Weselect our universe of exogenous variables on the basis oftheir availability to the investor at a specific point in timeand the variable’s relevance to explaining variations in theinvestor’s performance.4 We start with set of 16 variables(see Table XXI) that could potentially have an influence inexplaining variations in a fund managers performance. Wethen select those that are significant in explaining variationsin the performance of this particular investment style througha simple procedure described in Section V-C.

Among others Eltonet al. [24] found that the performanceof a stock might be influenced by itsgrowth characteristics.High growth stocks might move in trends more often thanlow growth ones. Thegrowth characteristicsof the underlyingmight in turn influence the performance of the investmentstyle. These are captured by five variables (see Table XXI).The EPS growth rateis the growth rate based on a leastsquares calculation of moving four quarters actual earning ofthe latest 20 quarters. Thestability of actual EPS(i.e., thestability in detrended earnings) is likely to be a key factor andit is computed as the mean absolute difference between actualEPS and the trend line in EPS growth (previous variable) forthe past 20 quarters. A large increase or decrease inthe marketsexpectations for earnings growthmight trigger a trend. Thiseffect is captured by two variables:current year’s expectedpercentage growth in EPSi.e., ([E(EPS(t))/EPS *100) based on the analysts’ estimates for this years earningsand EPS(t) is the actual earnings in the last reported fiscalyear. Likewise, theone year expectation in percentage growthof EPS, i.e., ([E(EPS(t))/EPS * 100) is also basedon analysts’ expectations.

It is possible that extreme movements in a stock’sex-pansivenessmight provide useful information in triggeringtrending periods. Several authors have linked the performanceof a stock to financial ratios. For example, Fama and French[28] found that the book/market value of equity ratio andfirm size, measured by the market value of equity emergedas variables that have strong relations with the cross sectionof average stock returns during the 1963–1990 period. Fivevariables are used to measure expansiveness: earnings yieldboth in absolute terms and relative to the market as a wholeand the interaction between earnings yield and earnings growthas a single variableearnings\_yield* earnings\_growth, and

4Some of the variables are computed from data provided by Datastream,others are computed from data provided by I/B/E/S (Institutional BrokersEstimate System).

the commonly used ratioscash\_flow/price, dividend-yield,price/book\_value.

To capture any nonlinear effects that are due to the size of astock itsmarket capitalizationis used as an additional variable.Finally the following two variables are used to capture anymean-reversion effects that frequent revisions of expectedearnings might induce:the percentage of EPS estimates raisedand thepercentage of EPS estimates loweredover the past fourmonths. They are computed from data provided by I/B/E/S.

It has also been argued that the expected returns of aparticular investment strategy might be conditioned on the cur-rent economic environment or the may vary according to thecountry or sector within the investment style is applied [23].To capture any effects due to the economic environment weuse two variables: the yield on the ten-year French governmentbond is used to capturelong interest rateeffects while the threemonth interest rates are use to capture any dependencies onthe short rate. We use four dummy variables to specify thesector in which a stock belongs:cyclical is a dummy variableencoded as one for stocks in the chemical, building materials,autos and equipment, and basic industries and encoded as zeroelsewhere;noncyclical is similarly encoded as one for stocksin the food, household products, retailer and utility industriesand zero elsewhere;growth is encoded as one for stocksin the cosmetics, drugs, electronics, entertainment, hotelsand restaurants, computers, and publishing; andfinancial isencoded as one for financial institutions and zero elsewhere.To account for country effects we use theleading indicatorfor France as published by the OECD over the past two years.

It is, however, possible that further exogenous factors maybe at work such as for example financial gearing, exposureto international competition, etc. which are not accountedfor in the set of variables above. Indeed, the debate as towhich variables have the strongest influence on explainingstock returns is still ranging in the finance literature (see, forexample, [2], [6], and [28], among others). In our case it is notfeasible to include all possible variables in our model purelyfor reasons of data availability, but to account for any residualeffects we shall use lagged values of the target variable.Pastcyclicity is simply and lagged passedcyclicity is We shall also includethe relative difference between two moving averages. A largevalue (in absolute terms) for this difference means that thestock price has moved strongly in one direction. It is possiblethat this might trigger trending periods.

The dependant variable is the (marked-to-market) profit aninvestor would make by using a trend following strategy onthe relative price of each stock for the 12 coming months. Thecross over of two MA constitutes the buy or sell signal. A longposition is taken in the stock after a buy signal, a short positionis taken after a sell signal. The lengths of the MA are threeand 12 months, respectively. All variable values are expressedin prices which are relative-to-market. The reason for this isbecause the procedure is evaluated against a benchmark, i.e.,the market. Besides, positions on the market as a whole caneasily be taken through future contracts. Finally, much of thenoise in individual stock prices are due to the market itself.Taking relative prices filters this noise out.

Page 40: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1261

TABLE XXISUMMARY OF ALL VARIABLES

The fact that the performance of an investment style isnot constant over time has two possible explanations. Thefirst is randomness. After all, the strategy’s performance isa result of the fluctuations in the underlying stock prices.If these fluctuations are random, there is no reason whyinvestment performance should not be random. However, itis also possible that under certain conditions, stock prices aremore probable to be in a certain state than in the alternativeone. For instance, it is entirely plausible that stocks of com-panies that are highly exposed to international competition aremoving in trends for extreme exchange rates between theirdomestic currency and the dollar. Another example is providedby companies for which earning prospects are difficult toforecast and are frequently revised by financial analysts. Theirstock prices are more likely to move sideways. This notionthat cyclicity in the performance of an investment strategy isinfluenced by exogenous variables, rather than merely causedby the randomness of the stock price, suggests that a modelof a strategy’s performance could be built. The existence ofsuch a model could then reject the first explanation, i.e., thatinvestment strategy performance is a random series.

Unfortunately, traditional linear models are only weak testsof this hypothesis. Failure to find any linear relationshipbetween cyclicity and potential explanatory variables does notimply that there is no relationship at all. Trending periodsmight well be triggered by a combination of factors. These in-teraction effects cannot be explored with a linear model unlessthey are explicitly taken into account by a composite variable.However, such explicit modeling is only possible if strongapriori knowledge is available. Often in financial forecastingthis is not the case. Besides, extensive search of interactionsis merely impossible, especially when the number of potentialexplanatory variables is large. NN models do not suffer fromsuch insufficiencies. Their powerful universal approximationabilities allow them to model nonlinear functions, in particularconditional (“if”) relationships. In other words, the influenceof some variables in the model can vary arbitrarily and it could

be a function of other variables in the model. Therefore, NNmodels constitute far stronger tests for rejecting relationshipsthan linear models but due to overfitting, they can easily bemisleading in finding relationships that reflect a temporarymarket anomaly.

Furthermore this experimental setup gives us the opportu-nity to test several prevailing beliefs among some investors.For instance, we investigate whether using constantly a trendfollowing strategy is profitable. We also test whether pastperformance of a strategy is related to future performance. Inthis framework of market efficiency, the concept of conditionalcyclicity becomes particularly relevant. Constant stock cyclic-ity would be relatively easy to detect by traditional methods,and therefore would be rapidly discounted in the market. Incontrast, conditional cyclicity is more difficult to model andtherefore might provide an economically feasible frameworkfor applying nonlinear methods such as NN’s to financialmodeling.

C. Modeling Strategy

Our modeling strategy consists of three basic steps. The firststep is essentially a data preparation step for the independentvariable. The second step attempts to identify time structure ininvestment strategy returns while in the third step we attemptto obtain incremental value by testing the hypothesis that thesereturns are conditioned on exogenous variables.

1) Simulating investment strategies: Unlike prices ofreal assets (e.g., stocks, bonds, currencies, derivatives,commodities, etc.), returns from an investment strategyare not directly observable on the markets, and haveto be calculated by a simulation procedure. In order togenerate the time series for the profit and loss of thestrategy, it is necessary to use historical data to simulatethe investment strategy and obtain its marked-to-marketreturns

(investment\_style, asset (43)

Page 41: 006

1262 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

where is a function of the particular investment styleapplied to a particular (group of) underlying asset(s). Inthe case that is explicitly quoted this step can beomitted by using publicly available data. In the studypresented here, for each stock in the index, we needto generate the series of returns that would reach theinvestor had he applied it to each stock. The procedure,

, for doing so is given in Section V-D.2) Identifying time structure in investment strategy re-

turns: Using univariate time series analysis investigatethe time structure of the returns and identify significantlags

(44)

(45)

In principle, the Box–Jenkins identification procedurecan be used to identify time structure for the linearmodel. In practice, however, because the returns areoften serially correlated one has to be cautious whenfitting ARIMA models. In Section V-E, we describe asimplead hocprocedure for overcoming this difficulty.

3) Identifying exogenous influences: Construct well-specified models of conditional cyclicity for each ofthe investment styles/instances under consideration

(46)

(47)

where is the vector of exogenous variables defined inTable XXI and is the vector of network parameters.In principle, this can be done using any of the variableselection methodologies described in the previous sec-tions, but in this application the significant exogenousvariables are selected using a combination of linear andnonlinear tests as described in Section V-F to overcomesome special difficulties arising from serially correlateddata.

The estimates of expected returns given by the meta-models in (46) and (47) can be used in the context of(3) in the obvious way. For simplicity, however, in thissection we shall use a simple asset allocation strategywhereby we purchase the investment strategy if ourestimate of future return is positive and sellthe strategy otherwise.

Because some of the variable distributions have heavy tailsand consequential outliers, the estimated parameters mightbe strongly influenced by a small number of observations.There are many ways to deal with such influential observationsincluding robust estimators as described in the Section III.In this section, however, we shall use a procedure based onranking as an alternative. Therefore, we shall also rank the data

in each month and estimate the models on this ranked data.At each point in time instead of using the value of thedirectly we use it to rank the underlying assets with respect tothe of the investment strategy. We then regress this rankagainst the similarly computed ranks of underlying assets withrespect to the independent variables. i.e.,

Rank

Rank

Rank (48)

where is the exogenous variable. The variables areselected if they are significant in both rank-based and non-ranked models. We apply both a multiple regression as wellas a neural model and compare the results to investigatenonlinearities in the relationship between cyclicity and itsexplanatory variables.

D. Simulating Investment Strategies

In order to generate the time series of the profit and loss ofthe strategy, a number of steps are necessary. First, we generatethe sell/buy signals recommended by the strategy for each assetat each point in time. These signals, , are defined by

if MA MAif MA MAif MA MA

(49)

where MA being the price ofthe asset at time being the length of the moving average inmonths. From this we calculate the marked-to-market returns

from the strategy

(50)

Finally, we calculate the profit and loss P and Lof thestrategy over the last 12 months

P L (51)

In the first step of our modeling procedure we investigatewhether the constant use of this strategy produces excess re-turns. Indeed, if we apply this simple trend-following strategyto a cross section of stocks drawn from the SBF250 it yieldssmall but significant excess return. The annual average excessprofit after transaction costs (1% per transaction) is 2.4%. Thestandard error for this average is 1.1%. The average return istherefore different from zero at a 90.2% significance level overthis period. The objective of our metamodeling strategy is toimprove on this performance by switching in and out of theinvestment strategy based on the predictions of our models.We start by building a simple univariate metamodel of theperformance of this investment strategy.

E. Identifying Time Structure in Investment Strategy Returns

In principle, the Box–Jenkins methodology can be usedto estimate the time structure of the Investment strategyreturns. In practice, however, because the P&L series areserially correlated by construction, one has to be cautious

Page 42: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1263

TABLE XXIIF - AND t-VALUES FOR VARIABLES APPLIED TO RANKED AND UNRANKED DATA

when fitting ARIMA models. The problem arises from thefact that although we have 4800 observations, due to the 12-month overlap the actualindependentobservations are at best400. Since we have 90 underlying assets, effectively thereare less than five independent observations per stock. Thisis not suitable for traditional ARIMA fitting. Instead, we shalluse cross sectional data and regress the future performanceof the strategy to only a limited number of lags chosen forpractical reasons not to exceed two. Since, due to the overlaps,we cannot use the in-sample and -statistics5 in orderto evaluate the predictive ability of the models and/or thesignificance of the parameters a we chose a simple alternativebased on out-of-sample testing. The models

P L P L P L

P L P L P L (52)

with being a function approximated by a neural learning,and is the vector of parameters, i.e., weights; are estimatedon 80% of the dataset and are then tested on the remaining20%. For the neural model, a further validation set (10% of thein-sample observations, randomly chosen) is used to controlthe complexity of the model through early stopping. The out-of-sample correlations between actual and predicted returnsare: 0.02 and 0.003 for the regression and neural models,respectively. The frequency of change in sign are on averageone every two years for both models. Therefore the additionalprofit obtained by switching on and off the strategy on thebasis of its predicted performance is1.37% and 0.94%for the regression and neural model, respectively. The fundingcosts for switching in and out of the strategy make this activestrategy management approach less profitable than the passive(i.e., buy-and-hold) investment strategy. Apparently neither ofthe two models seems to be profitable. However, it is entirelyplausible that the expected returns of this investment strategymay be conditioned not only on historical performance alone,but also on exogenous influences attributable to the currenteconomic and market environment. In the next section weattempt to verify this possibility.

5It is actually possible to use these metrics but they need to be adjusted forthe overlap if the extend of the overlap was clear.

F. Identifying Exogenous Influences

We are interested in assessing the degree to which theexogenous variables in Table XXI have a significant influencein explaining investment strategy returns, and whether byusing these variables we obtain incremental value over andabove the passive (unconditional buy and hold) strategy andthe time series model. From the possible methods of variableselection described in earlier sections of this paper ANOVAappears to be the most suitable in this application. The mainreason for this is that our interest lies in identifying theconditions under which to purchase or sell the investmentstrategy. This is likely to occur due to multiplicative effects(i.e., interactions) between these variables rather than simpleadditive contributions.

The -statistics for the 11 most significant variables areshown in Table XXII. According to the -statistics, threevariables seem to have the strongest effect on cyclicity: thedividend yield (relative to the market), the estimated growthin earnings for the current year (Fiscal year zero to fiscalyear one) as well as the spread between two moving averages.The -statistics also show linear or additive effects. Becauseranked data are not normally distributed but uniformly, the-values significance levels are different than those for normallydistributed data (see-values). Given the large amount ofdata points (4800), even small-values are significant (threeis significant at a 99% level). However, if we take intoaccount the overlap in the data (and adjust thefor it),most variables are unlikely to pass the significance test at aconfidence level higher than 80%. As far asdirect dependencyis concerned only those with an higher than ten areinfluential. However, because some of these variables interactwith each other it is desirable to retain them in the model.These nonlinear interactions are illustrated using sensitivityanalysis (Section V-I) but first let us examine if incrementalvalue can be obtained (both in the statistical and economicsense) by these metamodels.

G. Statistical Evaluation

Because the data contains both time series and cross sectionsof stocks, we split the datasets in two ways. First, in order totest how well the models generalize tounseen periodsa firstdataset is created by splitting the database according to time,

Page 43: 006

1264 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

TABLE XXIIISTATISTICAL EVALUATION OF LINEAR REGRESSION AND NEURAL MODELS

Fig. 25. Economic performances for the different models with fixed and proportional positions. Transaction costs of 1% are used.

i.e., the period January 1989 to December 1992 is used tocalibrate the regression, whereas January 1993 to December1994 is used for testing. Second, in order to test how well themodels generalize tounseen stocksa second dataset is createdby splitting the database according to stock names i.e., thefirst 72 stocks (in alphabetical order) are used for calibration,whereas the next 18 are used for testing. For the NN models asmall cross validation set is used (10% of the estimation set)are randomly chosen for this purpose. Table XXIII shows theperformance of the models in terms of percentage of varianceexplained

The NN model seems to outperform the linear regressionmodel, both on unseen stocks, and on unseen periods, suggest-ing the presence of nonlinear relationships in the dataset. Thereis a notable difference between the estimation (in-sample)and out of sample performance for both models suggestingnonstationarity. This difference is even more pronounced forthe NN which is expected since it uses a smaller estimationset.

The incremental value provided by the meta models issmall in statistical terms. However, the expected gain from thestrategy does not need to be large in order to ensure excessreturns. The strategy need only provide a positive expectationnet of funding (transaction costs). If we apply the strategya large number of times (or alternatively to a large numberof assets) the return should be positive and the risk low. Inthe next section we evaluate the economic performance of themetamodels.

H. Economic Results from Linear and NonlLinear Models

The statistical performance of the model has an interestwhen analyzing the nature of the relationships between thedifferent variables. However, it does not provide a measure ofhow useful the model is in term of prediction. Some modelscan have a very good explanatory power, but appear to beunable to generate profits. Transaction costs are often thereason for this as well as outliers in the data.

Two asset allocation rules will be applied to the models pre-dictions. The first rule is based on the sign of the metamodels’prediction i.e., if the is positive we buy the strategy, and if theexpectation is negative we sell the strategy in equal amounts.The second rule consists of weighting the position proportionalto the magnitude of the expected return. The average sizesof the position taken with the two rules being different, it isimportant to adjust (i.e., scale down) individual positions sothat the aggregate economic performances of the two rulescan be compared fairly.

Fig. 25 plots the cumulative excess returns for the twomodels with both allocation rules. The passive (i.e., buy andhold) strategy is also shown (solid line). Transaction costs of1% are taken into account.

As shown in Fig. 25, trading rules based on positions thatare proportional to the prediction of the models significantlyoutperform the trading rules based on fixed size positions. Thissuggests that the models have picked up the amplitude of theprofits and losses of the MA strategy, as well as their signs.

I. Sensitivity Analysis

The difference in performance between linear and nonlinearmodels suggests that some nonlinearities exist in the dataset.However, the -statistics in Table XXII only show minornonlineardependencies. These nonlinearities are probably dueto interactionsand conditional relationships. To illustrate thispoint, let us analyze the relationship between the expectedcyclicity and the estimated growth in earnings per share(provided by I/B/E/S and measured relative to the market).This relationship is plotted in Fig. 26 for two different marketcapitalization: the bold line represent market capitalization thatare 50% higher than the market average, the dotted line corre-sponds to capitalizations that are 50% smaller then the marketaverage. Because the estimated function is multidimensional,we can only plot a cross section of it. The relationship inFig. 26 is obtained when setting all the other variables to theirmean value.

Page 44: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1265

Fig. 26. Conditional relationship between cyclicity and the growth in EPS, relative to the market, for two different market capitalizations. The otherfactors are set to their mean value.

As shown in Fig. 26, the effect of estimated growth in EPSis different for large and small capitalization stocks. For largestocks, trends in relative prices tend to occur when EPS growthis extreme. On the other hand, when earning growth is in linewith the market’s earning growth, prices tend to oscillate. Forsmall stocks, only small or negative earning growth engendertrending (probably downwards) prices.

Unfortunately, because of the large number of variables, it isdifficult to visualise or analyze exhaustively all the interactioneffects. Many relationships are probably conditional to morethan one other variables.

VI. CONCLUSIONS AND FURTHER RESEARCH

We described a collection of neural applications in optionspricing, cointegration, the term structure of interest rates andmodels of investor behavior which highlight some key method-ological weaknesses of NN’s including: testing the statisticalsignificance of the input variables, testing for misspecifiedmodels, dealing with nonstationary data, handling leveragesin the datasets and generally formulating the problem in away which is more amenable to predictive modeling.

We proposed and evaluated a number of solutions. Wedescribed a number of ways for principled variable selec-tion including 1) a stepwise procedure building upon theBox–Jenkins methodology; 2) analysis of variance; and 3)regularization. We showed how model misspecification testscan be used to ensure against models which make systematicerror. We described how the principle of cointegration can beused to deal with nonstationary data and generally describedways of formulating the problem in a manner that makespredictive modeling more likely to succeed.

The problems and solutions presented and discussed inthis paper represent only the tip of an iceberg. Some of thesolutions, although effective in practise, still require morerigorous statistical foundations. For example, the stepwiseprocedure in Section II is still somewhatad hoc. Ideally, onewould wish to bypass the Box–Jenkins phase all togetherbut this is not possible without having a distribution for the

partial derivatives of the function in (9) or at least for thestatistic in (10). The ANOVA methodology on the other hand,is model independent and does not assume causality. Manyother research issues, particularly model misspecification anddistribution theories remain unexplored, but they are essentialif NN’s are to become commonplace in financial econometrics.

ACKNOWLEDGMENT

The authors would like to thank D. Bunn for motivatingthis research and his continuous technical contributions atvarious stages in the development of the methodologies. F.Miranda-Gonzales carried out the initial statistical analysis ofimplied volatility. The authors would also like to thank J.-F.De Laulanie from Societe Generale Asset Management for hisuseful comments regarding the metamodels for asset selection

REFERENCES

[1] C. Alexander and A. Johnson, “Dynamic links,”RISK, vol. 7, no. 2,1994.

[2] W. C. Barbee, S. Mukherji, and G. Raines, “Do sales-price and debt-equity explain stock returns better than book-market and firm size?,”Financial Analysts J., pp. 56–60, Mar./Apr. 1996.

[3] W. A. Barnett, A. R. Gallant, M. J. Hinich, J. A. Jungeiles, D. T.Kaplan, and M. J. Jensen, “An experimental design to compare testsof nonlinearity and chaos,” inNonlinear Dynamics and Economics, W.A. Barnettet al., Eds. Cambridge, U.K.: Cambridge Univ. Press, pp.163–190, 1996.

[4] S. Beckers, “Standard deviations in option prices as predictors of futurestock price variability,”J. Banking and Finance, vol. 5, pp. 363–382.

[5] Y. Bentz, A. N. Refenes, and J-F. Laulanie, “Modeling the performanceof investment strategies: Concepts, tools and examples,” inNeural Net-works in Financial Engineering,A-P. N. Refeneset al., Eds. London:World, 1996, pp. 241–258.

[6] L. C. Bhandari, “Debt/equity ratio and expected common stock returns:Empirical evidence,”J. Finance, vol. 43, no. 2, pp. 507–528, June 1988.

[7] F. Black, “The pricing of commodity contracts,”J. Financial Economics,vol. 3, pp. 167–179, 1976.

[8] F. Black and M. Scholes, “The pricing of corporate liabilities,”J.Political Economy, vol. 81, pp. 637–659, May 1973.

[9] G. E. P. Box and G. M. Jenkins,Time Series Analysis, Forecasting, andControl. San Francisco, CA: Holden-Day, 1970.

[10] L. Breiman and J. Friedman, “Estimation optimal transformation formultiple regression and correlation,”J. Amer. Statist. Assoc., vol. 80,pp. 580–619, 1985.

Page 45: 006

1266 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 8, NO. 6, NOVEMBER 1997

[11] L. Breiman, J. Friedman, R. Olshen, and C. J. Stone,Classification andregression trees, Belmont, CA: Wadsworth, 1984.

[12] A. N. Burgess, “Robust financial modeling by combining neural-networkestimators of mean and median,” inProc. Appl. Decision Technol.,UNICOM Seminars, U.K., 1995.

[13] , “Nonlinear model identification and statistical significance testsand their application to financial modeling,” inArtificial Neural Net-works, Inst. Elect. Eng. Conf., June 26–25, 1995, pp. 312–217.

[14] A. N. Burgess and A-P. N. Refenes, “A principled approach to neural-network modeling of financial time series,” inProc. IEEE ICNN’95,Perth, Australia, Nov. 1995.

[15] L. Canina and S. Figlewski, “The informational content of impliedvolatility,” Rev. Financial Studies, vol. 6, no. 3, pp. 659–681.

[16] L. Capaal, I. Rowley, and W. F. Sharpe, “International value and growthstock returns,”Financial Analysts J., vol. 49, pp. 27–36, 1993.

[17] D. P. Chiras and S. Manaster, “The information content of option pricesand a test of market efficiency,”J. Financial Economics, vol. 6, pp.234–256, 1978.

[18] J. Y. Choi and K. Shastri, “Bid-ask spreads and volatility estimates: Theimplication for option pricing,”J. Banking and Finance, 1987.

[19] J. H. E. Davidson, D. F. Hendry, F. Srba, and S. Yeo, “Econometricmodeling of the aggregate time-series relationship between consumers,expenditure and income in the United Kingdom,”Economic J., vol. 88,pp. 661–692, 1978.

[20] T. E. Day and C. M. Lewis, “The behavior of the volatility implicit inthe prices of stock index options,”J. Financial Economics, 1988.

[21] E. Derman and I. Kani, “Riding on a smile,”Risk Mag., vol. 7, p. 2,1994.

[22] D. A. Dickey and A. W. Fuller, “Likelihood ratio statistics for au-toregressive time series with a unit root,”Econometrica, vol. 49, pp.1057–1072, 1981.

[23] J. L. Dorian and R. D. Arnott, “Tactical style management,” inTheHandbook of Equity Style Management, T. D. Coggin and F. Fabozzi,Eds. New Hope, PA: FJF, 1995.

[24] E. S. Elton, E. J. Gruber, and J. Gullerkin, “Expectations and shareprices,” Management Sci., vol. 27, pp. 974–987, 1981.

[25] R. F. Engle and C. W. J. Granger, “Cointegartion and error correction:Representation, estimation, and testing,”Econometrica, vol. 55, pp.251–278.

[26] F. J. Fabozzi,Bond markets: Analysis and strategies, Prentice Hall, 1989.[27] E. E. Fama, “The behavior of stock market prices,”J. Business, vol. 38,

pp. 34–105, 1965.[28] E. F. Fama and K. R. French, “Size and book-to-market factors in

earnings and returns,”J. Finance, vol. 50, no. 1, pp. 185–224, Mar.1995.

[29] K. French, “Stock returns and the weekend effect,”J. Financial Eco-nomics, vol. 8, pp. 55–69, 1980.

[30] K. French and R. Roll, “Stock return variances: The arrival of infor-mation and the reaction of traders,”J. Financial Economics, vol. 8, pp.79–96, 1980.

[31] J. Friedman and W. Stutzle, “Projection pursuit regression,”J. Amer.Statist. Assoc., vol. 76, pp. 817–823, 1981.

[32] W. K. H. Fung and D. A. Hsieh, “Empirical analysis of impliedvolatility: Stocks, bonds, and currencies,” presented at theFinancialOptions Res. Center, Univ. Warwick, 1992.

[33] R. P. Gorman and T. P. Sejnowski, “Analysis of hidden unites in alayered network trained to classify sonar targets,”Neural Networks,vol. 1, pp. 75–89, 1988.

[34] C. Graham, “The supermodel comes of age, New Angles,”Risk Mag.,vol. 7, p. 1, 1994.

[35] C. W. J. Granger and P. Newbokd, “Spurious regressions in economet-rics,” J. Econometrics, vol. 2, pp. 111–120, 1974.

[36] C. W. J. Granger and A. A. Weiss, “Time series analysis of errorcorrection models,” in S. Karlinet al., Eds., Studies in Economet-rics, Time Series, and Multivariate Statistics. New York: Academic,1983.

[37] C. W. J. Granger, “Cointegrated variables and error correcting models,”Univ. California, San Diego, discussion paper 83-13a, 1983.

[38] , “Some properties of time series data and their use in econometricmodel specification,”J. Econometrics, pp. 121–130, 1981.

[39] W. Hardle, “Applied nonparametric regression,”Econometric Soc.Monographs, 1990.

[40] C. R. Harvey and R. E. Whaley, “S&P100 index option volatility,”J.Finance, vol. 46, pp. 1551–1561, 1991.

[41] D. Heath, R. Jarrow, and A. Morton, “Bond pricing and the termstructure of interest rates: A new methodology,” Cornell Univ., workingpaper, Aug. 1988.

[42] D. F. Hendry and T. Von Ungern-Sternberg, “Liquidity and inflationeffects on consumers expenditure,” inEssays in the Theory and Meas-

surement of Consumer Behavior, A. S. Deaton, Ed. Cambridge, U.K.:Cambridge Univ. Press, 1981.

[43] J. Hull and A. White, “The pricing of options on assets with stochasticvolatilities,” J. Finance, vol. 42, pp. 281–300, 1987.

[44] B. Jacquillat, J. Hamon, P. Handa, and R. Schwartz, “The profitabilityof limit order trading on the Paris stock exchange,” Universite ParisDauphine, unpublished document

[45] S. Johansen, “Statistical analysis of cointegration vectors,”J. EconomicDynamics Contr., vol. 12, pp. 231–254, 1988.

[46] , “Hypothesis testing for cointegration vectors in Gaussian vectorautoregressive models,” reprint, Inst. Math. Statist., Univ. Copenhagen,Denmark.

[47] H. Latane and R. J. Rendleman, “Standard deviations of stock priceratios implied by option premia,”J. Finance, vol. 31, pp. 369–381,1976.

[48] B. Lev, “On the usefulness of earnings and earnings research: Lessonsand directions from two decades of empirical research,”J. AccountingRes., vol. 27, suppl. 53–192, 1989.

[49] R. Litterman and J. Scheinkman, “Common factors affecting bondreturns,” J. Fixed Income, June 1991.

[50] A. Lo and A. C. Mackinlay, “Stock market prices do not follow randomwalks: Evidence from a simple specification test,”Rev. FinancialStudies, vol. 1, pp. 203–238, 1988.

[51] H. M. Markowitz, “Portfolio selection,”J. Finance, vol. 7, pp. 77–91,1952.

[52] R. Merton, “On estimating the expected return on the market: Anexploratory investigation,”J. Financial Economics, 1980.

[53] J. Moody and J. Utans, “Architecture selection strategies for neuralnetworks: Application to corporate bond rating prediction,” inNeuralNetworks in the Capital Markets, A-P. N. Refenes, Ed. Chichester,U.K.: Wiley, 1995, pp. 277–300.

[54] R. G. Palmer, W. B. Arthur, J. H. Holland, B. LeBaron, and P. Taylor,“Artificial economic life: A simple model of a stock market,” SantaFe Inst. working paper 93, also submitted Elsevier Publishers, Oct.1993.

[55] J. M. Patell and M. Wolfson, “Anticipated information releases reflectedin call option prices,”J. Accounting and Economics, 1979.

[56] P. C. B. Philips and S. Ouliaris, “Testing for cointegration usingprincipal components methods,”J. Economic Dynamics Contr., vol. 12,pp. 205–230, 1988.

[57] J. Poterba and L. H. Summers, “The persistence of volatility and stockmarket fluctuations,”Amer. Economic Rev., 1986.

[58] E. J. Pozo, “Instrumentos derivados sobre indices bursatiles negociadosen mercados organizados,” Especial referencia al mercado MEFF.RV.Tesis Doctoral, UAM, Madrid.

[59] A.-P. N. Refenes, “Methods for optimal network design,” in A.-P. N.Refenes, Ed.,Neural Networks in the Capital Markets. Chichester,U.K.: Wiley, 1995, pp. 33–54.

[60] A.-P. N. Refenes and M. Azema-Barac, “Neural-network applicationsin financial asset management,”Neural Computing Applicat., vol. 2, pp.13–39, 1994.

[61] A-P. N. Refenes, M. F. Gonzales, and A. N. Burgess, “Intraday volatilityforecasting using neural networks, a comparative study with regressionmodels,” IJCIO, accepted 1996, to appear 1997.

[62] A-P. Refenes and A. D. Zapranis, “Specification tests for neural net-works,” J. Forecasting, London Business School, Dept. Decision Sci.,Tech. Rep., submitted Apr. 1997.

[63] Risk/Finex, “From black-scholes to black holes: New frontiers inoptions,” Risk/Finex, pp. 64–67, 1992.

[64] R. Roll, “A simple implicit measure of the effective bid-ask spread,”J.Finance, vol. 39, pp. 1127–1139, 1984.

[65] S. A. Ross, “The arbitrage pricing theory of capital asset pricing,”J.Economic Theory, vol. 13, pp. 341–360, 1976.

[66] J. D. Sargan, “Wages and prices in the United Kingdom: A studyin econometric methodology,” in Econometric Analysis for NationalEconomic Planning, P. E. Hart,et al., Eds. London: Butterworths,1964.

[67] S. Schaefer and E. Schwartz, “A two factor model of the term structure:An approximate analytical solution,”J. Financial and QuantitativeAnal., vol. 19, no. 4, Dec. 1984.

[68] W. Schwertz, “Why does stock market volatility change over time?,”J. Finance, 1989.

[69] K. Shastri and K. Wthyavivorn, “The valuation of currency options foralternate stochastic processes,”J. Financial Res., vol. 10, pp. 283–293,1987.

[70] W. Sharpe, “Capital asset prices: A theory of market equilibrium,”J.Finance, vol. 19, pp. 425–442, 1964.

[71] A. M. Sheikh, “Transactions data tests of S&P100 call option pricing,”J. Financial and Quantitative Anal., vol. 26, pp. 459–474, 1991.

Page 46: 006

REFENESet al.: NEURAL NETWORKS IN FINANCIAL ENGINEERING 1267

[72] E. E. Stein and J. C. Stein, “Stock price distributions with stochasticvolatility: An analytic approach,”Rev. Financial Res., vol. 4, pp.727–752, 1987.

[73] R. Trippi, “A test of option market efficiency using a random walkvaluation model,”J. Economics and Business, 1977.

[74] S. Weisberg,Applied Linear Regression. New York: Wiley, 1985.[75] R. C. Whaley, “Valuation of American call options on dividends paying

stocks: Empirical test,”J. Financial Economics, vol. 10, pp. 29–58,1982.

[76] H. White, “Nonparametric estiamtion of conditional quantiles usingneural networks,” inProc. 22nd Symp. Interface. New York: Springer-Verlag, 1991, pp. 190–199.

[77] X. Xinxhong and S. Taylor, “Conditional volatility and the informationalefficiency of the PHLX currency options markets,”J. Banking andFinance, vol. 6, pp. 237–248, 1993.

[78] W. Hardle and T. Stouer, “Investigaring smooth multiple repression bythe method of average derivatives,”J. Amer. Statist. Assoc.,vol. 84, pp.986–995, 1989.

[79] T. Bollenslev, “Generalized autoregressive conditional hetero-schedasticity,”J. Econometrics,vol. 33, pp. 307–327, 1986.

Apostolos-Paul N. Refenesreceived the B.Sc. de-gree in mathematics from the University of NorthLondon in 1984 and the Ph.D. degree in computingfrom the University of Reading, U.K., in 1987.

He is Associate Professor of Decision Science andDirector of the Computational Finance Program atLondon Business School. His current research inter-ests on neural networks and nonparametric statisticsinclude model identification, variable selection, testsfor neural model misspecification, and estimationprocedures. Applied work includes factor models for

equity investment, dynamic risk management, nonlinear cointegration, tacticalasset allocation, exchange risk management, etc.

A. Neil Burgessreceived the B.Sc. degree in Com-puter Science from Warwick University, U.K., in1989. He is currently pursuing the Ph.D. degree onthe subject of modeling nonstationary systems usingcomputational intelligence techniques.

Subsequently, he worked in the Thorn-EMI Cen-tral Research Laboratories, applying computationalintelligence techniques to a range of problems inmusical analysis, marketing, signal processing, andfinancial forecasting. Since September 1993, he hasbeen a Research Fellow in the Decision Technology

Center at London Business School, where he has published more than 20papers on the use of advanced techniques for financial modeling. His researchinterests neural networks, genetic algorithms, nonparametric statistics, andcointegration.

Yves Bentz received the M.Sc. degree in physicsfrom Marseilles National School of Physics, France,in 1991. Subsequently, he received the M.A. degreefrom Marseilles Business School, France, in 1993.He is currently pursuing the Ph.D. degree at LondonBusiness School on the identification of conditionalfactor sensitivities in the area of investment man-agement.

He is currently a Researcher in the departmentof Investiment Strategy at Soci´ete Generale AssetManagement, Paris, investigating the applications of

advanced modeling techniques to equity investment management. His presentinterests include factor models based on adaptive and intelligent systems suchas the Kalman filter and neural networks.