Automated Trading Strategies with R 3rd April 2014 Richard Pugh, Commercial Director [email protected]
Jan 14, 2020
Automated Trading
Strategies with R 3rd April 2014
Richard Pugh, Commercial Director
Agenda
• Overview of Mango
• Data Analytics
• Introduction to Backtesting
• The Backtesting Project
• Leveraging Oracle R Enterprise
• Summary
Mango in a nutshell …
• Providers of analytic products and services
• Specialise in analytic application development
• Unique mix of business-focused statisticians and
mainstream software developers
• Private company founded in 2002
• Offices in UK & China
• Global Team of 65 and expanding
• ISO 9001 Accredited
• Partner with Oracle on R project
Data Analytics
• Companies are awash with structured and
unstructured data
• The insight locked in this data can help us to
make better decisions and gain a competitive
advantage
• Data Analytics can help to extract the key
information from our data
Who is a good driver? How do we win more games? What bonus should I pay?
Will someone like this? When might this break? What are they likely to want?
Data Analytic Examples
Challenges of Integrating Analytics
• Clear questions are needed
• Data may not be analytic-ready
• Sophisticated analytics require niche technology
that can be difficult to integrate
• The “language” of analytics can be difficult to
penetrate and requires specialists
• Integrating the “right” analytics is key …
Introduction to Backtesting
• Algorithmic trading makes up a large % of market
trades
• Backtesting is the process of testing a trading
strategy using historical data
• Allows the development of an automated trading
strategy
Backtesting Example
Buy every stock beginning with ‘A’ and sell all stocks beginning with ‘Z’
How do we know if this works??
Backtest!!
Key Factors in Backtesting
• Easy selection and execution of strategies
• Performance of backtest
• Optimisation across sectors, styles, etc
• Comparison with hurdle (e.g. interest rates)
• Transaction costs
The Backtesting Project
• Mango engaged by a major hedge fund to create
backtest solution
• Competitive advantage over off the shelf solution
• Particular complexity around transaction cost
(futility switching) and optimisation
• Framework with possibility for extensions
The Backtesting Solution
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
The Backtesting Solution
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
The Backtesting Solution
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
The Backtesting Solution
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
The Backtesting Solution
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
Bespoke C
The Backtesting Project Outcome
• Deemed a success
• Used to drive an industry-beating fund
• Scripting interface more popular than the
graphical user interface
• Code management interface allowed for the
addition of new routines without impact to the rest
of the application
The Backtesting Project Constraints & Challenges
• Performance bottleneck meant restricted to
weekly data
• Creation of the C layer for data access was
unexpected
• As number of “power users” increased, more
sophisticated code management would have
helped
Leveraging Oracle R Enterprise
• The project was operated on a shared-cost basis,
with Mango retaining the IP
• Mango now looking to further develop the
application and release as a product
• ORE identified as perfect way to replace non-
performant parts of the application
• Oracle products familiar to Mango
Steps to Integrating with ORE
• Use Oracle for Object Management
• Replace functions with ORE equivalent
• Use embedded scripts for execution
• Expose interface as SQL
• Build User Interface
ORE Functions
> apropos("^ore")
[1] "OREShowDoc" "ore.attach" "ore.connect" "ore.const"
[5] "ore.corr" "ore.create" "ore.crosstab" "ore.datastore"
[9] "ore.datastoreSummary" "ore.delete" "ore.detach" "ore.disconnect"
[13] "ore.doEval" "ore.drop" "ore.esm" "ore.exec"
[17] "ore.exists" "ore.frame" "ore.freq" "ore.get"
[21] "ore.getXlevels" "ore.getXnlevels" "ore.glm" "ore.glm.control"
[25] "ore.groupApply" "ore.hash" "ore.hiveOptions" "ore.hour"
[29] "ore.indexApply" "ore.is.connected" "ore.lazyLoad" "ore.lm"
[33] "ore.load" "ore.ls" "ore.make.names" "ore.mday"
[37] "ore.minute" "ore.month" "ore.neural" "ore.odmAI"
[41] "ore.odmAssocRules" "ore.odmDT" "ore.odmGLM" "ore.odmKMeans"
[45] "ore.odmNB" "ore.odmNMF" "ore.odmOC" "ore.odmSVM"
[49] "ore.predict" "ore.pull" "ore.pull" "ore.pull"
[53] "ore.push" "ore.push" "ore.rank" "ore.recode"
[57] "ore.rm" "ore.rollmax" "ore.rollmean" "ore.rollmin"
[61] "ore.rollsd" "ore.rollsum" "ore.rollvar" "ore.rowApply"
[65] "ore.save" "ore.scriptCreate" "ore.scriptDrop" "ore.second"
[69] "ore.showHiveOptions" "ore.sort" "ore.stepwise" "ore.summary"
[73] "ore.sync" "ore.tableApply" "ore.toXML" "ore.univariate"
[77] "ore.year" "oreOut"
Step #1: Oracle for Object Management
• Ported the application to use Oracle for object
(data) management
• Suite of ore.* functions to allow easy storage /
retrieval of R objects
• Immediate benefit in performance for data i/o
• Code base simplification (no need for bespoke C
layer)
Step #1: Oracle for Object Management > writeRdaObject
function(object, fileName, category = "RawData", dataMethod = .backTest$dataMethod, …) {
…
ore.save(object, name = returnObject, overwrite = TRUE)
…
}
> loadRdaObject
function(fileName, category = "RawData", dataMethod = .backTest$dataMethod, …) {
…
get(ore.load(returnObject))
…
}
> grep("^RAW", ore.datastore()[[1]], value = TRUE)
[1] "RAWDATA_BRD_NO" "RAWDATA_BRD_SECT" "RAWDATA_DIVYIELD" "RAWDATA_FISCALYR1"
[5] "RAWDATA_FISCALYR2" "RAWDATA_FY13M" "RAWDATA_FY23M" "RAWDATA_HIGHY1"
[9] "RAWDATA_HIGHY2" "RAWDATA_IH6Y1" "RAWDATA_IH6Y2" "RAWDATA_IH7Y1"
…
> ore.datastoreSummary("RAWDATA_PRICE")
object.name class size length row.count col.count
1 getIt matrix 34776172 4336119 3917 1107
Step #2: Replace Functions with ore*
• The ORE library contains many optimised
versions of existing R functions
• There are also new functions not available in
Base R
• Using these ORE functions improves performance
and simplifies the code base
> apropos("^ore")
[1] "OREShowDoc" "ore.attach" "ore.connect" "ore.const"
[5] "ore.corr" "ore.create" "ore.crosstab" "ore.datastore"
[9] "ore.datastoreSummary" "ore.delete" "ore.detach" "ore.disconnect"
[13] "ore.doEval" "ore.drop" "ore.esm" "ore.exec"
[17] "ore.exists" "ore.frame" "ore.freq" "ore.get"
[21] "ore.getXlevels" "ore.getXnlevels" "ore.glm" "ore.glm.control"
[25] "ore.groupApply" "ore.hash" "ore.hiveOptions" "ore.hour"
[29] "ore.indexApply" "ore.is.connected" "ore.lazyLoad" "ore.lm"
[33] "ore.load" "ore.ls" "ore.make.names" "ore.mday"
[37] "ore.minute" "ore.month" "ore.neural" "ore.odmAI"
[41] "ore.odmAssocRules" "ore.odmDT" "ore.odmGLM" "ore.odmKMeans"
[45] "ore.odmNB" "ore.odmNMF" "ore.odmOC" "ore.odmSVM"
[49] "ore.predict" "ore.pull" "ore.pull" "ore.pull"
[53] "ore.push" "ore.push" "ore.rank" "ore.recode"
[57] "ore.rm" "ore.rollmax" "ore.rollmean" "ore.rollmin"
[61] "ore.rollsd" "ore.rollsum" "ore.rollvar" "ore.rowApply"
[65] "ore.save" "ore.scriptCreate" "ore.scriptDrop" "ore.second"
[69] "ore.showHiveOptions" "ore.sort" "ore.stepwise" "ore.summary"
[73] "ore.sync" "ore.tableApply" "ore.toXML" "ore.univariate"
[77] "ore.year"
> apropos("^ore")
[1] "OREShowDoc" "ore.attach" "ore.connect" "ore.const"
[5] "ore.corr" "ore.create" "ore.crosstab" "ore.datastore"
[9] "ore.datastoreSummary" "ore.delete" "ore.detach" "ore.disconnect"
[13] "ore.doEval" "ore.drop" "ore.esm" "ore.exec"
[17] "ore.exists" "ore.frame" "ore.freq" "ore.get"
[21] "ore.getXlevels" "ore.getXnlevels" "ore.glm" "ore.glm.control"
[25] "ore.groupApply" "ore.hash" "ore.hiveOptions" "ore.hour"
[29] "ore.indexApply" "ore.is.connected" "ore.lazyLoad" "ore.lm"
[33] "ore.load" "ore.ls" "ore.make.names" "ore.mday"
[37] "ore.minute" "ore.month" "ore.neural" "ore.odmAI"
[41] "ore.odmAssocRules" "ore.odmDT" "ore.odmGLM" "ore.odmKMeans"
[45] "ore.odmNB" "ore.odmNMF" "ore.odmOC" "ore.odmSVM"
[49] "ore.predict" "ore.pull" "ore.pull" "ore.pull"
[53] "ore.push" "ore.push" "ore.rank" "ore.recode"
[57] "ore.rm" "ore.rollmax" "ore.rollmean" "ore.rollmin"
[61] "ore.rollsd" "ore.rollsum" "ore.rollvar" "ore.rowApply"
[65] "ore.save" "ore.scriptCreate" "ore.scriptDrop" "ore.second"
[69] "ore.showHiveOptions" "ore.sort" "ore.stepwise" "ore.summary"
[73] "ore.sync" "ore.tableApply" "ore.toXML" "ore.univariate"
[77] "ore.year"
Step #2: Replace Functions with ore*
Step #2: Replace Functions with ore*
MMIN <- function (data, Lag, …) {
…
rMin <- apply(data, 2, ore.rollmin, K = Lag, align = "right")
…
}
> myMat
[,1] [,2] [,3] [,4] [,5]
[1,] 4 7 1 1 1
[2,] 2 4 6 2 3
[3,] 4 0 4 2 3
[4,] 2 2 5 4 4
[5,] 4 1 2 2 1
[6,] 5 4 4 4 1
[7,] 2 3 0 0 4
[8,] 4 4 4 4 4
…
> MMIN(myMat, 3)
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] NA NA NA NA NA
[3,] 2 0 1 1 1
[4,] 2 0 4 2 3
[5,] 2 0 2 2 1
[6,] 2 1 2 2 1
[7,] 2 1 0 0 1
[8,] 2 3 0 0 1
…
Step #3: Use Embedded Scripts
• R Scripts Stored and Managed in the Database
• Execution controlled by Oracle Database and
performed on database server
• Set of ore.* functions for managing and executing
scripts
Step #3: Use Embedded Scripts
try(ore.scriptDrop("doBacktest"))
ore.scriptCreate("doBacktest", function(alphaName, alphaDesc, alphaCat, alphaFormula,
optimMethod, optimFactors, numBaskets, lowerThreshold, upperThreshold, portName,
portDesc = alphaDesc) {
require(backTest) # Load the backTest package
# Now do the backtest
myAlpha <- runAlpha(alphaName, alphaDesc, alphaCat, alphaFormula)
myClass <- switch(optimMethod,
"Simple" = Classify(simpleOpt(myAlpha, .myData$Factors[optimFactors]), numBaskets),
Classify(myAlpha, numBaskets))
myPort <- createPort(myAlpha, myClass, lowerThreshold, upperThreshold, Splits=numBaskets)
fullReport(myPort, portName, portDesc, alphaName, optimFactors, optimMethod,
theDir = "/home/oracle/Results", fileName = "backTestReport.pdf")
})
Step #3: Use Embedded Scripts
alphaForm <- c(
"aUpDwnY1 = (IH7Y1-IH8Y1)/pmax(IH7Y1+IH8Y1,IH6Y1)*100",
"aUpDwnY2 = (IH7Y2-IH8Y2)/pmax(IH7Y2+IH8Y2,IH6Y2)*100",
"aUpDwnSc = UPR((UPR(aUpDwnY1)+UPR(aUpDwnY2)))",
"aFyrevs = UPR((UPR(FY13M)+UPR(FY23M)))",
"UPR(aUpDwnSc+aFyrevs)")
res <- ore.doEval(FUN.NAME="doBacktest", ore.connect = TRUE,
alphaName = "aRevSc", alphaDesc = "Simple Revision Alpha", alphaCat = "Revisions",
alphaFormula = alphaForm, optimMethod = "Simple", optimFactors = c("Style", "Sector"),
numBaskets = 5, lowerThreshold = .5, upperThreshold = 2, portName = "OptRevScore")
user system elapsed
0.134 0.037 240.697
> 240.697/60
[1] 4.01161
An Aside … the Backtest Report
try(ore.scriptDrop("doBacktest"))
ore.scriptCreate("doBacktest", function(alphaName, alphaDesc, alphaCat, alphaFormula,
optimMethod, optimFactors, numBaskets, lowerThreshold, upperThreshold, portName,
portDesc = alphaDesc) {
require(backTest) # Load the backTest package
# Now do the backtest
myAlpha <- runAlpha(alphaName, alphaDesc, alphaCat, alphaFormula)
myClass <- switch(optimMethod,
"Simple" = Classify(simpleOpt(myAlpha, .myData$Factors[optimFactors]), numBaskets),
Classify(myAlpha, numBaskets))
myPort <- createPort(myAlpha, myClass, lowerThreshold, upperThreshold, Splits=numBaskets)
fullReport(myPort, portName, portDesc, alphaName, optimFactors, optimMethod,
theDir = "/home/oracle/Results", fileName = "backTestReport.pdf")
})
Another Aside … getting interactive!
• Results are stored as ore objects in the database
• I can access the object for more in-depth analysis
> x <- loadPort("OptRevScore", "aRevSc")
Loading object STRATEGIES_REVISIONS_AREVSC_OPTREVSCORE_PORT
> names(x)
[1] "baskets" "bRets" "alpha" "relRets" "hMat" "classed" "tCosts"
[8] "turnOver" "costData“
> ls("package:backTest", pattern = "*lot")
[1] "alphaPlot" "dayPlot" "monthPlot" "pairsPlot" "plotPort"
[6] "qRetsPlot" "qSharpePlot" "qTranCostPlot" "qTurnOverPlot" "qVolsPlot"
[11] "textPlot" "turnOverPlot"
Another Aside … getting interactive!
> plotPort(x, removeTcosts = TRUE, title = "Simple Optimised Revision Strategy")
Step #3: Expose via SQL Interface
• R Scripts Stored and Managed in the Database
• Execution controlled by Oracle Database and
performed on database server
• Set of ore.* functions for managing and executing
scripts
• Outputs can be stored as XML or PNG (blobs)
Updated Application
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
Bespoke C
Updated Application
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
MgMent UI
Backtest
Application
Graphical User
Interface
Scripting
Interface
Updated Application
Universe
Feed Raw Data &
Alpha Storage
Analytic Engine
Analytic Code
Interface
Backtest
Application
Embedded
Scripts
SQL
Interface
Graphical User
Interface
Benefits of ORE
• Significant immediate benefits in performance and
code management
• Database script management makes deployment
very simple
• Script and SQL interfaces allow for close
integration into business processes in a controlled
manner
Summary
• Oracle R Enterprise provides a sophisticated platform for integrating R into business processes
• Adds scalability and performance improvements to flexible R environment
• Integrating a legacy application with ORE proved to be easy to achieve
• We have this running on demo servers if you want to see it ….