Page 1
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SPRINGONE2GXWASHINGTON, DC
Implementing a highly scalable Stock prediction system with R, Apache Geode and Spring XD
Fred Melo@fredmelo_br
William Markito@william_markito
Page 2
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
About us
Fred Melo
Technical Director for Data
[email protected]
@fredmelo_br
2
William Markito
Enterprise Architect for GemFire
[email protected]
@william_markito
Page 3
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 3
Page 4
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 4
It's all about DATA
Data SourcesLook for patterns
Prediction
Page 5
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
What do we want to build?
5
"Smart System"
Page 6
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
… in our specific case
6
Trading Data
"Smart System"
Historical Data Repository
Learns with historical trends"How were the medium average price and relative strength reading when the latest failures happened? "
Live data becomes historical over time
Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"
Historical
Page 7
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
… in our specific case
7
Trading Data
"Smart System"
Historical Data Repository
Learns with historical trends
"How were the medium average price and relative strength reading when the latest failures happened? "
Live data becomes historical over time
Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"
Historical
Page 8
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 8
Live Data
Data Temperature
Hot
Cold
Greenplum DB
Apache Geode / GemFire1- Live data is ingested into the grid
3 - Results are pushed immediately to deployed applications
4 - “Hot" data ages, becoming part of the historical dataset
Machine Learning model 5 - Re-training is triggered,
updating the model with the latest historical data
Spring XD
Spring XD
The ML pipeline data flow
2 - Trained ML model compares new data to historical patterns
Page 9
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 9
Live Data
Apache Geode / GemFire1- Live data is ingested into the grid
2 - Trained ML model compares new data to historical patterns
3 - Results are pushed immediately to deployed applications
Machine Learning model
4 - Re-training is triggered, updating the model with the latest historical data
Spring XD
Spring XD
Simplified demo model Data Temperature
Hot
Warm
Page 10
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 10
Transform Sink
SpringXD
ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
Page 11
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 11
Eating it in small bites…
Page 12
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 12
SpringXD GemFire
Page 13
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
• Cache
• Configurable through XML, ,Java
• Region
• Distributed j.u.Map on steroids
• Highly available, redundant
• Member
• Locator, Server, Client
• Callbacks
• Listener, Writer, AsyncEventListener, Parallel/Serial
Apache Geode & GemFire Concepts
13
Page 14
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Geode & GemFire, why ?
• Performance
• Consistency
• Resiliency
14
Page 15
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Geode & GemFire, why ?
15
© Copyright 2014 Pivotal. All rights reserved.
Pivotal GemFire High Availability and Fault Tolerance in 6 acts
Failing data copies are replaced transparently
Data is replicated to other clusters and sites (WAN)
Network segmentations are identified and fixed automatically
Client and cluster disconnections are handled gracefully
Data is persisted on local disk for ultimate durability
“split brain”
Failed function executions are restarted automatically
restart
Page 16
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Some interesting cases…
16
China RailwayCorporation
5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second
* http://pivotal.io/big-data/pivotal-gemfire
Indian Railways
7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute
Page 17
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Use cases and industries
17
Indian RailwaysChina Railway Corporation
World: ~7,349,000,000
~36% of the world population
Population: 1,251,695,6161,401,586,609
Page 18
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
• Commercial product available since 2004
• Native clients in Java, C++, C#, REST
• Event Subscriptions and Continuous Queries
• Configurable WAN Gateway between clusters
• Enterprise Support, commercial features
Apache Geode & Pivotal GemFire
• Open Sourced in April/2015
• Java Native Client, REST
• 98% of GemFire API
• Event subscriptions
• ~30 contributors
• Under Incubation
18
Page 19
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 19
SpringXD GemFire
Page 20
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SpringXD Basic Concepts
• Streams
• Pipelines
• Sources
• Sinks
• Filters
• Taps
20
Page 21
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
SpringXD Basic Concepts
21
Page 22
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
A simple example
22
twittersearch --consumerKey=XXX —consumerSecret=XXX --query=SpringOne2GX --outputType=application/json | gemfire-json-server --useLocator=true --host=localhost --port=10334 --regionName=tweets --keyExpression=payload.getField('id_str')
twittersearch --query=SpringOne2GX | gemfire-json-server --host=localhost--regionName=tweets
Page 23
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 23
SpringXD GemFire
Page 24
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Spark Concepts
•RDD
•Dataframe
•Driver
•Worker
24
"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."
Page 25
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Apache Spark Concepts
•RDD
•Dataframe
•Driver
•Worker
25
Page 26
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 26
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Machine Learning Model (e.g. Linear Regression)
Page 27
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 27
medium avg (x+1)
relative strength (x)
medium avg (x)
price(x)
Machine Learning Model (e.g. Linear Regression)
Features Label
Page 28
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 28
Transform Sink
SpringXD
ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native
Machine Learning
Enrich Filter
Split
Dashboard
Indicators
1
2
Predict
3
Real data
Simulator
/Stocks
/TechIndicators
/Predictions
Page 29
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 29
Page 30
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Learn more!
30
https://github.com/Pivotal-Open-Source-Hub/geode-security-sampleshttps://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoThttps://github.com/Pivotal-Open-Source-Hub/geode-social-demo
http://pivotal-open-source-hub.github.io/StockInference-Spark/
Page 31
Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/
Thank you
31
@william_markito @fredmelo_br
Related: Building Highly-Scalable Spring Applications with In-Memory, Distributed Data Grids
by John Blum & Luke ShannonSeptember 15, 2015 -10:30 - Salon M
http://pivotal-open-source-hub.github.io/StockInference-Spark/