Awesome Banking APIs Exposing bigdata and streaming analytics using hadoop, cassandra, akka and spray
Jan 27, 2015
Awesome Banking APIsExposing bigdata and streaming analytics using hadoop, cassandra, akka and spray
Humanize Data
The bank statements
The bank statements How I read the bank bills
The bank statements How I read the bank bills What happened those days
data is the fabric of our lives
Personal history:
Long term Interaction:
Real time events:
>>> from sklearn.datasets import load_iris>>> from sklearn import tree>>> iris = load_iris()>>> clf = tree.DecisionTreeClassifier()>>> clf = clf.fit(iris.data, iris.target)
● Flexible, coincise language● Quick to code and prototype● Portable, visualization libraries
Machine learning libraries:scipy, statsmodels, sklearn, matplotlib, ipython
Web librariesflask, tornado, (no)SQL clients
# Multiple Linear Regression Example
fit <- lm(y ~ x1 + x2 + x3, data=mydata)
summary(fit) # show results
● Language for statitics● Easy to Analyze and shape data● Advanced statistical package● Fueled by academia and professionals● Very clean visualization packages
Packages for machine learningtime serie forecasting, clustering, classification decision trees, neural networks
Remote procedure calls (RPC)From scala/java via RProcess and Rserve
OK, let’s build some banking apps
core banking systems
SOAP services and DBs
System BUS
customer facing appls
channels
Bank schematic
Challenges
Higher separation !
Bigger and Faster
Less silos
Interactions
with core
systems
Reliable
Low cost↓ ↑
Computing Powerhouse
Reliable
Low latency
Tunable CAP
Data model: hashed rows, sorted wide columns
Architecture model: No SPOF, ring of nodes, omogeneous system
ActorA Actor
B
ActorC
msg 1msg 2
msg 3
msg 4●
●
●
●
CoreFlow
HTTPI/O
NoSQLClient
hadoop
BatchDatascience
Cassandra
SOAPClient
Real-time Analytics
Bank core servicesBankTransactions
Data Science
Data Science
Data Science
API
Sprayin’ trait ApiService extends HttpService {
// Create Analytics client actor
val actor = actorRefFactory.actorOf(Props[AnalyticsActor], "analytics-actor")
//curl -vv -H "Content-Type: application/json" localhost:8888/api/v1/123/567
val serviceRoute = {
pathPrefix("api" / "v1") {
pathPrefix( Segment / Segment ) {
(aid, cid) =>
get {
complete {
actor ? (aid, cid)
Create an actor for analytics
Serve the API path
Message is passed on to the analytics actor
https://github.com/natalinobusa/wavr
Latency tradeoffs
Managing computation
Science & Engineering
Statistics, Data Science
PythonRVisualization
IT InfraBig Data
JavaScalaSQL
Hadoop: Big Data Infrastructure, Data Science on large datasets
Big Data and Fast Data requires different profiles to be able to achieve the best results
Some lessons learned
● Mix and match technologies is a good thing● Harden the design as you go● Define clear interfaces● Ease integration among teams● Hadoop , Cassandra, and Akka: they work!● Plugin the Data Science !
Thanks !Any questions?