© 2015 ligaDATA, Inc. All Rights Reserved. Powering Real-time Decisioning for Financial & Healthcare using Open Source August 2015 Community @ http://Kamanja.org
Aug 18, 2015
© 2015 ligaDATA, Inc. All Rights Reserved.
Powering Real-time Decisioning for Financial & Healthcare using Open Source
August 2015
Community @ http://Kamanja.org
2 © 2015 ligaDATA, Inc. All Rights Reserved.
In ‘14 Barclays embarked on transforming how they leverage their data using Open Source & Big Data technologies.
3 © 2015 ligaDATA, Inc. All Rights Reserved.
To achieve this goal withBarclays we needed to:
1. Create a framework to adopt Open Source Software
2. Need a catalyst to attract and retain the talent
© 2015 ligaDATA, Inc. All Rights Reserved. 4
Marissa Meyer of Yahoo won’t have to go in front of the senate to explain why 100,000 records were lost – Barbara Desoer of CitiBank would.
What is different about Financial Services? ü Regulatory requirements requires 100% data protection ü Security & Data governance ü Auditability ü Lineage ü ZERO data loss ü Integration with legacy ecosystem ü Skillset
Open Source in Financial ServicesGood enough for Internet companies isn't good enough!
© 2015 ligaDATA, Inc. All Rights Reserved. 5
A modified “Crossing the Chasm” view for OSS
OSS – Adoption Chasm Why Financial Services have not adopted OSS more aggressively?
Creators Contributors Users
CreatorsTechnology
Organizations, Rich resources, Solving a problem, Creating a
competitive advantage
ContributorsTechnology
Organizations, taking a risk while Solving a
problem
Users Lower Technology Skillset, Low risk
tolerance, Solving a problem
© 2015 ligaDATA, Inc. All Rights Reserved. 6
Establish the BOSS framework for the consumption and contribution to open source software (OSS) at scale in Barclays .
Barclays Open Source Software (BOSS)
Contribution to OSS by enhancing existing open
source projects, documentation, fixes,
enhancements
Initiation of a new OSS project, championing and facilitating OSS community
development and consumption
Evaluation & Consumption of OSS
Maturing Capability
Consumption Contribution
Barclays Current Focus Step Change
Pioneering Target
BOSS optimises Consumption, enables Contribution and Creation • Input from stakeholders, internal and external influenced BOSS framework definition • OSS advisory board to steer and drive • Pre-approved licenses types per use case (consumption and contribution) • Invest in enabling technology, GitHub, Black Duck, Sonatype • No new governance steps, leverage and streamline existing controls instead of creating new ones
Creation
© 2015 ligaDATA, Inc. All Rights Reserved. 7
BOSS framework is designed based on guidance and feedback received from key representatives within Barclays and from leading open source contributors and fellow banks .
Technology
Internal
External
BOSS – Collective Thought Process
Retail
Investment
Cards
Legal
Risk
Security
Sourcing
Business Units Control Functions
Data
Design
Infra
© 2015 ligaDATA, Inc. All Rights Reserved. 8
Millennial developers …
• Grew up using OSS
• Unaware of Closed Source software
• Want to engage, share and contribute
Real-time using Kamanja was selected as a capability big enough, important enough to build a Center of Excellence around it.
Attracting and Retaining talent
© 2015 ligaDATA, Inc. All Rights Reserved. 9
Individual Events
Decisioning, Detection
In-context and online
Cross section of events
Analytics, MI
Offline, Longer cycle
Deriving Decisionsfrom Big Data
BATCH REAL-TIME
© 2015 ligaDATA, Inc. All Rights Reserved. 10
customer-centric product design require Real-time decisions
Trig
gers
Scoring
Notifications
Alerts
Transactional Updates
Deriving an Opportunity or Threat
E N D - T O - E N D C A P A B I L I T Y
Tracking & Analyzing (processing)
Streams of Information(real-time)
About Things That Happen (events)
Actio
ns
Real-timeDecisions
Real-time DecisionRequirement
11 © 2015 ligaDATA, Inc. All Rights Reserved.
LigaDATA introduced Kamanja – an open source real-time decisioning project, hardened for Financial Services & Healthcare requirements and scalable to IoT level data volumes enabling low latency use cases.
Customer churn/
retention
Risk Analysis
Customer Contact
Cyber Crime
Fraud
Security & Compliance
Audit & Governance
U S E C A S E S
Marketing
Telephony Interception
Real-Time Offer
12 © 2015 ligaDATA, Inc. All Rights Reserved.
Uses of Real-Time Decisioning
Complex Event Processing (CEP) • A few to possibly 100’s of concurrent data streams • Apply rule logic, select, aggregate • Decide action on elements in stream
Enterprise Applications, During … • customer call or chat: recommendations to improve service • card transaction: offer credit increase • web application: pre-approval • web transaction: recommend other product(s)
13 © 2015 ligaDATA, Inc. All Rights Reserved.
Case Study of a Modeling Department Monitor $80B of consumer bank transactions / year to detect fraud (between 1,400 banks)
PAIN POINT: ~2 months to deploy (model group was different from deployment group) INDUSTRY REVIEW to answer: • How common is it to use many algorithms or tools in a project? • What is an easier way to deploy models?
14 © 2015 ligaDATA, Inc. All Rights Reserved.
http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html
Independent use of tools
15 © 2015 ligaDATA, Inc. All Rights Reserved.
http://www.kdnuggets.com/2015/06/data-mining-data-science-tools-associations.html
Tools used in combination
16 © 2015 ligaDATA, Inc. All Rights Reserved.
Scoring Engine
(Kamanja)
PMML DiagramPredictive Modeling Markup Language
Training & test data (batch)
Data Mining Tool File, Save As
PMML
PMML File
PMML Producer
PMML File Scoring data
(real time streaming) Output data has new score field
Training Project Phase
Production Scoring Project Phase
Full model specification
PMML Consumer
17 © 2015 ligaDATA, Inc. All Rights Reserved.
Given industry fragmentation, PMML is a solution PMML Producers (18 companies) • R (Rattle, PMML) • RapidMiner • KNIME
PMML Consumers (12 co) • Zementis • SAS • IBM SPSS • KNIME • Microstrategy • Kamanja • JPMML
• Spark (MLlib) (Open Source) • Weka • SAS Enterprise Miner
PREDICTIVE Naïve Bayes Neural Net Regression Rules Scorecard Sequence SVM Time Series Trees
DESCRIPTIVE / OTH Association Rules Cluster, K-Nearest Nb Text Models model ensembles & composition (i.e. Gradient Boosting)
© 2015 ligaDATA, Inc. All Rights Reserved. 18
Real Time Computing
OSS Technology Stack Integration with Kamanja
Kamanja (PMML/Java/Scala Consumer)
High level languages / abstractions
Compute Fabric
Cloud, EC2 Internal Cloud
Security
Kerberos
Real Time Streaming
Kafka, MQ
Spark*
ligaDATA
Data Store
HBase, Cassandra,
InfluxDB HDFS
(Create adaptors to integrate others)
Resource Management
Zookeeper, Yarn*, Mesos*
High Level Languages / Abstractions
MLlib* (PMML Producer)
© 2015 ligaDATA, Inc. All Rights Reserved. 19
PerformanceCharacteristics
© 2015 ligaDATA, Inc. All Rights Reserved. 19
Performance • Throughput of million messages/second
• Uses commodity hardware
Scalability • Linear scalability -- horizontally
• Data partitioning support
• Runtime multi-model optimizations to support thousands of models
• Consistent performance on hundreds of models and thousands of rules
Built for IoT data volumes
© 2015 ligaDATA, Inc. All Rights Reserved. 20
• Clinicians (knowledge experts) develop heuristic based rule set models
• The initial model was COPD (Chronic Obstructive Pulmonary Disease) risk assessment
• Support of referenced Beneficiary, HL7, Inpatient Claim, and Outpatient Claim
• Models are expressed with a domain specific language (DSL) they developed
• DSL models are transformed to PMML for Kamanja
• Models consume current + prior related messages over “look back period” Save the “assertions” of a patient in the database (beyond standard PMML) “State” can evolve over time
• The “Medical Company” plans to integrate the DSL with their ontology data modeling effort
• Goal is to generate new models as their “medical world” ontology evolves
Medical Company use of Kamanja