Analy&cs in Ac&on Summer School 2015 Seshika Fernando Technical Lead
What’s in store
o Quick recap of key concepts
o Real world applica5ons and demos of o Batch Analy5cs o Interac5ve Analy5cs o Real-‐5me Analy5cs o Predic5ve Analy5cs o Combina5ons of the above
o Summary
Data Science is…
“the extrac&on of knowledge from large volumes of data that are structured or unstructured”
Analy&cs Landscape
o Batch Analy5cs Extrac5ng knowledge by processing large amounts of stored data
o Interac5ve Analy5cs Extrac5ng knowledge by interac5ng with large amounts of stored data by querying
o Real-‐5me Analy5cs Extrac5ng knowledge by processing fast moving data
o Predic5ve Analy5cs Extrac5ng knowledge from exis5ng data to determine paGerns and predict future outcomes and trends
Batch Analy&cs in the Real world
o KPI Sta5s5cs o Web applica5on stats monitoring o Network/Service sta5s5cs o Aggrega5ons of sensor data
o Solving op5miza5on problems o Urban Planning o Revenue distribu5on analysis
Interac&ve Analy&cs in the Real world
o Log Analysis o HTTP logs o Audit logs o System logs
o Ac5vity Monitoring o Tracing workflows o Detec5ng performance issues o Health data monitoring
o Fraud Detec5on o Once a fraud is detected, querying other events that
maybe related
Real-‐&me Analy&cs in the Real world
o Sports o Real-‐5me analysis of team/player performance o Real-‐5me match analy5cs for fans
o Geo-‐spa5al o Traffic Monitoring and alerts o Geo-‐fencing requirements for Transporta5on
o Anomaly Detec5on o Fraud Detec5on o Network Intrusion Detec5on o Network/Server health monitoring
Predic&ve Analy&cs in the Real world
o Next value predic5on o Sales forecasts o Electricity loads
o Classifica5on o Product Categoriza5on o Customer Segmenta5on
o Anomaly Detec5on o Fraud Detec5on o Preven5ve Maintenance
Website Ac&vity Data
o Product Downloads o Whitepapers o Webinars o Case Studies o Workshops
Random Forest
24
from e1 = Transac5onStream -‐>
e2 = Transac5onStream[e1.cardNo == e2.cardNo] <3:>
within 5000
select e1.cardNo, e1.txnID, e2[0].txnID, e2[1].txnID, e2[2].txnID
insert into FraudStream;
Fraud Detec&on: Real-‐&me queries
Fraud Detec&on
o Known Fraud Modelling o Real-‐5me Analy5cs
o Unknown Fraud Modelling o Predic5ve Analy5cs
o Parameters for Fraud detec5on o Batch Analy5cs o Predic5ve Analy5cs
o Further Analysis once Fraud is detected o Interac5ve Analy5cs
Summary
o Many flavors of Analy5cs o Batch, Interac5ve, Real-‐5me, Predic5ve
o Real life use cases need to u5lize different types of analy5cs
o Many Technologies available o Hadoop MapReduce, Spark, Storm, R, WSO2
Analy5cs
o WSO2 Analy5cs Plaiorm provides Batch, Interac5ve, Real-‐5me and Predic5ve Analy5cs all in one place