@louisdorard #ParisDataGeeks
Jul 15, 2015
@louisdorard
#ParisDataGeeks
–Waqar Hasan, Apigee Insights
“Predictive is the ‘killer app’ for big data.”
–Mike Gualtieri, Principal Analyst at Forrester
“Predictive apps are the next big thing
in app development.”
Machine Learning
Data
BUT
–McKinsey & Co.
“A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics
and machine learning.”
What the @#?~% is ML?
“How much is this house worth? — X $” -> Regression
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)
3 1 860 1950 house 565,000
3 1 1012 1951 house
2 1.5 968 1976 townhouse 447,000
4 1315 1950 house 648,000
3 2 1599 1964 house
3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000
4 2001 house 855,000
3 2.5 1472 2005 house
4 3.5 1714 2005 townhouse
2 2 1113 1999 condo
1 769 1999 condo 315,000
ML is a set of AI techniques where “intelligence” is built by
referring to examples
“Which type of email is this? — Spam/Ham”-> Classification
WATCH OUT!
• Need examples of inputs AND outputs
• Need enough examples
??
Prediction APIs
HTML / CSS / JavaScript
HTML / CSS / JavaScript
squarespace.com
The two phases of machine learning:
• TRAIN a model
• PREDICT with a model
The two methods of prediction APIs:
• TRAIN a model
• PREDICT with a model
The two methods of prediction APIs: • model = create_model(dataset)
• predicted_output = create_prediction(model, new_input)
from bigml.api import BigML
# create a modelapi = BigML()source = api.create_source('training_data.csv')dataset = api.create_dataset(source)model = api.create_model(dataset)
# make a predictionprediction = api.create_prediction(model, new_input)print "Predicted output value: ",prediction['object']['output']
http://bit.ly/bigml_wakari
Beyond predictive modelling
Phrase problem as ML task
Engineer features
Prepare data (csv)
Learn model
Make predictions
Deploy model & integrate pred
Evaluate model
Measure impact
PRED
ICTIO
N A
PIS
• Deployment to production?
• Maintenance?
• monitor performance
• update with new data
• D: Data preparation
• A: Algorithm
• S: Serving
• E: Evaluation
• Open source
• Spark’s MLlib -> prediction server
• Expose model as (scalable & robust) API
• DASE framework
• Send new data/events to event server
• Send prediction queries to engine
PredictionIO vs Azure ML on KDnuggets
Case study: churn analysis
• Who: SaaS company selling monthly subscription
• Question asked: “is this customer going to leave within 1 month?”
• Input: customer
• Output: no-churn (negative) or churn (positive)
• Data collection: history up until 1 month ago
Learning -> OK but
How to represent customers? What to do after predicting churn?
Customer representation:
• basic info (age, income, etc.)
• usage of service (avg call duration, overcharges, leftover minutes/month, etc.)
• interactions with customer support (how many, topics of questions, satisfaction ratings)
Taking action to prevent churn:
• contact customer
• switch to different plan
• fix issues
• give special offer
Measuring performance:
• #TP, #FP, #FN
• F-measure?
• ROI
• Compare to baseline
Machine Learning Canvas
BACKGROUND
ENGINE SPECS
INTEGRATION
PREDICTIONS OBJECTIVES DATA
BACKGROUND
ENGINE SPECS
INTEGRATION
PREDICTIONS OBJECTIVES DATA
BACKGROUND End-user Value prop Sources
ENGINE SPECS ML problem Perf eval Preparation
INTEGRATION Using pred Learning model
PREDICTIONS OBJECTIVES DATA
BACKGROUND 1 2 3
ENGINE SPECS 4 5 6
INTEGRATION
html
End-user Value prop
Sources-> events
ML problem Perf eval Features
Using pred Learning model
DASE
Why fill in ML canvas?
• target the right problem for your company
• choose right algorithm, infrastructure, or ML solution
• guide project management
• improve team communication
machinelearningcanvas.com
Recap
• Create value from data with ML!
• Creating and deploying models is easy(er)!
• Good data is essential!
• Use the ML canvas!
• Go to PAPIs Connect!
Some real-world insights
• Models that are easier to maintain cost less
• Need to explain predictions?
• One problem may call for another one…