Pragmatic machine learning for the real world

@louisdorard

#ParisDataGeeks

–Waqar Hasan, Apigee Insights

“Predictive is the ‘killer app’ for big data.”

–Mike Gualtieri, Principal Analyst at Forrester

“Predictive apps are the next big thing

in app development.”

Machine Learning

Data

BUT

–McKinsey & Co.

“A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics

and machine learning.”

What the @#?~% is ML?

“How much is this house worth? — X $” -> Regression

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000


3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house


4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000


3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house


4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

ML is a set of AI techniques where “intelligence” is built by

referring to examples

“Which type of email is this? — Spam/Ham”-> Classification

WATCH OUT!

• Need examples of inputs AND outputs

• Need enough examples

??

Prediction APIs

HTML / CSS / JavaScript

HTML / CSS / JavaScript

squarespace.com

http://squarespace.com

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

The two methods of prediction APIs:

• TRAIN a model

• PREDICT with a model

The two methods of prediction APIs: • model = create_model(dataset)

• predicted_output = create_prediction(model, new_input)

from bigml.api import BigML

# create a modelapi = BigML()source = api.create_source('training_data.csv')dataset = api.create_dataset(source)model = api.create_model(dataset)

# make a predictionprediction = api.create_prediction(model, new_input)print "Predicted output value: ",prediction['object']['output']

http://bit.ly/bigml_wakari

http://bit.ly/bigml_wakari

Beyond predictive modelling

Phrase problem as ML task

Engineer features

Prepare data (csv)

Learn model

Make predictions

Deploy model & integrate pred

Evaluate model

Measure impact

PRED

ICTIO

N A

PIS

• Deployment to production?

• Maintenance?

• monitor performance

• update with new data

• D: Data preparation

• A: Algorithm

• S: Serving

• E: Evaluation

• Open source

• Spark’s MLlib -> prediction server

• Expose model as (scalable & robust) API

• DASE framework

• Send new data/events to event server

• Send prediction queries to engine

PredictionIO vs Azure ML on KDnuggets

http://www.kdnuggets.com/2015/03/predictionio-open-source-vs-microsoft-azure-machine-learning.html/

Case study: churn analysis

• Who: SaaS company selling monthly subscription

• Question asked: “is this customer going to leave within 1 month?”

• Input: customer

• Output: no-churn (negative) or churn (positive)

• Data collection: history up until 1 month ago

Learning -> OK but

How to represent customers? What to do after predicting churn?

Customer representation:

• basic info (age, income, etc.)

• usage of service (avg call duration, overcharges, leftover minutes/month, etc.)

• interactions with customer support (how many, topics of questions, satisfaction ratings)

Taking action to prevent churn:

• contact customer

• switch to different plan

• fix issues

• give special offer

Measuring performance:

• #TP, #FP, #FN

• F-measure?

• ROI

• Compare to baseline

Machine Learning Canvas

BACKGROUND

ENGINE SPECS

INTEGRATION

PREDICTIONS OBJECTIVES DATA

BACKGROUND

ENGINE SPECS

INTEGRATION


BACKGROUND End-user Value prop Sources

ENGINE SPECS ML problem Perf eval Preparation

INTEGRATION Using pred Learning model


BACKGROUND 1 2 3

ENGINE SPECS 4 5 6

INTEGRATION

html

example: churn

https://github.com/louisdorard/machinelearningcanvas/blob/master/churn.pdf

End-user Value prop

Sources-> events

ML problem Perf eval Features

Using pred Learning model

DASE

Why fill in ML canvas?

• target the right problem for your company

• choose right algorithm, infrastructure, or ML solution

• guide project management

• improve team communication

machinelearningcanvas.com

http://louisdorard.com

Recap

• Create value from data with ML!

• Creating and deploying models is easy(er)!

• Good data is essential!

• Use the ML canvas!

• Go to PAPIs Connect!

http://www.papis.io/connect

Some real-world insights

• Models that are easier to maintain cost less

• Need to explain predictions?

• One problem may call for another one…

papis.io/connect

Discount code: DATAGEEKS

http://papis.io/connect

Pragmatic machine learning for the real world

Technology

house worth

big data

new data d

data preparation

type price

methods of prediction

prediction queries

positive data collection