Top Banner
Everyone can do data science import.io webinar 23/9/14 Louis Dorard (@louisdorard)
46

Everyone can do data science — import.io webinar

Nov 28, 2014

Download

Data & Analytics

Louis Dorard

Everyone can do data science with the help of tools such as:
- import.io for visually scraping data from the web
- Pandas to wrangle data in Python
- BigML to apply machine learning to data.

In this presentation , I introduce what machine learning is before moving on to a case study where I show how to build a real estate pricing model. Check out import.io's webinar for the whole thing: http://blog.import.io/post/become-a-data-scientist-in-an-hour
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Everyone can do data science — import.io webinar

Everyone can dodata science"

import.io webinar 23/9/14!

Louis Dorard (@louisdorard)

Page 2: Everyone can do data science — import.io webinar
Page 3: Everyone can do data science — import.io webinar
Page 4: Everyone can do data science — import.io webinar

US real estate portals:"- Realtor - Zillow - Trulia - …

Page 5: Everyone can do data science — import.io webinar

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

Page 6: Everyone can do data science — import.io webinar

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

Page 7: Everyone can do data science — import.io webinar

Let’s create a real estate pricing model

Page 8: Everyone can do data science — import.io webinar

Fabien Durand (@thefabiendurand)

www.louisdorard.com/guest/everyone-can-do-data-science-importio

Page 10: Everyone can do data science — import.io webinar

Data Science:"- domain knowledge - hacking abilities - machine learning

Page 11: Everyone can do data science — import.io webinar

What the @#?~% is ML?

Page 12: Everyone can do data science — import.io webinar
Page 13: Everyone can do data science — import.io webinar

“Which type of email is this? — Spam/Ham”"-> Classification

Page 14: Everyone can do data science — import.io webinar
Page 15: Everyone can do data science — import.io webinar

“How much is this house worth? — X $” -> Regression

Page 16: Everyone can do data science — import.io webinar

Bedrooms Bathrooms Surface (foot²) Year built Type Price ($)

3 1 860 1950 house 565,000

3 1 1012 1951 house

2 1.5 968 1976 townhouse 447,000

4 1315 1950 house 648,000

3 2 1599 1964 house

3 2 987 1951 townhouse 790,0001 1 530 2007 condo 122,0004 2 1574 1964 house 835,000

4 2001 house 855,000

3 2.5 1472 2005 house

4 3.5 1714 2005 townhouse

2 2 1113 1999 condo

1 769 1999 condo 315,000

Page 17: Everyone can do data science — import.io webinar
Page 18: Everyone can do data science — import.io webinar

ML is a set of AI techniques where “intelligence” is built by

referring to examples

Page 19: Everyone can do data science — import.io webinar
Page 20: Everyone can do data science — import.io webinar

??

Page 21: Everyone can do data science — import.io webinar

(McKinsey & Co.)

“A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics

and machine learning.”

Page 22: Everyone can do data science — import.io webinar

(Bret Victor)

Making ML effortless

Page 23: Everyone can do data science — import.io webinar
Page 24: Everyone can do data science — import.io webinar

HTML / CSS / JavaScript

Page 25: Everyone can do data science — import.io webinar

HTML / CSS / JavaScript

Page 26: Everyone can do data science — import.io webinar

squarespace.com

Page 27: Everyone can do data science — import.io webinar
Page 28: Everyone can do data science — import.io webinar
Page 29: Everyone can do data science — import.io webinar

The two phases of machine learning:

• TRAIN a model

• PREDICT with a model

Page 30: Everyone can do data science — import.io webinar

The two methods of prediction APIs:

• TRAIN a model

• PREDICT with a model

Page 31: Everyone can do data science — import.io webinar

The two methods of prediction APIs: • model = create_model(dataset)!

• predicted_output = create_prediction(model, new_input)

Page 32: Everyone can do data science — import.io webinar

from bigml.api import BigML !# create a model!api = BigML()!source = api.create_source('training_data.csv')!dataset = api.create_dataset(source)!model = api.create_model(dataset) !# make a prediction!prediction = api.create_prediction(model, new_input)!print "Predicted output value: ",prediction['object']['output']

http://bit.ly/bigml_wakari

Page 33: Everyone can do data science — import.io webinar
Page 34: Everyone can do data science — import.io webinar

Recap

Page 35: Everyone can do data science — import.io webinar

• Classification and regression

• 2 phases in ML: train and predict

• Prediction APIs make it easy to build models

• Let’s use them on real estate data to predict price from house characteristics

Page 36: Everyone can do data science — import.io webinar

• Encoding domain knowledge

• Making our life easier: restricting data to only 1 city

Page 37: Everyone can do data science — import.io webinar
Page 38: Everyone can do data science — import.io webinar

BigML!

• Look at data

• Split into training and test

• Build model from training

• Evaluate model on test

• Errors: mean absolute error (or percentage?)

Page 39: Everyone can do data science — import.io webinar
Page 40: Everyone can do data science — import.io webinar

Other import.io + BigML use cases:!- Predict ebook rating from description - Predict sales of etsy stores

Page 41: Everyone can do data science — import.io webinar

Talk at #APIconUK!tomorrow in London

Page 42: Everyone can do data science — import.io webinar

ML Algorithm API

Automated Pred. API

Text Classification API

Vertical Pred. API

Fixed-model Pred. API

AB

STRA

CTIO

N

Page 43: Everyone can do data science — import.io webinar
Page 44: Everyone can do data science — import.io webinar
Page 45: Everyone can do data science — import.io webinar

www.louisdorard.com/machine-learning-book 50% off for 24 hours with code “importio”

!

!

@louisdorard

Page 46: Everyone can do data science — import.io webinar