Top Banner
Deep learning made doubly easy with reusable deep features Carlos Guestrin Dato, CEO University of Washington, Amazon Prof. of ML
55

Strata London - Deep Learning 05-2015

Apr 10, 2017

Download

Data & Analytics

Turi, Inc.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Strata London - Deep Learning 05-2015

Deep learning made doubly easy with reusable deep features

Carlos GuestrinDato, CEOUniversity of Washington, Amazon Prof. of ML

Page 2: Strata London - Deep Learning 05-2015

Successful apps in 2015 must be

intelligent

Machine learning

key to next-gen apps

• Recommenders • Fraud detection• Ad targeting• Financial models• Personalized

medicine • Churn prediction• Smart UX

(video & text)• Personal assistants• IoT• Socials nets• …

Last decade: Data management

Now: Intelligent apps

?Last 5 years:

Traditional analytics

Page 3: Strata London - Deep Learning 05-2015

The ML pipeline circa 2013

DATAML

Algorithm

My curve is better

than your curve

Write a paper

Page 4: Strata London - Deep Learning 05-2015

2015: Production ML pipeline

DATA

Your Web Service or

Intelligent App

MLAlgorith

m

Data cleaning

& feature

eng

Offline eval &

Parameter search

Deploy model

Data engineering Data intelligence Deployment

Using deep learning

Goal: Platform to help implement, manage, optimize entire pipeline

Page 5: Strata London - Deep Learning 05-2015

Today’s talk

Features in ML

Neural networks

Deep learning

for computer

vision

Deep learning made

easy with deep

features

Applications to text

data

Deployment in

production

Page 6: Strata London - Deep Learning 05-2015

Features are key to machine learning

Page 7: Strata London - Deep Learning 05-2015

7

Simple example: Spam filtering• A user walks into an email…

- Will she thinks its spam?

• What’s the probability email is spam?

Text of email

User info

Source info

Input: x

MODELYes!

No

Output: Probability of y

Page 8: Strata London - Deep Learning 05-2015

8

Feature engineering: the painful black art of transforming raw inputs into useful inputs for ML algorithm

• E.g., important words, stemming text, complex transformation of inputs,…

MODELYes!

No

Output: Probability of y

Feature extractio

n

Features: Φ(x)

Text of email

User info

Source info

Input: x

Page 9: Strata London - Deep Learning 05-2015

Neural networks

Learning *very* non-linear features

Page 10: Strata London - Deep Learning 05-2015

10

Linear classifiers• Most common classifier

- Logistic regression- SVMs- …

• Decision correspond to hyperplane:- Line in high dimensional

space

w 0 +

w1 x

1 + w

2 x2 =

0

w0 + w1 x1 + w2 x2 > 0 w0 + w1 x1 + w2 x2 < 0

Page 11: Strata London - Deep Learning 05-2015

11

Graph representation of classifier:useful for defining neural networks

x1x2

xd

y…

1 w0

w1

w2

w d

w0 + w1 x1 + w2 x2 + … + wd xd

> 0, output 1

< 0, output 0

Input Output

Page 12: Strata London - Deep Learning 05-2015

12

What can a linear classifier representx1 OR x2 x1 AND x2

x1x2

1

y

-0.511

x1x2

1

y

-1.511

Page 13: Strata London - Deep Learning 05-2015

13

What can’t a simple linear classifier represent?

XOR the counterexample

to everything

Need non-linear features

Page 14: Strata London - Deep Learning 05-2015

Solving the XOR problem: Adding a layer

XOR = x1 AND NOT x2 OR NOT x1 AND x2

z1

-0.5

1

-1

z1 z2

z2

-0.5

-1

1

x1

x2

1

y

1 -0.5

1

1

Thresholded to 0 or 1

Page 15: Strata London - Deep Learning 05-2015

15

A neural network• Layers and layers and layers of linear models and non-linear

transformation

• Around for about 50 years- Fell in “disfavor” in 90s

• In last few years, big resurgence- Impressive accuracy on a several benchmark problems- Powered by huge datasets, GPUs, & modeling/learning alg

improvements

x1

x2

1

z1

z2

1

y

Page 16: Strata London - Deep Learning 05-2015

Applications to computer vision(or the deep devil is in the deep details)

Page 17: Strata London - Deep Learning 05-2015

17

Image features• Features = local detectors

- Combined to make prediction- (in reality, features are more low-level)

Face!

Eye

Eye

Nose

Mouth

Page 18: Strata London - Deep Learning 05-2015

18

Many hand create features exist…

Page 19: Strata London - Deep Learning 05-2015

19

Standard image classification approachInput Extract features Use simple classifier

e.g., logistic regression, SVMs

Car?

Page 20: Strata London - Deep Learning 05-2015

20

Many hand create features exist…

… but very painful to design

Page 21: Strata London - Deep Learning 05-2015

21

Use neural network to learn features Each layer learns features, at different levels of abstraction

Page 22: Strata London - Deep Learning 05-2015

22

Many tricks needed to work well… • Different types of layers, connections,… needed for high accuracy

Krizhevsky et al. ‘12

Page 23: Strata London - Deep Learning 05-2015

Sample performance results

Page 24: Strata London - Deep Learning 05-2015

Sample results• Traffic sign recognition

(GTSRB)- 99.2% accuracy

• House number recognition (Google)- 94.3% accuracy

30

Page 25: Strata London - Deep Learning 05-2015

Krizhevsky et al. ’12: 60M parameters, won 2012 ImageNet competition

31

Page 26: Strata London - Deep Learning 05-2015

32

ImageNet 2012 competition: 1.5M images, 1000 categories

32

Page 27: Strata London - Deep Learning 05-2015

33 33

Page 28: Strata London - Deep Learning 05-2015

34

Application to scene parsing

Page 29: Strata London - Deep Learning 05-2015

Challenges of deep learning

Page 30: Strata London - Deep Learning 05-2015

Deep learning score cardPros• Enables learning of features

rather than hand tuning

• Impressive performance gains on- Computer vision- Speech recognition- Some text analysis

• Potential for much more impact

Cons

Page 31: Strata London - Deep Learning 05-2015

Deep learning workflow

Lots of labeled

data

Training set

Validation set

80%

20%

Learn deep

neural net model

Validate

Adjust hyper-parameters,

model architecture,…

Page 32: Strata London - Deep Learning 05-2015

Deep learning score cardPros• Enables learning of features

rather than hand tuning

• Impressive performance gains on- Computer vision- Speech recognition- Some text analysis

• Potential for much more impact

Cons• Computationally really expensive• Requires a lot of data for high

accuracy• Extremely hard to tune

- Choice of architecture- Parameter types- Hyperparameters- Learning algorithm- …

• Computational + so many choices = incredibly hard to tune

Page 33: Strata London - Deep Learning 05-2015

Deep features: Deep

learning

+ Transfer

learning

Page 34: Strata London - Deep Learning 05-2015

40

Change image classification approach?Input Extract features Use simple classifier

e.g., logistic regression, SVMs

Car?Can we learn features

from data, even when

we don’t have data or time?

Page 35: Strata London - Deep Learning 05-2015

41

Transfer learning:Use data from one domain to help learn on another

Lots of data:Learn

neural netGreat

accuracy on cat v. dogvs.

Some data: Neural net as feature extractor

+Simple classifier

Great accuracy on 101

categories

Old idea, explored for deep learning by Donahue et al. ’14

Page 36: Strata London - Deep Learning 05-2015

42

What’s learned in a neural netNeural net trained for Task 1: cat vs. dog

Very specific to Task 1Should be ignored for other tasks

More genericCan be used as feature extractor

vs.

Page 37: Strata London - Deep Learning 05-2015

43

Transfer learning in more detail…

Neural net trained for Task 1: cat vs. dog

Very specific to Task 1Should be ignored for other tasks

More genericCan be used as feature extractor

Keep weights fixed!

For Task 2, predicting 101 categories, learn only end partUse simple classifiere.g., logistic regression, SVMs

Class?

Page 38: Strata London - Deep Learning 05-2015

44

Careful where you cut…Last few layers tend to be too specific

Too specific for car

detectionUse

these!

Page 39: Strata London - Deep Learning 05-2015

Transfer learning with deep features

Training set

Validation set

80%

20%

Learn simple model

Some labeled

data

Extract features

with neural net trained on different

task Validat

e Deploy in production

Page 40: Strata London - Deep Learning 05-2015

How general are deep features?

Page 41: Strata London - Deep Learning 05-2015

Applications to text data

Page 42: Strata London - Deep Learning 05-2015

Simple text classification with bag of wordsaardvark0about 2all 2Africa 1apple 0anxious 0...gas 1...oil 1…Zaire 0

Use simple classifiere.g., logistic regression, SVMs

Class?

One “feature” per word

Page 43: Strata London - Deep Learning 05-2015

Word2Vec: Neural network for finding high dimensional representation per word Mikolov et al. ‘13

Skip-gram Model: From a word, predict nearby words in sentence

Awesome learning talk at Strata

deep

300 dim representatio

n

300 dim representatio

n

300 dim representatio

n

300 dim representatio

n

300 dim representatio

n

300 dim representatio

n

Neural net

Viewed as deep features

Page 44: Strata London - Deep Learning 05-2015

50

Related words placed nearby high dim space

Projecting 300 dim space into 2 dim with PCA (Mikolov et al. ’13)

Page 45: Strata London - Deep Learning 05-2015

Classifier:e.g., logistic regression, SVMs with300 x number_of_words parameters

Class?

Embed each

word into 300 dim space

Text classification with word embeddingsaardvark0about 2all 2Africa 1apple 0anxious 0...gas 1...oil 1…Zaire 0

Page 46: Strata London - Deep Learning 05-2015

Practical example

Page 47: Strata London - Deep Learning 05-2015

Blog corpus HahaYeaHahahaHahahLisxcUmmHehelaughingoutloud

LOLClosest words

in 300 dim

Predicts gender of author with 79% accuracy

Page 48: Strata London - Deep Learning 05-2015

Deploying ML in production

Page 49: Strata London - Deep Learning 05-2015

55

DATAML

Algorithm

Deployment?

• Write spec, other team implements in ‘production’ languageo 6-12 monthso Stale/irrelevant model/approacho 2 teams maintaining 2 systems

Custom Model

Data Engineers, Data Architects, DevOps,

App Developers

AppAPI

Data Scientist

Page 50: Strata London - Deep Learning 05-2015

ML deployment requirements

56

Easy to integrate

Rest APIScalable

Fault tolerant

FlexibleAny model, any Python

AppAPI

API

CACHE

API

CACHE

API

CACHE

LB GLC Model

GLC Model

GLC ModelDato

Models

DatoModels

DatoModels

API

CACHE

API

CACHE

API

CACHE

LB GLC Model

GLC Model

GLC ModelPytho

n

Python

Python

Page 51: Strata London - Deep Learning 05-2015

57

Do-It-Yourself• Web Service layer:

- Tornado, Flask, Keen, Django, …• Caching layer:

- Redis, Cassandra, Memcached, DynamoDb, MySQL, …

• Logs: - Logback, LogStash, Splunk, Loggly, …

• Metrics: - AWS CloudWatch, Mixpanel, Librato, …

API

CACHE

API

CACHE

API

CACHE

LB GLC Model

GLC Model

GLC ModelPython

Python

PythonApp

Page 52: Strata London - Deep Learning 05-2015

58

… or use Dato Predictive Services

Your Web Service or

Intelligent AppML Model

DatoPredictive ServicesCaching Layer

Predictive Object Server

Serves predictions in a robust, scalable, incremental fashion

BetterML Model

Serve any model: GraphLab Create, scikit-learn, Python, …

Page 53: Strata London - Deep Learning 05-2015

• Out-of-core computation• Tools for feature engineering• Rich data type support

• Models built for scale• App-oriented toolkits• Advanced ML & Extensible

• Deploy models as low-latency REST services• Same code for distributed computation• Elastically scale up or out with one command• Job monitoring & model management• Deploy existing Python code & models• Run on AWS EC2 or Hadoop Yarn

SGraphCreate Engine

SFrameCanvas

Machine Learning Toolkits SDK

GraphLab Create Dato DistributedDato Predictive Services

Predictive Engine

REST Client DirectModel Mgmt

Clean Learn Deploy

Distributed Engine

DirectJob ClientJob Mgmt

Dato Platform

Page 54: Strata London - Deep Learning 05-2015

Summary

Page 55: Strata London - Deep Learning 05-2015

Deep learning made easy with deep featuresDeep learning: exciting ML development

Slow, lots of tuning, needs lots of data

Deep features: reuse deep models for new domains

Needs less data Faster training times

Much simpler tuning

Can still achieve excellent performance