Top Banner
MACHINE LEARNING PIPELINES Evan R. Sparks Graduate Student, AMPLab With: Shivaram Venkataraman, Tomer Kaftan, Gylfi Gudmundsson, Michael Franklin, Benjamin Recht, and others!
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning Pipelines

MACHINE LEARNING PIPELINES

Evan R. SparksGraduate Student, AMPLab

With: Shivaram Venkataraman, Tomer Kaftan, Gylfi Gudmundsson, Michael Franklin, Benjamin Recht, and others!

Page 2: Machine Learning Pipelines

WHAT IS MACHINE LEARNING?

Page 3: Machine Learning Pipelines

–Wikipedia

“Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building

a model based on inputs and using that to make predictions or decisions, rather than following only

explicitly programmed instructions.”

Model

Data

Page 4: Machine Learning Pipelines

ML PROBLEMS• Real data often not ∈ Rd

• Real data not well-behaved according to my algorithm.

• Features need to be engineered.

• Transformations need to be applied.

• Hyperparameters need to be tuned.

SVM Input:

Real Data:

Page 5: Machine Learning Pipelines

• Datasets are huge.

• Distributed computing is hard.

• Mapping common ML techniques to distributed setting may be untenable.

SYSTEMS PROBLEMS

Page 6: Machine Learning Pipelines

WHAT IS MLBASE?

• Distributed Machine Learning - Made Easy!

• Spark-based platform to simplify the development and usage of large scale machine learning.

Page 7: Machine Learning Pipelines

A STANDARD MACHINE LEARNING PIPELINE

Right?

Data TrainClassifier Model

Page 8: Machine Learning Pipelines

A STANDARD MACHINE LEARNING PIPELINE

That’s more like it!

DataTrainLinear

ClassifierModelFeature

Extraction

Test Data

Predictions

Page 9: Machine Learning Pipelines

A REAL PIPELINE FOR IMAGE CLASSIFICATION

Inspired by Coates & Ng, 2012

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

ModelTestData

LabelExtractor

Feature Extractor

Test Error

ErrorComputer

Pooler

Feature Extractor

Page 10: Machine Learning Pipelines

A SIMPLE EXAMPLE

• Load up some images.

• Featurize.

• Apply a transformation.

• Fit a linear model.

• Evaluate on test data. Replicates Fast Food Features Pipeline - Le et. al., 2012

Page 11: Machine Learning Pipelines

PIPELINES API

• A pipeline is made of nodes which have an expected input and output type.

• Nodes fit together in a sensible way.

• Pipelines are just nodes.

• Nodes should be things that we know how to scale.

Page 12: Machine Learning Pipelines

WHAT’S IN THE TOOLBOX?Nodes Images - Patches, Gabor Filters, HoG, Contrast NormalizationText - n-grams, lemmatization, TF-IDF, POS, NER General Purpose - ZCA Whitening, FFT, Scaling, Random Signs, Linear Rectifier, Windowing, Pooling, Sampling, QR DecomopsitionStatistics - Borda Voting, Linear Mapping, Matrix MultiplyML - Linear Solvers, TSQR, Cholesky Solver, MLlibSpeech and more - coming soon!

Pipelines Example pipelines across domains CIFAR, MNIST, ImageNet, ACL Argument Extraction, TIMIT.

Spark

MLlibGraphX ml-matrix Featurizers StatsUtils

Pipelines

Hyper Parameter Tuning Libraries

Stay Tuned!

MLI

Page 13: Machine Learning Pipelines

A REAL PIPELINE FOR IMAGE CLASSIFICATION

Inspired by Coates & Ng, 2012

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

ModelTestData

LabelExtractor

Feature Extractor

Test Error

ErrorComputer

Pooler

Feature Extractor

YOU’RE GOING TO BUILD THIS!!

Page 14: Machine Learning Pipelines

BEAR WITH ME

Photo: Andy Rouse, (c) Smithsonian Institute

Page 15: Machine Learning Pipelines

COMPUTER VISION CRASH COURSE

Page 16: Machine Learning Pipelines

SVM Model

Page 17: Machine Learning Pipelines

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

Page 18: Machine Learning Pipelines

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

Page 19: Machine Learning Pipelines

NORMALIZATION• Moves pixels from [0, 255] to

[-1.0,1.0].

• Why? Math!

• -1*-1 = 1, 1*1 =1

• If I overlay two pixels on each other and they’re similar values, their product will be close to 1 - otherwise, it will be close to 0 or -1.

• Necessary for whitening.

0

255

-1

+1

Page 20: Machine Learning Pipelines

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

Page 21: Machine Learning Pipelines

PATCH EXTRACTION• Image patches become our

“visual vocabulary”

• Intuition from text classification.

• If I’m trying to classify a document as “sports” - I’d look for words like “football”, “batter”, etc.

• For images - classifying pictures as “face” - I’m looking for things that look like eyes, ears, noses, etc.

Visual Vocabulary

Page 22: Machine Learning Pipelines

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

Page 23: Machine Learning Pipelines

CONVOLUTION• A convolution filter applies a weighted

average to sliding patches of data.

• Can be used for lots of things - finding edges, blurring, etc.

• Normalized Input:

• Image, Ear Filter

• Output:

• New image - close to 1 for areas that look like the ear filter.

• Apply many of these simultaneously.

Page 24: Machine Learning Pipelines

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

Page 25: Machine Learning Pipelines

LINEAR RECTIFICATION

• For each feature, x, given some a (=0.25):

• xnew=max(x-a, 0)

• What does it do?

• Removes a bunch of noise.

Page 26: Machine Learning Pipelines

FEATURE EXTRACTION

Data ImageParser Normalizer Convolver

Linear Solver

SymmetricRectifier

PatchExtractor

Patch Whitener

Patch Selector

LabelExtractor

Model

Pooler

Feature Extractor

Page 27: Machine Learning Pipelines

POOLING• convolve(image, k filters) => k

filtered images.

• Lots of info - super granular.

• Pooling lets us break the (filtered) images into regions and sum.

• Think of the “sum” a how much an image quadrant is activated.

• Image summarized into 4*k numbers.

0.5 8

0 2

Page 28: Machine Learning Pipelines

LINEAR CLASSIFICATION

Page 29: Machine Learning Pipelines

WHY LINEAR CLASSIFIERS?They’re simple. They’re fast. They’re well studied. They scale.

With the right features, they do a good job!

Data: A Labels: b Model: x

Hypothesis: Ax = b + error

Find the x, which minimizes the error = |Ax - b|

Page 30: Machine Learning Pipelines

BACK TO OUR PROBLEM• What is A in our problem?

• #images x #features (4f)

• What about x?

• #features x #classes

• For f < 10000, pretty easy to solve!

• Bigger - we have to get creative.

10m 100kx =

100k1k

10m

1k

Page 31: Machine Learning Pipelines

TODAY’S EXERCISE

• Build 3 image classification pipelines - simple, intermediate, advanced.

• Qualitatively (with your eyes) and quantitatively (with statistics) compare their effectiveness.

Page 32: Machine Learning Pipelines

ML PIPELINES

• Reusable, general purpose components.

• Built with distributed data in mind from day 1.

• Used together: give a complex system comprised of well-understood parts.

GO BEARS