Top Banner
Machine Learning LW/OB presentation
75

Machine Learning

Nov 02, 2014

Download

Technology

Arthur Breitman

Simple presentation explaining what machine learning is.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning

Machine Learning

LW/OB presentation

Page 2: Machine Learning

Machine learning ( ML ) is a field concerned with studying and developing algorithms that perform

better at a task as they gain experience

( but mostly I wanted to use this cool picture )

Page 3: Machine Learning

WARNING This presentation is seriously lacking slides,

preparation and cool running examples.

That being said. I know what I’m talking about ;)

Page 4: Machine Learning

What ML is really about…

Page 5: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

Page 6: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

Page 7: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

• ML is about finding simple yet expressive classes of distributions

Page 8: Machine Learning

What ML is really about…

• ML is about data, and modeling its distribution

• ML is about a tradeoff between model accuracy and predictive power

• ML is about finding simple yet expressive classes of distributions

• ML is about using approximate numerical methods to perform Bayesian update on the training data

Page 9: Machine Learning

ML = intersection of

Page 10: Machine Learning

Data sizes vary…

Page 11: Machine Learning

Data sizes vary…

From a couple kilobytes

Page 12: Machine Learning

Data sizes vary

From a couple kilobytes To petabytes

Page 13: Machine Learning

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

• ( transduction )

Page 14: Machine Learning

Type of problems solved

• Supervised

– Classification

– Regression

• Unsupervised

• Reinforcement learning

• ( transduction )

Page 15: Machine Learning

Type of problems solved

• Supervised

• Unsupervised

- Clustering

- Discovering causal links

• Reinforcement learning

• ( transduction )

Page 16: Machine Learning

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

– Learn to perform a task, only from final result

• ( transduction )

– Not discussed, improve supervised learning with unsupervised samples

Page 17: Machine Learning

Typical applications

• Image, speech, pattern recognition

• Collaborative filtering

• Time series forecasting

• Game playing

• Denoising

• Any task where experience is valuable

Page 18: Machine Learning

Common ML techniques

Page 19: Machine Learning

Common ML techniques

• Linear regression

Page 20: Machine Learning

Common ML techniques

• Linear regression

• Factor models

Page 21: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

Page 22: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

Page 23: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

perceptron, multilayer perceptron with backpropagation, hebbian autoassociative memory, Boltzmann machine, spiking neurons…

Page 24: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

• SVM’s

Page 25: Machine Learning

Common ML techniques

• Linear regression

• Factor models

• Decision trees

• Neural networks

• SVM’s

• Bayesian networks, white box models…

Page 26: Machine Learning

Meta-Methods

Page 27: Machine Learning

Meta-Methods

– Ensemble forecasting

Page 28: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

Page 29: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

Page 30: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

Page 31: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

• Out of sample testing

Page 32: Machine Learning

Meta-Methods

– Ensemble forecasting

– Bootstrapping, Bagging, model averaging

– Boosting

– Inductive bias through

• Out of sample testing

• Minimum description length

Page 33: Machine Learning

Neural networks demystified

Page 34: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

Page 35: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

THIS IS…

Page 36: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

THIS IS… LINEAR

ALGEBRA!

Page 37: Machine Learning

Neural networks demystified • Perceptron

• Linear separability

Page 38: Machine Learning

Neural networks demystified • Perceptron

• Linear separability

8 binary inputs => 1/2212classifications linearly separable

Page 39: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

Page 40: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

( 1969 ~ 1986 )

Page 41: Machine Learning

Neural networks demystified • Perceptron ( 1957 )

• Linear separability

• Multilayered perceptron + backpropagation

• Smooth Interpolation

Page 42: Machine Learning

Many more types…

Page 43: Machine Learning

SVM in a nutshell

Page 44: Machine Learning

SVM in a nutshell

• Maximize margin

Page 45: Machine Learning

SVM in a nutshell

• Maximize margin

• Embed in a high dimensional space

Page 46: Machine Learning

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

Page 47: Machine Learning

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

• Train on random ( with replacement ) subsets of the data ( bootstrapping )

Page 48: Machine Learning

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

• Train on random ( with replacement ) subsets of the data ( bootstrapping )

• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )

Page 49: Machine Learning

Numerical tricks

Page 50: Machine Learning

Numerical tricks

• Optimization of fit with standard operational search techniques

Page 51: Machine Learning

Numerical tricks

• Optimization of fit with standard operational search techniques

• EM algorithm

Page 52: Machine Learning

Numerical tricks

• Optimization of fit with standard operational search techniques

• EM algorithm

• MCMC methods ( Gibbs sampling, metropolis algorithm… )

Page 53: Machine Learning

A fundamental Bayesian model, the Hidden Markov Model

Page 54: Machine Learning

A fundamental Bayesian model, the Hidden Markov Model

• Hidden states produce observed states

Page 55: Machine Learning

A fundamental Bayesian model, the Hidden Markov Model

• Hidden states produce observed states

• Billions of application

– Finance

– Speech recognition

– Swype

– Kinect

– Open heart surgery

– Airplane navigation

Page 56: Machine Learning

Questions I was asked

• How does Boosting work ?

• What is the No Free Lunch Theorem ?

• Writing style recognition

• Signature recognition

• Rule extraction

• Moving odds in response to informed gamblers

• BellKor-Pragmatic Chaos and the Netflix prize

Page 57: Machine Learning

Writing style recognition

Page 58: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

Page 59: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

Page 60: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

Page 61: Machine Learning

Writing style recognition

• Naïve Bayes ( similar to spam filtering, bag of words approach )

• Clustering of HMM model parameters

• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

• Combine with a logistic regression

Page 62: Machine Learning

Signature recognition

Page 63: Machine Learning

Signature recognition

• Depends if raster or vector

Page 64: Machine Learning

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

Page 65: Machine Learning

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

• Dimensionality reduction is key

Page 66: Machine Learning

Signature recognition

• Wavelet on raster image for feature extraction

Page 67: Machine Learning

Signature recognition

• Depends if raster or vector

• Post office uses neural networks, but corpus is gigantic

• Dimensionality reduction is key

• Wavelet on raster image for feature extraction

• Path following then learning on path features ( total variation, average curvature etc )

Page 68: Machine Learning

Rules extraction

• Hard, hypothesis space not smooth

• Decision tree regression

• Genetic Programming ( Koza )

Page 69: Machine Learning

Netflix prize

Page 70: Machine Learning

Netflix prize

• The base (cinematch) = latent semantic model

• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors

• Best team were mergers of good teams

Page 71: Machine Learning

Latent semantic model

• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.

• Features are latent, we only assume the value of K.

• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE

Page 72: Machine Learning

Poker is hard…

Page 73: Machine Learning

Poker is hard…

• Gigantic, yet not continuous state space

• Dimensionality reduction isn’t easy

• High variance

• Possible to make parametric strategies and optimize with ML

• Inputs such as pot odds trivial to compute

Page 74: Machine Learning

Uhuh, slides end here

Page 75: Machine Learning

Sort of… Questions ?