Machine Learning

Machine Learning

LW/OB presentation

Machine learning ( ML ) is a field concerned with studying and developing algorithms that perform

better at a task as they gain experience

( but mostly I wanted to use this cool picture )

WARNING This presentation is seriously lacking slides,

preparation and cool running examples.

That being said. I know what I’m talking about ;)

What ML is really about…


• ML is about data, and modeling its distribution



• ML is about a tradeoff between model accuracy and predictive power




• ML is about finding simple yet expressive classes of distributions




• ML is about finding simple yet expressive classes of distributions

• ML is about using approximate numerical methods to perform Bayesian update on the training data

ML = intersection of

Data sizes vary…

Data sizes vary…

From a couple kilobytes

Data sizes vary

From a couple kilobytes To petabytes

Type of problems solved

• Supervised

• Unsupervised

• Reinforcement learning

• ( transduction )


• Supervised

– Classification

– Regression

• Unsupervised




• Supervised

• Unsupervised

- Clustering

- Discovering causal links




• Supervised

• Unsupervised


– Learn to perform a task, only from final result


– Not discussed, improve supervised learning with unsupervised samples

Typical applications

• Image, speech, pattern recognition

• Collaborative filtering

• Time series forecasting

• Game playing

• Denoising

• Any task where experience is valuable

Common ML techniques


• Linear regression



• Factor models



• Factor models

• Decision trees



• Factor models

• Decision trees

• Neural networks



• Factor models

• Decision trees

• Neural networks

perceptron, multilayer perceptron with backpropagation, hebbian autoassociative memory, Boltzmann machine, spiking neurons…



• Factor models

• Decision trees

• Neural networks

• SVM’s



• Factor models

• Decision trees

• Neural networks

• SVM’s

• Bayesian networks, white box models…

Meta-Methods

Meta-Methods

– Ensemble forecasting

Meta-Methods


– Bootstrapping, Bagging, model averaging

Meta-Methods



– Boosting

Meta-Methods



– Boosting

– Inductive bias through

Meta-Methods



– Boosting


• Out of sample testing

Meta-Methods



– Boosting


• Out of sample testing

• Minimum description length

Neural networks demystified

Neural networks demystified • Perceptron ( 1957 )


THIS IS…


THIS IS… LINEAR

ALGEBRA!

Neural networks demystified • Perceptron

• Linear separability

Neural networks demystified • Perceptron


8 binary inputs => 1/2212classifications linearly separable



• Multilayered perceptron + backpropagation




( 1969 ~ 1986 )




• Smooth Interpolation

Many more types…

SVM in a nutshell

SVM in a nutshell

• Maximize margin

SVM in a nutshell

• Maximize margin

• Embed in a high dimensional space

Ensemble learning

• Combine predictions through voting ( with classifiers ) or regression to improve prediction

Ensemble learning


• Train on random ( with replacement ) subsets of the data ( bootstrapping )

Ensemble learning


• Train on random ( with replacement ) subsets of the data ( bootstrapping )

• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )

Numerical tricks

Numerical tricks

• Optimization of fit with standard operational search techniques

Numerical tricks


• EM algorithm

Numerical tricks


• EM algorithm

• MCMC methods ( Gibbs sampling, metropolis algorithm… )

A fundamental Bayesian model, the Hidden Markov Model


• Hidden states produce observed states


• Hidden states produce observed states

• Billions of application

– Finance

– Speech recognition

– Swype

– Kinect

– Open heart surgery

– Airplane navigation

Questions I was asked

• How does Boosting work ?

• What is the No Free Lunch Theorem ?

• Writing style recognition

• Signature recognition

• Rule extraction

• Moving odds in response to informed gamblers

• BellKor-Pragmatic Chaos and the Netflix prize

Writing style recognition


• Naïve Bayes ( similar to spam filtering, bag of words approach )



• Clustering of HMM model parameters




• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )




• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )

• Combine with a logistic regression

Signature recognition


• Depends if raster or vector



• Post office uses neural networks, but corpus is gigantic




• Dimensionality reduction is key


• Wavelet on raster image for feature extraction




• Dimensionality reduction is key

• Wavelet on raster image for feature extraction

• Path following then learning on path features ( total variation, average curvature etc )

Rules extraction

• Hard, hypothesis space not smooth

• Decision tree regression

• Genetic Programming ( Koza )

Netflix prize

Netflix prize

• The base (cinematch) = latent semantic model

• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors

• Best team were mergers of good teams

Latent semantic model

• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.

• Features are latent, we only assume the value of K.

• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE

Poker is hard…

Poker is hard…

• Gigantic, yet not continuous state space

• Dimensionality reduction isn’t easy

• High variance

• Possible to make parametric strategies and optimize with ML

• Inputs such as pot odds trivial to compute

Uhuh, slides end here

Sort of… Questions ?

Machine Learning

Technology

bagof words

fundamental

signature

neural networksdemystified

vector post

numerical

data sizes

neural networks