Machine Learning at Geeky Base

Machine Learning

Kan Ouivirach

Kan Ouivirach

Research & Development Engineer

www.kanouivirach.com

http://www.kanouivirach.com

Outline

• What is Machine Learning?

• Main Types of Learning

• Model Validation, Selection, and Evaluation

• Applied Machine Learning Process

• Cautions

What is Machine Learning?

http://www.bigdata-madesimple.com/

http://www.bigdata-madesimple.com/

–Arthur Samuel (1959)

“Field of study that gives computers the ability to learn without being explicitly programmed.”

–Tom Mitchell (1988)

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its

performance at tasks in T, as measured by P, improves with experience E.”

Statistics vs. Data Mining vs. Machine Learning vs. …?

Programming vs. Machine Learning?

Programming?

“Given a specification of a function f, implement f that meets the specification.”

Machine Learning?

“Given example (x, y) pairs, induce f such that y = f(x) for given pairs and generalizes

well for unseen x”

–Peter Norvig (2014)

Why is Machine Learning so hard?

http://veronicaforand.com/

http://veronicaforand.com/

http://www.thinkgeek.com/product/f0ba/

What do you see?

http://www.thinkgeek.com/product/f0ba/

Dog and Cat?

http://thisvsthatshow.com/

Applications of Machine Learning

• Search Engines

• Medical Diagnosis

• Object Recognition

• Stock Market Analysis

• Credit Card Fraud Detection

• Speech Recognition

• etc.

Recommendation System on Amazon.com

Advertisement System on Facebook.com

Speech Recognition from Microsoft

Robot Localization

https://github.com/mjl/particle_filter_demo

https://github.com/mjl/particle_filter_demo

Main Types of Learning

• Supervised Learning

• Unsupervised Learning

• Reinforcement Learning

Supervised Learning

y = f(x)

Given x, y pairs, find a function f that will map new x to a proper y.

Supervised Learning Problems

• Regression

• Classification

Regression

Linear Regression

y = wx + b

http://thisvsthatshow.com/

Classification

k-Nearest Neighbors

http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/

http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/

Perceptron

Processor

Input 0

Input 1

Output

One or more inputs, a processor, and a single output

Perceptron

https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/

w0x0 + w1x1


Perceptron



Probability Theoryhttps://seisanshi.wordpress.com/tag/probability/

https://seisanshi.wordpress.com/tag/probability/

A2A1 A3 An

Ck

. . .

P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)

P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)

with independence assumption, we then have

Naive Bayes

Naive Bayes

No. Content Spam?

1 Party Yes

2 Sale Discount Yes

3 Party Sale Discount Yes

4 Python Party No

5 Python Programming No

Naive Bayes

P(Spam | Party, Programming) = P(Spam) * P(Party | Spam) * P(Programming | Spam)

P(NotSpam | Party, Programming) = P(NotSpam) * P(Party | NotSpam) * P(Programming | NotSpam)

We want to find if “Party Programming” is spam or not?

We need to know

P(Spam), P(NotSpam)

P(Party | Spam), P(Party | NotSpam)

P(Programming | Spam), P(Programming | NotSpam)

Naive Bayes

No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No

P(Spam) = ? P(NotSpam) = ?

P(Party | Spam) = ? P(Party | NotSpam) = ?

P(Programming | Spam) = ? P(Programming | NotSpam) = ?

Naive Bayes

No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No

P(Spam) = 3/5 P(NotSpam) = 2/5

P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2

P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2

Naive Bayes

P(Spam | Party, Programming) = 3/5 * 2/3 * 0 = 0

P(NotSpam | Party, Programming) = 2/5 * 1/2 * 1/2 = 0.1

P(NotSpam | Party, Programming) > P(Spam | Party, Programming)

“Party Programming” is NOT a spam.

Decision Tree

Outlook

Humidity Wind

SunnyOvercast

Rain

Yes

High Normal Strong Weak

No Yes No Yes

Day Outlook Temp Humidity WInd Play

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Mild High Strong Yes

D4 Rain Cool Normal Strong No

Play tennis?

Support Vector Machines

x

y


x

y

Current Coordinate System

x

z

New Coordinate System

“Kernel Trick”


http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/

3 support vectors

http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/

Unsupervised Learning

f(x)

Given x, find a function f that gives a compact description of x.

Unsupervised Learning

• k-Means Clustering

• Hierarchical Clustering

• Gaussian Mixture Models (GMMs)

k-Means Clustering

http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894

http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894

Anomaly Detection

http://modernfarmer.com/2013/11/farm-pop-idioms/

http://modernfarmer.com/2013/11/farm-pop-idioms/

http://boxesandarrows.com/designing-screens-using-cores-and-paths/

http://boxesandarrows.com/designing-screens-using-cores-and-paths/

Reinforcement Learning

y = f(x)

Given x and z, find a function f that generates y.

z

Flappy Bird Hack using Reinforcement Learninghttp://sarvagyavaish.github.io/FlappyBirdRL/

http://sarvagyavaish.github.io/FlappyBirdRL/

Model Validation

I’ve got a perfect classifiers!

https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti

https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti

http://blog.csdn.net/love_tea_cat/article/details/25972921

Overfitting (High Variance)

Normal fit Overfitting



Underfitting (High Bias)

Normal fit Underfitting


How to Avoid Overfitting and Underfitting

• Using more data does NOT always help.

• Recommend to

• find a good number of features;

• perform cross validation;

• use regularization when overfitting is found.

Model Selection

Model Selection

• Use cross validation to find the best parameters for the model.

Model Evaluation

Metrics

• Accuracy

• True Positive, False Positive, True Negative, False Negative

• Precision and Recall

• F1 Score

• etc.

Precision and Recall

http://en.wikipedia.org/wiki/Precision_and_recall

http://en.wikipedia.org/wiki/Precision_and_recall

Applied Machine Learning Process

http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/

http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/

Define the Problem

https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/

https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/

Prepare Data

http://vpnexpress.net/big-data-use-a-vpn-block-data-collection/

http://vpnexpress.net/big-data-use-a-vpn-block-data-collection

Spot Check Algorithms

https://www.flickr.com/photos/withassociates/4385364607/sizes/l/

https://www.flickr.com/photos/withassociates/4385364607/sizes/l/

If two models fit the data equally well, choose the simpler one.

Improve Results

http://www.mobilemechanicprosaustin.com/

http://www.mobilemechanicprosaustin.com/

Present Results

http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/

http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/

http://newventurist.com/

• Curse of dimensionality

• Correlation does NOT imply causation.

• Learn many models, not just ONE.

• More data beats a cleaver algorithm.

• Data alone are not enough.

A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)

Some Cautions

http://newventurist.com/

— Feature engineering is the key. —

Example of Feature Engineering

Width (m) Length (m) Cost (baht)

100 100 1,200,000

500 50 1,300,000

100 80 1,000,000

400 100 1,500,000

Are the data good to model the area’s cost?

Size (m x m) Cost (baht)

100,000 1,200,000

25,000 1,300,000

8,000 1,000,000

400,00 1,500,000

Engineer features.

They look better here.

Deep Learning at Microsoft’s Speech Group

Let’s get our hands dirty!

https://github.com/zkan/intro-to-machine-learning

https://github.com/zkan/intro-to-machine-learning

Machine Learning at Geeky Base

Data & Analytics