Top Banner
Machine Learning Kan Ouivirach
68
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning at Geeky Base

Machine Learning

Kan Ouivirach

Page 2: Machine Learning at Geeky Base

Kan Ouivirach

Research & Development Engineer

www.kanouivirach.com

Page 3: Machine Learning at Geeky Base

Outline

• What is Machine Learning?

• Main Types of Learning

• Model Validation, Selection, and Evaluation

• Applied Machine Learning Process

• Cautions

Page 4: Machine Learning at Geeky Base

What is Machine Learning?

http://www.bigdata-madesimple.com/

Page 5: Machine Learning at Geeky Base

–Arthur Samuel (1959)

“Field of study that gives computers the ability to learn without being explicitly programmed.”

Page 6: Machine Learning at Geeky Base

–Tom Mitchell (1988)

“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its

performance at tasks in T, as measured by P, improves with experience E.”

Page 7: Machine Learning at Geeky Base

Statistics vs. Data Mining vs. Machine Learning vs. …?

Page 8: Machine Learning at Geeky Base

Programming vs. Machine Learning?

Page 9: Machine Learning at Geeky Base

Programming?

“Given a specification of a function f, implement f that meets the specification.”

Machine Learning?

“Given example (x, y) pairs, induce f such that y = f(x) for given pairs and generalizes

well for unseen x”

–Peter Norvig (2014)

Page 10: Machine Learning at Geeky Base

Why is Machine Learning so hard?

http://veronicaforand.com/

Page 11: Machine Learning at Geeky Base

http://www.thinkgeek.com/product/f0ba/

What do you see?

Page 12: Machine Learning at Geeky Base

Dog and Cat?

http://thisvsthatshow.com/

Page 13: Machine Learning at Geeky Base

Applications of Machine Learning

• Search Engines

• Medical Diagnosis

• Object Recognition

• Stock Market Analysis

• Credit Card Fraud Detection

• Speech Recognition

• etc.

Page 14: Machine Learning at Geeky Base

Recommendation System on Amazon.com

Page 15: Machine Learning at Geeky Base

Advertisement System on Facebook.com

Page 16: Machine Learning at Geeky Base

Speech Recognition from Microsoft

Page 17: Machine Learning at Geeky Base

Robot Localization

https://github.com/mjl/particle_filter_demo

Page 18: Machine Learning at Geeky Base

Main Types of Learning

• Supervised Learning

• Unsupervised Learning

• Reinforcement Learning

Page 19: Machine Learning at Geeky Base

Supervised Learning

y = f(x)

Given x, y pairs, find a function f that will map new x to a proper y.

Page 20: Machine Learning at Geeky Base

Supervised Learning Problems

• Regression

• Classification

Page 21: Machine Learning at Geeky Base

Regression

Page 22: Machine Learning at Geeky Base

Linear Regression

y = wx + b

Page 23: Machine Learning at Geeky Base

http://thisvsthatshow.com/

Classification

Page 24: Machine Learning at Geeky Base

k-Nearest Neighbors

http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/

Page 25: Machine Learning at Geeky Base

Perceptron

Processor

Input 0

Input 1

Output

One or more inputs, a processor, and a single output

Page 26: Machine Learning at Geeky Base

Perceptron

https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/

w0x0 + w1x1

Page 27: Machine Learning at Geeky Base

Perceptron

https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/

Page 28: Machine Learning at Geeky Base

Probability Theoryhttps://seisanshi.wordpress.com/tag/probability/

Page 29: Machine Learning at Geeky Base

A2A1 A3 An

Ck

. . .

P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)

P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)

with independence assumption, we then have

Naive Bayes

Page 30: Machine Learning at Geeky Base

Naive Bayes

No. Content Spam?

1 Party Yes

2 Sale Discount Yes

3 Party Sale Discount Yes

4 Python Party No

5 Python Programming No

Page 31: Machine Learning at Geeky Base

Naive Bayes

P(Spam | Party, Programming) = P(Spam) * P(Party | Spam) * P(Programming | Spam)

P(NotSpam | Party, Programming) = P(NotSpam) * P(Party | NotSpam) * P(Programming | NotSpam)

We want to find if “Party Programming” is spam or not?

We need to know

P(Spam), P(NotSpam)

P(Party | Spam), P(Party | NotSpam)

P(Programming | Spam), P(Programming | NotSpam)

Page 32: Machine Learning at Geeky Base

Naive Bayes

No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No

P(Spam) = ? P(NotSpam) = ?

P(Party | Spam) = ? P(Party | NotSpam) = ?

P(Programming | Spam) = ? P(Programming | NotSpam) = ?

Page 33: Machine Learning at Geeky Base

Naive Bayes

No. Content Spam?1 Party Yes2 Sale Discount Yes3 Party Sale Discount Yes4 Python Party No5 Python Programming No

P(Spam) = 3/5 P(NotSpam) = 2/5

P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2

P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2

Page 34: Machine Learning at Geeky Base

Naive Bayes

P(Spam | Party, Programming) = 3/5 * 2/3 * 0 = 0

P(NotSpam | Party, Programming) = 2/5 * 1/2 * 1/2 = 0.1

P(NotSpam | Party, Programming) > P(Spam | Party, Programming)

“Party Programming” is NOT a spam.

Page 35: Machine Learning at Geeky Base

Decision Tree

Outlook

Humidity Wind

SunnyOvercast

Rain

Yes

High Normal Strong Weak

No Yes No Yes

Day Outlook Temp Humidity WInd Play

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Mild High Strong Yes

D4 Rain Cool Normal Strong No

Play tennis?

Page 36: Machine Learning at Geeky Base

Support Vector Machines

x

y

Page 37: Machine Learning at Geeky Base

Support Vector Machines

x

y

Current Coordinate System

x

z

New Coordinate System

“Kernel Trick”

Page 38: Machine Learning at Geeky Base

Support Vector Machines

http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/

3 support vectors

Page 39: Machine Learning at Geeky Base

Unsupervised Learning

f(x)

Given x, find a function f that gives a compact description of x.

Page 40: Machine Learning at Geeky Base

Unsupervised Learning

• k-Means Clustering

• Hierarchical Clustering

• Gaussian Mixture Models (GMMs)

Page 41: Machine Learning at Geeky Base

k-Means Clustering

http://stackoverflow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894

Page 42: Machine Learning at Geeky Base

Anomaly Detection

http://modernfarmer.com/2013/11/farm-pop-idioms/

Page 43: Machine Learning at Geeky Base

http://boxesandarrows.com/designing-screens-using-cores-and-paths/

Page 44: Machine Learning at Geeky Base

Reinforcement Learning

y = f(x)

Given x and z, find a function f that generates y.

z

Page 45: Machine Learning at Geeky Base

Flappy Bird Hack using Reinforcement Learninghttp://sarvagyavaish.github.io/FlappyBirdRL/

Page 46: Machine Learning at Geeky Base
Page 47: Machine Learning at Geeky Base

Model Validation

Page 48: Machine Learning at Geeky Base

I’ve got a perfect classifiers!

https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti

Page 49: Machine Learning at Geeky Base

http://blog.csdn.net/love_tea_cat/article/details/25972921

Overfitting (High Variance)

Normal fit Overfitting

Page 50: Machine Learning at Geeky Base

http://blog.csdn.net/love_tea_cat/article/details/25972921

Underfitting (High Bias)

Normal fit Underfitting

Page 51: Machine Learning at Geeky Base

How to Avoid Overfitting and Underfitting

• Using more data does NOT always help.

• Recommend to

• find a good number of features;

• perform cross validation;

• use regularization when overfitting is found.

Page 52: Machine Learning at Geeky Base

Model Selection

Page 53: Machine Learning at Geeky Base

Model Selection

• Use cross validation to find the best parameters for the model.

Page 54: Machine Learning at Geeky Base

Model Evaluation

Page 55: Machine Learning at Geeky Base

Metrics

• Accuracy

• True Positive, False Positive, True Negative, False Negative

• Precision and Recall

• F1 Score

• etc.

Page 56: Machine Learning at Geeky Base

Precision and Recall

http://en.wikipedia.org/wiki/Precision_and_recall

Page 57: Machine Learning at Geeky Base

Applied Machine Learning Process

http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/

Page 58: Machine Learning at Geeky Base

Define the Problem

https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/

Page 59: Machine Learning at Geeky Base

Prepare Data

http://vpnexpress.net/big-data-use-a-vpn-block-data-collection/

Page 60: Machine Learning at Geeky Base

Spot Check Algorithms

https://www.flickr.com/photos/withassociates/4385364607/sizes/l/

Page 61: Machine Learning at Geeky Base

If two models fit the data equally well, choose the simpler one.

Page 62: Machine Learning at Geeky Base

Improve Results

http://www.mobilemechanicprosaustin.com/

Page 63: Machine Learning at Geeky Base

Present Results

http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-confidence/presentation-skills-2/

Page 64: Machine Learning at Geeky Base

http://newventurist.com/

• Curse of dimensionality

• Correlation does NOT imply causation.

• Learn many models, not just ONE.

• More data beats a cleaver algorithm.

• Data alone are not enough.

A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)

Some Cautions

Page 65: Machine Learning at Geeky Base

— Feature engineering is the key. —

Page 66: Machine Learning at Geeky Base

Example of Feature Engineering

Width (m) Length (m) Cost (baht)

100 100 1,200,000

500 50 1,300,000

100 80 1,000,000

400 100 1,500,000

Are the data good to model the area’s cost?

Size (m x m) Cost (baht)

100,000 1,200,000

25,000 1,300,000

8,000 1,000,000

400,00 1,500,000

Engineer features.

They look better here.

Page 67: Machine Learning at Geeky Base

Deep Learning at Microsoft’s Speech Group

Page 68: Machine Learning at Geeky Base

Let’s get our hands dirty!

https://github.com/zkan/intro-to-machine-learning