Top Banner
Machine Learning OSU EECS, Fall 2017 Prof. Liang Huang [email protected] http://classes.engr.oregonstate.edu/eecs/fall2017/cs534/
23

Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Jun 23, 2018

Download

Documents

hoangkhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Warm-Up Questions• what are the geometric interpretations of

• eigenvector

• covariance matrix

• 0% quiz on Tuesday

2

Page 3: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Machine Learning is Everywhere• “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates)

• Machine learning is the hot new thing” (John Hennessy, President, Stanford)

• “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo)

3

Page 4: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

The Future of Software Engineering

• “See when AI comes, I’ll be long gone (being replaced by autonomous cars) but the programmers in those companies will be too, by automatic program generators.” --- an Uber driver to an ML prof

4

Page 5: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Liang Huang (USC)

Machine Learning Failures

5

liang’s rule: if you see “X carefully” in China,

just don’t do it.

Page 6: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Liang Huang (ACL Group)

Machine Learning Failures

6

Page 7: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Liang Huang (ACL Group)

Machine Learning Failures

7clear evidence that MT is used in real life.

Page 8: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

What is Machine Learning

• Machine Learning = Automating Automation

• Getting computers to program themselves

• Let the data do the work instead!

8

Output

Traditional Programming

Machine Learning

ComputerInput

Program

ComputerInput

OutputProgram

Page 9: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Magic?

No, more like gardening

• Seeds = Algorithms

• Nutrients = Data

• Gardener = You

• Plants = Programs

Page 10: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

ML in a Nutshell

• Tens of thousands of machine learning algorithms

• Hundreds new every year

• Every machine learning algorithm has three components:

–Representation–Evaluation–Optimization

Page 11: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Representation

• Separating Hyperplanes

• Support vectors

• Decision trees

• Sets of rules / Logic programs

• Instances (Nearest Neighbor)

• Graphical models (Bayes/Markov nets)

• Neural networks

• Model ensembles

• Etc.

Page 12: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Evaluation

• Accuracy

• Precision and recall

• Squared error

• Likelihood

• Posterior probability

• Cost / Utility

• Margin

• Entropy

• K-L divergence

• Etc.

Page 13: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Optimization

• Combinatorial optimization

• E.g.: Greedy search, Dynamic programming

• Convex optimization

• E.g.: Gradient descent, Coordinate descent

• Constrained optimization

• E.g.: Linear programming, Quadratic programming

Page 14: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Gradient Descent• if learning rate is too big, it’ll diverge

• if learning rate is too small, it’ll converge very slowly

14

Page 15: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Types of Learning

• Supervised (inductive) learning

• Training data includes desired outputs

• Unsupervised learning

• Training data does not include desired outputs

• Semi-supervised learning

• Training data includes a few desired outputs

• Reinforcement learning

• Rewards from sequence of actions

Page 16: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Supervised Learning

• Given examples (X, f(X)) for an unknown function f

• Find a good approximation of function f

• Discrete f(X): Classification (binary, multiclass, structured)

• Continuous f(X): Regression

Page 17: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

When is Supervised Learning useful

• when there is no human expert

• input x: bond graph for a new molecule

• output f(x): predicted binding strength to AIDS protease

• when humans can perform the task but can’t describe it

• computer vision: face recognition, OCR

• where the desired function changes frequently

• stock price prediction, spam filtering

• where each user needs a customized function

• speech recognition, spam filtering17

Page 18: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Classification• input X: feature representation (“observation”)

18

overfittingunderfitting

Page 19: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Linear Classification

• Q1: how to learn a separating hyperplane

• Q2: how to learn the optimal separating hyperplane

• Q3: what if the data is NOT linearly separable

19

potential overfittingin very high dimensions

low-dimension high-dimension

Page 20: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Regression• linear and non-linear regression

• overfitting and underfitting (same as in classification)

• how to choose the optimal model complexity?

20

overfitting

underfitting underfitting

Page 21: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Training, Test, & Generalization Error

• but you don’t know test data a priori

• generalization error: prob. of error on possible test data

• use held-out training data to “simulate” test-data

21

overfittingunderfitting

Page 22: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

Ways to Prevent Overfitting

• held-out data to simulate generalization error

• more data points (overfitting is more likely on small data)

• assuming same model complexity

• regularization (explicit control of model complexity)

22

polynomials of degree 9

Page 23: Machine Learning - Oregon State Universityweb.engr.oregonstate.edu/~huanlian/teaching/machine-learning/2017... · •0% quiz on Tuesday 2. ... • “Web rankings today are mostly

What We’ll Cover

• Supervised learning• Linear Classification (Perceptron)

• Linear Regression

• Logistic Regression

• Support Vector Machines

• Instance-based learning (e.g. Nearest Neighbors)

• Structured Prediction

• Unsupervised learning• Clustering (k-means, EM)

• Dimensionality reduction (PCA etc.)