Machine Learning Introduction to Machine Learning Marek Petrik January 26, 2017 Some of the figures in this presentation are taken from ”An Introduction to Statistical Learning, with applications in R” (Springer, 2013) with permission from the authors: G. James, D. Wien, T. Hastie and R. Tibshirani
37
Embed
Machine Learning - Introduction to Machine Learningmpetrik/teaching/intro_ml_17_files/class1.pdf · Machine Learning Introduction to Machine Learning Marek Petrik January 26, 2017
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Machine LearningIntroduction to Machine Learning
Marek Petrik
January 26, 2017
Some of the figures in this presentation are taken from ”An Introduction to Statistical Learning, with applications in R”(Springer, 2013) with permission from the authors: G. James, D. Wien, T. Hastie and R. Tibshirani
What is machine learning?Arthur Samuel (1959, IBM):
Field of study that gives computers the ability to learnwithout being explicitly programmed
The rise of machine learningICML: International Conference on Machine Learning
I World is too complex to model preciselyI Many features are not captured in data setsI Need to allow for errors ε in f :
Y = f(X) + ε
Machine Learning Algorithm
I Input:Training data-set with features and targets
I Output:Prediction function f
Parametric Prediction Methods
Years of Education
Sen
iorit
y
Incom
e
Linear models (linear regression)
income = f(education, seniority) = β0+β1×education+β2×seniority
Why Estimate f?
0 50 100 200 300
51
01
52
02
5
TV
Sa
les
0 10 20 30 40 505
10
15
20
25
Radio
Sa
les
0 20 40 60 80 100
51
01
52
02
5
Newspaper
Sa
les
1. Prediction: Make predictions about future: Best medium mixto spend ad money?
2. Inference: Understand the relationship: What kind of adswork? Why?
Prediction or Inference?
Application Prediction InferenceIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Prediction or Inference?
Application Prediction InferenceIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Statistical View of Machine Learning
I Probability space Ω: Set of all adultsI Random variable: X(ω) = R: Years of educationI Random variable: Y (ω) = R: Salary
10 12 14 16 18 20 22
20
30
40
50
60
70
80
Years of Education
Inco
me
10 12 14 16 18 20 22
20
30
40
50
60
70
80
Years of Education
Inco
me
How Good are Predictions?
I Learned function fI Test data: (x1, y1), (x2, y2), . . .
I Mean Squared Error (MSE):
MSE =1
n
n∑i=1
(yi − f(xi))2
I This is the estimate of:
MSE = E[(Y − f(X))2] =1
|Ω|∑ω∈Ω
(Y (ω)− f(X(ω)))2
I Important: Samples xi are i.i.d.
Do We Need Test Data?I Why not just test on the training data?
0 20 40 60 80 100
24
68
10
12
X
Y
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
Me
an
Sq
ua
red
Err
or
I Flexibility is the degree of polynomial being fitI Gray line: training error, red line: testing error
Bias-Variance Decomposition
Y = f(X) + ε
Mean Squared Error can be decomposed as:
MSE = E(Y − f(X))2 = Var(f(X))︸ ︷︷ ︸Variance
+ (E(f(X)))2︸ ︷︷ ︸Bias
+ Var(ε)
I Bias: How well would method work with infinite dataI Variance: How much does output change with dierent data
sets
Bias-Variance Trade-o
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
2 5 10 20
05
10
15
20
Flexibility
MSEBiasVar
Types of Function f
Regression: continuous target
f : X → R
Years of Education
Sen
iorit
y
Incom
e
Classification: discrete target
f : X → 1, 2, 3, . . . , k
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o oo
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2
Regression or Classification?
Application Regression ClassificationIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Regression or Classification?
Application Regression ClassificationIdentify risk of geing a diseasePredict eectiveness of a treatmentRecognize hand-wrien textSpeech recognitionPredict probability of an employee leaving
Error Rate In Classification
I Learned function fI Test data: (x1, y1), (x2, y2), . . .
1
n
n∑i=1
I(yi 6= f(xi))
I Bayes classifier: assign each observation to the most likely class
f(x) = Pr[Y = j | X = x]
I Bayes classifier would require known true function fI Lower bound on the error
KNN: K-Nearest Neighbors
I Bayes classifier can only predict the values x in the training setI Idea: Use similar training points when making predictions
o
o
o
o
o
oo
o
o
o
o
o o
o
o
o
o
oo
o
o
o
o
o
I Non-parametric method (unlike regression)
KNN: Choosing k
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
KNN: K=1 KNN: K=100
KNN: Training and Test Errors
0.01 0.02 0.05 0.10 0.20 0.50 1.00
0.0
00.0
50.1
00.1
50.2
0
1/K
Err
or
Rate
Training Errors
Test Errors
R Language
I Download and install R: http://cran.r-project.prgI Try using RStudio as an R IDEI Read the R lab: ISL 2.3I Use Piazza for questions