Machine Learning using Matlab Lecture 8 Advice on ML application
Machine Learning using Matlab
Lecture 8 Advice on ML application
Presentation schedule
Time slot
10:00 - 10:20 Presentation 1
10:25 - 10:45 Presentation 2
10:50 - 11:10 Presentation 3
11:15 - 11:35 Presentation 4
● 20 minutes for each group (15 minutes talk, and 5 minutes questions)
● Each member should give at least 3 minutes talk
Outline● Evaluating your machine learning model● Bias vs. variance
○ Feature parameter, e.g., degree of polynomial in linear regression○ Regularization parameter, e.g., C in SVM○ Size of training examples
● Handling skewed/unbalanced classes
Debugging a learning modelSuppose you have implemented regularized linear regression to predict housing prices:
However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?
● Get more training examples● Try smaller sets of features● Try getting additional features● Try adding polynomial features● Try decreasing lambda● Try increasing lambda
Evaluate your modelTo evaluate the performance of your ML model, you should:
● Divide your dataset into training set (70%) and test set (30%)● Learn hypothesis from from training data, namely,● Predict results on test set and measure the performance of your model
Example - linear regressionSize Price
2104 400
1600 330
2400 369
1416 232
3000 540
1985 300
1534 315
1427 199
1380 212
1494 243
Randomly shuffled
Size Price
2104 400
2400 369
1416 232
3000 540
1534 315
1427 199
1380 212
Training set (70%)
Size Price
1600 330
1985 300
1494 243
Test set (30%)
Example - linear regression
Size Price
2104 400
2400 369
1416 232
3000 540
1534 315
1427 199
1380 212
Training set
Minimize the following cost function using the training set
optimal
Example - linear regression
Size Price
1600 330
1985 300
1494 243
Test setMean squared error
Question: how to evaluate the performance of a logistic regression model?
Parameter selectionTake linear regression as an example, you may need to choose the degree of polynomial (d), i.e.,
● You tried d from 1 to 10, and you find d = 3 have the lowest mean square error in test data. So you claim d = 3 is the optimal parameter of your model. Anything wrong?
If you apply your model to other data, the performance may decrease as the parameter is fit to the test data. Namely, you don’t know how well your model is generalized to other examples.
Parameter selection (cont.)To select the optimal parameters, there are two options:
● K-fold Cross Validation (CV) when you have a small data● Divide your data into three parts (training, validation, and test) when you have
a big data
K-fold Cross Validation● Divide your training set into K parts● Each iteration you pick (K-1) parts for
training, and pick the rest part for testing, measure the performance
● Average the performance from those K iterations.
Parameter selection with K-fold CV● Procedures:
○ For each parameter, e.g, degree of polynomial, compute the average performance use K-fold CV
○ Pick the parameter that reports the best average performance
● Pros and cons:○ Less bias○ Computational intensive (train K ⨉ d times)
Parameter selection with big data● Procedures:
○ Divide your dataset into three parts: training set (60%), validation set (20%), and test set (20%)
○ Train your model with training set, and measure the performance on validation set with different parameters, choose the optimal parameter. I.e., the parameter that has the best performance on validation set
○ Measure the performance of your model on test set with the optimal parameter
● Pros and cons:○ Less computational cost (train d times)○ More bias
Example - bias vs. variance on regression
Underfitting, high bias Just right Overfitting, high variance
size size size
pric
e
pric
e
pric
e
Bias vs. variance on degree of polynomialUsing the mean squared error which is defined before, we have training error and validation error:
Q: if we change the degree of polynomial d, what will the training error and validation error look like?
Diagnosing bias vs. variance on degree of polynomialSuppose your machine learning model is performing less well than you were hoping. Is it a bias problem or a variance problem?
● Bias (underfitting): both training error and validation error are high
● Variance (overfitting): training error ≫ test error
degree of polynomial
erro
r
training error
validation error
Let’s fix the degree of polynomial d = 4, what will the hypothesis look like with different values of lambda?
Bias vs. variance on regularization
size
pric
e
size
pric
e
size
pric
e
small intermediate large
Diagnosing bias vs. variance on regularizationIf we change the value of regularization parameter , what will the training error and validation error look like?
erro
rtraining error
validation error
“Just right”
Q: now you try to tune degree of polynomial d and regularization parameter , what should you do?
Grid search● Pick a bunch of values of parameter A● Pick a bunch of values of parameter B● For each pair of parameter A and B, evaluate the
validation error, either K-fold CV on training set or testing on validation set.
● Pick the pair that gives the minimum value of the validation error
Grid search - regularized linear regression
0.05 2 10
2 0.22 0.10 0.34
4 0.32 0.05 0.21
6 0.52 0.12 0.43
d
optimal parameters (4,2)
Bias vs. variance on size of dataIf a learning algorithm is suffering from high bias, what will the training error and validation error look like when increasing training examples?
size
pric
e
size
pric
e
No. of training examples
erro
r
validation error
training error
Increasing number of training examples will not help much if high bias
high error
Bias vs. variance on size of training examplesIf a learning algorithm is suffering from high variance, what will the training error and validation error look like when increasing number of training examples?
size
pric
e
No. of training examples
erro
r validation error
training errorsize
pric
e
Increasing number of training examples is likely to help if high variance
Debugging a learning modelSuppose you have implemented regularized linear regression to predict housing prices:
However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?
● Get more training examples ➡ fixes high variance● Try smaller sets of features ➡ fixes high variance● Try getting additional features ➡ fixes high bias● Try adding polynomial features ➡ fixes high bias● Try decreasing lambda ➡ fixes high bias● Try increasing lambda ➡ fixes high variance
Is your error metric fair?Suppose you have trained a logistic regression model to predict cancer. In your test set, only 0.5% of patients have cancer (skewed classes). You got 1% error on test set. Is your model a good classifier?
Positive example (1) - patient have cancerNegative example (0) - patient no cancer
Function y = predictCancer(x)y = 0;
end
You will achieve 0.5% error without doing anything!
Precision/Recall
Predicted condition
Total population Positive Negative
True condition Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
Precision: of all patients where we predicted have cancer, what fraction of patients actually have cancer.
Recall: of all patients that actually have cancer, what fraction of patients did we correctly detected as having cancer.
Tradeoff between precision and recall● Logistic regression:
○ Predict 1 if ○ Predict 0 if
● Suppose we want to predict cancer (y = 1) only if very confident
○ Higher precision, lower recall (large threshold)
● Suppose we want to avoid missing too much cases of cancer (avoid false negatives)
○ Higher recall, lower precision (small threshold)
● Generate the curve by tuning thresholds
Recall
Pre
cisi
on
Large threshold
Small threshold
F1-measureSuppose you have the precision and recall of three learning algorithms, which one is better? Precision Recall
Algorithm 1 0.6 0.3
Algorithm 2 0.2 0.9
Algorithm 3 0.9 0.1
Algorithm 1 has the highest F1-measure
SummaryThe procedure of a machine learning project:
1. Collect data and divide it into training, validation, and test sets.2. Choose the machine learning model you would like to use3. Select the optimal parameters by means of training and validation sets4. With the optimal parameters, predict results on test set5. Measure and analyze your result, improve your model if possible6. Write your project report