Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50

Machine Learning using Matlab

Lecture 8 Advice on ML application

Presentation schedule

Time slot

10:00 - 10:20 Presentation 1




● 20 minutes for each group (15 minutes talk, and 5 minutes questions)

● Each member should give at least 3 minutes talk

Outline● Evaluating your machine learning model● Bias vs. variance

○ Feature parameter, e.g., degree of polynomial in linear regression○ Regularization parameter, e.g., C in SVM○ Size of training examples

● Handling skewed/unbalanced classes

Debugging a learning modelSuppose you have implemented regularized linear regression to predict housing prices:

However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?

● Get more training examples● Try smaller sets of features● Try getting additional features● Try adding polynomial features● Try decreasing lambda● Try increasing lambda

Evaluate your modelTo evaluate the performance of your ML model, you should:

● Divide your dataset into training set (70%) and test set (30%)● Learn hypothesis from from training data, namely,● Predict results on test set and measure the performance of your model

Example - linear regressionSize Price

2104 400

1600 330

2400 369

1416 232

3000 540

1985 300

1534 315

1427 199

1380 212

1494 243

Randomly shuffled

Size Price

2104 400

2400 369

1416 232

3000 540

1534 315

1427 199

1380 212

Training set (70%)

Size Price

1600 330

1985 300

1494 243

Test set (30%)

Example - linear regression

Size Price

2104 400

2400 369

1416 232

3000 540

1534 315

1427 199

1380 212

Training set

Minimize the following cost function using the training set

optimal

Example - linear regression

Size Price

1600 330

1985 300

1494 243

Test setMean squared error

Question: how to evaluate the performance of a logistic regression model?

Parameter selectionTake linear regression as an example, you may need to choose the degree of polynomial (d), i.e.,

● You tried d from 1 to 10, and you find d = 3 have the lowest mean square error in test data. So you claim d = 3 is the optimal parameter of your model. Anything wrong?

If you apply your model to other data, the performance may decrease as the parameter is fit to the test data. Namely, you don’t know how well your model is generalized to other examples.

Parameter selection (cont.)To select the optimal parameters, there are two options:

● K-fold Cross Validation (CV) when you have a small data● Divide your data into three parts (training, validation, and test) when you have

a big data

K-fold Cross Validation● Divide your training set into K parts● Each iteration you pick (K-1) parts for

training, and pick the rest part for testing, measure the performance

● Average the performance from those K iterations.

Parameter selection with K-fold CV● Procedures:

○ For each parameter, e.g, degree of polynomial, compute the average performance use K-fold CV

○ Pick the parameter that reports the best average performance

● Pros and cons:○ Less bias○ Computational intensive (train K ⨉ d times)

Parameter selection with big data● Procedures:

○ Divide your dataset into three parts: training set (60%), validation set (20%), and test set (20%)

○ Train your model with training set, and measure the performance on validation set with different parameters, choose the optimal parameter. I.e., the parameter that has the best performance on validation set

○ Measure the performance of your model on test set with the optimal parameter

● Pros and cons:○ Less computational cost (train d times)○ More bias

Example - bias vs. variance on regression

Underfitting, high bias Just right Overfitting, high variance

size size size

pric

e

pric

e

pric

e

Bias vs. variance on degree of polynomialUsing the mean squared error which is defined before, we have training error and validation error:

Q: if we change the degree of polynomial d, what will the training error and validation error look like?

Diagnosing bias vs. variance on degree of polynomialSuppose your machine learning model is performing less well than you were hoping. Is it a bias problem or a variance problem?

● Bias (underfitting): both training error and validation error are high

● Variance (overfitting): training error ≫ test error

degree of polynomial

erro

r

training error

validation error

Let’s fix the degree of polynomial d = 4, what will the hypothesis look like with different values of lambda?

Bias vs. variance on regularization

size

pric

e

size

pric

e

size

pric

e

small intermediate large

Diagnosing bias vs. variance on regularizationIf we change the value of regularization parameter , what will the training error and validation error look like?

erro

rtraining error

validation error

“Just right”

Q: now you try to tune degree of polynomial d and regularization parameter , what should you do?

Grid search● Pick a bunch of values of parameter A● Pick a bunch of values of parameter B● For each pair of parameter A and B, evaluate the

validation error, either K-fold CV on training set or testing on validation set.

● Pick the pair that gives the minimum value of the validation error

Grid search - regularized linear regression

0.05 2 10

2 0.22 0.10 0.34

4 0.32 0.05 0.21

6 0.52 0.12 0.43

d

optimal parameters (4,2)

Bias vs. variance on size of dataIf a learning algorithm is suffering from high bias, what will the training error and validation error look like when increasing training examples?

size

pric

e

size

pric

e

No. of training examples

erro

r

validation error

training error

Increasing number of training examples will not help much if high bias

high error

Bias vs. variance on size of training examplesIf a learning algorithm is suffering from high variance, what will the training error and validation error look like when increasing number of training examples?

size

pric

e

No. of training examples

erro

r validation error

training errorsize

pric

e

Increasing number of training examples is likely to help if high variance

Debugging a learning modelSuppose you have implemented regularized linear regression to predict housing prices:

However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?

● Get more training examples ➡ fixes high variance● Try smaller sets of features ➡ fixes high variance● Try getting additional features ➡ fixes high bias● Try adding polynomial features ➡ fixes high bias● Try decreasing lambda ➡ fixes high bias● Try increasing lambda ➡ fixes high variance

Is your error metric fair?Suppose you have trained a logistic regression model to predict cancer. In your test set, only 0.5% of patients have cancer (skewed classes). You got 1% error on test set. Is your model a good classifier?

Positive example (1) - patient have cancerNegative example (0) - patient no cancer

Function y = predictCancer(x)y = 0;

end

You will achieve 0.5% error without doing anything!

Precision/Recall

Predicted condition

Total population Positive Negative

True condition Positive True Positive (TP) False Negative (FN)

Negative False Positive (FP) True Negative (TN)

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

Precision: of all patients where we predicted have cancer, what fraction of patients actually have cancer.

Recall: of all patients that actually have cancer, what fraction of patients did we correctly detected as having cancer.

Tradeoff between precision and recall● Logistic regression:

○ Predict 1 if ○ Predict 0 if

● Suppose we want to predict cancer (y = 1) only if very confident

○ Higher precision, lower recall (large threshold)

● Suppose we want to avoid missing too much cases of cancer (avoid false negatives)

○ Higher recall, lower precision (small threshold)

● Generate the curve by tuning thresholds

Recall

Pre

cisi

on

Large threshold

Small threshold

F1-measureSuppose you have the precision and recall of three learning algorithms, which one is better? Precision Recall

Algorithm 1 0.6 0.3

Algorithm 2 0.2 0.9

Algorithm 3 0.9 0.1

Algorithm 1 has the highest F1-measure

SummaryThe procedure of a machine learning project:

1. Collect data and divide it into training, validation, and test sets.2. Choose the machine learning model you would like to use3. Select the optimal parameters by means of training and validation sets4. With the optimal parameters, predict results on test set5. Measure and analyze your result, improve your model if possible6. Write your project report

Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50

Documents