Top Banner
Linear Regression & Gradient Descent 1 Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Many slides attributable to: Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) Prof. Mike Hughes
28

Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Aug 07, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Linear Regression& Gradient Descent

1

Tufts COMP 135: Introduction to Machine Learninghttps://www.cs.tufts.edu/comp/135/2019s/

Many slides attributable to:Erik Sudderth (UCI)Finale Doshi-Velez (Harvard)James, Witten, Hastie, Tibshirani (ISL/ESL books)

Prof. Mike Hughes

Page 2: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

LR & GD Unit Objectives

• Exact solutions of least squares• 1D case without bias• 1D case with bias• General case

• Gradient descent for least squares

3Mike Hughes - Tufts COMP 135 - Spring 2019

Page 3: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

What will we learn?

4Mike Hughes - Tufts COMP 135 - Spring 2019

SupervisedLearning

Unsupervised Learning

Reinforcement Learning

Data, Label PairsPerformance

measureTask

data x

labely

{xn, yn}Nn=1

Training

Prediction

Evaluation

Page 4: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

5Mike Hughes - Tufts COMP 135 - Spring 2019

Task: RegressionSupervisedLearning

Unsupervised Learning

Reinforcement Learning

regression

x

y

y is a numeric variable e.g. sales in $$

Page 5: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Visualizing errors

6Mike Hughes - Tufts COMP 135 - Spring 2019

Page 6: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Regression: Evaluation Metrics

• mean squared error

• mean absolute error

7Mike Hughes - Tufts COMP 135 - Spring 2019

1

N

NX

n=1

|yn � yn|

1

N

NX

n=1

(yn � yn)2

Page 7: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Linear RegressionParameters:

Prediction:

Training:find weights and bias that minimize error

8Mike Hughes - Tufts COMP 135 - Spring 2019

y(xi) ,FX

f=1

wfxif + b

w = [w1, w2, . . . wf . . . wF ]b

weight vector

bias scalar

Page 8: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Sales vs. Ad Budgets

9Mike Hughes - Tufts COMP 135 - Spring 2019

Page 9: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Linear Regression: Training

10Mike Hughes - Tufts COMP 135 - Spring 2019

minw,b

NX

n=1

⇣yn � y(xn, w, b)

⌘2

Optimization problem: “Least Squares”

Page 10: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

11Mike Hughes - Tufts COMP 135 - Spring 2019

Linear Regression: Training

minw,b

NX

n=1

⇣yn � y(xn, w, b)

⌘2

Optimization problem: “Least Squares”

Exact formula for optimal values of w, b exist!

With only one feature (F=1):

w =

PNn=1(xn � x)(yn � y)PN

n=1(xn � x)2b = y � wx

x = mean(x1, . . . xN )

y = mean(y1, . . . yN )

Where does this come from?

Page 11: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

12Mike Hughes - Tufts COMP 135 - Spring 2019

Linear Regression: Training

minw,b

NX

n=1

⇣yn � y(xn, w, b)

⌘2

Optimization problem: “Least Squares”

Exact formula for optimal values of w, b exist!

With many features (F >= 1):

Where does this come from?

[w1 . . . wF b]T = (XT X)�1XT y

X =

2

664

x11 . . . x1F 1x21 . . . x2F 1

. . .

xN1 . . . xNF 1

3

775

Page 12: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Derivation Notes

http://www.cs.tufts.edu/comp/135/2019s/notes/day03_linear_regression.pdf

13Mike Hughes - Tufts COMP 135 - Spring 2019

Page 13: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

When does the Least Squares estimator exist?• Fewer examples than features (N < F)

• Same number of examples and features (N=F)

• More examples than features (N > F)

14Mike Hughes - Tufts COMP 135 - Spring 2019

Optimum exists if X is full rank

Optimum exists if X is full rank

Infinitely many solutions!

Page 14: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

More compact notation

15Mike Hughes - Tufts COMP 135 - Spring 2019

✓ = [b w1 w2 . . . wF ]

xn = [1 xn1 xn2 . . . xnF ]

y(xn, ✓) = ✓

Txn

J(✓) ,NX

n=1

(yn � y(xn, ✓))2

Page 15: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Idea: Optimize via small steps

16Mike Hughes - Tufts COMP 135 - Spring 2019

Page 16: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Derivatives point uphill

17Mike Hughes - Tufts COMP 135 - Spring 2019

Page 17: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

18Mike Hughes - Tufts COMP 135 - Spring 2019

To minimize, go downhill

Step in the opposite direction of the derivative

Page 18: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Steepest descent algorithm

19Mike Hughes - Tufts COMP 135 - Spring 2019

input: initial ✓ 2 Rinput: step size ↵ 2 R+

while not converged:

✓ ✓ � ↵d

d✓J(✓)

Page 19: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Steepest descent algorithm

20Mike Hughes - Tufts COMP 135 - Spring 2019

input: initial ✓ 2 Rinput: step size ↵ 2 R+

while not converged:

✓ ✓ � ↵d

d✓J(✓)

Page 20: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

How to set step size?

21Mike Hughes - Tufts COMP 135 - Spring 2019

Page 21: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

How to set step size?

22Mike Hughes - Tufts COMP 135 - Spring 2019

• Simple and usually effective: pick small constant

• Improve: decay over iterations

• Improve: Line search for best value at each step

↵ = 0.01

↵t =C

t↵t = (C + t)�0.9

Page 22: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

How to assess convergence?

• Ideal: stop when derivative equals zero

• Practical heuristics: stop when …• when change in loss becomes small

• when step size is indistinguishable from zero

23Mike Hughes - Tufts COMP 135 - Spring 2019

↵| dd✓

J(✓)| < ✏

|J(✓t)� J(✓t�1)| < ✏

Page 23: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Visualizing the cost function

24Mike Hughes - Tufts COMP 135 - Spring 2019

“Level set” contours : all points with same function value

Page 24: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

In 2D parameter space

25Mike Hughes - Tufts COMP 135 - Spring 2019

gradient = vector of partial derivatives

Page 25: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Gradient Descent DEMOhttps://github.com/tufts-ml-courses/comp135-19s-assignments/blob/master/labs/GradientDescentDemo.ipynb

26Mike Hughes - Tufts COMP 135 - Spring 2019

Page 26: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Fitting a line isn’t always ideal

27Mike Hughes - Tufts COMP 135 - Spring 2019

Page 27: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

Can fit linear functions tononlinear features

28Mike Hughes - Tufts COMP 135 - Spring 2019

y(xi) = ✓0 + ✓1xi + ✓2x2i + ✓3x

3i

y(�(xi)) = ✓0 + ✓1�(xi)1 + ✓2�(xi)2 + ✓3�(xi)3

A nonlinear function of x:

Can be written as a linear function of �(xi) = [xi x

2i x

3i ]

“Linear regression” means linear in the parameters (weights, biases)

Features can be arbitrary transforms of raw data

Page 28: Tufts COMP 135: Introduction to Machine Learning https ... · “Linear regression” means linear in the parameters (weights, biases) Features can be arbitrary transforms of raw

What feature transform to use?

• Anything that works for your data!

• sin / cos for periodic data

• polynomials for high-order dependencies

• interactions between feature dimensions

• Many other choices possible

29Mike Hughes - Tufts COMP 135 - Spring 2019

�(xi) = [xi1xi2 xi3xi4]

�(xi) = [xi x

2i x

3i ]