Top Banner
Logistic Regression Lyle Ungar Learning objectives Logistic model & loss Decision boundaries as hyperplanes Multi-class regression
18

7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Sep 23, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Logistic RegressionLyle Ungar

Learning objectivesLogistic model & lossDecision boundaries as hyperplanesMulti-class regression

Page 2: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

What do you do with a binary y?u Can you use linear regression?

l y = wTxu How about a different link function?

l y = f(wTx)u Or a different probability distribution

l P(y=1|x) = f(wTx)

Page 3: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Logistic function

h✓(x) =1

1 + e�✓Tx

Page 4: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Logistic Regression

Log odds

Page 5: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Log likelihood of data

y = 1 or -1

Page 6: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Decision Boundary

Page 7: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Representing Hyperplanes• How do we represent a line?

• In general, a hyperplane is defined by

The red vector (w) defines the green hyper plane that is orthogonal to it.

Why bother with this weird representation?

0 = wTx

[1,-1]

Page 8: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Projections

Page 9: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Now classification is easy!

h(x) = sgn(wTx)

Page 10: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Computing MLEu Use gradient ascent

Loss function = log-likelihood

Page 11: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Computing MAPu Prior

u So solve

u Again use gradient descent

gg

gg g

2g2

2g2

Page 12: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all
Page 13: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Multi-Class Classification

Disease diagnosis: healthy / cold / flu / pneumoniaObject classification: desk / chair / monitor / bookcase

x1

x2

x1

x2

Binary classification:

Multi-class classification:

Page 14: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Multi-Class Logistic Regressionu For 2 classes:

u For K classes:

l Called the softmax functionn maps a vector to a probability distribution

h✓(x) =1

1 + exp(�✓Tx)=

exp(✓Tx)

1 + exp(✓Tx)h✓(x) =

1

1 + exp(�✓Tx)=

exp(✓Tx)

1 + exp(✓Tx)

weight assigned to

y = -1

weight assigned to

y = 1

p(y = k | x; ✓1, . . . , ✓k) = exp(✓>k x)PK

k=1 exp(✓>k x)

<latexit sha1_base64="Q9FhtzPFW1Pl5XW1deBWQxJu3BU=">AAACe3ichVHLahsxFNVMH0ndR9xkWSiiptQpxswklAaKwW03hWxSqJOA5QwajSYWI42EdKfEDLPNvv207rLIqj/RTaHyY1EnhR4QOpxzrx7npkYKB1F0FYR37t67v7H5oPXw0eMnW+2n28dOV5bxEdNS29OUOi5FyUcgQPJTYzlVqeQnafFx7p985dYJXX6BmeETRc9LkQtGwUtJ+7vpzvAAF5gokWGS5vVF826xE5hyoE0S9zCRmQbXW5OLXdwaYJJbymrCL0x3zTwjoM3quN2mJq5SSV0M4ubsEP+vOml3on60AL5N4hXpDN8ffrv+WV0eJe0fJNOsUrwEJqlz4zgyMKmpBcEkb1qkctxQVtBzPva0pIq7Sb3IrsEvvZLhXFu/SsAL9e+OmirnZir1lYrC1N305uK/vHEF+cGkFqWpgJdseVFeSQwazweBM2E5AznzhDIr/Fsxm1IfJ/hxtXwI8c0v3ybHe/14v7/3Oe4MP6AlNtEz9AJ1UYzeoiH6hI7QCDH0K3gevAq6we+wE74Oe8vSMFj17KA1hG/+ALC1xTU=</latexit>

Page 15: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Multi-Class Logistic Regression

u Train a logistic regression classifier for each class k to predict the probability that y = k with

x1

x2

Split into One vs. Rest:

hk(x) =exp(✓>

k x)Pkk=1 exp(✓k>x)

<latexit sha1_base64="Fch+YdYAD1Fzy16gzFuLgQRYv84=">AAACS3icbVC7ahtBFJ2VrcRWXpu4TDNYBBQCYlcp7EagJI1LBawHaKVldnRXGnb2wczdYLHsF+Srgps0adLlJ9K4sDEpMnoUluQDFw7nnMvMPUEmhUbH+WNVDg6rT54eHdeePX/x8pX9+k1fp7ni0OOpTNUwYBqkSKCHAiUMMwUsDiQMgujL0h98A6VFmlziIoNxzGaJCAVnaCTfDuZ+1PCCsLgq39M29ULFeOHBVbYSPZwDstKPJh6mGd3kysLTeewXUdstJxHdT2+Fa75dd5rOCnSfuBtS73yaXP/43v3Q9e3f3jTleQwJcsm0HrlOhuOCKRRcQlnzcg0Z4xGbwcjQhMWgx8Wqi5K+M8qUhqkykyBdqQ83ChZrvYgDk4wZzvWutxQf80Y5hufjQiRZjpDw9UNhLimmdFksnQoFHOXCEMaVMH+lfM5MnWjqX5bg7p68T/qtpvux2frq1jufyRpH5C05JQ3ikjPSIRekS3qEk5/kL7kld9Yv68a6t/6toxVrs3NCtlCp/gee47gC</latexit>

q1q3

q2

Page 16: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

Implementing Multi-Class Logistic Regression

u P(y=k|x) estimated by:u Gradient descent simultaneously updates all

parameters for all modelsl Same derivative as before, just with the above hk(x)

u Predict class label as the most probable label

hk(x) =exp(✓>

k x)Pkk=1 exp(✓k>x)

<latexit sha1_base64="Fch+YdYAD1Fzy16gzFuLgQRYv84=">AAACS3icbVC7ahtBFJ2VrcRWXpu4TDNYBBQCYlcp7EagJI1LBawHaKVldnRXGnb2wczdYLHsF+Srgps0adLlJ9K4sDEpMnoUluQDFw7nnMvMPUEmhUbH+WNVDg6rT54eHdeePX/x8pX9+k1fp7ni0OOpTNUwYBqkSKCHAiUMMwUsDiQMgujL0h98A6VFmlziIoN xzGaJCAVnaCTfDuZ+1PCCsLgq39M29ULFeOHBVbYSPZwDstKPJh6mGd3kysLTeewXUdstJxHdT2+Fa75dd5rOCnSfuBtS73yaXP/43v3Q9e3f3jTleQwJcsm0HrlOhuOCKRRcQlnzcg0Z4xGbwcjQhMWgx8Wqi5K+M8qUhqkykyBdqQ83ChZrvYgDk4wZzvWutxQf80Y5hufjQiRZjpDw9UNhLimmdFksnQoFHOXCEMaVMH+lfM5MnWjqX5bg7p68T/qtpvux2frq1jufyRpH5C05JQ3ikjPSIRekS3qEk5/kL7kld9Yv68a6t/6toxVrs3NCtlCp/gee47gC</latexit>

Page 17: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all

You should knowu Logistic model & loss

l Linear in log-odds

u Decision boundariesl hyperplane

u Softmaxl Maps vector to probability distribution

Page 18: 7 logistic regression - Penn Engineeringcis520/lectures/7_logistic_regression.… · Logistic Regression uUse as the model for class k uGradient descent simultaneously updates all