Top Banner
Linear Classification: The Perceptron Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.
12

Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

Aug 24, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

LinearClassification:ThePerceptron

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withonlyminormodificationsfromEricEaton’sslidesandgratefulacknowledgementtothemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

Page 2: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

LinearClassifiers• Ahyperplane partitionsintotwohalf-spaces

– Definedbythenormalvector• isorthogonaltoanyvectorlyingonthehyperplane

– Assumedtopassthroughtheorigin• Thisisbecauseweincorporatedbiastermintoitby

• Considerclassificationwith+1,-1labels...

2

Rd

✓ 2 Rd

✓ 2 Rd

✓0 x0 = 1

BasedonslidebyPiyush Rai

Page 3: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

LinearClassifiers• Linearclassifiers:representdecisionboundarybyhyperplane

– Notethat:

3

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

x

| =⇥1 x1 . . . xd

⇤✓ =

2

6664

✓0✓1...✓d

3

7775

|x > 0 =) y = +1

|x < 0 =) y = �1

Page 4: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

– Ifthepredictionmatchesthelabel,makenochange– Otherwise,adjustθ

h(x) = sign(✓|x) sign(z) =

⇢1 if z � 0

�1 if z < 0where

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

ThePerceptron

4

(x(i), y(i))

either2or-2

Page 5: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

✓j ✓j �↵

2

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Theperceptronusesthefollowingupdateruleeachtimeitreceivesanewtraininginstance

• Re-writeas(onlyuponmisclassification)

– Caneliminateα inthiscase,sinceitsonlyeffectistoscaleθbyaconstant,whichdoesn’taffectperformance

ThePerceptron

5

(x(i), y(i))

either2or-2

✓j ✓j + ↵y

(i)x

(i)j

PerceptronRule:Ifismisclassified,do✓ ✓ + y(i)x(i)

✓ ✓ + y(i)x(i)

Page 6: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

✓old

+

WhythePerceptronUpdateWorks

6

x

x ✓old

+✓new

✓old

+misclassified

BasedonslidebyPiyush Rai

Page 7: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

WhythePerceptronUpdateWorks• Considerthemisclassifiedexample(y =+1)– Perceptronwronglythinksthat

• Update:

• Notethat

• Therefore,islessnegativethan– So,wearemakingourselvesmorecorrect onthisexample!

7

|old

x < 0

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)✓

new

= ✓

old

+ yx = ✓

old

+ x (since y = +1)

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

kxk22 > 0

|new

x = (✓old

+ x)|x= ✓

|old

x+ x

|x

|old

x < 0

BasedonslidebyPiyush Rai

Page 8: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

ThePerceptronCostFunction• Theperceptronusesthefollowingcostfunction

– is0ifthepredictioniscorrect– Otherwise,itistheconfidenceinthemisprediction

8

Jp(✓) =1

n

nX

i=1

max(0,�y

(i)x

(i)✓)

Jp(✓) =1

n

nX

i=1

max(0,�y

(i)x

(i)✓)

BasedonslidebyAlanFern

Page 9: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

OnlinePerceptronAlgorithm

9BasedonslidebyAlanFern

1.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

3.) Receive training example (x

(i), y(i))4.) if y(i)x(i)

✓ 0 // prediction is incorrect

5.) ✓ ✓ + y(i)x(i)

Onlinelearning– thelearningmodewherethemodelupdateisperformedeachtimeasingleobservationisreceived

Batchlearning– thelearningmodewherethemodelupdateisperformedafterobservingtheentiretrainingset

Page 10: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

OnlinePerceptronAlgorithm

10BasedonslidebyAlanFern

Redpointsarelabeled+

Bluepointsarelabeled-

Page 11: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

BatchPerceptron

11

1.) Given training data

�(x

(i), y(i)) n

i=12.) Let ✓ [0, 0, . . . , 0]2.) Repeat:

2.) Let � [0, 0, . . . , 0]3.) for i = 1 . . . n, do4.) if y(i)x(i)

✓ 0 // prediction for i

thinstance is incorrect

5.) � �+ y(i)x(i)

6.) � �/n // compute average update

6.) ✓ ✓ + ↵�8.) Until k�k2 < ✏

• Simplestcase:α=1anddon’tnormalize,yieldsthefixedincrementperceptron

• Guaranteedtofindaseparatinghyperplane ifoneexistsBasedonslidebyAlanFern

Page 12: Linear Classification: The Perceptronbboots3/CS4641-Fall2018/... · Improving the Perceptron • The Perceptron produces many θ‘sduring training • The standard Perceptron simply

ImprovingthePerceptron• ThePerceptronproducesmanyθ‘s duringtraining• ThestandardPerceptronsimplyusesthefinalθ attesttime

– Thismaysometimesnotbeagoodidea!– Someotherθmaybecorrecton1,000consecutiveexamples,butonemistakeruinsit!

• Idea:Useacombinationofmultipleperceptrons– (i.e.,neuralnetworks!)

• Idea:Usetheintermediateθ‘s– VotedPerceptron:voteonpredictionsoftheintermediateθ‘s– AveragedPerceptron:averagetheintermediateθ‘s

12BasedonslidebyPiyush Rai