Top Banner
Perceptron, Kernels, and SVM CSE 546 Recitation November 5, 2013
14

Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Jul 24, 2019

Download

Documents

doanhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Perceptron, Kernels, and SVM

CSE 546 RecitationNovember 5, 2013

Page 2: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Grading Update

● Midterms: likely by Monday

– Expected average is 60%● HW 2: after midterms are graded

● Project proposals: mostly or all graded (everyone gets full credit)

– Check your dropbox for comments

● HW 3 scheduled to be released tomorrow, due in two weeks

Page 3: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Perceptron Basics

● Online algorithm

● Linear classifier

● Learns set of weights

● Always converges on linearly separable data

Page 4: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

What does perceptron optimize?

● Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

● Is equivalent to making a mistake

● Hinge loss penalizes mistakes by

Page 5: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Hinge Loss

● Gradient descent update rule:

● Stochastic gradient descent update rule = perceptron:

Page 6: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Feature Maps

● What if data aren't linearly separable?

● Sometimes if we map features to new spaces, we can put the data in a form more amenable to an algorithm, e.g. linearly separable

● The maps couldhave extremely high oreven infinite dimension,so is there a shortcut torepresent them?

– Don't want to storeevery or do computation in highdimensions

Page 7: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Kernel Trick

● Kernels (aka kernel functions) represent dot products of mapped features in same dimension as original features

– Apply to algorithms that only depend on dot product●

– Lower dimension for computation

– Don't have to store explicitly● Choose mappings that have kernels, since not all do

– e.g.

Page 8: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Kernelized Perceptron

● Recall perceptron update rule:

– Implies: where M^t is mistake indices up to t

● Classification rule:

● With mapping :

● If have kernel :

Page 9: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

SVM Basics

● Linear classifier (without kernels)

● Find separating hyperplane by maximizing margin

● One of the most popular and robust classifiers

Page 10: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Setting Up SVM Optimization

● Weights and margin

– Optimization unbounded● Use canonical hyperplanes to

remedy

● If linearly separable data, can solve

Page 11: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

SVM Optimization

● If non-linearly separable data, could map to new space

– But doesn't guarantee separability● Therefore, remove separability constraints

and instead penalize the violation in the objective

– Soft-margin SVM minimizes regularized hinge loss

Page 12: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

SVM vs Perceptron

● SVM

has almost same goal as L2-regularized perceptron

● Perceptron

Page 13: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Other SVM Comments

● C > 0 is “soft margin”

– High C means we care more about getting a good separation

– Low C means we care more about getting a large margin

● How to implement SVM?

– Suboptimal method is SGD (see HW 3)

– More advanced methods can be used to employ the kernel trick

Page 14: Perceptron, Kernels, and SVM - University of Washington · What does perceptron optimize? Perceptron appears to work, but is it solving an optimization problem like every other algorithm?

Questions?