Top Banner
1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)
41

1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Dec 13, 2015

Download

Documents

Fay Farmer
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

1

CSC 4510, Spring 2012. © Paula Matuszek 2012.

CSC 4510

Support Vector Machines

(SVMs)

Page 2: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

2

CSC 4510, Spring 2012. © Paula Matuszek 2012.

A Motivating Problem

• People who set up bird feeders want to feed birds.

• We don’t want to feed squirrels– they eat too fast– they drive the birds away

• So how do we create a bird feeder that isn’t a squirrel feeder?

Page 3: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

3

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Birds vs Squirrels• How are birds and squirrels different?

• Take a moment and write down a couple of features that distinguish birds from squirrels.

Page 4: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

4

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Birds vs Squirrels• How are birds and squirrels different?

• Take a moment and write down a couple of features that distinguish birds from squirrels.

• And now take another and write down how YOU tell a bird from a squirrel.

Page 5: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

5

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Possible Features• Birds

• Squirrels–

• And what do people actually do?

Page 6: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

6

CSC 4510, Spring 2012. © Paula Matuszek 2012.

The Typical Bird Feeder Solutions

Squirrels are heavier

Squirrels can’t fly

Page 7: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

7

CSC 4510, Spring 2012. © Paula Matuszek 2012.

And Then There’s This:

http://youtu.be/_mbvqxjFZpE

http://www.slideshare.net/kgrandis/pycon-2012-militarizing-your-backyard-computer-vision-and-the-squirrel-hordes

Page 8: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

8

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Knowledge for SVMs• We are trying here to emulate our human

decision making about whether something is a squirrel. So we will have:– Features– A classification

• As we saw in our discussion, there are a couple of obvious features, but mostly we decide based on a number of visual cues: size, color, general arrangement of pixels.

Page 9: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

9

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Kurt Grandis’ Squirrel Gun• The water gun is driven by a system which

– uses a camera to watch the bird feeder

– detects blobs

– determines whether the blob is a squirrel

– targets the squirrel

– shoots!

• Mostly in Python, using openCV for the vision.

• Of interest to us is that decision about whether a blob is a squirrel is made by an SVM.

Page 10: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

10

CSC 4510, Spring 2012. © Paula Matuszek 2012.

So What’s an SVM?• A Support Vector Machine (SVM) is a classifier

– It uses features of instances to decide which class each instance belongs to

• It is a supervised machine-learning classifier– Training cases are used to calculate parameters for a model

which can then be applied to new instances to make a decision

• It is a binary classifier – it distinguishes between two classes

• For the squirrel vs bird, Grandis used size, a histogram of pixels, and a measure of texture as the features

Page 11: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

11

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Basic Idea Underlying SVMs• Find a line, or a plane, or a hyperplane, that

separates our classes cleanly.– This is the same concept as we have seen in

regression.

• By finding the greatest margin separating them– This is not the same concept as we have seen in

regression. What does it mean?

Page 12: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

denotes +1

denotes -1

How would you classify this data?

Page 13: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

denotes +1

denotes -1

How would you classify this data?

Page 14: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Linear Classifiers

denotes +1

denotes -1

Any of these would be fine..

..but which is best?

Page 15: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Classifier Margin

denotes +1

denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Page 16: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin

denotes +1

denotes -1 The maximum margin linear classifier is the linear classifier with the maximum margin.

Page 17: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin

denotes +1

denotes -1 The maximum margin linear classifier is the linear classifier with the maximum margin.

Called Linear Support Vector Machine (SVM)

Page 18: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin

denotes +1

denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

Called Linear Support Vector Machine (SVM)

Support Vectors are those datapoints that the margin pushes up against

Page 19: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Why Maximum Margin?

denotes +1

denotes -1

f(x,w,b) = sign(w. x - b)

The maximum margin linear classifier is the linear classifier with the, um, maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Support Vectors are those datapoints that the margin pushes up against

1. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification.

2. Empirically it works very very well.

Page 20: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials. Copyright © 2001, 2003, Andrew W. Moore

Maximum Margin Classifier (SVM)

• Find the linear classifier • separates documents of

the positive class from those of the negative class

• has the largest classification margin

+1

-1

• Compute the classification margin

• Search the linear classifier with the largest classification margin

Classification margin

Plus-Plane

Minus-Plane

Classifier Boundary

Page 21: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

21

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Concept Check• For which of these could we use a basic linear

SVM?– A: Classify the three kinds of iris in the Orange data set?

– B: Classify email into spam and non-spam?

– C: Classify students into likely to pass or not?

• Which of these is the SVM margin?

BA

Page 22: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

22

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Messy Data• This is all good so far.

• Suppose our aren’t that neat:

Page 23: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

23

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Soft Margins• Intuitively, it still looks like we can make a

decent separation here.– Can’t make a clean margin

– But can almost do so, if we allow some errors

• We introduce slack variables, which measure the degree of misclassification

• A soft margin is one which lets us make some errors, in order to get a wider margin

• Tradeoff between wide margin and classification errors

Page 24: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

24

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Only Two Errors, Narrow Margin

Page 25: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

25

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Several Errors, Wider Margin

Page 26: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

26

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Slack Variables and Cost• In order to find a soft margin, we allow slack variables,

which measure the degree of misclassification.– Takes into account the distance from the margin as well we

the number of misclassified instances

• We then modify this by a cost (C) for these misclassified instances.– High cost will give relatively narrow margins

– Low cost will give broader margins but misclassify more data.

• How much we want it to cost to misclassify instances depends on our domain -- what we are trying to do

Page 27: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

27

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Concept Check• Which of these represents a soft margin?

A B

Page 28: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

28

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Evaluating SVMs• As with any classifier, we need to know how

well our trained model performs on other data

• Train on sample data, evaluate on test data (why?)

• Some things to look at:– classification accuracy: percent correctly

classified

– confusion matrix

– sensitivity and specificity

Page 29: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

29

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Confusion Matrix

Is it spam? Predicted yes Predicted no

Actually yes True positives

False negatives

Actually no False positives

True negatives

Note that “positive” vs “negative” is arbitrary

Page 30: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

30

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Specificity and Sensitivity• sensitivity: ratio of labeled positives to

actual positives– how much spam are we finding?

• specificity: ratio of labeled negatives to actual negatives– how much “real” email are we calling email?

Page 31: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

31

CSC 4510, Spring 2012. © Paula Matuszek 2012.

More on Evaluating SVMs• Overfitting: very close fit to training data

which takes advantage of irrelevant variations in instances– performance on test data will be much lower– may mean that your training sample isn’t

representative– in SVMs, may mean that C is too high

• Is the SVM actually useful?– Compare to the “majority” classifier

Page 32: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

32

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Concept Check• For binary classifiers A and B, for balanced data:

– Which is better: A is 80% accurate, B is 60% accurate

– Which is better: A has 90% sensitivity, B has 70% sensitivity

– Which is the better classifier:• A has 100 % sensitivity, 50% specificity

• B has 80% sensitivity, 80% specificity

• Would you use a spam filter that was 80% accurate?

• Would you use a classifier for who needs major surgery that was 80% accurate?

• Would you ever use a binary classifier that is 50% accurate?

Page 33: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

33

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Orange and SVMs• We now know enough to start looking at

the Orange SVM widget– create some linearly separable data– sample it– run a linear SVM and look at results

Page 34: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

34

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Why SVMs?• Focus on the instances nearest the margin is

paying more attention to where the differences are critical

• Can handle very large feature sets effectively

• In practice has been shown to work well in a variety of domains

Page 35: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

35

CSC 4510, Spring 2012. © Paula Matuszek 2012.

But wait, there’s more!

Page 36: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

36

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Non-Linearly-Separable Data• Suppose we can’t do a good linear

separation of our data?

• As with regression, allowing non-linearity will give us much better modeling of many data sets.

• In SVMs, we do this by using a kernel.

• A kernel is a function which maps our data into a higher-order order feature space where we can find a separating hyperplane

Page 37: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

37

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Kernels for SVMs• As we saw in Orange, we always specify a

kernel for an SVM

• Linear is simplest, but seldom a good match to the data

• Other common ones are– polynomial– RBF (Gaussian Radial Basis Function)

Page 38: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials.

Page 39: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

Borrowed heavily from Andrew tutorials:: http://www.cs.cmu.edu/~awm/tutorials.

Page 40: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

40

CSC 4510, Spring 2012. © Paula Matuszek 2012.

Back to Orange• Let’s try some

Page 41: 1 CSC 4510, Spring 2012. © Paula Matuszek 2012. CSC 4510 Support Vector Machines (SVMs)

41

CSC 4510, Spring 2012. © Paula Matuszek 2012.

To be continued next week