Top Banner
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006
34

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Jan 18, 2018

Download

Documents

Buck Chase

Motivating Problem Which pixels in this image are “skin pixels”? Useful for tracking, finding people, finding images with too much skin.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Lecture 5: Statistical Methods for Classification

CAP 5415: Computer VisionFall 2006

Page 2: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Classifiers: The Swiss Army Tool of Vision

A HUGE number of vision problems can be reduced to: Is this a _____ or not?

The next two lectures will focus on making that decision

Classifiers that we will cover Bayesian classification Logistic regression Boosting Support Vector Machines Nearest-Neighbor Classifiers

Page 3: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Motivating Problem Which pixels in this image are “skin pixels”?

Useful for tracking, finding people, finding images with too much skin.

Page 4: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

How could you find skin pixels?

Step 1: Get Data

Label every pixel as skin or not skin

Page 5: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Getting Probabilities

Now that I have a bunch of examples, I can create probability distributions. P([r,g,b]|skin) = Probability of an [r,g,b] tuple given

that the pixel is skin P([r,g,b]|~skin) = Probability of an [r,g,b] tuple given

that the pixel is not skin

Page 6: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

(From Jones and Rehg)

Page 7: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Using Bayes Rule

x – the observation y – some underlying cause (skin/not skin)

Page 8: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Using Bayes Rule

PriorLikelihood

Normalizing Constant

Page 9: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Classification

In this case P[skin|x] = 1-P[~skin|x] So the classifier reduces to

P[skin|x] > 0.5? We can change this to

P[skin|x] > c And vary c

Page 10: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

The effect of varying c This is called a Receiver Operating Curve (or

ROC

From Jones and Rehg

Page 11: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Application: Finding Adult Pictures

Let's say you needed to build a web filter for a library

Could look at a few simple measurements based on the skin model

Page 12: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Example of Misclassified Image

Page 13: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Example of Correctly Classified Image

Page 14: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

ROC Curve

Page 15: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Generative versus Discriminative Models

The classifier that I have just described is known as a generative model

Once you know all of the probabilities, you can generate new samples of the data

May be too much work You could also optimize a function to just

discriminate skin and not skin

Page 16: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Discriminative Classification using Logistic Regression

Imagine we had two measurements and we plotted each sample on a 2D chart

Page 17: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Discriminative Classification using Logistic Regression

Imagine we had two measurements and we plotted each sample on a 2D chart

To separate the two groups, we'll project each point onto a line

Some points will be projected to positive values and some will be projected to negative values

Page 18: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Discriminative Classification using Logistic Regression

This line defines a separating line Each point is classified based on where it falls

on the line

Page 19: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

How do we get the line?

Common Option: Logistic Regression Logistic Function:

Page 20: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

The logistic function Notice that g(x) goes from 0 to 1 We can use this to estimate the probability something being an

x or an o We need to find a function that will have large positive values

for x's And large negative values for o's

Page 21: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Fitting the Line

Remember, we want a line. For the diagram below, x = +1, o = -1 y = label of point (-1 or +1)

Page 22: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Fitting the line

The logistic function gives us an estimate of the probability of an example being either +1 or -1

We can fit the line by maximizing the conditional probability of the correct labeling of the training set

Also called features

Page 23: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Fitting the Line

We have multiple samples that we assume are independent, so the probability of the whole training set is

Page 24: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Fitting the line

It is usually easier to optimize the log conditional probability

Page 25: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Optimizing

Lots of options Easiest option: Gradient ascent

: The Learning Rate parameter, many ways to choose this

Page 26: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Choosing

My (current) personal favorite method Choose some value for Update w, Compute new probability If the new probability does not rise, divide

by 2 Otherwise multiply it by 1.1 (or something

similar) Called “Bold-Driver” heuristic

Page 27: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Faster Option

Computing the gradient requires summing over every training example

Could be slow for a large training set Speed-up: Stochastic Gradient Ascent Instead of computing the gradient over the

whole training set, instead choose one point at random.

Do update based on that one point

Page 28: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Limitations

Remember, we are only separating the two classes with a line

Separate this data with a line:

This is a fundamental problem, most things can't be separated by a line

Page 29: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Overcoming these limitations

Two options: Train on a more complicated function

Quadratic Cubic

Make a new set of features:

Page 30: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Advantages

We achieve non-linear classification by doing linear classification on non-linear transformations of the features

Only have to rewrite feature generation code Learning code stays the same

Page 31: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Nearest Neighbor Classifier

Is the “?” an x or an o?

?

Page 32: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Nearest Neighbor Classifier

Is the “?” an x or an o?

?

Page 33: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Nearest Neighbor Classifier

Is the “?” an x or an o?

?

Page 34: Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

Basic idea

For your new example, find the k nearest neighbors in the training set

Each neighbor casts a vote Label with the most votes wins Disadvantages:

Have to find the nearest neighbors Can be slow for a large training set Good approximate methods available (LSH - Indyk)