Top Banner
Computer vision: models, learning and inference Chapter 9 Classification models Please send errata to [email protected]
82
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 09 cv mil_classification

Computer vision: models, learning and inference

Chapter 9 Classification models

Please send errata to [email protected]

Page 2: 09 cv mil_classification

2

Structure

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 3: 09 cv mil_classification

Models for machine vision

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 4: 09 cv mil_classification

Example application: Gender Classification

4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 5: 09 cv mil_classification

Type 1: Model Pr(w|x) - Discriminative

How to model Pr(w|x)?– Choose an appropriate form for Pr(w)– Make parameters a function of x– Function takes parameters q that define its shape

Learning algorithm: learn parameters q from training data x,wInference algorithm: just evaluate Pr(w|x)

5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 6: 09 cv mil_classification

Logistic RegressionConsider two class problem. • Choose Bernoulli distribution over world. • Make parameter l a function of x

Model activation with a linear function

creates number between . Maps to with

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 7: 09 cv mil_classification

Two parameters

Learning by standard methods (ML,MAP, Bayesian)Inference: Just evaluate Pr(w|x)

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 8: 09 cv mil_classification

Neater Notation

To make notation easier to handle, we• Attach a 1 to the start of every data vector

• Attach the offset to the start of the gradient vector f

New model:

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 9: 09 cv mil_classification

Logistic regression

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: 09 cv mil_classification

Maximum Likelihood

Take Logarithm

Take derivative:

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 11: 09 cv mil_classification

Derivatives

Unfortunately, there is no closed form solution– we cannot get and expression for f in terms of x and w

Have to use a general purpose technique:

“iterative non-linear optimization”

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 12: 09 cv mil_classification

Optimization

Goal:

How can we find the minimum?

Basic idea:• Start with estimate • Take a series of small steps to• Make sure that each step decreases cost• When can’t improve then must be in minimum

Cost function orObjective function

12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 13: 09 cv mil_classification

Local Minima

13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 14: 09 cv mil_classification

Convexity

If a function is convex, then it has only a single minimumCan tell if a function is convex by looking at 2nd derivatives

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 15: 09 cv mil_classification

15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 16: 09 cv mil_classification

Gradient Based Optimization

• Choose a search direction s based on the local properties of the function

• Perform an intensive search along the chosen direction. This is called line search

• Then set

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 17: 09 cv mil_classification

Gradient Descent

Consider standing on a hillside

Look at gradient where you are standing

Find the steepest direction downhill

Walk in that direction for some distance (line search)

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 18: 09 cv mil_classification

Finite differences

What if we can’t compute the gradient?

Compute finite difference approximation:

where ej is the unit vector in the jth direction 18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 19: 09 cv mil_classification

Steepest Descent ProblemsClose up

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 20: 09 cv mil_classification

Second Derivatives

In higher dimensions, 2nd derivatives change how much we should move differently in the different directions: changes best direction to move in.

20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 21: 09 cv mil_classification

Newton’s MethodApproximate function with Taylor expansion

Take derivative

Re-arrange

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Adding line search

Page 22: 09 cv mil_classification

Newton’s Method

Matrix of second derivatives is called the Hessian.

Expensive to compute via finite differences.

If positive definite then convex22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 23: 09 cv mil_classification

Newton vs. Steepest Descent

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 24: 09 cv mil_classification

Line Search

Gradually narrow down range

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 25: 09 cv mil_classification

Optimization for Logistic Regression

Derivatives of log likelihood:

Positive definite!25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 26: 09 cv mil_classification

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 27: 09 cv mil_classification

Maximum likelihood fits

27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 28: 09 cv mil_classification

28

Structure

28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 29: 09 cv mil_classification

29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 30: 09 cv mil_classification

Bayesian Logistic RegressionLikelihood:

Prior (no conjugate):

Apply Bayes’ rule:

(no closed form solution for posterior)

30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 31: 09 cv mil_classification

Laplace Approximation

Approximate posterior distribution with normal• Set mean to MAP estimate• Set covariance to matches that at MAP estimate

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 32: 09 cv mil_classification

Laplace ApproximationFind MAP solution by optimizing log likelihood

Approximate with normal

where

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 33: 09 cv mil_classification

Laplace Approximation

Actual posterior Approximated

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Prior

Page 34: 09 cv mil_classification

Inference

Using transformation properties of normal distributions

Can re-express in terms of activation

34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 35: 09 cv mil_classification

Approximation of Integral

35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 36: 09 cv mil_classification

Bayesian Solution

36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 37: 09 cv mil_classification

37

Structure

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 38: 09 cv mil_classification

38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 39: 09 cv mil_classification

Non-linear regression

Same idea as for regression.

• Apply non-linear transformation

• Build model as usual

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 40: 09 cv mil_classification

Non-linear regression

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Example transformations:

Fit using optimization (also transformation parameters):

Page 41: 09 cv mil_classification

Non-linear regression in 1D

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 42: 09 cv mil_classification

Non-linear regression in 2D

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 43: 09 cv mil_classification

43

Structure

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 44: 09 cv mil_classification

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 45: 09 cv mil_classification

Dual Logistic Regression

KEY IDEA:

Gradient F is just a vector in the data space

Can represent as a weighted sum of the data points

Now solve for Y. One parameter per training example.

45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 46: 09 cv mil_classification

Maximum LikelihoodLikelihood

Derivatives

Depend only depend on inner products!

46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 47: 09 cv mil_classification

Kernel Logistic Regression

47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 48: 09 cv mil_classification

ML vs. Bayesian

48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Bayesian case is known as Gaussian process classification

Page 49: 09 cv mil_classification

Relevance vector classificationApply sparse prior to dual variables:

49Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

As before, write as marginalization of dual variables:

Page 50: 09 cv mil_classification

Relevance vector classificationApply sparse prior to dual variables:

50Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Gives likelihood:

Page 51: 09 cv mil_classification

Relevance vector classification

51Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Use Laplace approximation result:

giving:

Page 52: 09 cv mil_classification

Relevance vector classificationPrevious result:

52Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Second approximation:

To solve, alternately update hidden variables in H and mean and variance of Laplace approximation.

Page 53: 09 cv mil_classification

Relevance vector classification

Results:

Most hidden variables increase to larger values

This means prior over dual variable is very tight around zero

The final solution only depends on a very small number of examples – efficient

53Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 54: 09 cv mil_classification

54

Structure

54Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting & boosting• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 55: 09 cv mil_classification

55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 56: 09 cv mil_classification

Incremental FittingPreviously wrote:

Now write:

56Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 57: 09 cv mil_classification

Incremental FittingKEY IDEA: Greedily add terms one at a time.

STAGE 1: Fit f0, f1, x1

STAGE K: Fit f0, fk, xk

STAGE 2: Fit f0, f2, x2

57Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 58: 09 cv mil_classification

Incremental Fitting

58Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 59: 09 cv mil_classification

DerivativeIt is worth considering the form of the derivative in the context of the incremental fitting procedure

Actual label Predicted Label

Points contribute to derivative more if they are still misclassified: the later classifiers become increasingly specialized to the difficult examples.

59Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 60: 09 cv mil_classification

Boosting

Incremental fitting with step functions

Each step function is called a ``weak classifier``

Can`t take derivative w.r.t a so have to just use exhaustive search

60Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 61: 09 cv mil_classification

Boosting

61Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 62: 09 cv mil_classification

Branching Logistic Regression

New activation

The term is a gating function.

• Returns a number between 0 and • If 0 then we get one logistic regression model• If 1 then get a different logistic regression model

A different way to make non-linear classifiers

62Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 63: 09 cv mil_classification

Branching Logistic Regression

63Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 64: 09 cv mil_classification

Logistic Classification Trees

64Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 65: 09 cv mil_classification

65

Structure

65Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 66: 09 cv mil_classification

Multiclass Logistic Regression

66Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

For multiclass recognition, choose distribution over w and then make the parameters of this a function of x.

Softmax function maps real activations {ak} to numbers between zero and one that sum to one

Parameters are vectors {fk}

Page 67: 09 cv mil_classification

Multiclass Logistic Regression

67Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Softmax function maps activations which can take any value to parameters of categorical distribution between 0 and 1

Page 68: 09 cv mil_classification

Multiclass Logistic Regression

68Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

To learn model, maximize log likelihood

No closed from solution, learn with non-linear optimization

where

Page 69: 09 cv mil_classification

69

Structure

69Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 70: 09 cv mil_classification

Random classification tree

70Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Key idea:• Binary tree• Randomly chosen function at each split• Choose threshold t to maximize log probability

For given threshold, can compute parameters in closed form

Page 71: 09 cv mil_classification

Random classification tree

71Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Related models:

Fern: • A tree where all of the functions at a level are the same• Threshold may be same or different• Very efficient to implement

Forest• Collection of trees• Average results to get more robust answer• Similar to `Bayesian’ approach – average of models with

different parameters

Page 72: 09 cv mil_classification

72

Structure

72Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 73: 09 cv mil_classification

Non-probabilistic classifiers

73Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Most people use non-probabilistic classification methods such as neural networks, adaboost, support vector machines. This is largely for historical reasons

Probabilistic approaches: • No serious disadvantages• Naturally produce estimates of uncertainty• Easily extensible to multi-class case• Easily related to each other

Page 74: 09 cv mil_classification

Non-probabilistic classifiers

74Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Multi-layer perceptron (neural network)• Non-linear logistic regression with sigmoid functions• Learning known as back propagation• Transformed variable z is hidden layer

Adaboost• Very closely related to logitboost• Performance very similar

Support vector machines• Similar to relevance vector classification but objective fn is convex• No certainty• Not easily extended to multi-class• Produces solutions that are less sparse• More restrictions on kernel function

Page 75: 09 cv mil_classification

75

Structure

75Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Logistic regression• Bayesian logistic regression• Non-linear logistic regression• Kernelization and Gaussian process classification• Incremental fitting, boosting and trees• Multi-class classification• Random classification trees• Non-probabilistic classification• Applications

Page 76: 09 cv mil_classification

Gender Classification

76Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Incremental logistic regression

300 arc tan basis functions:

Results: 87.5% (humans=95%)

Page 77: 09 cv mil_classification

Fast Face Detection (Viola and Jones 2001)

77Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 78: 09 cv mil_classification

Computing Haar Features

78Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 79: 09 cv mil_classification

Pedestrian Detection

79Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 80: 09 cv mil_classification

Semantic segmentation

80Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 81: 09 cv mil_classification

Recovering surface layout

81Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 82: 09 cv mil_classification

Recovering body pose

82Computer vision: models, learning and inference. ©2011 Simon J.D. Prince