Top Banner
Computer vision: models, learning and inference Chapter 4 Fitting Probability Models
47

Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Dec 21, 2015

Download

Documents

Erica Hall
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Computer vision: models, learning and inference

Chapter 4 Fitting Probability Models

Page 2: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

2

Structure

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Page 3: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting: As the name suggests: find the parameters under which the data are most likely:

Maximum Likelihood

3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Predictive Density:Evaluate new data point under probability distribution with best parameters

We have assumed that data was independent (hence product)

Page 4: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize the posterior probability .

4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Again we have assumed that data was independent

Page 5: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize the posterior probability .

6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Since the denominator doesn’t depend on the parameters we can instead maximize

Page 6: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Maximum a posteriori (MAP)

Predictive Density:

Evaluate new data point under probability distribution with MAP parameters

7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 7: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Bayesian ApproachFitting

Compute the posterior distribution over possible parameter values using Bayes’ rule:

Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities

8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 8: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Bayesian ApproachPredictive Density

• Each possible parameter value makes a prediction• Some parameters more probable than others

Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities

9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 9: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Predictive densities for 3 methods

Maximum a posteriori:

Evaluate new data point under probability distribution with MAP parameters

Maximum likelihood:

Evaluate new data point under probability distribution with ML parameters

Bayesian:

Calculate weighted sum of predictions from all possible values of parameters

10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 10: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

How to rationalize different forms?

Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions)

Predictive densities for 3 methods

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 11: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

12

Structure

12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Page 12: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Univariate Normal Distribution

For short we write:

Univariate normal distribution describes single continuous variable.

Takes 2 parameters m and s2>013Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 13: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Normal Inverse Gamma DistributionDefined on 2 variables m and s2>0

or for short

Four parameters , , > 0 a b g and .d

14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 14: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

15

Ready?

• Approach the same problem 3 different ways:– Learn ML parameters– Learn MAP parameters– Learn Bayesian distribution of parameters

• Will we get the same results?

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 15: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

As the name suggests we find the parameters under which the data is most likely.

Fitting normal distribution: ML

16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Likelihood given by pdf

Page 16: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: ML

17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 17: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting a normal distribution: ML

Plotted surface of likelihoods as a function of possible parameter values

ML Solution is at peak

18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 18: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: ML

Algebraically:

or alternatively, we can maximize the logarithm

19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

where:

Page 19: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Why the logarithm?

The logarithm is a monotonic transformation.

Hence, the position of the peak stays in the same place

But the log likelihood is easier to work with20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 20: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: ML

How to maximize a function? Take derivative and equate to zero.

21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Solution:

Page 21: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: ML

Maximum likelihood solution:

Should look familiar!

22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 22: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Least Squares

23

Maximum likelihood for the normal distribution...

...gives `least squares’ fitting criterion.

Page 23: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: MAPFitting

As the name suggests we find the parameters which maximize the posterior probability ..

Likelihood is normal PDF

24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 24: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: MAPPrior

Use conjugate prior, normal scaled inverse gamma.

25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 25: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: MAP

Likelihood Prior Posterior

26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 26: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: MAP

Again maximize the log – does not change position of maximum

27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 27: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: MAPMAP solution:

Mean can be rewritten as weighted sum of data mean and prior mean:

28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 28: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal distribution: MAP

50 data points 5 data points 1 data points

Page 29: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Page 30: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

Two constants MUST cancel out or LHS not a valid pdf31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 31: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:

where

32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 32: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Posterior Samples from posterior

Page 33: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 34: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:

where

35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 35: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Fitting normal: Bayesian Approach

50 data points 5 data points 1 data points36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 36: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

37

Structure

37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Page 37: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution

or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]

For short we write:

Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes K parameters where

38

Page 38: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Dirichlet DistributionDefined over K values where

Or for short: Has k parameters ak>0

39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 39: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical distribution: ML

40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Maximize product of individual likelihoods

(remember, P(x) = )

Nk = # times weobserved bin k

Page 40: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical distribution: ML

41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Instead maximize the log probability

Log likelihood Lagrange multiplier to ensure that params sum to one

Take derivative, set to zero and re-arrange:

Page 41: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical distribution: MAP

42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

MAP criterion:

Page 42: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical distribution: MAP

43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

With a uniform prior (a1..K=1), gives same result as maximum likelihood.

Take derivative, set to zero and re-arrange:

Page 43: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical Distribution

Observed data

Five samples from prior

Five samples from posterior

44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 44: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical Distribution: Bayesian approach

45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Two constants MUST cancel out or LHS not a valid pdf

Compute posterior distribution over parameters:

Page 45: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Categorical Distribution: Bayesian approach

46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Two constants MUST cancel out or LHS not a valid pdf

Compute predictive distribution:

Page 46: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

ML / MAP vs. Bayesian

MAP/ML Bayesian47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Page 47: Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Conclusion

48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Three ways to fit probability distributions• Maximum likelihood• Maximum a posteriori• Bayesian Approach

• Two worked example• Normal distribution (ML least squares)• Categorical distribution