Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Computer vision: models, learning and inference

Chapter 4 Fitting Probability Models

2

Structure

2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

• Fitting probability distributions– Maximum likelihood– Maximum a posteriori– Bayesian approach

• Worked example 1: Normal distribution• Worked example 2: Categorical distribution

Fitting: As the name suggests: find the parameters under which the data are most likely:

Maximum Likelihood


Predictive Density:Evaluate new data point under probability distribution with best parameters

We have assumed that data was independent (hence product)

Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize the posterior probability .


Again we have assumed that data was independent

Maximum a posteriori (MAP)FittingAs the name suggests we find the parameters which maximize the posterior probability .


Since the denominator doesn’t depend on the parameters we can instead maximize

Maximum a posteriori (MAP)

Predictive Density:

Evaluate new data point under probability distribution with MAP parameters


Bayesian ApproachFitting

Compute the posterior distribution over possible parameter values using Bayes’ rule:

Principle: why pick one set of parameters? There are many values that could have explained the data. Try to capture all of the possibilities


Bayesian ApproachPredictive Density

• Each possible parameter value makes a prediction• Some parameters more probable than others

Make a prediction that is an infinite weighted sum (integral) of the predictions for each parameter value, where weights are the probabilities


Predictive densities for 3 methods

Maximum a posteriori:

Evaluate new data point under probability distribution with MAP parameters

Maximum likelihood:

Evaluate new data point under probability distribution with ML parameters

Bayesian:

Calculate weighted sum of predictions from all possible values of parameters


How to rationalize different forms?

Consider ML and MAP estimates as probability distributions with zero probability everywhere except at estimate (i.e. delta functions)

Predictive densities for 3 methods


12

Structure




Univariate Normal Distribution

For short we write:

Univariate normal distribution describes single continuous variable.

Takes 2 parameters m and s2>013Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Normal Inverse Gamma DistributionDefined on 2 variables m and s2>0

or for short

Four parameters , , > 0 a b g and .d


15

Ready?

• Approach the same problem 3 different ways:– Learn ML parameters– Learn MAP parameters– Learn Bayesian distribution of parameters

• Will we get the same results?

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

As the name suggests we find the parameters under which the data is most likely.

Fitting normal distribution: ML


Likelihood given by pdf



Fitting a normal distribution: ML

Plotted surface of likelihoods as a function of possible parameter values

ML Solution is at peak



Algebraically:

or alternatively, we can maximize the logarithm


where:

Why the logarithm?

The logarithm is a monotonic transformation.

Hence, the position of the peak stays in the same place

But the log likelihood is easier to work with20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince


How to maximize a function? Take derivative and equate to zero.


Solution:


Maximum likelihood solution:

Should look familiar!



Least Squares

23

Maximum likelihood for the normal distribution...

...gives `least squares’ fitting criterion.

Fitting normal distribution: MAPFitting

As the name suggests we find the parameters which maximize the posterior probability ..

Likelihood is normal PDF


Fitting normal distribution: MAPPrior

Use conjugate prior, normal scaled inverse gamma.


Fitting normal distribution: MAP

Likelihood Prior Posterior



Again maximize the log – does not change position of maximum


Fitting normal distribution: MAPMAP solution:

Mean can be rewritten as weighted sum of data mean and prior mean:



50 data points 5 data points 1 data points

Fitting normal: Bayesian approachFitting

Compute the posterior distribution using Bayes’ rule:



Two constants MUST cancel out or LHS not a valid pdf31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince



where


Fitting normal: Bayesian approachPredictive density

Take weighted sum of predictions from different parameter values:


Posterior Samples from posterior






where


Fitting normal: Bayesian Approach

50 data points 5 data points 1 data points36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

37

Structure




Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Categorical Distribution

or can think of data as vector with all elements zero except kth e.g. [0,0,0,1 0]

For short we write:

Categorical distribution describes situation where K possible outcomes y=1… y=k.Takes K parameters where

38

Dirichlet DistributionDefined over K values where

Or for short: Has k parameters ak>0


Categorical distribution: ML


Maximize product of individual likelihoods

(remember, P(x) = )

Nk = # times weobserved bin k

Categorical distribution: ML


Instead maximize the log probability

Log likelihood Lagrange multiplier to ensure that params sum to one

Take derivative, set to zero and re-arrange:

Categorical distribution: MAP


MAP criterion:

Categorical distribution: MAP


With a uniform prior (a1..K=1), gives same result as maximum likelihood.

Take derivative, set to zero and re-arrange:

Categorical Distribution

Observed data

Five samples from prior

Five samples from posterior


Categorical Distribution: Bayesian approach


Two constants MUST cancel out or LHS not a valid pdf

Compute posterior distribution over parameters:

Categorical Distribution: Bayesian approach


Two constants MUST cancel out or LHS not a valid pdf

Compute predictive distribution:

ML / MAP vs. Bayesian

MAP/ML Bayesian47Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Conclusion


• Three ways to fit probability distributions• Maximum likelihood• Maximum a posteriori• Bayesian Approach

• Two worked example• Normal distribution (ML least squares)• Categorical distribution

Computer vision: models, learning and inference Chapter 4 Fitting Probability Models.

Documents

computer vision

prince slide

fitting probability

parameters bayesian

posterior probability

posteriori map fitting

independent slide

best parameters