Common Probability Distibution

CHAPTER 3:

COMMON PROBABILITY

DISTRIBUTIONSCOMPUTER VISION: MODELS, LEARNING AND

INFERENCE

Lukas Tencer

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

2

Why model these complicated

quantities?


Simon J.D. Prince

3

Because we need probability distributions over model parameters as

well as over data and world state. Hence, some of the distributions

describe the parameters of the others:

Why model these complicated

quantities?


Simon J.D. Prince

4

Because we need probability distributions over model parameters as

well as over data and world state. Hence, some of the distributions

describe the parameters of the others:

Example:

Models meanModels variance

Parameters modelled by:

Bernoulli Distribution


Simon J.D. Prince

5

or

For short we write:

Bernoulli distribution describes situation where only two

possible outcomes y=0/y=1 or failure/success

Takes a single parameter

Beta Distribution


Simon J.D. Prince

6

Defined over data (i.e. parameter of Bernoulli)

• Two parameters both > 0

• Mean depends on relative values E[ ] = .

• Concentration depends on magnitude

For short we write:

Categorical Distribution


Simon J.D. Prince

7

or can think of data as vector with all

elements zero except kth e.g. e4 =

[0,0,0,1,0]

For short we write:

Categorical distribution describes situation where K possible

outcomes y=1… y=k.

Takes K parameters where

Dirichlet Distribution


Simon J.D. Prince

8

Defined over K values where

Or for short: Has k

parameters k>0

Univariate Normal Distribution


Simon J.D. Prince

9

For short we write:

Univariate normal distribution

describes single continuous

variable.

Takes 2 parameters and 2>0

Normal Inverse Gamma

Distribution


Simon J.D. Prince

10

Defined on 2 variables and 2>0

or for short

Four parameters and

Multivariate Normal Distribution


Simon J.D. Prince

11

For short we write:

Multivariate normal distribution describes multiple continuous

variables. Takes 2 parameters

• a vector containing mean position,

• a symmetric “positive definite” covariance matrix

Positive definite: is positive for any real

Types of covariance


Simon J.D. Prince

12

Covariance matrix has three forms, termed spherical, diagonal and full

Normal Inverse Wishart


Simon J.D. Prince

13

Defined on two variables: a mean vector and a symmetric positive definite

matrix, .

or for short:

Has four parameters

• a positive scalar,

• a positive definite matrix

• a positive scalar,

• a vector

Samples from

Normal Inverse

Wishart


Simon J.D. Prince

14

(dispersion) (ave. Covar) (disper of means) (ave. of means)

Conjugate Distributions


Simon J.D. Prince

15

The pairs of distributions discussed have a special relationship: they are conjugate distributions

Beta is conjugate to Bernouilli

Dirichlet is conjugate to categorical

Normal inverse gamma is conjugate to univariatenormal

Normal inverse Wishart is conjugate to multivariate normal

Conjugate Distributions


Simon J.D. Prince

16

When we take product of distribution and it’s conjugate,

the result has the same form as the conjugate.

For example, consider the case where

then

a constant A new Beta distribution

Example proof17

When we take product of distribution and it’s conjugate,

the result has the same form as the conjugate.

17Computer vision: models, learning and inference. ©2011 Simon J.D.

Prince

Bayes’ Rule Terminology


Simon J.D. Prince

18

Posterior – what we

know about y after

seeing x

Prior – what we know

about y before seeing

x

Likelihood – propensity

for observing a certain

value of x given a certain

value of y

Evidence – a constant to

ensure that the left hand

side is a valid distribution

Importance of the Conjugate

Relation 1


Simon J.D. Prince

19

Learning parameters: 1. Choose prior

that is conjugate

to likelihood

2. Implies that posterior

must have same form as

conjugate prior

distribution

3. Posterior must be a distribution

which implies that evidence must

equal constant from conjugate

relation

Importance of the Conjugate

Relation 2


Simon J.D. Prince

20

Marginalizing over parameters

1. Chosen so

conjugate to other

term

2. Integral becomes easy --the product

becomes a constant times a distribution

Integral of constant times probability

distribution

= constant times integral of probability

distribution = constant x 1 = constant

Conclusions


Simon J.D. Prince

21

• Presented four distributions which model useful

quantities

• Presented four other distributions which model

the parameters of the first four

• They are paired in a special way – the second set

is conjugate to the other

• In the following material we’ll see that this

relationship is very useful

Based on:

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

http://www.computervisionmodels.com/

22 Thank You

for you attention

Common Probability Distibution

Technology