Top Banner
CHAPTER 3: COMMON PROBABILITY DISTRIBUTIONS COMPUTER VISION: MODELS, LEARNING AND INFERENCE Lukas Tencer
22

Common Probability Distibution

Jul 04, 2015

Download

Technology

Lukas Tencer

Presentation for reading session of Computer Vision: Models, Learning, and Inference
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Common Probability Distibution

CHAPTER 3:

COMMON PROBABILITY

DISTRIBUTIONSCOMPUTER VISION: MODELS, LEARNING AND

INFERENCE

Lukas Tencer

Page 2: Common Probability Distibution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

2

Page 3: Common Probability Distibution

Why model these complicated

quantities?

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

3

Because we need probability distributions over model parameters as

well as over data and world state. Hence, some of the distributions

describe the parameters of the others:

Page 4: Common Probability Distibution

Why model these complicated

quantities?

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

4

Because we need probability distributions over model parameters as

well as over data and world state. Hence, some of the distributions

describe the parameters of the others:

Example:

Models meanModels variance

Parameters modelled by:

Page 5: Common Probability Distibution

Bernoulli Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

5

or

For short we write:

Bernoulli distribution describes situation where only two

possible outcomes y=0/y=1 or failure/success

Takes a single parameter

Page 6: Common Probability Distibution

Beta Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

6

Defined over data (i.e. parameter of Bernoulli)

• Two parameters both > 0

• Mean depends on relative values E[ ] = .

• Concentration depends on magnitude

For short we write:

Page 7: Common Probability Distibution

Categorical Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

7

or can think of data as vector with all

elements zero except kth e.g. e4 =

[0,0,0,1,0]

For short we write:

Categorical distribution describes situation where K possible

outcomes y=1… y=k.

Takes K parameters where

Page 8: Common Probability Distibution

Dirichlet Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

8

Defined over K values where

Or for short: Has k

parameters k>0

Page 9: Common Probability Distibution

Univariate Normal Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

9

For short we write:

Univariate normal distribution

describes single continuous

variable.

Takes 2 parameters and 2>0

Page 10: Common Probability Distibution

Normal Inverse Gamma

Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

10

Defined on 2 variables and 2>0

or for short

Four parameters and

Page 11: Common Probability Distibution

Multivariate Normal Distribution

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

11

For short we write:

Multivariate normal distribution describes multiple continuous

variables. Takes 2 parameters

• a vector containing mean position,

• a symmetric “positive definite” covariance matrix

Positive definite: is positive for any real

Page 12: Common Probability Distibution

Types of covariance

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

12

Covariance matrix has three forms, termed spherical, diagonal and full

Page 13: Common Probability Distibution

Normal Inverse Wishart

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

13

Defined on two variables: a mean vector and a symmetric positive definite

matrix, .

or for short:

Has four parameters

• a positive scalar,

• a positive definite matrix

• a positive scalar,

• a vector

Page 14: Common Probability Distibution

Samples from

Normal Inverse

Wishart

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

14

(dispersion) (ave. Covar) (disper of means) (ave. of means)

Page 15: Common Probability Distibution

Conjugate Distributions

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

15

The pairs of distributions discussed have a special relationship: they are conjugate distributions

Beta is conjugate to Bernouilli

Dirichlet is conjugate to categorical

Normal inverse gamma is conjugate to univariatenormal

Normal inverse Wishart is conjugate to multivariate normal

Page 16: Common Probability Distibution

Conjugate Distributions

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

16

When we take product of distribution and it’s conjugate,

the result has the same form as the conjugate.

For example, consider the case where

then

a constant A new Beta distribution

Page 17: Common Probability Distibution

Example proof17

When we take product of distribution and it’s conjugate,

the result has the same form as the conjugate.

17Computer vision: models, learning and inference. ©2011 Simon J.D.

Prince

Page 18: Common Probability Distibution

Bayes’ Rule Terminology

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

18

Posterior – what we

know about y after

seeing x

Prior – what we know

about y before seeing

x

Likelihood – propensity

for observing a certain

value of x given a certain

value of y

Evidence – a constant to

ensure that the left hand

side is a valid distribution

Page 19: Common Probability Distibution

Importance of the Conjugate

Relation 1

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

19

Learning parameters: 1. Choose prior

that is conjugate

to likelihood

2. Implies that posterior

must have same form as

conjugate prior

distribution

3. Posterior must be a distribution

which implies that evidence must

equal constant from conjugate

relation

Page 20: Common Probability Distibution

Importance of the Conjugate

Relation 2

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

20

Marginalizing over parameters

1. Chosen so

conjugate to other

term

2. Integral becomes easy --the product

becomes a constant times a distribution

Integral of constant times probability

distribution

= constant times integral of probability

distribution = constant x 1 = constant

Page 21: Common Probability Distibution

Conclusions

Computer vision: models, learning and inference. ©2011

Simon J.D. Prince

21

• Presented four distributions which model useful

quantities

• Presented four other distributions which model

the parameters of the first four

• They are paired in a special way – the second set

is conjugate to the other

• In the following material we’ll see that this

relationship is very useful

Page 22: Common Probability Distibution

Based on:

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

http://www.computervisionmodels.com/

22 Thank You

for you attention