CHAPTER 3: COMMON PROBABILITY DISTRIBUTIONS COMPUTER VISION: MODELS, LEARNING AND INFERENCE Lukas Tencer
Jul 04, 2015
CHAPTER 3:
COMMON PROBABILITY
DISTRIBUTIONSCOMPUTER VISION: MODELS, LEARNING AND
INFERENCE
Lukas Tencer
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
2
Why model these complicated
quantities?
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
3
Because we need probability distributions over model parameters as
well as over data and world state. Hence, some of the distributions
describe the parameters of the others:
Why model these complicated
quantities?
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
4
Because we need probability distributions over model parameters as
well as over data and world state. Hence, some of the distributions
describe the parameters of the others:
Example:
Models meanModels variance
Parameters modelled by:
Bernoulli Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
5
or
For short we write:
Bernoulli distribution describes situation where only two
possible outcomes y=0/y=1 or failure/success
Takes a single parameter
Beta Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
6
Defined over data (i.e. parameter of Bernoulli)
• Two parameters both > 0
• Mean depends on relative values E[ ] = .
• Concentration depends on magnitude
For short we write:
Categorical Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
7
or can think of data as vector with all
elements zero except kth e.g. e4 =
[0,0,0,1,0]
For short we write:
Categorical distribution describes situation where K possible
outcomes y=1… y=k.
Takes K parameters where
Dirichlet Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
8
Defined over K values where
Or for short: Has k
parameters k>0
Univariate Normal Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
9
For short we write:
Univariate normal distribution
describes single continuous
variable.
Takes 2 parameters and 2>0
Normal Inverse Gamma
Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
10
Defined on 2 variables and 2>0
or for short
Four parameters and
Multivariate Normal Distribution
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
11
For short we write:
Multivariate normal distribution describes multiple continuous
variables. Takes 2 parameters
• a vector containing mean position,
• a symmetric “positive definite” covariance matrix
Positive definite: is positive for any real
Types of covariance
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
12
Covariance matrix has three forms, termed spherical, diagonal and full
Normal Inverse Wishart
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
13
Defined on two variables: a mean vector and a symmetric positive definite
matrix, .
or for short:
Has four parameters
• a positive scalar,
• a positive definite matrix
• a positive scalar,
• a vector
Samples from
Normal Inverse
Wishart
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
14
(dispersion) (ave. Covar) (disper of means) (ave. of means)
Conjugate Distributions
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
15
The pairs of distributions discussed have a special relationship: they are conjugate distributions
Beta is conjugate to Bernouilli
Dirichlet is conjugate to categorical
Normal inverse gamma is conjugate to univariatenormal
Normal inverse Wishart is conjugate to multivariate normal
Conjugate Distributions
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
16
When we take product of distribution and it’s conjugate,
the result has the same form as the conjugate.
For example, consider the case where
then
a constant A new Beta distribution
Example proof17
When we take product of distribution and it’s conjugate,
the result has the same form as the conjugate.
17Computer vision: models, learning and inference. ©2011 Simon J.D.
Prince
Bayes’ Rule Terminology
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
18
Posterior – what we
know about y after
seeing x
Prior – what we know
about y before seeing
x
Likelihood – propensity
for observing a certain
value of x given a certain
value of y
Evidence – a constant to
ensure that the left hand
side is a valid distribution
Importance of the Conjugate
Relation 1
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
19
Learning parameters: 1. Choose prior
that is conjugate
to likelihood
2. Implies that posterior
must have same form as
conjugate prior
distribution
3. Posterior must be a distribution
which implies that evidence must
equal constant from conjugate
relation
Importance of the Conjugate
Relation 2
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
20
Marginalizing over parameters
1. Chosen so
conjugate to other
term
2. Integral becomes easy --the product
becomes a constant times a distribution
Integral of constant times probability
distribution
= constant times integral of probability
distribution = constant x 1 = constant
Conclusions
Computer vision: models, learning and inference. ©2011
Simon J.D. Prince
21
• Presented four distributions which model useful
quantities
• Presented four other distributions which model
the parameters of the first four
• They are paired in a special way – the second set
is conjugate to the other
• In the following material we’ll see that this
relationship is very useful
Based on:
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
http://www.computervisionmodels.com/
22 Thank You
for you attention