Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Bayesian InferenceEkaterina LomakinaTNU seminar: Bayesian inference1 March 2013

Outline• Probability distributions• Maximum likelihood estimation• Maximum a posteriori estimation• Conjugate priors• Conceptualizing models as collection of priors• Noninformative priors• Empirical Bayes

Probability distribution• Density estimation – to model distribution p(x) of a random

variable x given a finite set of observations x1, …, xN.

Nonparametric approach Parametric approach

• Histogram• Kernel density estimation• Nearest neighbor approach

• Gaussian distribution• Beta distribution• …

The Exponential Family

Gaussian distribution

Binomial distribution

Beta distribution

etc…

Gaussian distribution• Central limit theorem (CLT) states that, given certain

conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed

Bean machine by Sir Francis Galton

Maximum likelihood estimation• The frequentist approach to estimate parameters of the

distribution given a set of observations is to maximize likelihood.

– data are i.i.d

– monotonic transformation

MLE for Gaussian distribution

– simple average

Maximum a posterior estimation• The bayesian approach to estimate parameters of the

distribution given a set of observations is to maximize posterior distribution.

• It allows to account for the prior information.

MAP for Gaussian distribution

Posterior distribution is given by

– weighted average

Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a

prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.

• For any member of the exponential family, there exists a conjugate prior that can be written in the form

• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma

MLE for Binomial distribution• Binomial distribution models the probability of m “heads” out

of N tosses.

• The only parameter of the distribution μ encodes probability of a single event (“head”)

• Maximum likelihood estimation is given by

MAP for Binomial distribution

• The conjugate prior for this distribution is Beta

• The posterior is then given by

where l = N – m, simply the number of “tails”.

Models as collection of priors - 1• Take a simple regression model

• Add a prior on weights

• And get Bayesian linear regression!

Models as collection of priors - 2• Take again a simple regression model

• Add a prior on function

• And get Gaussian processes!

yn

β

yn

β

K

Where yn is some function of xn

Models as collection of priors - 3• Take a model where xn is discrete and unknown

• Add a prior on states (xn), assuming they are temporarily smooth

• And get Hidden Markov Model!

θ

x1 x2 xn-1xn xn+1

t1 tnt2 tn-1tn+1

Noninformative priors• Sometimes we have no strong prior belief but still want to

apply Bayesian inference. Then we need noninformative priors.

• If our parameter λ is a discrete variable with K states then we can simply set each prior probability to 1/K.

• However for continues variables it is not so clear. • One example of a noninformative prior could be a

noninformative prior over μ for Gaussian distribution:

with • We can see that the effect of the prior on the posterior over μ

is vanished in this case.

Empirical Bayes• But what if still want to assume some prior

information but want to learn it from the data instead of assuming in advance?

• Imagine the following model

• We cannot use full Bayesian inference but we can approximate it by finding the best λ* to maximize p(X|λ)

N

θs

xn

S

λ

• We can estimate the result by the following iterative procedure (EM-algorithm):

• Initialize λ*

• E-step:

• M-step:

• It illustrates the other term for Empirical Bayes – maximum marginal likelihood.

• This is not fully Bayesian treatment however offers a useful compromise between Bayesian and frequentist approaches.

Empirical Bayes

Compute p(θ|X, λ) given fixed λ*

Thank you for your attention!

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

Documents

prior distribution

normal distribution

cauchy distribution

hypergeometric distribution

logistic distribution

model distribution px

students tdistribution

inverse gaussian