Top Banner
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray
19

Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Dec 16, 2015

Download

Documents

Armani Mead
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Maximum Likelihood And Expectation Maximization

Lecture Notes for CMPUT 466/551

Nilanjan Ray

Page 2: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

MLE and EM

• Maximum Likelihood Estimation (MLE) and Expectation Maximization are two very important tools in Machine Learning

• Essentially you use them in estimating probability distributions in a learning algorithm; we have already seen one such example– in logistic regression we used MLE

• We will revisit MLE here, realize certain difficulties of MLE

• Then Expectation Maximization (EM) will rescue us

Page 3: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Probability Density Estimation: Quick Points

Two different routes:

Parametric• Provide a parametrized class of

density functions• Tools:

– Maximum likelihood estimation– Expectation Maximization – Sampling techniques– ….

Non-Parametric• Density is modeled by samples:• Tools:

– Kernel Methods– Sampling techniques– …

Page 4: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Revisiting Maximum Likelihood

The data is coming from a known probability distribution

The probability distribution has some parameters that are unknown to you

Example: data is distributed as Gaussian yi ~ N(, 2),so the unknown parameters here are = (, 2)

MLE is a tool that estimates the unknown parameters of the probabilitydistribution from data

Page 5: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

MLE: Recapitulation

• Assume observation data yi are independent

• Form the Likelihood:

• Form the Log-likelihood:

• To find out the unknown parameter values, maximize the log-likelihood with respect to the unknown parameters:

},,,{);2

)(exp(

2

1);( 212

2

1N

iN

i

yyyμy

σπL

ZZ

)2log(2

)())

2

)(exp(

2

1log();(

12

2

2

2

1

σπNμyμy

σπl

N

i

iiN

i

Z

N

ii

N

i i μyN

l

N

μ

l

1

222

1 )(1

0;0

Page 6: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

MLE: A Challenging Example

Observation data:

histogram

Indicator variable

is the probability with which the observation is chosen from density 2

(1- ) is the probability with which the observation is chosen from density 1

Mixture model:

Source: Department of Statistics, CMU

}1,0{;)1(

),(~);,(~

21

2222

2111

YYY

σμNYσμNY

),();,( 222111 σμθσμθ

Page 7: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

MLE: A Challenging Example…

Maximum likelihood fitting for parameters:

Numerically (and of course analytically, too) Challenging to solve!!

),,,,( 2121 σσμμπ

Page 8: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Expectation Maximization: A Rescuer

EM augments the data space– assumes some latent data

Source: Department of Statistics, CMU

Page 9: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM: A Rescuer…

Maximizing this form of log-likelihood is now tractable

Note that we cannot analytically maximize this log-likelihood

niiiy 1},{ T

Source: Department of Statistics, CMU

);(0 Tl

Page 10: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM: The Complete Data Likelihood

;)1(

)1(0

1

11

1

0

N

ii

N

iii yl

;0

1

12

2

0

N

ii

N

iii yl

;)1(

))(1(0

1

1

21

212

1

0

N

ii

N

iii μy

σσ

l;

)(0

1

1

22

222

2

0

N

ii

N

iii μy

σσ

l

;0 10

π

l

N

ii

By simple differentiations we have:

How do we get the latent variables?

So, maximization of the complete data likelihood is much easier!

Page 11: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Obtaining Latent Variables

The latent variables are computed as expected values given the data and parameters:

),|1Pr(),|()( iiiii yyEθγ

Apply Bayes’ rule:

πyΦπyΦ

πyΦ

yy

yyθγ

ii

i

iiiiii

iiiiii

)()1)((

)(

)|0Pr(),0|Pr()|1Pr(),1|Pr(

)|1Pr(),1|Pr(),|1Pr()(

21

2

Page 12: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM for Two-component Gaussian Mixture

• Initialize 1, 1, 2, 2,

• Iterate until convergence– Expectation of latent variables

– Maximization for finding parameters

)2

)(2

)(exp(

11

1

)()1)((

)()(

22

22

21

21

1

221

2

σμy

σμyπyΦπyΦ

πyΦθγ

iiii

ii

;)1(

)1(

1

11

N

ii

N

iii

γ

yγ ;

1

12

N

ii

N

iii

γ

yγ ;

)1(

))(1(

1

1

21

21

N

ii

N

iii

γ

μyγσ ;

)(

1

1

22

22

N

ii

N

iii

γ

μyγσ ;1

N

γπ

N

ii

Page 13: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM for Mixture of K Gaussians

• Initialize mean vectors, covariance matrices, and mixing probabilities: k, k, k, k =1,2,…,K.

• Expectation Step: compute responsibilities

• Maximization Step: update parameters

• Iterate Steps Expectation and Maximization until convergence

.,,1,,,1,),;(

),;(

1

KkNiΦπ

Φπγ K

n nnin

kkikik

μy

μy

N

i ik

N

i

Tkikiik

k

N

i ikkN

i ik

N

i iikk

γ

γ

N

γπ

γ

γ

1

1

1

1

1

))((

;

μyμy

Page 14: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM Algorithm in General

T = (Z, Zm) is the complete data; we only know Z, Zm is missing

),|Pr(

)|Pr(

),|Pr(

)|,Pr()|Pr(

θZZ

θT

θZZ

θZZθZ

mm

m

Taking logarithm: )|;();();( 10 ZZθlTθlZθl m

Because we have access to previous parameter values , we can do better:

),(),()]|;([)];([);( 1,|0,| θθRθθQZZθlETθlEZθl mθZZθZT m

Let us now consider the expression: )],(),([)],(),([);();( θθRθθRθθQθθQZθlZθl

It can be shown that ),(),( θθRθθR

Thus if ’ maximizes ),( θθQ then );();( ZθlZθl

This is actually done by Jensen’s inequality

Page 15: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

• Start with initial parameter values (0); t = 1• Expectation step: compute

• Maximization step:

• t =t + 1 and iterate

EM Algorithm in General

)];([),( 0,|

)1()1( TθlEθθQ tθZT

t

),(maxarg )1()(

tt θθQ

Page 16: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM Algorithm: Summary

• Augment the original data space by latent/hidden/missing data

• Frame a suitable probability model for the augmented data space

• In EM iterations, first assume initial values for the parameters

• Iterate the Expectation and the Maximization steps• In the Expectation step, find the expected values of the

latent variables (here you need to use the current parameter values)

• In the Maximization step, first plug in the expected values of the latent variables in the log-likelihood of the augmented data. Then maximize this log-likelihood to reevaluate the parameters

• Iterate last two steps until convergence

Page 17: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Applications of EM

– Mixture models– HMMs– PCA– Latent variable models– Missing data problems– many computer vision problems– …

Page 18: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

References

• The EM Algorithm and Extensions by Geoffrey J. MacLauchlan, Thriyambakam Krishnan

• For a non-parametric density estimate by EM look at: http://bioinformatics.uchc.edu/LectureNotes_2006/Tools_EM_SA_2006_files/frame.htm

Page 19: Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.

EM: Important Issues

• Is the convergence of the algorithm guaranteed?

• Does the outcome of EM depend on the initial choice of the parameter values?

• How about the speed of convergence?

• How easy or difficult could it be to compute the expected values of the latent variables?