Incomplete Graphical Models Nan Hu
Feb 25, 2016
Incomplete Graphical Models
Nan Hu
Outline
Motivation K-means clustering
Coordinate Descending algorithm Density estimation
EM on unconditional mixture Regression and classification
EM on conditional mixture A general formulation of EM Algorithm
K-means clustering
Problem: Given a set of observations
how to group them into a set of K clustering, supposing the value of K is given.
First Phase
Second Phase
K-means clustering
Original Set
First Iteration
Second Iteration
Third Iteration
K-means clustering
Coordinate descent algorithm The algorithm is trying to minimize distortion
measure J
by setting the partial derivatives to zero
Unconditional MixtureProblem: If the given sample data demonstrate
multimodal densities, how to estimate the true density?
Fit a single density with this bimodal case.
Although algorithm converges, the results bear little relationship to the truth.
Unconditional Mixture
A “divide-and-conquer” way to solve this problem
Introducing latent variable Z
Z
X
Multinomial node taking on one of K values
Assign a density model for each subpopulation, overall density is
Back
Unconditional Mixture
Gaussian Mixture Models In this model, the mixture components are
Gaussian distributions with parameters
Probability model for a Gaussian mixture
Unconditional Mixture
Posterior probability of latent variable Z:
Log likelihood:
Unconditional Mixture
Partial derivative of over using Lagrange Multipliers
Solve it, we have
Unconditional Mixture
Partial derivative of over
Setting it to zero, we have
Unconditional Mixture
Partial derivative of over
Setting it to zero, we have
Unconditional Mixture
The EM Algorithm First Phase
Second Phase
Back
Unconditional Mixture
EM algorithm from expected complete log likelihood point of view
Suppose we observed the latent variables , the data set becomes completely
observed, the likelihood is defined as the complete log likelihood
nZ),( nn zx
n i iiniin
ni
ziini
n nnnnc
xNz
xN
zxpzxlin
)],|(log[
)],|([log
)|,(log),|(
Unconditional Mixture
We treat the as random variables and take expectations conditioned on X and .
Note are binary r.v., where
Use this as the “best guess” for , we haveExpected complete log likelihood
nZ
nZ)(t
nZ
n i iiniti
n
n i iiniin
n nnnnc
xN
xNZ
Zxpzxl
t
tt
)],|(log[
)],|(log[
)|,(log),|(
)(
)(
)()(
Unconditional Mixture
Minimizing expected complete log likelihood by setting the derivatives to zero, we have
Conditional Mixture
Graphical Model
X
Z
Y Latent variable Z, multinomial node taking on one of K values
For regression and classification
The relationship between X and Z can be modeled in a discriminative classification way, e.g. softmax func.
Back
Conditional Mixture
By marginalizing over Z,
X is taken to be always observed. The posterior probability is defined as
Conditional Mixture
Some specific choice of mixture components Gaussian components
Logistic components
Where is the logistic function:
Conditional Mixture
Parameter estimation via EMComplete log likelihood :
Use expectation as the “best guess”, we have
n i ininnni
in
n i
zin
innni
ninnnnnnc
xZypxz
xZypx
xzypzyxl
in
)],,1|(),(log[
)],,1|(),([log
),|,(log)},,(|{
),,(
),,|1()(
)()(
tnn
in
tnn
in
in
yx
yxZpZ t
Conditional Mixture
The expected complete log likelihood can then be written as
Taking partial derivatives and setting them to zero to find the update formula for EM
n i in
innni
tin
nnn
xZypx
zyxl
)],,1|(),(log[
)},,(|{)(
Conditional Mixture
Summary of EM algorithm for conditional mixture (E step): Calculate the posterior probabilities (M step): Use the IRLS algorithm to update the
parameter , base on data pairs . (M step): Use the weighted IRLS algorithm to
update the parameters , based on the data points , with weights .
)(tin
),( )(tinnx
i),( nn yx )(ti
n
Back
General Formulation
- all observable variables - all latent variables - all parametersSuppose is observed, the ML estimate is
However, is in fact not observed
Z
XZ
)|,(logmaxarg),;(maxarg zxpzxlc
Complete log likelihood
Z
z
zxpxpxl )|,(log)|(log);( Incomplete log likelihood
General Formulation
Suppose factors in some way, complete log likelihood turns to be
Since is unknown, it’s not clear how to solve this ML estimation. However, if we average over the r.v. of
)|,( zxp
z
zc zxpxzfzxl )|,(log),|(),;(
),|( zxzf
),|( zxzf
z
zxzfxzq
),|()|(
General Formulation
Use as an estimate of , complete log likelihood becomes expected complete log likelihood
This expected complete log likelihood becomes solvable, and hopefully, it’ll also improve the complete log likelihood in some way. (The basic idea behind EM.)
)|( xzq ),|( zxzf
z
qc zxpxzqzxl )|,(log)|(),;(
General Formulation
EM maximizes incomplete log likelihood
),()|(
)|,(log)|(
)|()|,()|(log
)|,(log)|(log);(
qLxzqzxpxzq
xzqzxpxzq
zxpxpxl
z
z
z
Jensen’s Inequality
Auxiliary Function
General Formulation
Given , maximizing is equal to maximizing the expected complete log likelihood
)|( xzq ),( qL
zqc
z z
z
xzqxzqzxl
xzqxzqzxpxzqxzqzxpxzqqL
)|(log)|(),;(
)|(log)|()|,(log)|()|(
)|,(log)|(),(
General Formulation
Given , the choice yields the maximum of .
),( qL
),|()|( )()1( tt xzpxzq
);(
)|(log
)|(log),|(
),|()|,(log),|()),|((
)(
)(
)()(
)(
)()()()1(
xl
xp
xpxzp
xzpzxpxzpxzqL
t
t
z
tt
zt
tttt
Note: is the upper bound of );( )( xl t),( )(tqL
General Formulation
From above, at every step of EM, we maximized .
However, how do we know whether the finally maximized also maximized incomplete log likelihood ?
),( qL
),( qL);( xl
General Formulation
The different between and
z
z
zz
z
xzpxzqDxzpxzqxzq
xpxzpxzqxpxzq
xzqzxpxzpxpxzq
xzqzxpxzqxlqLxl
)),|(||)|((),|(
)|(log)|(
)|(),|()|(log)|(log)|(
)|()|,(log)|()|(log)|(
)|()|,(log)|();(),();(
);( xl ),( qL
KL divergencenon-negative and uniquely minimized at ),|()|( xzpxzq
General Formulation
EM and alternating minimization Recall the maximization of the likelihood is
exactly the same as minimization of KL divergence between the empirical distribution and the model.
Including the latent variable , KL divergence comes to be a “complete KL divergence” between joint distributions on .
Z
),( zx
General Formulation
Reformulated EM algorithm (E step)
(M step)
)||(minarg)|( )()1( t
q
t qDxzq
)||(minarg )1()1(
tt qDAlternating minimization algorithm
Summary
Unconditional Mixture Graphic model EM algorithm
Conditional Mixture Graphic model EM algorithm
A general formulation of EM algorithm Maximizing auxiliary function Minimizing “complete KL divergence”
Incomplete Graphical Models
Thank You!