University of Joensuu Dept. of Computer Science P.O. Box 111 FIN- 80101 Joensuu Tel. +358 13 251 7959 fax +358 13 251 7955 Gaussian Mixture.
Post on 01-Apr-2015
213 Views
Preview:
Transcript
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Gaussian Mixture Models
Speech and Image Processing UnitDepartment of Computer Science
University of Joensuu, FINLAND
Ville Hautamäki
Clustering Methods: Part 8
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Preliminaries
• We assume that the dataset X has been generated by a parametric distribution p(X).
• Estimation of the parameters of p is known as density estimation.
• We consider Gaussian distribution.
http://research.microsoft.com/~cmbishop/PRML/Figures taken from:
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Typical parameters (1)
• Mean (μ): average value of p(X), also called expectation.
• Variance (σ): provides a measure of variability in p(X) around the mean.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Typical parameters (2)
• Covariance: measures how much two
variables vary together.
• Covariance matrix: collection of covariances between all dimensions.
– Diagonal of the covariance matrix
contains the variances of each attribute.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
One-dimensional Gaussian
• Parameters to be estimated are the mean (μ) and variance (σ)
2222
1 1Normal( | , ) exp
22x x
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Multivariate Gaussian (1)
• In multivariate case we have covariance matrix instead of variance
2 1/ 2
1 1 1Normal( | , ) exp
(2 ) det( ) 2
T
x x x
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Multivariate Gaussian (2)
Full covarianceDiagonalSingle
2
2
0
0
21
22
0
0
2 211 12
2 212 22
1
ln ( ) ln Normal( | , )N
nn
p X
x
Complete data log likelihood:
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Maximum Likelihood (ML) parameter estimation
• Maximize the log likelihood formulation
• Setting the gradient of the complete data log
likelihood to zero we can find the closed form
solution.
– Which in the case of mean, is the sample average.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
When one Gaussian is not enough
• Real world datasets are rarely unimodal!
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Mixtures of Gaussians
1
( ) Normal( | , )M
k k kk
p
x x
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Mixtures of Gaussians (2)
• In addition to mean and covariance parameters (now M times), we have mixing coefficients πk.
1
1M
kk
0 1k
Following properties hold for the mixing coefficients:
It can be seen as the prior probability of the component k
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Responsibilities (1)
• Component labels (red, green and blue)
cannot be observed.
• We have to calculate approximations
(responsibilities).
Complete data Incomplete data Responsibilities
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Responsibilities (2)
• Responsibility describes, how
probably observation vector x is from
component k.
• In clustering, responsibilities take
values 0 and 1, and thus, it defines the
hard partitioning.
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
We can express the marginal density p(x) as:
1
( ) ( ) ( | )M
k
p p k p k
x x
( ) ( | )
( ) ( | )
( ) ( | )
Normal( | , )
Normal( | , )
k
l
k k k
l l ll
p k
p p k
p l p l
x x
x x
x
x
x
From this, we can find the responsibility of the kth component of x using Bayesian theorem:
Responsibilities (3)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Expectation Maximization (EM)
• Goal: Maximize the log likelihood of the whole data
• When responsibilities are calculated, we can maximize individually for the means, covariances and the mixing coefficients!
1 1
ln ( | , , ) ln Normal( | , )N M
k n k kn k
p
X x
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Exact update equations
New mean estimates:
Covariance estimates
Mixing coefficient estimates
1
1( )
N
k k n nnkN
x x1
( )N
k k nn
N
x
1
1( )( )( )
TN
k k nnkN
x x x
kk
N
N
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
EM Algorithm
• Initialize parameters
• while not converged
– E step: Calculate responsibilities.
– M step: Estimate new parameters
– Calculate log likelihood of the new
parameters
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Example of EM
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Computational complexity
• Hard clustering with MSE criterion is NP-complete.
• Can we find optimal GMM in polynomial time?
• Finding optimal GMM is in class NP
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Some insights
• In GMM we need to estimate the parameters, which all are real numbers– Number of parameters:
M+M(D) + M(D(D-1)/2)
• Hard clustering has no parameters, just set partitioning (remember optimality criteria!)
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Some further insights (2)
• Both optimization functions are mathematically rigorous!
• Solutions minimizing MSE are always meaningful
• Maximization of log likelihood might lead to singularity!
University of JoensuuDept. of Computer ScienceP.O. Box 111FIN- 80101 Joensuu
Tel. +358 13 251 7959fax +358 13 251 7955
www.cs.joensuu.fi
Example of singularity
top related