Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Learning multiple nonredundant clusterings Presenter : Wei-Hao Huang Authors : Ying Gui, Xiaoli Z. Fern, Jennifer G. DY TKDD, 2010
Feb 22, 2016
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
Learning multiple nonredundant clusterings
Presenter : Wei-Hao Huang Authors : Ying Gui, Xiaoli Z. Fern, Jennifer G. DY
TKDD, 2010
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
Motivation· Data exist multiple groupings that are reasonable
and interesting from different perspectives.· Traditional clustering is restricted to finding only
one single clustering.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives
4
• To propose a new clustering paradigm for finding all non-redundant clustering solutions of the data.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
5
Methodology· Orthogonal clustering
─ Cluster space· Clustering in orthogonal subspaces
─ Feature space· Automatically Finding the number of clusters· Stopping criteria
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Orthogonal Clustering Framework
6
X (Face dataset)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Orthogonal clustering
· Residue space
7
)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Clustering in orthogonal subspaces· Feature space
─ linear discriminant analysis (LDA)
─ singular value decomposition (SVD)
─ LDA v.s. SVD where
8
Projection Y=ATX
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Clustering in orthogonal subspaces
· Residue space
9
A(t)= eigenvectors of
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Compare moethod1 and mothod2· Residue space· Moethod1
─ · Moethod2
─ ─
· Moethod1 is a special case of Moethod2.─
10
A(t)= eigenvectors of
M’=M then P1=P2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· To use PCA to reduce dimensional· Clustering
─ K-means clustering Smallest SSE
─ Gaussian mixture model clustering (GMM) Largest maximum likelihood
· Dataset─ Synthetic─ Real-world
Face, WebKB text, Vowel phoneme, Digit
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Evaluation
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Synthetic
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Face dataset
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· WebKB dataset
· Vowe phoneme dataset
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Digit dataset
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Finding the number of clusters
─ K-means Gap statistics
17
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Finding the number of clusters
─ GMMBIC
· Stopping Criteria─ SSE is less than 10% at first iteration─ Kopt=1─ Kopt > Kmax Select Kmax ─ Gap statistics─ BIC Maximize value of BIC
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Synthetic dataset
19
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Face dataset
20
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· WebKB dataset
21
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
22
Conclusions
• To discover varied interesting and meaningful
clustering solutions.
• Method2 is able to apply any clustering and
dimensionality reduction algorithm.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
23
Comments· Advantages
─ Find Multiple non-redundant clustering solutions
· Applications─ Data Clustering