Learning multiple nonredundant clusterings

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Learning multiple nonredundant clusterings

Presenter : Wei-Hao Huang 　Authors : Ying Gui, Xiaoli Z. Fern, Jennifer G. DY

TKDD, 2010


N.Y.U.S.T.

I. M.

2

Outlines Motivation Objectives Methodology Experiments Conclusions Comments


N.Y.U.S.T.

I. M.

3

Motivation· Data exist multiple groupings that are reasonable

and interesting from different perspectives.· Traditional clustering is restricted to finding only

one single clustering.


N.Y.U.S.T.

I. M.Objectives

4

• To propose a new clustering paradigm for finding all non-redundant clustering solutions of the data.


N.Y.U.S.T.

I. M.

5

Methodology· Orthogonal clustering

─ Cluster space· Clustering in orthogonal subspaces

─ Feature space· Automatically Finding the number of clusters· Stopping criteria


N.Y.U.S.T.

I. M.Orthogonal Clustering Framework

6

X (Face dataset)


N.Y.U.S.T.

I. M.Orthogonal clustering

· Residue space

7

)


N.Y.U.S.T.

I. M.Clustering in orthogonal subspaces· Feature space

─ linear discriminant analysis (LDA)

─ singular value decomposition (SVD)

─ LDA v.s. SVD where

8

Projection Y=ATX


N.Y.U.S.T.

I. M.Clustering in orthogonal subspaces

· Residue space

9

A(t)= eigenvectors of


N.Y.U.S.T.

I. M.Compare moethod1 and mothod2· Residue space· Moethod1

─ · Moethod2

─ ─

· Moethod1 is a special case of Moethod2.─

10

A(t)= eigenvectors of

M’=M then P1=P2


N.Y.U.S.T.

I. M.Experiments· To use PCA to reduce dimensional· Clustering

─ K-means clustering Smallest SSE

─ Gaussian mixture model clustering (GMM) Largest maximum likelihood

· Dataset─ Synthetic─ Real-world

Face, WebKB text, Vowel phoneme, Digit

11


N.Y.U.S.T.

I. M.Experiments· Evaluation

12


N.Y.U.S.T.

I. M.Experiments· Synthetic

13


N.Y.U.S.T.

I. M.Experiments· Face dataset

14


N.Y.U.S.T.

I. M.Experiments· WebKB dataset

· Vowe phoneme dataset

15


N.Y.U.S.T.

I. M.Experiments· Digit dataset

16


N.Y.U.S.T.

I. M.Experiments· Finding the number of clusters

─ K-means Gap statistics

17


N.Y.U.S.T.

I. M.Experiments· Finding the number of clusters

─ GMMBIC

· Stopping Criteria─ SSE is less than 10% at first iteration─ Kopt=1─ Kopt > Kmax Select Kmax ─ Gap statistics─ BIC Maximize value of BIC

18


N.Y.U.S.T.

I. M.Experiments· Synthetic dataset

19


N.Y.U.S.T.

I. M.Experiments· Face dataset

20


N.Y.U.S.T.

I. M.Experiments· WebKB dataset

21


N.Y.U.S.T.

I. M.

22

Conclusions

• To discover varied interesting and meaningful

clustering solutions.

• Method2 is able to apply any clustering and

dimensionality reduction algorithm.


N.Y.U.S.T.

I. M.

23

Comments· Advantages

─ Find Multiple non-redundant clustering solutions

· Applications─ Data Clustering

Learning multiple nonredundant clusterings

Documents