Top Banner
Multiple Non- Redundant Spectral Clustering Views Donglin Niu, Jennifer G. Dy Department of Electrical and Computer Engineering, Northeastern University, Boston, MA Michael I. Jordan EECS and Statistics Departments, University of California, Berkeley
20
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2010 ICML

Multiple Non-Redundant Spectral

Clustering ViewsDonglin Niu, Jennifer G. Dy

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA

Michael I. Jordan

EECS and Statistics Departments, University of California, Berkeley

Page 2: 2010 ICML

Motivation of Multiple Clustering

Page 3: 2010 ICML

Another Example

Given medical data, From doctor’s view:

according to type of disease From insurance company view:

based on patient’s cost/risk

Page 4: 2010 ICML

Two kinds of Approaches:Iterative Given an existing clustering, find another

clustering Conditional Information Bottleneck. Gondek and

Hofmann (2004) COALA. Bae and Bailey (2006) Minimizing KL-divergence. Qi and Davidson (2009)

Multiple alternative clusterings Orthogonal Projection. Cui et al. (2007)

Previous Work

Iterative & Simultaneous

Page 5: 2010 ICML

SimultaneousDiscovery of all the possible partitionings

Meta Clustering. Caruana et al. (2006) De-correlated kmeans. Jain et al. (2008)

Page 6: 2010 ICML

Ensemble Clustering

Hierarchical Clustering

Different from

Page 7: 2010 ICML

1. have high cluster quality, and2. be non-redundant

and we’d like to simultaneously 3. learn the subspace in each view

Problem FormulationVIEW 1

VIEW 2

NKThere are O( ) possible clustering solutions. We’d like to find solutions that:

Page 8: 2010 ICML

Normalized Cut (On Spectral Clustering, Ng et al.)

-maximize within-cluster similarity and minimize between-cluster similarity.

Let U be the cluster assignment

Advantage: Can discover arbitrarily-shaped clusters.

Clustering Quality

IUUts

UKDDUT

T

..

)(max tr 2/12/1

Page 9: 2010 ICML

There are several possible criteria: Correlation, Mutual information.

Correlation: can capture only linear dependencies.

Mutual information: can capture non-linear dependencies, but requires estimating the joint probability distribution.

Non-Redundant Clustering Views

2,HSIC

HSxycy)(x

In this approach, we choose Hilbert-Schmidt Information Criterion

Advantage: Can detect non-linear dependence, do not need to estimate joint probability distributions.

Page 10: 2010 ICML

HSIC is the norm of a cross-covariance matrix in kernel space.

Empirical estimate of HSIC

Hilbert-Schmidt Information Criteria (HSIC)

)])(())([( yxxyxy yxEC

)KHLH(1

:),(HSIC2tr

nYX

Tnn

jiijjiij

nn

nI

yylxxk

R

ts

111

H

),(:L),,(:K

,LK,H,

..

Number of observations

Kernel functions

2,HSIC

HSxycy)(x

Page 11: 2010 ICML

Overall Objective Function

),(, , ..

)(tr)(trmaximize

,

:RedundancyCut Normalized :QualityCluster

2/12/1

jTvi

Tvijvv

Tvv

Tv

qv

HSIC

qvRU vvvvTv

xWxWKKIWWIUUts

HHKKUDKDUcnv

Where Uv is the embedding, Kv is the kernel matrix, Dv is the degree matrix for each view v. Hv is the matrix to centralize the kernel

matrix. All these are defined in subspace Wv.

Page 12: 2010 ICML

Step 1: Fixed Wv, optimize for Uv

Solution to Uv is equal to the eigenvectors with the largest eigenvalues of the normalized kernel similarity matrix.

AlgorithmWe use a coordinate ascent approach.

Step 2: Fixed Uv, optimize for Wv

We use gradient ascent on a Stiefel manifold.

Repeat Steps 1 & 2 until convergence.

K-means Step: Normalize Uv. Apply k-means on Uv.

Page 13: 2010 ICML

Cluster the features using spectral clustering. Data x = [f1 f2 f3 f4 f5 …fd] Feature similarity based on HSIC(fi,fj).

Initialize Wv

..0..

..100

..000

..010

..001vWf1

f2

f4

Transformation Matrix

f3f21

f9

…f15

f34

f7…

Page 14: 2010 ICML

Synthetic Data Synthetic Data 1View 1 View 2

Synthetic Data 2View 1 View 2

DATA 1 DATA 2

VIEW 1 VIEW 2 VIEW 1 VIEW 2

mSC 0.94 0.95 0.90 0.93

OPC 0.89 0.85 0.02 0.07

DK 0.87 0.94 0.03 0.05

SC 0.37 0.42 0.31 0.25

Kmeans 0.36 0.34 0.03 0.05

mSC: our algorithmOPC: orthogonal Projection

(Cui et al., 2007)DK: de-correlated Kmeans

(Jain et al., 2008)SC: spectral clustering

Normalized Mutual Information (NMI) Results

Page 15: 2010 ICML

Face Image Data

•Mean face•Number below each image is cluster purity

Identity (ID)View Pose ViewFACE

ID POSE

mSC 0.79

0.42

OPC 0.67 0.37

DK 0.70 0.40

SC 0.67 0.22

Kmeans

0.64 0.24

NMI Results

Page 16: 2010 ICML

High weight word in each subspace view

view 1 Cornell, Texas, Wisconsin, Madison, Washington

view 2

homework, student, professor, project, Ph.d

Webkb Data High Weight Words

NMI Results

Webkb

Univ. Type

mSC 0.81 0.54

OPC 0.43 0.53

DK 0.48 0.57

SC 0.25 0.39

Kmeans 0.10 0.50

Page 17: 2010 ICML

Subjects

Physics Information Biology

materialschemicalmetalopticalquantum

controlprogramminginformationfunctionlanguages

cellgeneproteinDNABiological

Work Type

experimental theoretical

methodsmathematicaldevelopequationtheoretical

ExperimentsProcessesTechniquesMeasurementssurface

NSF Award Data High Frequent Words

Page 18: 2010 ICML

Machine Sound Data

Machine Sound Data

Motor Fan Pump

mSC 0.82 0.75 0.83

OPC 0.73 0.68 0.47

DK 0.64 0.58 0.75

SC 0.42 0.16 0.09

Kmeans 0.57 0.16 0.09

Normalized Mutual Information (NMI) Results

Page 19: 2010 ICML

Summary Most clustering algorithms only find one single

clustering solution. However, data may be multi-faceted (i.e., it can be interpreted in many different ways).

We introduced a new method for discovering multiple non-redundant clusterings.

Our approach, mSC, optimizes both a spectral clustering (to measure quality) and an HSIC regularization (to measure redundancy).

mSC, can discover multiple clusters with flexible shapes, while simultaneously find the subspace in which these clustering views reside.

Page 20: 2010 ICML

Thank you!