Page 1
1
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
1
A Tutorial on Spectral ClusteringPart 2: Advanced/related Topics
Chris DingComputational Research Division
Lawrence Berkeley National LaboratoryUniversity of California
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
2
Advanced/related Topics• Spectral embedding: simplex cluster structure• Perturbation analysis• K-means clustering in embedded space• Equivalence of K-means clustering and PCA• Connectivity networks: scaled PCA & Green’s function• Extension to bipartite graphs: Correspondence
analysis • Random talks and spectral clustering• Semi-definite programming and spectral clustering• Spectral ordering (distance-sensitive ordering)• Webpage spectral ranking: Page-Rank and HITS
Page 2
2
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
3
Spectral Embedding:Simplex Cluster Structure
(Ding, 2004)
Simplex Embedding Theorem.Assume objects are well characterized by spectral clustering objective functions. In the embedded space, objects aggregate to K distinct centroids:• Centroids locate on K corners of a simplex
• Simplex consists K basis vectors + coordinate origin• Simplex is rotated by an orthogonal transformation T• Columns of T are eigenvectors of a K × K embedding matrix Γ
• Compute K eigenvectors of the Laplacian.• Embed objects in the K-dim eigenspaceWhat is the structure of the clusters?
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
4
K-way Clustering Objectives
=
=
=
=
∑
∑
∈∈
∈
MinMaxCut for Cut Normalizedfor
Cut Ratiofor
kk
k
CjCi ijkk
Ci ik
kk
k
wCCs
dCd
Cn
C
,),(
)(
||
)(ρ
∑∑−
=+=≤<≤ k
p
qp
q
qpKqp
p
qp
CCGCs
CCCs
CCCs
J)(
),()(
),()(
),(1 ρρρ
G - Ck is the graph complement of Ck
Page 3
3
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
5
Simplex Spectral Embedding Theorem
kkk tt λ=Γ
21
21 −− ΩΓΩ=Γ
)](,),([ 1 kCC ρρ diag=Ω
∑ ≠=
kpp kpkk sh|
−−
−−−−
=Γ
KKKK
K
K
hss
shsssh
21
22221
11211
Spectral Perturbation Matrix
),( qppq CCss =
Simplex Orthogonal Transform Matrix )1( KT tt ,,=
T are determined by:
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
6
Properties of Spectral Embedding
• Original basis vectors:
k
n
k nhk
/)00,11,00( lll=
• Dimension of embedding is K-1: (q2, … qK)– q1=(1,…,1)T is constant & trivial– Eigenvalues of Γ (=eigenvalues of D-W)– Eigenvalues determine how well clustering objective
function characterize the data
• Exact solution for K=2
Page 4
4
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
7
2-way Spectral Embedding(Exact Solution)
Eigenvalues
),(),(
),(),(,
)(),(
)(),(,
)(),(
)(),(
22
21
11
21
2
21
1
21
2
21
1
21
CCsCCs
CCsCCs
CdCCs
CdCCs
CnCCs
CnCCs +=+=+= MMCNcutRcut λλλ
Recover the original 2-way clustering objectives
For Normalized Cut, orthogonal transform T rotates
Tbbaaq ),,,,,(, 2 −−= ll Tq )11(1 l=
Th )11,00(, 2 ll= Th )00,11(1 ll=into
(Ding et al, KDD’01)Spectral clustering inherently consistent!
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
8
Perturbation Analysis
1C2C
3C
Assume data has 3 dense clusters sparsely connected.
=
33
22
11
WW
WW
3231
2321
1312
WWWWWW
zzWDDzW λ== −− )(ˆ 2/12/1DqWq λ= zDq 2/1−=
Off-diagonal blocks are between-cluster connections, assumed small and are treated as a perturbation
(Ding et al, KDD’01)
Page 5
5
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
9
Perturbation Analysis
=
3231
2321
1312)1(
WWWWWW
W
=
33
22
11)0(
WW
WW
=
W
WW W
ˆ
ˆˆ
)0(33
)0(22
)0(11
)0( ˆ
−
−
−=
WWWW
WWWW
WWWWW
ˆˆˆˆ
ˆˆˆˆ
ˆˆˆˆˆ
33)0(
333231
2322)0(
2221
131211)0(
11)1(
DWDW qqpqpppq2/12/1ˆ )0( −−=
2/12/1 )()(ˆ321321
−− ++++= qqqpqppppq DDDWDDDW
0th order:
1st order:
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
10
K-means clustering
• Developed in 1960’s (Lloyd, MacQueen, etc)• Computationally Efficient (order-mN)• Widely used in practice
– Benchmark to evaluate other algorithms
∑∑∈=
−=kCi
ki
K
kK cxJ 2
1
||||min
),,,( 21 nxxxX l=Given n points in m-dim:
K-means
Page 6
6
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
11
K-means Clustering in Spectral Embedded Space
Simplex spectral embedding theorem provides theoretical basis for K-means clustering in the embedded eigenspace – Cluster centroids are well separated (corners of the
simplex)– K-means clustering is invariant under (i) coordinate
rotation x → Tx, and (ii) shift x → x + a– Thus orthogonal transform T in simplex embedding un-
necessary• Many variants of K-means (Ng et al, Bach &
Jordan, Zha et al, Shi & Xu, etc)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
12
We have provedSpectral embedding + K-means clusteringis the appropriate method
We now show :K-means itself is solved by PCA
Page 7
7
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
13
Equivalence of K-means Clustering and Principal Component Analysis
• Cluster indicators specify the solution of K-means clustering
• Principal components are eigenvectors of the Gram (Kernel) matrix = data projections in the principal directions of the covariance matrix
• Optimal solution of K-means clustering: continuous solution of the discrete cluster indicators of K-means are given by Principal components
(Zha et al, NIPS’01; Ding & He, 2003)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
14
Principal Component Analysis (PCA)
• Widely used in large number of different fields– Best low-rank approximation (SVD Theorem, Eckart-
Young, 1930) : Noise reduction– Unsupervised dimension reduction– Many generalizations
• Conventional perspective is inadequate to explain the effectiveness of PCA
• New results: Principal components are cluster indicators for well-motivated clustering objective
Page 8
8
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
15
Principal Component Analysis
),,,( 21 nxxxX l=n points in m-dim:
TXXS =Covariance
Gram (Kernel) matrix
Principal directions: ku
XX T
Principal components: kvkkk
T uuXX λ=
kkkT vXvX λ=
Singular Value Decomposition: ∑=
=m
k
Tkkk vuX
1
λ
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
16
2-way K -means Clustering
∈−∈+
=221
112
if/ if/
)(CinnnCinnn
iqCluster membership indicator:
−−= 2
2
2221
11
21
2121 ),(),(),(2n
CCdn
CCdnnCCd
nnnJD,2
DK JxnJ −⟩⟨=
DK JJ maxmin ⇒
Define distance matrix: 2||),( jiijij xxddD −==
XqXqqDqDqqJ TTTTD 2~ =−=−=
is the centered distance matrixD~
Page 9
9
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
17
2-way K-means Clustering
1)(,0)( 2 == ∑∑ ii iqiqCluster indicator satisfy:
Theorem: The (continuous) optimal solution of qis given by the principal component v1 .
0)(|,0)(| 1211 ≥=<= iviCiviCClusters C1, C2 are determined by:
Once C1, C2 are computed, iterate K-mean to convergence
Relax the restriction q (i) take discrete values. Let it take continuous values in [-1,1]. Solution for q is the eigenvector of the Gram matrix.
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
18
Multi-way K-means Clustering
Unsigned Cluster membership indicators h1, …, hK:
),,(
1000
0100
0011
321 hhh=
C1 C2 C3
Page 10
10
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
19
Multi-way K-means Clustering
For ,2≥K ∑ ∑ ∑=
∈−=
i
K
kCji j
Ti
kiK
kxx
nxJ
1,
2 1
kT
n
k nhk
/)00,11,00( =
(Unsigned) Cluster membership indicators h1, …, hK:
)(Tr2k
TTk
iiK XHXHxJ −=∑
∑ ∑=
−=i
K
kk
TTkiK XhXhxJ
1
2
),,( 1 KhhH =Let
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
20
Multi-way K-means Clustering
THQ kk=
Redundancy in h1, …, hK: TK
kkk ehn )111(
1
2/1m==∑
=
Regularized Relaxation of K-means Clustering
Transform to signed indicator vectors q1 - qk via the k x k orthogonal matrix T:
Thhqq kk ),,(),...,( 11 =
Require 1st column of T =2/12/12/1
1 /),,( nnn Tk
Thus const/ 2/11 == neq
Page 11
11
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
21
)()( 11 −−−= kTT
kT
K YQYQTrYYTrJ
(Regularized relaxation)
Theorem: The optimal solutions of q2… qk are
given by the principal components v2… vk. JK is
bounded below by total variance minus sum of K eigenvalues of covariance:
21
1
2 min ynJynK
kKk <<−∑
−
=
λ
Regularized Relaxation of K-means Clustering
),...,( 21 kk qqQ =−
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
22
Scaled PCAsimilarity matrix S=(sij) (generated from XXT)
Nonlinear re-scaling:
DqqDDzzDDSDSk
Tkkk
k
Tkkk
=== ∑∑ λλ 21
21
21
21
~
2/1.. )/(~ ,~
21
21
jiijij ssssSDDS == −−
qk = D-1/2 zk is the scaled principal component
Apply SVD on ⇒S~
),,(diag 1 nddD m= .ii sd =
1..,/,1 02/1
00 === qsdzλDqqDsddS
k
Tkkk
T ../ 1
∑=
=−⇒ λ
Subtract trivial component
(Ding, et al, 2002)
Page 12
12
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
23
Scaled principal components have optimality properties:
Ordering– Adjacent objects along the order are similar– Far-away objects along the order are dissimilar– Optimal solution for the permutation index are given by
scaled PCA.
Clustering– Maximize within-cluster similarity– Minimize between-cluster similarity– Optimal solution for cluster membership indicators given
by scaled PCA.
Optimality Properties of Scaled PCA
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
24
Difficulty of K-way clustering• 2-way clustering uses a single eigenvector• K-way clustering uses several eigenvectors• How to recover 0-1 cluster indicators H?
),,(
),...,(
1
1
k
k
hhH
qqQ
l=
=
:indicatorsentries negative and positive both has
:rseigenvecto
HTQ =Avoid computing the transformation T:
• Do K-means, which is invariant under T• Compute connectivity network QQT, which cancels T
Page 13
13
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
25
Connectivity Network
=otherwise0
cluster same tobelong if1
ji,Cij
DqqDCK
k
Tkkk ∑
=
≅1
λSPCA provides
Green’s function : ∑= −
=≈K
k
Tk
kk qqGC
2 11λ
Projection matrix: ∑=
≡≈K
k
Tkk qqPC
1 (Ding et al, 2002)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
26
Connectivity network
• Similar to Hopfield network • Mathematical basis: projection matrix• Show self-aggregation clearly• Drawback: how to recover clusters
– Apply K-means directly on C– Use linearized assignment with cluster crossing
and spectral ordering (ICML’04)
Page 14
14
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
27
Connectivity network: Example 1
268.0,300.0 22 == λλBetween-cluster connections suppressed
Within-cluster connections enhanced
Sim
ilarit
y m
atrix
WCo
nnec
tivity
m
atrix
Effects of self-aggregation
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
28
Connectivity of Internet NewsgroupsNG2: comp.graphicsNG9: rec.motorcyclesNG10: rec.sport.baseballNG15:sci.spaceNG18:talk.politics.mideast
100 articles from each group. 1000 wordsTf.idf weight. Cosine similarity
Spectral Clustering 89%Direct K-means 66%
cosine similarity Connectivity matrix
Page 15
15
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
29
Spectral embedding is not topology preserving
700 3-D data points form 2 interlock rings
In eigenspace, they shrink and separate
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
30
Correspondence Analysis (CA)
• Mainly used in graphical display of data• Popular in France (Benzécri, 1969)• Long history
– Simultaneous row and column regression (Hirschfeld, 1935)
– Reciprocal averaging (Richardson & Kuder, 1933; Horst, 1935; Fisher, 1940; Hill, 1974)
– Canonical correlations, dual scaling, etc.• Formulation is a bit complicated (“convoluted”
Jolliffe, 2002, p.342)• “A neglected method”, (Hill, 1974)
Page 16
16
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
31
Scaled PCA on a Contingency Table⇒ Correspondence Analysis
Nonlinear re-scaling: 2/1.. )(~ ,~ /2
121
jiijijcr ppppPDDP == −−
are the scaled row and column principal component (standard coordinates in CA)
Apply SVD on P~
ck
Tkkkr
T DgfDprcP ..1
/ ∑=
=− λ
Subtract trivial component
Tnppr ),,( ..1 l=
Tnppc ),,( .1. l=
kckkrk vDguDf 21
21
, −− ==
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
32
Information Retrieval
Bell Lab tech memos5 comp-sci and 4 applied-math memo titles:C1: Human machine interface for lab ABC computer applicationsC2: A survey of user opinion of computer system response timeC3: The EPS user interface management systemC4: System and human system engineering testing of EPSC5: Relation of user-perceived response time to error managementM1: The generation of random, binary, unordered treesM2: The intersection graph of paths in treesM3: Graph minors IV: widths of trees and well-quasi-orderingM4: Graph minors: A survey
Page 17
17
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
33
words docs
Word-document matrix: row/col clustering
m1m2m3m4c2c5c3c1c4
111tree111graph
11minors11survey
11time11response11user1computer112system
11interface11EPS
11human
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
34
=
22
11
2002
22
11
ss
GF Tcr
TcrT
KK eeee
=
22
11
2002
22
11
ss
GG Tcc
TccT
KK eeee
=
22
11
2002
22
11
ss
FF Trr
TrrT
KK eeee
= T
KKT
KK
TKK
TKKT
KK GGFGGFFF
QQ
==
K
KKK G
FQ ),,( 1 qq m
Row-column association
Column-column clustering
row-row clustering
Bipartite Graph: 3 types of Connectivity networks
Page 18
18
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
35
Example
row-row: FFT
column-column: GGT
row-column: FGT
Original data matrix477.0,456.0 22 == λλ
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
36
Internet Newsgroups
Simultaneous clustering of documents and words
Page 19
19
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
37
Random Walks and Normalized Cut
Similarity matrix W,
(Meila & Shi, 2001)
WDP 1−=
TT P ππ =
)()(
)()(
BABP
ABAPJ NormCut ππ
→+→=
Random walks between A,B:
Stochastic matrix
⇒ equilibrium distribution: d=π
DxxWDDxWxxPx )1()( λλλ −=−⇒=⇒=
PageRank: Tout eeLDP )1(1 αα −+= −
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
38
Semi-definite Programming for Normalized Cut
Normalized Cut :
(Xing & Jordan, 2003)
||||/)00,11,00( 2/12/1k
Tn
k hDDyk
ooo=
IYYYWIY TTY
=− tosubject Tr:Optimize ),)~((min
2/12/1~ −−= WDDW
])~Tr[( TYYWI −⇒ ])~Tr[( ZWIZ
−⇒ min
TZZKZdZd,ZZts ===≥ ,Tr,0,0..
Compute Z via SDP. Z=Y’Y’T. Y’’=D-1/2Y’. K-means on Y’’.
TYYZ =
Z = connectivity network
Page 20
20
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
39
Spectral OrderingHill, 1970, spectral embedding
Solution are eigenvectors of Laplacian
xWDxwxxJ T
ijijji )()( 2 −=−=∑
Barnard, Pothen, Simon, 1993, envelop reduction of sparse matrix: find ordering such that the envelop is minimized
∑∑ −⇒−ij
ijjiij
ij wwji 22 )(min)(min ππ
Find coordinate x to minimize
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
40
Distance-sensitive ordering
0102103210
4-variable. For a given ordering, there are 3 distance=1 pairs, two d=2 pairs, one d=3 pair.
∑−
= += dn
id diisJ
1 ,)( πππ
)()(,min 11
2 πππ
∑−== n
d dJdJJ
),,(),,1( 1 nn πππ ll =
Ordering is determined by permutation indexes
The larger distance, the larger weights. Large distance similarities reduced more than small distance similarities
(Ding & He, ICML’04)
Page 21
21
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
41
Distance-sensitive orderingTheorem. The continuous optimal solution for the discrete inverse permutation indexes are given by the scaled principal component q1.
The shifted and scaled inverse permutation indexes
1,,3,12/
2/)1(1
nn
nn
nn
nnq i
i−−−=+−=
−
π
Relax the restriction on q. Allow it be continuous.Solution for q becomes the eigenvector of
DqqSD λ=− )(
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
42
Re-ordering of Genes and Tissues
C. Ding, RECOMB 2002C. Ding, RECOMB 2002
)()(
randomJJr π=
)random(
)(
1
11
=
== =
d
dd J
Jr π
18.0=r
39.31 ==dr
Page 22
22
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
43
Webpage Spectral RankingRank webpages from the hyperlink topology.
L : adjacency matrix of the web subgraph
Tout eeLDTT 2.08.0 1, +== − ππ
HITS (Kleinberg): rank according to principal eigenvector of authority matrix
qqLLT λ=)(
PageRank (Page & Brin): rank according to principal eigenvector π(equilibrium distribution)
Eigenvectors can be obtained in closed-form
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
44
Webpage Spectral RankingHITS (Kleinberg) ranking algorithm
Theorem. Eigenvalues of LTL
Assume web graph is fixed degree sequence random graph (Aiello, Chung, Lu, 2000)
⇒ HITS ranking is identical to indegree ranking
1,
2
2211 −−=>>>>
nddhhh i
ii λλ
T
nk
n
kkk h
dh
dh
du ),,,(2
2
1
1
−−−=
λλλ
Eigenvectors:
Principal eigenvector u1 is monotonic decreasing if hh>>> 321 ddd
(Ding, et al, SIAM Review ’04)
Page 23
23
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
45
Webpage Spectral RankingPageRank: weight normalizationHITS : mutual reinforcement
LDLS outT 1−=
Random walks on this similarity graph has the equilibrium distribution: Eddd T
n 2/),,,( 21
Combine PageRank and HITS. Generalize. ⇒
Ranking based on a similarity graph
PageRank ranking is identical to indegree ranking
(1st order approximation, due to combination of PageRank & HITS)
(Ding, et al, SIGIR’02)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
46
PCA: a Unified Framework for clustering and ordering
• PCA is equivalent to K-means Clustering• Scaled PCA has two optimality properties
– Distance sensitive ordering– Min-max principle Clustering
• SPCA on contingency table ⇒ Correspondence Analysis– Simultaneous ordering of rows and columns– Simultaneous clustering of rows and columns
• Resolve open problems – Relationship between Correspondence Analysis and PCA (open
problem since 1940s)– Relationship between PCA and K-means clustering (open
problem since 1960s)
Page 24
24
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
47
Spectral Clustering:a rich spectrum of topicsa comprehensive framework for learning
A tutorial & review of spectral clustering
Tutorial website will post all related papers (send your papers)
Tutorial on Spectral Clustering, ICML 2004, Chris Ding © University of California
48
Acknowledgment
Hongyuan Zha, Penn StateHorst Simon, Lawrence Berkeley LabMing Gu, UC BerkeleyXiaofeng He, Lawrence Berkeley LabMichael Jordan, UC BerkeleyMichael Berry, U. Tennessee, KnoxvilleInderjit Dhillon, UT AustinGeorge Karypis, U. MinnesotaHaesen Park, U. Minnesota
Work supported by Office of Science, Dept. of Energy