This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Clustering – Part III: Spectral Clustering
COSC 526 Class 14
Arvind RamanathanComputational Science & Engineering DivisionOak Ridge National Laboratory, Oak RidgePh: 865-576-7266E-mail: [email protected]
Slides inspired by: Andrew Moore (CMU), Jure Leskovec (http://www.mmds.org)
eigenvaluer with eigenvectoan is : vvvW • smallest eigenvecs of D-A are largest eigenvecs of A• smallest eigenvecs of I-W are largest eigenvecs of WSuppose each y(i)=+1 or -1: • Then y is a cluster indicator that splits the nodes into two • what is yT(D-A)y ?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
18
jijiji
jijiji
jijij
jiiij
jijiji
jj
iij
ii
jij
jijiji
iii
jijiji
iii
TTT
yya
yyayaya
yyayaya
yyayd
yyaydADAD
,
2,
,,
,
2
,
2
,,
22
,,
2
,,
2
)(2
1
22
1
22
1
222
1
)( yyyyyy
= size of CUT(y)
)NCUT( of size)( yyy WIT
NCUT: roughly minimize ratio of transitions between classes vs transitions within classes
19
So far…
• How to define a “good” partition of a graph?– Minimize a given graph cut criterion
• How to efficiently identify such a partition?
– Approximate using information provided by the eigenvalues and eigenvectors of a graph
• Spectral Clustering
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
20
Spectral Clustering Algorithms
• Three basic stages:– 1) Pre-processing
• Construct a matrix representation of the graph
– 2) Decomposition• Compute eigenvalues and eigenvectors of the matrix
• Map each point to a lower-dimensional representation based on one or more eigenvectors
– 3) Grouping• Assign points to two or more clusters, based on the
new representation
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
21
Spectral Partitioning Algorithm
• 1) Pre-processing:– Build Laplacian
matrix L of the graph
• 2) Decomposition:– Find eigenvalues
and eigenvectors x of the matrix L
– Map vertices to corresponding components of 2
21
0.0-
0.4-
0.40.4
-0.6
0.4
0.50.4-
0.2-
0.5-
0.30.4
-0.5
0.40.60.1-
0.30.4
0.5-
0.40.60.10.30.4
0.00.4-
0.40.40.60.4
-0.5
-0.4
-0.2
-0.5
0.30.4
5.0
4.0
3.0
3.0
1.0
0.0
= X =
How do we now find the
clusters?-0.6
6
-0.3
5
-0.3
4
0.33
0.62
0.31
1 2 3 4 5 6
1 3 -1 -1 0 -1 0
2 -1 2 -1 0 0 0
3 -1 -1 3 -1 0 0
4 0 0 -1 3 -1 -1
5 -1 0 0 -1 3 -1
6 0 0 0 -1 -1 2
22
Spectral Partitioning
• 3) Grouping:– Sort components of reduced 1-dimensional vector
– Identify clusters by splitting the sorted vector in two
• How to choose a splitting point?– Naïve approaches:
• Split at 0 or median value
– More expensive approaches:• Attempt to minimize normalized cut in 1-dimension
(sweep over ordering of nodes induced by the eigenvector)
22-
0.66
-0.35
-0.34
0.33
0.62
0.31 Split at 0:
Cluster A: Positive points
Cluster B: Negative points
0.33
0.62
0.31
-0.66
-0.35
-0.34
A B
23
Example: Spectral Partitioning
Rank in x2
Val
ue o
f x 2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
24
Example: Spectral Partitioning
Rank in x2
Val
ue o
f x 2
Components of x2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
25
Example: Spectral partitioning
Components of x1
Components of x3J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
26
k-Way Spectral Clustering
• How do we partition a graph into k clusters?
• Two basic approaches:– Recursive bi-partitioning [Hagen et al., ’92]
• Recursively apply bi-partitioning algorithm in a hierarchical divisive manner
• Disadvantages: Inefficient, unstable
– Cluster multiple eigenvectors [Shi-Malik, ’00]• Build a reduced space from multiple eigenvectors
• Commonly used in recent papers
• A preferable approach…
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
27
Why use multiple eigenvectors?
• Approximates the optimal cut [Shi-Malik, ’00]– Can be used to approximate optimal k-way normalized cut
• Emphasizes cohesive clusters– Increases the unevenness in the distribution of the data
– Associations between similar points are amplified, associations between dissimilar points are attenuated
– The data begins to “approximate a clustering”
• Well-separated space– Transforms data to a new “embedded space”,
consisting of k orthogonal basis vectors
• Multiple eigenvectors prevent instability due to information lossJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
eigenvaluer with eigenvectoan is : vvvW • smallest eigenvecs of D-A are largest eigenvecs of A• smallest eigenvecs of I-W are largest eigenvecs of WSuppose each y(i)=+1 or -1: • Then y is a cluster indicator that cuts the nodes into two • what is yT(D-A)y ? The cost of the graph cut defined by y• what is yT(I-W)y ? Also a cost of a graph cut defined by y• How to minimize it?
• Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X• But: this will not be +1/-1, so it’s a “relaxed” solution
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
– which meant we could identify how good our training and testing errors were
– Metric: Accuracy, Precision, Recall
• For clustering: – How do we measure the “goodness” of the
resulting clusters?
41
Clustering random data (overfitting)
If you ask a clustering algorithm to find clusters, it will find some
42
Different aspects of validating clsuters• Determine the clustering tendency of a set of
data, i.e., whether non-random structure actually exists in the data (e.g., to avoid overfitting)
• External Validation: Compare the results of a cluster analysis to externally known class labels (ground truth).
• Internal Validation: Evaluating how well the results of a cluster analysis fit the data without reference to external information.
• Compare clusterings to determine which is better.
• Determining the ‘correct’ number of clusters.
43
Measures of cluster validity• External Index: Used to measure the
extent to which cluster labels match externally supplied class labels. – Entropy, Purity, Rand Index
• Internal Index: Used to measure the goodness of a clustering structure without respect to external information.– Sum of Squared Error (SSE), Silhouette
coefficient
• Relative Index: Used to compare two different clusterings or clusters. – Often an external or internal index is used for
this function, e.g., SSE or entropy
44
Measuring Cluster Validation with Correlation
• Proximity Matrix vs. Incidence matrix:
– A matrix Kij with 1 if the point belongs to the
same cluster; 0 otherwise
• Compute the correlation between the two matrices:– Only n(n-1)/2 values to be computed
– High values indicate similarity between points in the same cluster
• Not suited for density based clustering
45
Another approach: use similarity matrix for cluster validation
46
Internal Measures: SSE
• SSE is also a good measure to understand how good the clustering is– Lower SSE good clustering
• Can be used to estimate number of clusters
47
More on Clustering a little later…
• We will discuss other forms of clustering in the following classes
• Next class:– please bring your brief write up on the two
papers
– We will discuss frequent itemset mining and a few other aspects of clustering
– Move on to Dimensionality Reduction
48
Summary
• We saw spectral clustering techniques:– only a broad overview
– more details next class
• Speeding up Spectral clustering techniques can be challenging