Top Banner
Chameleon: A hierarchical Clustering Algorithm Using Dynamic Modeling By George Karypis, Eui-Hong Han,Vipin Kumar and not by Prashant Thiruvengadachari
18

Chameleon Clustering

Apr 03, 2018

Download

Documents

Tanooj Parekh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 1/18

Chameleon: A hierarchical

Clustering Algorithm UsingDynamic Modeling

By George Karypis, Eui-Hong Han,Vipin Kumar 

and not by Prashant Thiruvengadachari

Page 2: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 2/18

Existing Algorithms

K-means and PAM

 Algorithm assigns K-representational points tothe clusters and tries to form clusters based onthe distance measure.

Page 3: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 3/18

More algorithms

Other algorithm include CURE, ROCK,CLARANS, etc.

CURE takes into account distance betweenrepresentatives

ROCK takes into account inter-cluster aggregate connectivity.

Page 4: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 4/18

Chameleon

Two-phase approach

Phase -I Uses a graph partitioning algorithm to divide the

data set into a set of individual clusters.

Phase -II uses an agglomerative hierarchical mining

algorithm to merge the clusters.

Page 5: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 5/18

So, basically..

Page 6: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 6/18

Why not stop with Phase-I? We've got theclusters, haven't we ?

• Chameleon(Phase-II) takes into account

• Inter Connetivity• Relative closeness

Hence, chameleon takes into account features

intrinsic to a cluster.

Page 7: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 7/18

Constructing a sparse graph

Using KNN

Data points that are far away are completely

avoided by the algorithm (reducing the noise inthe dataset)

  captures the concept of neighbourhood

dynamically by taking into account the density of the region.

Page 8: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 8/18

What do you do with the graph

? Partition the KNN graph such that the edge

cut is minimized.

Reason: Since edge cut represents similaritybetween the points, less edge cut => lesssimilarity.

Multi-level graph partitioning algorithms topartition the graph  – hMeTiS library.

Page 9: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 9/18

Example:

Page 10: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 10/18

Cluster Similarity

Models cluster similarity based on therelative inter-connectivity and relativecloseness of the clusters.

Page 11: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 11/18

Relative Inter-Connectivity

Ci and Cj

RIC= AbsoluteIC(Ci,Cj)

internal IC(Ci)+internal IC(Cj) / 2

where AbsoluteIC(Ci,Cj)= sum of weights of edges that connect Ci with Cj.

internalIC(Ci) = weighted sum of edges thatpartition the cluster into roughly equal parts.

Page 12: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 12/18

Relative Closeness

 Absolute closeness normalized with respectto the internal closeness of the two clusters.

 Absolute closeness got by average similaritybetween the points in Ci that are connectedto the points in Cj.

average weight of the edges from C(i)->C(j).

Page 13: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 13/18

Internal Closeness…. 

Internal closeness of the cluster got byaverage of the weights of the edges in the

cluster.

Page 14: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 14/18

So, which clusters do wemerge?

So far, we have got

Relative Inter-Connectivity measure.

Relative Closeness measure.

Using them,

Page 15: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 15/18

If the relative inter-connectivity measurerelative closeness measure are same,choose inter-connectivity.

You can also use, RI (Ci , C j )≥T(RI) and RC(C i,C j ≥ ( T(RC)  

 Allows multiple clusters to merge at eachlevel.

Merging the clusters..

Page 16: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 16/18

Good points about the paper :

Nice description of the working of the system.

Gives a note of existing algorithms and as to

why chameleon is better.

Not specific to a particular domain.

Page 17: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 17/18

yucky and reasonably yuckyparts..

Not much information given about the Phase-I part of the paper  – graph properties ?

Finding the complexity of the algorithmO(nm + n log n + m^2log m)

  Different domains require different measures

for connectivity and closeness, ...................

Page 18: Chameleon Clustering

7/28/2019 Chameleon Clustering

http://slidepdf.com/reader/full/chameleon-clustering 18/18

Questions ?