Top Banner
TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin t. for Inform. Transm. Problems, Russian Acad. of S Email: [email protected] Http://www.iitp.ru/mslevin/ CSR’2007, Ural State University, Ekaterinburg, Russia, Sept. 4, 2007 PLAN: sic agglomerative algorithm for hierarchical cluste icriteria decision making (DM) approach to proximity of objects ation of objects into several groups/clusters clustering with intersection): algorithms & applic rds resultant performance (quality of results)
36

TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: [email protected] Http:

Jan 02, 2016

Download

Documents

Marion Fox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

TOWARDS HIERARCHICAL CLUSTERING

Mark Sh. LevinInst. for Inform. Transm. Problems, Russian Acad. of Sci.

Email: [email protected] Http://www.iitp.ru/mslevin/

CSR’2007, Ural State University, Ekaterinburg, Russia, Sept. 4, 2007

PLAN: 1.Basic agglomerative algorithm for hierarchical clustering2.Multicriteria decision making (DM) approach

to proximity of objects3.Integration of objects into several groups/clusters

(i.e., clustering with intersection): algorithms & applications4.Towards resultant performance (quality of results) 5.Conclusion

Page 2: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

A1 = ( z11 , … , z1j , … , z1m )… …Ai = ( zi1 , … , zij , … , zim )… …An = ( zn1 , … , znj , … , znm )

Objects (alternatives) Criteria (characteristics) C1 … Cj … Cm

A1 = ( d11 , … , d1i , … , d1n )… …Ai = ( di1 , … , dii , … , din )… …An = ( dn1 , … , dni , … , dnn )

“Distance” A1 … Ai … An

Matrix Z

Matrix D

Page 3: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

A1 = ( z11 , … , z1j , … , z1m )… …Ai = ( zi1 , … , zij , … , zim )… …An = ( zn1 , … , znj , … , znm )

Objects (alternatives) Criteria (characteristics) C1 … Cj … Cm

A1 = ( 0 , … , d1i , … , d1n )… …Ai = ( di1 , … , 0 , … , din )… …An = ( dn1 , … , dni , … , 0 )

“Distance” A1 … Ai … An

Matrix Z

Matrix D

Page 4: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

Matrix D:

dil = sqrt ( ∑j=1 m ( zij – zlj )2 )

Scale for D

0 maxmin

Page 5: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

Agglomerative algorithm:

Stage 1.Computing matrix D=| dil | (pair “distances”)

Stage 2.Revelation of the smallest pair “distance” (i.e., the minimal pair “distance”, the minimal element in matrix D)and integration of the corresponding elements (Ax, Ay)(objects) into a new joint (integrated) object A=Ax*Ay

Stage 3.Stopping the process or re-computing the matrix D and GOTO Stage 2.

Page 6: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

Ax = ( zx1 , … , zxj , … , zxm ) Ay = ( zy1 , … , zyj , … , zym )

Pair of objects Ax and Ay

j (j = 1,…,m) zi (A) = ( zxj + zyj ) / 2

Integrated object A = ( Ax * Ay )

Page 7: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

(1*2*3*4*5*6*7)

Hierarchical clustering: agglomerative algorithm

Stage 0:1 2 3 4 5 6 7

Stage 1:1 2 (3*4) 5 6 7

Stage 2:1 2 (3*4) 5 (6*7)

Stage 3:2 (1*3*4) 5 (6*7)

Stage 4:2 (1*3*4) (5*6*7)

Stage 6:

. . .

Page 8: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

Cluster F1

Cluster F2

Cluster F3

Cluster F4

Cluster F5

Cluster F6

ILLUSTRATIVE EXAMPLE

Page 9: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: agglomerative algorithm

First, Complexity of agglomerative algorithm:1.Number of stages (each stage – one integration):(n-1) stages2.Each stage:(a)computing “distances” (n2 * m operations) THUS:Operations: O(m n3 ) Memory: O(n(n+m))

Second, we have got the TREE-LIKE STRUCTURE

Page 10: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

Question 1: What we can do better in the algorithm?

Page 11: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

Question 1: What we can do better in the algorithm?

Question 2:

What is needed in practice (e.g., applications)? What we can do for applications?

Page 12: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

Question 1: What we can do better?

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

2.Complexity: decrease the number of stages:

Integration of several pair of objects at each stage

Page 13: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects at each stage

Usage of an ordinal scale:

0 max{dxy}

Page 14: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects at each stage

Usage of an ordinal scale:

0 max{dxy}

To divide the interval [0,max{dxy}] to get an ordinal scale

Page 15: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects at each stage

Usage of an ordinal scale:

0 max{dxy}

To divide the interval [0,max{dxy}] to get an ordinal scale

interval 0 interval 1 interval k

Page 16: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects at each stage

Usage of an ordinal scale:

0 max{dxy}

To divide the interval [0,max{dxy}] to get an ordinal scale

interval 0 interval 1 interval k

dab duv dpq dgh

Example: pairs of objects: (a,b), (u,v), (p,q), (g,h)

Page 17: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects at each stage

Usage of an ordinal scale:

0 max{dxy}

To divide the interval [0,max{dxy}] to get an ordinal scale

interval 0 interval 1 interval k

dab duv dpq dgh

RESULT:

dab = 0duv = 0dpq = 1dgh = 1

Page 18: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

Page 19: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)Ay -> (zy1, … , zyj , … , zym)

Page 20: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)Ay -> (zy1, … , zyj , … , zym)

Vector of “differences” for Ax , Ay:( (zx1 - zy1) , … , (zxj - zyj), … , (zxm - zym) )

Page 21: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)Ay -> (zy1, … , zyj , … , zym)

Vector of “differences” for Ax , Ay:( (zx1 - zy1) , … , (zxj - zyj), … , (zxm - zym) )

Space of the vectors

Page 22: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)Ay -> (zy1, … , zyj , … , zym)

Vector of “differences” for Ax , Ay:( (zx1 - zy1) , … , (zxj - zyj), … , (zxm - zym) )

Space of the vectors =>ordinal scale&ordinal proximity

Page 23: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches from multicriteria decision making, e.g., Revelation of Pareto-layers and usage of an ordinal scale for pair proximity

Space of the vectors =>ordinal scale&ordinal proximity

C1

C2Ideal point(equal objects)

Pareto-effectiveLayer (1)

Layer 2

Page 24: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

Question 2:What is needed in practice (e.g., applications)? What we can do for applications?

Page 25: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

Question 2:What is needed in practice (e.g., applications)? What we can do for applications?

Integration of objects into several groups (clusters) to obtain more rich resultant structure (tree => hierarchy, i.e., clusters with intersection)

Examples of applied domains:1.Engineering: structures of complex systems2.CS: structures of software/hardware3.Communication networks (topology)4.Biology 5.Others

Page 26: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

Question 2:What is needed in practice (e.g., applications)? What we can do for applications?

Cluster F1

Cluster F2

Cluster F3

Cluster F4

Cluster F5Cluster F6

Clustering withintersection

Page 27: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

(1*2*3*4*5*6*7)

Stage 0:1 2 3 4 5 6 7

Stage 1:1 (2*3) (3*4) (5*6) (6*7)

Stage 2:1 (2*3*4) (3*4*5*6) (6*7)

Stage 3:(1*2*3*4) (3*4*5*6*7)

2

Stage 4:

Page 28: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

(1*2*3*4*5*6*7)

Stage 0:1 2 3 4 5 6 7

Stage 1:

Stage 2:

Stage 3:

Resultantstructure

Page 29: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

Example frombiology(evolution)

Traditionalevolution process as tree

Page 30: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

Example frombiology(evolution)

Hierarchicalstructure

Page 31: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: IMPROVEMENTS (practice)

Algorithm 1. The number of inclusion for each object is not limited):(i)initial set of objects -> vertices(ii)”small” proximity -> edgesThus: a graphProblem: to reveal cliques in the graph (It is NP-hard problem)

Algorithm 2. The number of the inclusion is limited by t (e.g., t=2/3/4). Here complexity is polynomial.

Page 32: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: performance (i.e., quality)

Performance (i.e., quality) of clustering procedures:

1.Issues of complexity

2.Quality of results (??) Some traditional approaches:(a)computing a clustering quality InterCluster Distance / IntraCluster Distance(b)Coverage, Diversity

Our case: research procedure (for investigation and problem structuring)

Page 33: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: performance (i.e., quality)

Decision Making Paradigm (stages) by Herbert A. Simon1.Analysis of an applied problem (to understand

the problem: main contradictions, etc.)2.Structuring the problem:

2.1.Generation of alternatives 2.2.Design of criteria 2.3.Design of scales for assessment of

alternatives upon criteria 3.Evaluation of alternatives upon criteria 4.Selection of the best alternative (s) 5.Analysis of results

Basic DM problems: choice, ranking,

Page 34: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

Hierarchical clustering: performance (i.e., quality)

FOR CLUSTERING:1.Analysis of an applied problem (to understand

the problem: main contradictions, etc.) 2.Structuring the problem:

2.1.Generation of alternatives 2.2.Design of criteria 2.3.Design of scales for assessment of

alternatives upon criteria 3.Evaluation of alternatives upon criteria 4.Design of CLUSTERS and

STRUCTURE OF CLUSTERING PROCESS 5.Analysis of results

THUS: we have got some prospective RESEARCH RPOCEDURES

Page 35: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

CONCLUSION

1.Algorithms, procedures & their analysis

2.New approaches to performance/quality for research procedures

3.Various applied examples

4.Usage in education

Page 36: TOWARDS HIERARCHICAL CLUSTERING Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: mslevin@acm.org Http://

That’s All

Thanks!

http://www.iitp.ru/mslevin/

Mark Sh. Levin