Graph-based cluster labeling using Growing Hierarchal SOM Mahmoud Rafeek Alfarra College Of Science & Technology [email protected]The second International conference of Applied Science & natural Ayman Shehda Ghabayen College Of Science & Technology [email protected]Prepared by:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Graph-based cluster labeling using Growing Hierarchal SOM
Cluster labeling: process tries to select descriptive labels (Key words) for the clusters obtained through a clustering algorithm.
Labeling, What and why ?
Cluster labeling is an increasingly important task that:
1. The document collections grow larger.2. Help To: work with processing of news,
email threads, blogs, reviews, and search results
Labeling, What and why ?
Documents collection
DocumentLabeled Clusters
Preprocessing StepDIG Model
X B
S OLA
G
CD
Clustering Process
+Labeling
0G00G1
0GsSOM
1G01G1
1Gs
2G12G2
Hierarchal Growing SOM
2G12G2
1G01G1
2G12G2
Graph based Representation
010110
25
96
37
100000
A
B
X
D
NC
S
2,3
3,3
1,3
1,1
ph1
ph2
ph3
ph4
ph5
Graph based Representation Capture the silent features of the data. DIG Model: a directed graph.
A document is represented as a vector of sentences Phrase indexing information is stored in the graph nodes themselves in the form of document tables.
e1
e0
e2
rafting
adventures
river
Document Table e0 S1(1), S2(2), S3(1)
e0 S2(1)
e2 S1(2)
e1 S4(1)
fishing
DocTFET
1{0,0,3}
2{0,0,2}
3{0,0,1}
S1(2)
#SentencePosition
of term
Graph based RepresentationExample Document 1
River rafting
Mild river rafting
River rafting trips
Document 2Wild river adventures
River rafting vocation plan
fishing trips
fishing vocation plan
booking fishing trips
river fishing
mild
river
rafting
trips
mildriver
rafting
trips
wild
adventures vocation
plan
wild
plan
mild
river
rafting
trips
adventures
vocation
booking
fishing
+
Growing Hierarchal SOM
Growing Hierarchal SOM
Determining the winning node
…
v1
v2
v3
v5
v4
v7
e0 v6e0
e1 e5
e3
e2
e4
n-nodes in SOM (Gs)
v1
v2 v5
v7
e0 v6e0
e1 e5
e3
Input Document Graph (Gi)
Phrases Significance
Gi Gs
length
Gi
Growing Hierarchal SOM
Neuron updating in the graph domain
A
B D
Ce0 Xe0
e1 e5
e3
Y
B D
CEe4
e1 e5
e3
Ae2
e2
G1G2
We choose increasing the matching phrases to update graphs due to its affect is more stronger than increasing terms (nodes) also add matching phrases can consider it as add ordered pair of nodes
Over all Document clustering Process
Extracting labeling of clusters
To extract the Key word, we need to build a table for each cluster as the following: