Top Banner
SPIN: Mining Maximal SPIN: Mining Maximal Frequent Subgraphs Frequent Subgraphs from Graph Databases from Graph Databases Jun Huan, Wei Wang, Jan P Jun Huan, Wei Wang, Jan P rins, Jiong Yang rins, Jiong Yang KDD 2004 KDD 2004
21

SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Jan 03, 2016

Download

Documents

Branden Charles
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

SPIN: Mining Maximal SPIN: Mining Maximal Frequent Subgraphs Frequent Subgraphs

from Graph Databasesfrom Graph Databases

Jun Huan, Wei Wang, Jan Prins, Jun Huan, Wei Wang, Jan Prins, Jiong YangJiong YangKDD 2004KDD 2004

Page 2: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

IntroductionIntroduction

►Graphs model a relations among dataGraphs model a relations among data Inter-disciplinary researchInter-disciplinary research

►Huge number of recurring patternsHuge number of recurring patterns►To mining only maximal frequent To mining only maximal frequent

subgraphs.subgraphs. None of its super graphs are frequentNone of its super graphs are frequent

Page 3: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

AdvantagesAdvantages

►Reducing the total number of mined subgReducing the total number of mined subgraphsraphs Saving space and analysis effortSaving space and analysis effort

►Reducing mining timeReducing mining time►Non-maximal frequent subgraph can be rNon-maximal frequent subgraph can be r

econstructed.econstructed.►Maximal frequent subgraphs are of most iMaximal frequent subgraphs are of most i

nterest in some appliations.nterest in some appliations.

Page 4: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

AlgorithmAlgorithm

►Mining all frequent trees from a general grMining all frequent trees from a general graph database.aph database. Tree normalization is simpler than graph.Tree normalization is simpler than graph. In certain applications, most of the frequent sIn certain applications, most of the frequent s

ubgraphs are really trees.ubgraphs are really trees. Use current subgraph mining algorithmUse current subgraph mining algorithm Mining subtrees from a forestMining subtrees from a forest

Page 5: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

AlgorithmAlgorithm►Reconstruct all maximal subgraphs from tReconstruct all maximal subgraphs from t

he mined trees.he mined trees. For each frequent tree T, find all frequent subFor each frequent tree T, find all frequent sub

graphs whose canonical spanning tree are isographs whose canonical spanning tree are isomorphic to Tmorphic to T

Enumerate the equvalence class of a tree TEnumerate the equvalence class of a tree T Maximal subgraph miningMaximal subgraph mining

Page 6: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Tree-based Equivalence Tree-based Equivalence ClassesClasses

►A subtree T is a A subtree T is a spanning treespanning tree of G if T cont of G if T contains ains all nodesall nodes in G. in G. MaximalMaximal one: one: canonicalcanonical spanning tree spanning tree

►Group all frequent subgraphs in to equivalGroup all frequent subgraphs in to equivalence classes based on spanning trees.ence classes based on spanning trees.

Page 7: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Spanning treeSpanning tree

Page 8: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Tree-based Equivalence Tree-based Equivalence ClassesClasses

back

Page 9: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

12 singletons group12 singletons group

b

a

y

b

a

x

a

a

y

a

a

x

a

y

b

a

y

a

x

b

a

y

a

x

b

a

x

a

x

a

a

y

a

x

b

a

x

a

y

a

y

b

a

y

a

x

a

y

b

a

y

a

x

a

y

a

b

x

a

x

Page 10: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Enumerating Graphs from Enumerating Graphs from TreesTrees

► G C :{eG C :{e11,e,e22,…,e,…,enn}} If frequent -> edge C (candidate set)If frequent -> edge C (candidate set)

► Search space of GSearch space of G : : G:C ={G+y|y 2G:C ={G+y|y 2CC}}

GO

Page 11: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

OptimizationsOptimizations

►Removing a set of frequent subgraphs thaRemoving a set of frequent subgraphs that can not be maximal from a search spacet can not be maximal from a search space

►Locally Locally maximalmaximal :: frequent subgraph G is frequent subgraph G is maximal in its maximal in its equivalence classequivalence class

►GloballyGlobally maximal maximal :: maximal frequent in maximal frequent in a graph databasea graph database

►Avoid enumerating subgraphs which are Avoid enumerating subgraphs which are nnotot locally maximallocally maximal..

Page 12: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Bottom-up PruningBottom-up Pruning

►G’ = G C G’ = G C G’ is G’ is frequentfrequent : each graph in search space is a : each graph in search space is a

subgraph of G’ and not maximalsubgraph of G’ and not maximal

Page 13: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Tail ShrinkTail Shrink► EmbeddingEmbedding of G in G’ is a subgraph of G in G’ is a subgraph

isomorphism f from G to G’isomorphism f from G to G’ Two embeddings of L in PTwo embeddings of L in P

l1->P1, l2->P2, l3->P3, l4->P4

l1->P1, l2->P3 ,l3->P2 ,l4->P4

go

Page 14: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Tail ShrinkTail Shrink

►candidate edge (i, j, ecandidate edge (i, j, ell) is) is associative associative to a graph Gto a graph G It appears in It appears in every embeddingevery embedding of G in a of G in a

graph databasesgraph databases

► If a tree T contains a set of associative If a tree T contains a set of associative edges, any edges, any maximal frequent graphmaximal frequent graph G, G, a superset of T, must contains a superset of T, must contains allall associative edges.associative edges.

Page 15: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Tail ShrinkTail Shrink► Remove associative edges from candidate Remove associative edges from candidate

sets and augment them to T without sets and augment them to T without missing any maximal onesmissing any maximal ones Reducing the search spaceReducing the search space Prune the entire equivalences class in certain Prune the entire equivalences class in certain

casescases

► A set of associative edges C of a tree T isA set of associative edges C of a tree T is lethallethal G’ = T C has a G’ = T C has a canonical spanning treecanonical spanning tree

differentdifferent from that of T from that of T

go

Page 16: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

External-Edge PruningExternal-Edge Pruning► Remove one equivalence class without any knoRemove one equivalence class without any kno

wledge about its candidate edgeswledge about its candidate edges► External-edgeExternal-edge for a graph G: it connects a node i for a graph G: it connects a node i

n G and a node n G and a node notnot in G in G► (i, e(i, ell, v, vll) is associative to a graph G) is associative to a graph G

Every embedding f of G in a graph G’, G’ has a nodEvery embedding f of G in a graph G’, G’ has a node v with the label ve v with the label vll

v connects to the node f(i) with an edge label ev connects to the node f(i) with an edge label ell in G in G’’

Not exist node j V[G] such that v = f(j)Not exist node j V[G] such that v = f(j)

Page 17: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Associative external edgesAssociative external edges

Page 18: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

ExperimentsExperiments►2.8GHz Pentium Xeon, 2.8GHz Pentium Xeon, ►512KB L2 cache,2GB main memory512KB L2 cache,2GB main memory►Red Hat Linux 7.3Red Hat Linux 7.3►C++ Programming languageC++ Programming language

Page 19: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

Synthetic DatasetSynthetic DatasetD10KT30L200I11V4E4

Page 20: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

DTP CA data setDTP CA data set

Page 21: SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.

DTP CM data setDTP CM data set