Mining Frequent pattern in a set of graph using sub - graph Mining - gSpan with closed graph Ankita Sambhare ([email protected]) Advisor: Dr. Carlos Rivero Rochester Institute Of Technology Background Research CONCLUSIONS Example REFERENCES Approach gSpan includes mapping each graph to a DFS code, builds a lexicographic ordering on these codes, followed by the construction of a search tree based on the lexicographic order. The search tree is traversed on the basis of the number of edges in the graph. Figure2: A simple example of patterns mined from 2 graphs It is very clear from the results that gSpan works faster than the other branch and bound candidate graph generation algorithm due to DFS codes introduced. It is also clear that gspan mines more relevant subgraph patterns as compared to gaston as it allows performing closed mining. Graph Mining Domains: 1. Frequent subgraph mining 2. Approximate graph pattern mining 3. Graph pattern summarization 4. Graph classification 5. Graph clustering 6. Graph indexing 7. Graph searching 8. Correlated graph pattern mining 9. Optimal graph pattern mining 10. Graph kernels 11. Link mining 12. Web structure mining 13. Workflow mining 14. Biological network mining 1. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. UIUC-CS Tech. Report: R-2002-2296, (a 4-page short version in REPLACE THIS BOX WITH YOUR ORGANIZATION’S HIGH RESOLUTION LOGO Goal Extract all the frequently occurring patterns from a set of graphs to study most commonly occurring behaviorally significant patterns among the different graphs. The mined patterns can then be used for further analyzing the set of graphs on the basis of its similarities and identify its significance. RESULTS FUTURE WORK Building approximate graph mining on top of frequent subgraph mining to add approximation to the mined patterns which is required due to the noise and the diversity of the data. Handle complex data such as programs data where each node is a complex structure Steps: 1. DFS subscripting with rightmost extension 2. DFS codes Algorithm Algorithm (Contd.) 3. Lexicographical ordering of DFS codes 4. Minimum DFS Code 5. Perform dfs on DFS code tree 2770 10027 736 1363 401 706 0 2000 4000 6000 8000 10000 12000 gSpan gaston Output Fragments Algorithm with minimum frequency Gspan vs gaston on - 340 graphs (dense edges) 5% 10% 15% 0 2 4 6 8 10 12 14 16 5% 10% 15% RunTime Minimum Frequency Gspan vs gaston on - 340 graphs (dense edges) gspan gaston 1795 -1 460 460 225 225 126 126 -200 0 200 400 600 800 1000 1200 1400 1600 1800 2000 gSpan gaston Output Fragments Algorithm with minimum frequency Gspan vs gaston on - 10000 graphs (sparse edges) 5% 10% 15% 20% 0 5 10 15 20 25 30 35 5% 10% 15% 20% RunTime Minimum Frequency Gspan vs gaston on - 10000 graphs (sparse edges) gspan gaston