Graph Mining and Graph Kernels Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 1 Data Mining in Bioinformatics Day 3: Graph Mining August 24, 2008 | ACM SIG KDD, Las Vegas Karsten Borgwardt & Chloé-Agathe Azencott February 6 to February 17, 2012 Machine Learning and Computational Biology Research Group MPIs Tübingen From Borgwardt & Yan, Graph Mining & Graph Kernels, KDD tutorial, 2008 – with permission from Xifeng Yan.
32
Embed
Data Mining in Bioinformatics Day 3: Graph Mining · 2014. 10. 29. · M. Wörlein, T. Meinl, I. Fischer, and M. Philippsen, A quantitative comparison of the subgraph miners MoFa,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 1
Data Mining in Bioinformatics Day 3: Graph Mining
August 24, 2008 | ACM SIG KDD, Las Vegas
Karsten Borgwardt & Chloé-Agathe Azencott
February 6 to February 17, 2012
Machine Learning and Computational Biology Research Group MPIs Tübingen
From Borgwardt & Yan, Graph Mining & Graph Kernels, KDD tutorial, 2008 – with permission from Xifeng Yan.
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 2
Graphs Are Everywhere
Co-expression Network
Mag
wen
e et
al.
Gen
ome
Bio
logy
200
4 5:
R10
0
Program Flow
Social Network
Protein Structure Chemical Compound
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 3
Mining Graph Patterns
Graph Pattern Mining – Frequent graph patterns
– Pattern summarization
– Optimal graph patterns
– Graph patterns with constraints
– Approximate graph patterns
Graph Classification – Pattern-based approach
– Decision tree
– Decision stumps
Graph Compression Other important topics (graph model, laws, graph dynamics,
social network analysis, visualization, summarization, graph clustering, link analysis, …)
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 4
Applications of Graph Pattern Mining
Mining biochemical structures
Finding biological conserved subnetworks
Finding functional modules
Program control flow analysis
Intrusion network analysis
Mining communication networks
Anomaly detection
Mining XML structures
Building blocks for graph classification, clustering, compression, comparison, correlation analysis, and indexing
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 5
Graph Pattern Mining
Graph pattern mining (single graph setting)
Graph classification (multiple graphs setting)
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 6
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 26
Closed and Maximal Patterns
Closed Frequent Graph A frequent graph G is closed if there exists no supergraph of G that carries the
same support as G
If some of G’s subgraphs have the same support, it is unnecessary to output
these subgraphs (nonclosed graphs)
Lossless compression: still ensures that the mining result is complete
Maximal Frequent Graph A frequent graph G is maximal if there exists no supergraph of G that is frequent
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 27
Closed and Maximal Patterns – Examples
Data:
is a subgraph of A, B, C but so is which has the same support (3)
No supergraph of E is also a subgraph of all 3 graphs and therefore E is closed.
is a subgraph of A, B and is also closed: none of its supergraphs has support 2
If θ = 70%, E is maximal:
E is frequent None of its supergraphs is frequent
Therefeore D is not closed
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 28
Closed and Maximal Patterns – Sizes
Minimum support
Num
ber o
f pat
tern
s
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 29
CloseGraph (Yan and Han, KDD’03)
…
Pattern-Growth Approach
G
G1
G2
Gn
k-edge
(k+1)-edge
Under which condition can we stop searching supergraphs?
(early termination)
If: G and H are frequent G is a subgraph of H in any part of graphs in the dataset where G occurs, H also occurs, then we need not grow G, since none of G’s supergraphs will be closed except those of H.
[Yan and Han KDD’03]
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 30
References & Further Reading
B. McKay. Practical graph isomorphism. Congressus Numerantium, 30:45–87, 1981.
M. Wörlein, T. Meinl, I. Fischer, and M. Philippsen, A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston, PKDD 2005
X. Yan and J. Han, gSpan: graph-based substructure pattern mining, ICDM 2002
X. Yan and J. Han, CloseGraph: mining closed frequent graph patterns, KDD 2003
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 31
More References
C. Borgelt and M. R. Berthold, Mining molecular fragments: finding relevant substructures of molecules, ICDM 2002
C. Chen, C. X. Lin, X. Yan, and J. Han. On effective presentation of graph patterns: a structural representative approach, CIKM 2008
Y. Chi, Y. Xia, Y. Yang, and R. Muntz, Mining closed and maximal frequent subtrees from databases of labeled rooted trees, TKDE 2005
T. Horváth, J. Ramon, and S. Wrobel, Frequent subgraph mining in outerplanar graphs, KDD 2006
J. Huan, W. Wang, and J. Prins, Efficient mining of frequent subgraph in the presence of isomorphism, ICDM 2003
J. Huan, W. Wang, and J. Prins, and J. Yang, SPIN: Mining maximal frequent subgraphs from graph databases, KDD 2004
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data, PKDD 2000
R. King, A Srinivasan, and L Dehaspe, WARMR: a data mining tool for chemical data, J. Comput. Aided Mol. Des. 2001
M. Kuramochi and G. Karypis. Frequent subgraph discovery, ICDM 2001
S. Nijssen and J. Kok, A quickstart in frequent structure mining can make a difference, KDD 2004
N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data, ICDM 2002
D. Xin, H. Cheng, X. Yan, and J. Han, Extracting redundancy-aware top-k patterns, KDD 2006
X. Yan, H. Cheng, J. Han, and P. S. Yu, Mining significant graph patterns by leap search, SIGMOD 2008
Graph Mining and Graph Kernels
Karsten Borgwardt & Chloé-Agathe Azencott | Data mining in Bioinformatics | 32