1 Seminar in Bioinformatics Seminar in Bioinformatics An efficient algorithm for detecting frequent subgraphs in biological networks Paper by: M. Koyuturk, A. Grama and W. Szpankowski Appeared in: Bioinformatics, Vol. 20, Sup. 1, 2004, pages i200-i207. Presented by: Royi Ronen
41
Embed
1 Seminar in Bioinformatics An efficient algorithm for detecting frequent subgraphs in biological networks Paper by: M. Koyuturk, A. Grama and W. Szpankowski.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Seminar in BioinformaticsSeminar in Bioinformatics
An efficient algorithm for detecting
frequent subgraphs in biological
networks
Paper by: M. Koyuturk, A. Grama and W. Szpankowski
Appeared in: Bioinformatics, Vol. 20, Sup. 1, 2004, pages i200-i207.
Presented by: Royi Ronen
2
AbstractAbstract
• Motivation– Network interaction data is abundant
– Analyzing this data is important
– Problems are close to the subgraph isomorphism problem – Hard!
• Results– An efficient algorithm for detecting frequently occurring patterns in
bio-network
– The algorithm simplifies the subgraph isomorphism problem to a different, tractable, problem with biological applications
– Mining the KEGG database yields positive empiric results
3
OutlineOutline
• Introduction
• Model
• Approach: Graph Mining– Related Work
– Formalism for metabolic pathways
– The Algorithm
• Discussion and Empiric Results
• Conclusion
• Future Work
4
IntroductionIntroduction
• Experimental data relating to biological sequences (that are highly available and accessible) play an important role in tasks such as discovering common sequences and motifs
• Biomolecular interaction data are abstracted as graphs– Example: A hypergraph can represent a metabolic
pathway where nodes represent compounds
– Can be reduced to a directed graph where nodes are enzymes and edges relate them
5
IntroductionIntroduction
• Key problems in this context:
– Aligning multiple graphs
– Finding frequently occurring sub-graphs in a collection of
graphs
• A solution can lead to the understanding of
– Motifs of cellular interactions
– Evolutionary relationships
– Differences between networks in different organisms
– Patterns of gene regulation
6
IntroductionIntroduction
• In the paper– Finding frequently occurring subgraphs in a collection of graphs, each representing a
metabolic pathway
– Close to the NP-Hard subgraph isomorphism problem
– End of story?
• No!– The problem can be simplified and made tractable and still capture the biological
information
– Nodes will be “uniquely labeled”, according to the represented enzyme
– Experimental results: discovering “interesting” patterns from KEGG takes seconds
7
OutlineOutline
• Introduction ☺
• Model
• Approach: Graph Mining– Related Work
– Formalism for metabolic pathways
– The Algorithm
• Discussion and Empiric Results
• Conclusion
• Future Work
8
Metabolic PathwaysMetabolic Pathways
• Oldest kind of biological network
• Group the reactions that belong to a process
• Publicly available (e.g., KEGG)
• Chemical compounds are linked to each other by a
product-substrate relationship
• In a hypergraph – Nodes are compounds
– A hyperedge is a reaction (or an enzyme)
– Hyperedge direction is important to distinguish between substrates
and products
a
b
c
9
Metabolic PathwaysMetabolic Pathways
• Simplification: – Regular graph, nodes represent enzymes, an edge connects enzyme a to enzyme b
iff a’s product is b’s substrate (more accurately, if such a relation exists)
– Edges may be labeled by the compound that relates a to b.
– A specific enzyme may appear more than once in the same pathway, but we consider merged nodes at the price of losing temporal information
• Various problems related to understanding the molecular interaction in the cell can be solved using graph related frameworks, mostly to provide a means to investigate units with well defined functionality
• Paper focus: Mining pathways for frequent connected subgraphs, which is important because functional modules are expected to repeat among several pathways or organisms (or both)
a bcom.
10
OutlineOutline
• Introduction ☺
• Model ☺
• Approach: Graph Mining– Related Work
– Formalism for metabolic pathways
– The Algorithm
• Discussion and Empiric Results
• Conclusion
• Future Work
11
Related WorkRelated Work
• Subgraph isomorphism– Unlabeled version. Hardness usually “tackled” by
ordering nodes and edges for efficient processing
– Labeled Version. Easier, suitable for biological networks
• Frequent itemset mining– Multiple sets of items (transactions) from domain D are
given
– Itemset X implies itemset Y with c confidence if c% of sets containing X also contain Y
– X→Y has support s if s% of the sets contain X and Y
12
Graph Formalism for Metabolic Graph Formalism for Metabolic PathwaysPathways
• A Metabolic Pathway is a triplet, P(M,Z,R)
– M, a set of metabolites
– Z, a set of enzymes
– R, a set of reactions, where each reaction r is associated with
• A set of enzymes Z(r) from Z
• A set of substrates S(r) from M
• A set of products T(r) from M
metabolite
enzyme
13
Graph Formalism for Metabolic Graph Formalism for Metabolic PathwaysPathways