Comparative Analysis of Molecular Interaction Networks Ananth Grama Coordinated Systems Lab and, Computer Science Department, Purdue University http://www.cs.purdue.edu/people/ayg Various parts of this work involve collaborations with Priya Vashishtha, Rajiv Kalia, Aiichiro Nakano (USC), William Goddard (CalTech), Yohan Kim and Shankar Subramaniam (UCSD), Vipin Kumar (UMN), Eric Jakobsson (UIUC), Larry Scott (IIT), Ahmed Sameh, Mete Sozen, Mehmet Koyuturk, Suresh Jagannathan (Purdue) This work is supported by National Institutes of Health, National Science Foundation, Department of Energy, and Intel.
75
Embed
Comparative Analysis of Molecular Interaction Networks
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Comparative Analysis of
Molecular Interaction Networks
Ananth GramaCoordinated Systems Lab and,
Computer Science Department,Purdue University
http://www.cs.purdue.edu/people/ayg
Various parts of this work involve collaborations with Priya Vashishtha, Rajiv
Kalia, Aiichiro Nakano (USC), William Goddard (CalTech), Yohan Kim and
Shankar Subramaniam (UCSD), Vipin Kumar (UMN), Eric Jakobsson (UIUC), Larry
Scott (IIT), Ahmed Sameh, Mete Sozen, Mehmet Koyuturk, Suresh Jagannathan
(Purdue)
This work is supported by National Institutes of Health, National Science
• Algorithms for Analyzing Molecular Interaction Networks
– Analyzing biological networks for conserved molecular interaction
patterns
– Pairwise Alignment of protein-protein interaction networks
– Probabilistic models/analyses for assessing statistical significance
• Computational Synthesis of Interaction Networks
– Inferring function from domain co-evolution
• Ongoing Work
Ananth Grama 2001/01/27
Lab Overview
• Development of algorithmic and software substrates to solvefundamental problems in science and engineering.
• Research transcends software infrastructure (compilers, OS),algorithms (numerical and combinatorial), platforms (motes topetascale), and software (libraries to services).
• We focus on problems at the core of computing, but measurethe value of our work in terms of its impact on science andengineering applications.
• All of our projects are in close collaboration with domainexperts.
Ananth Grama 2001/01/27
Lab Overview: Sample Projects
Model Reduction and Control of Large Structures
• Virtually all (new) large structures have some form passive orsemi active control mechanisms.
Ananth Grama 2001/01/27
Model Reduction and Control of Large Structures
Pictured left is a passive fluid damper with bottom casingcontaining the bearings and oil used to absorb seismic energy.Pictured right is a semiactive damper with variable orificedamping (Picture credits Steven Williams).
Ananth Grama 2001/01/27
Model Reduction and Control of Large Structures
The Dongting Lake Bridge is being retrofitted with MR dampers tocontrol wind-induced vibration.
Ananth Grama 2001/01/27
Model Reduction and Control of Large Structures
Objective: Develop the computational infrastructure to enablethe next generation of civil infrastructure.
• Real-time sensing and actuation
• Model reduction and control
• In-network computation of control vectors
Ananth Grama 2001/01/27
Model Reduction and Control of Large Structures
Ananth Grama 2001/01/27
Model Reduction and Control of Large Structures
A critical component of embedded systems is the effortassociated with application development. Our COSMOSenvironment fundamentally addresses this bottleneck.
• Cooperative tasks require all participating units
– Selective pressure on preserving interactions & interacting proteins
– Interacting proteins follow similar evolutionary trajectories [Pellegrini et al.,
PNAS, 1999]
• Orthologs of interacting proteins are likely to interact [Wagner,
Mol. Bio. Evol., 2001]
– Conservation of interactions may provide clues relating to conservation
of function
• Modular conservation and alignment hold the key to criticalstructural, functional, and evolutionary concepts in systemsbiology
Ananth Grama 2001/01/27
Conserved Interaction Patterns
• Given a collection of interaction networks (belonging todifferent species), find sub-networks that are common toan interesting subset of these networks [Koyuturk, Grama, &
Szpankowski, ISMB, 2004]
– A sub-network is a group of interactions that are tied to each other
(connected)
– Frequency: The number of networks that contain a sub-network, is a
coarse measure of statistical significance
• Computational challenges
– How to relate molecules (proteins) in different organisms?
– Requires solution of the intractable subgraph isomorphism problem
– Must be scalable to potentially large number of networks
– Networks are large [in the range of 10K edges]
Ananth Grama 2001/01/27
Graph Analysis
Network database
Interaction patterns that are common to all networks
Ananth Grama 2001/01/27
Relating Proteins in Different Species
• Ortholog Databases
– PPI networks: COG, Homologene, Pfam, ADDA
– Metabolic pathways: Enzyme nomenclature
– Reliable, but conservative
– Domain families rely on domain information, but the underlying domains
for most interactions are unknown ⇒ Multiple node labels
• Sequence Clustering
– Cluster protein sequences and label proteins according to this clustering
– Flexible, but expensive and noisy
• Labels may span a large range of functional relationships, fromprotein families to ortholog groups
– Without loss of generality, we call identically labeled proteins as orthologs
Ananth Grama 2001/01/27
Problem Statement
• Given a set of proteins V , a set of interactions E, and a many-to-many mapping from V to a set of ortholog groups L ={l1, l2, ..., ln}, the corresponding interaction network is a labeledgraph G = (V,E,L).
– v ∈ V (G) is associated with a set of ortholog groups L(v) ⊆ L.
– uv ∈ E(G) represents an interaction between u and v.
• S is a sub-network of G, i.e., S ⊑ G if there is an injectivemapping φ : V (S) → V (G) such that for all v ∈ V (S), L(v) ⊆L(φ(v)) and for all uv ∈ E(S), φ(u)φ(v) ∈ E(G).
Ananth Grama 2001/01/27
Computational Problem
• Conserved sub-network discovery
– Instance: A set of interaction networks G = {G1 = (V1, E1,L), G2 =
(V2, E2,L), ..., Gm = (Vm, Em,L)}, each belonging to a different
organism, and a frequency threshold σ∗.
– Problem: Let H(S) = {Gi : S ⊑ Gi} be the occurrence set of graph
S. Find all connected subgraphs S such that |H(S)| ≥ σ∗, i.e., S is a
frequent subgraph in G and for all S′= S, H(S) 6= H(S′), i.e., S is
maximal.
Ananth Grama 2001/01/27
Algorithmic Insight: Ortholog Contraction
• Contract orthologous nodes into a single node
• No subgraph isomorphism
– Graphs are uniquely identified by their edge sets
• Key observation: Frequent sub-networks are preserved ⇒ Noinformation loss
– Sub-networks that are frequent in general graphs are also frequent in
their ortholog-contracted representation
– Ortholog contraction is a powerful pruning heuristic
• Discovered frequent sub-networks are still biologically interpretable!
– Interaction between proteins becomes interaction between ortholog
– Global labeling by enzyme nomenclature (EC numbers)
– A directed edge from one enzyme to the other implies that the second
consumes a product of the first
267
221
668
1172
5.1.3.3
5.1.3.3
2.7.1.1
2.7.1.1
2.7.1.1
2.7.1.2
2.7.1.2
2.7.1.632.7.1.63
Ananth Grama 2001/01/27
Ortholog Contraction in PPI Networks
• Interaction between proteins → Interaction between orthologgroups or protein families
Rrp4
Rrp43 Mtr3
Ski6
Csr4
KOG3013KOG3013
KOG1613KOG1613
KOG1068
KOG1068
KOG1068
KOG3409
KOG3409
Ananth Grama 2001/01/27
Preservation of Sub-networks
Theorem: Let G be the ortholog-contracted graph obtainedby contracting the orthologous nodes of network G. Then, if S is asubgraph of G, S is a subgraph of G.
Corollary: The ortholog-contracted representation of anyfrequent sub-network is also frequent in the set of ortholog-contracted graphs.
G G
Ananth Grama 2001/01/27
Simplifying the Graph Analysis Problem
• Observation: An ortholog-contracted graph is uniquelydetermined by the set of its edges.
– Conserved Sub-network Discovery Problem → Frequent Edge set
Discovery Problem
a
a a
ab
b b
b
c
c c
c
d
d d
de
e e
e
G1 G2
G3 G4
F1= {ab, ac, de}
F2= {ab, ac, bc, de, ea}
F3= {ab, ac, bc, ea}
F4= {ab, ce, de, ea}
Ananth Grama 2001/01/27
Extending Frequent Itemset Mining to Graph Analysis
• Given a set of transactions, find sets of items that are frequentin these transactions
– Extensively studied in data mining literature
• Algorithms exploit downward closure property
– An edge set is frequent only if all of its subsets are frequent
– Generate edge sets (sub-networks) from small to large, pruning supersets
• Mehmet Koyuturk, Wojciech Szpankowski, and Ananth Grama,Assessing Significance of Connectivity and Conservation inProtein Interaction Networks, Journal of Computational Biology(in press).
• Mehmet Koyuturk, Yohan Kim, Shankar Subramaniam, WojciechSzpankowski, and Ananth Grama, “Detecting conservedinteraction patterns in biological networks”, Journal ofComputational Biology, 13(7), 1299-1322, 2006.
• Mehmet Koyuturk, Yohan Kim, Umut Topkara, ShankarSubramaniam, Wojciech Szpankowski, and Ananth Grama,Pairwise Alignment of Protein Interaction Networks, Journal ofComputational Biology, 13(2), 182-199, 2006.
• Yohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama,and Shankar Subramaniam, Inferring Functional Informationfrom Domain Co-evolution, Bioinformatics, 22(1), pp. 40-49,2006.
Ananth Grama 2001/01/27
References
• Mehmet Koyuturk, Ananth Grama, and Wojciech Szpankowski,An Efficient Algorithm for Detecting Frequent Subgraphs inBiological Networks, Bioinformatics, Vol. 20, Suppl. 1, pp i200-i207, 2004.
• Mehmet Koyuturk, Ananth Grama, and Wojciech Szpankowski,Assessing Significance of Connectivity and Conservation inProtein Interaction Networks, 10th International Conference onResearch in Computational Molecular Biology (RECOMB), LNBI3909, pp. 45-59, 2006.
• Mehmet Koyuturk, Ananth Grama and Wojciech Szpankowski,Pairwise Local Alignment of Protein Interaction NetworksGuided by Models of Evolution, RECOMB 2005.
• Can we derive rules in terms of GO terms, e.g., Pi → Pj ⊣ Pk?
– Statistical challenge: Such patterns have to be significantly abundant
– Computational challenge: When statistical significance is the basis
(as opposed to frequency), monotonicity properties (e.g., downward
closure) no longer hold!
– Our approach: conditional significance, i.e., evaluate significance of a
pattern based on the background constructed by its substructures
• Final goal: Database of (computationally derived) canonicalmodules and pathways
A network of GO terms [Tong et al., Science, 2004]
Ananth Grama 2001/01/27
Cell as a State Machine
• Signaling pathways can be modeled as a series of transitionsbetween states of protein or peptide molecules, non-proteinmolecules, (non-)protein complexes, and modules
– Signaling Gateway provides a database of network states for proteins, a
mirror is available to our group via our collaboration with S. Subramaniam
• Constructing signaling pathways from state information forindividual molecules
– Smallest common supergraph problem
– Identification of specified pathways
State diagram for Cdk8 protein (from Molecule Pages)
Ananth Grama 2001/01/27
Modular Phylogenetics
CTF18
RFC3
RFC5
CTF8
RFC2
RFC4PF TB EC SP SC DH KL CG YL AG CE HS PT MM RN GG DR FR TN AM AN DM TP AT OS
RFC2
CTF18
RFC4
CTF8
RFC3
RFC5
Replication Factor C complex identified on yeast PPI network byMCODE [Bader & Hogue, BMC Bioinformatics, 2003] algorithm
and the phylogenetic profiles of its proteins on25 eukaryotic genomes
Conserved in all eukaryotic species!
Ananth Grama 2001/01/27
Modular Phylogenetics
YDR116C
MRPL19 YML025C MRPL9
MHR1 MRP20
YDR115WMRP8PF TB EC SP SC DH KL CG YL AG CE HS PT MM RN GG DR FR TN AM AN DM TP AT OS
MRPL9
MRPL19
YDR116C
YML025C
MHR1
MRP20
YDR115W
MRP8
A component of mitochondrial ribosome identified onyeast PPI network by MCODE algorithm and the
phylogenetic profiles of its proteins on 25 eukaryotic genomes
Conserved in only yeast species!
• Models and algorithms for quantifying, analyzing, andevaluating modular conservation and divergence acrossspecies