10. Lecture WS 2004/05 Bioinformatics III 1 Protein complexes and their shared components - Most cellular processes result from a cascade of events mediated by proteins that act in a cooperative manner. -Protein complexes can share components: proteins can be reused and participate to several complexes. Methods for analyzing high-throughput protein interaction data have mainly used clustering techniques. They have been applied to assign protein function by inference from the biological context as given by their interactors, and to identify complexes as dense regions of the network (see V9). The logical organization into shared and specific components, and its representation remains elusive. Gagneur et al. Genome Biology 5, R57 (2004)
Protein complexes and their shared components. - Most cellular processes result from a cascade of events mediated by proteins that act in a cooperative manner. Protein complexes can share components: proteins can be reused and participate to several complexes. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
10. Lecture WS 2004/05
Bioinformatics III 1
Protein complexes and their shared components
- Most cellular processes result from a cascade of events mediated by proteins
that act in a cooperative manner.-Protein complexes can share components: proteins can be reused and
participate to several complexes.
Methods for analyzing high-throughput protein interaction data have mainly used
clustering techniques.
They have been applied to assign protein function by inference from the biological
context as given by their interactors, and to identify complexes as dense regions
of the network (see V9).
The logical organization into shared and specific components, and its
representation remains elusive.
Gagneur et al. Genome Biology 5, R57 (2004)
10. Lecture WS 2004/05
Bioinformatics III 2
shared components
Shared components = proteins or groups of proteins occurring in different
complexes are fairly common:
A shared component may be a small part of many complexes, acting as a unit that
is constantly reused for ist function.
Also, it may be the main part of the complex e.g. in a family of variant complexes
that differ from each other by distinct proteins that provide functional specificity.
Aim: identify and properly represent the modularity of protein-protein interaction
networks by identifying the shared components and the way they are arranged to
Elements of a module have exactly the same neighbors outside the module
one can substitute all of them for a representative node.
In a quotient, all elements of the module are replaced by the representative node,
and the edges with the neighbors are replaced by edges to the representative.
Quotients can be iterated until the entire graph is merged into a final
representative node.
Iterated quotients can be captured in a tree, where each node represents a
module, which is a subset of ist parent and the set of its descendant leaves.
Gagneur et al. Genome Biology 5, R57 (2004)
10. Lecture WS 2004/05
Bioinformatics III 5
Modular decomposition
Modular decomposition of the
example graph shown before.
Modular decomposition gives a
labeled tree that represents iterations
of particular quotients, here the
successive quotients on the modules
{a,b,c} and {e,f}.
The modular decomposition is a
unique, canonical tree of iterated
quotients
(formal proof exists Möhring 1985).
Gagneur et al. Genome Biology 5, R57 (2004)
10. Lecture WS 2004/05
Bioinformatics III 6
Modular decomposition
The nodes of the modular decomposition
are labeled in 3 ways:
As series when the direct descendants
are all neighbors of each other,
as parallel when the direct descendants
are all non-neighbors of each other,
and by the structure of the module
otherwise (prime module case).
Gagneur et al. Genome Biology 5, R57 (2004)
Series are labeled by an asterisk within a circle, parallel by two parallel lines within a circle,
and prime by a P within a circle. The prime is advantageously labeled by its structure.
The graph can be retrieved from the tree on the right by recursively expanding the modules
using the information in the labels. Therefore, the labeled tree can be seen as an exact
alternative representation of the graph.
10. Lecture WS 2004/05
Bioinformatics III 7
Results from protein complex purifications (PCP), e.g. TAP
Different types of data:- Y2H: detects direct physical interactions between proteins
- PCP by tandem affinity purification with mass-spectrometric identification of the
protein components identifies multi-protein complexes
Molecular decomposition will have a different meaning due to different semantics
of such graphs.
Here, focus analysis on PCP content.
PCP experiment: select bait protein where TAP-label is attached Co-purify
protein with those proteins that co-occur in at least one complex with the bait
protein.
In future, integrated view combining both types of data would be preferred.
Gagneur et al. Genome Biology 5, R57 (2004)
10. Lecture WS 2004/05
Bioinformatics III 8
Clique and maximal clique
A clique is a fully connected sub-graph, that is, a set
of nodes that are all neighbors of each other.
In this example, the whole graph is a clique and
consequently any subset of it is also a clique, for
example {a,c,d,e} or {b,e}. A maximal clique is a
clique that is not contained in any larger clique. Here
only {a,b,c,d,e} is a maximal clique.
Gagneur et al. Genome Biology 5, R57 (2004)
Assuming complete datasets and ideal results, a permanent complex will appear
as a clique.
The opposite is not true: not every clique in the network necessarily derives from
an existing complex. E.g. 3 connected proteins can be the outcome of a single
trimer, 3 heterodimers or combinations thereof.
10. Lecture WS 2004/05
Bioinformatics III 9
Results from protein complex purifications (PCP), e.g. TAP
Interpretation of graph and module labels
for systematic PCP experiments.
(a) Two neighbors in the network are
proteins occurring in a same complex.
(b) Several potential sets of complexes
can be the origin of the same observed
network. Restricting interpretation to the
simplest model (top right), the series
module reads as a logical AND between
its members.
(c) A module labeled ´parallel´
corresponds to proteins or modules
working as strict alternatives with respect
to their common neighbors.
(d) The ´prime´ case is a structure where
none of the two previous cases occurs. Gagneur et al. Genome Biology 5, R57 (2004)
10. Lecture WS 2004/05
Bioinformatics III 10
Obtain maximal cliques
Modular decomposition provides an instruction set to deliver all maximal cliques
of a graph.
In particular, when the decomposition has only series and parallels, the maximal
cliques are straightforwardly retrieved by traversing the tree recursively from top
to bottom.
A series module acts as a product: the maximal cliques are all the combinations
made up of one maximal clique from each „child“ node.
A parallel module acts as a sum: the set of maximal cliques is the union of all
maximal cliques from the „child“ nodes.
Gagneur et al. Genome Biology 5, R57 (2004)
10. Lecture WS 2004/05
Bioinformatics III 11
Modular decomposition of graphs
The notion of module naturally arises from different combinatorial structures and
has been introduced several times in different fields:
Modules have been called Decompositions have also been called- Autonomous sets - substitution decomposition- Closed sets - ordinal sum- Stable sets - X-join- Clumps- Committees- Externally related sets- Nonsimplifiable subnetworks- Partitive sets
…
Muller&Spinrad, J. Ass. Comp Mach 36, 1 (1989)
10. Lecture WS 2004/05
Bioinformatics III 12
Consider undirected graph G=(V,E) with n =|V| vertices and m=|E| edges.
The complement of a graph G is denoted by G.
If X is a subset of vertices, then G[X] is the subgraph of G induced by X.
Let x be an arbitrary vertex, then N(x) and N(x) stand respectively for the
neighborhood of x and its non-neighborhood.
A vertex x distinguishes two vertices u and v if (x,u) E and (x,v) E.
A module M of a graph G is a set of vertices that is not distinguished by any
vertex.
10. Lecture WS 2004/05
Bioinformatics III 13
A simple linear algorithm for modular decomposition
The modules of a graph are a potentially exponentially-sized family
However, the sub-family of strong modules, the modules that overlap no other
modules, has size O(n).
A overlaps B if A B , A \ B and B \ A
The inclusion order of this family defines the previously explained
modular tree decomposition, which is enough to store the module family of a
graph.
The root of this tree is the trivial module V and its n leaves are the trivial modules
{x}, xV.
Habib, de Montgolfier, Paul (2004)
10. Lecture WS 2004/05
Bioinformatics III 14
Aim: a simple linear algorithm for modular decomposition
Any graph G with at least 3 vertices is either not connected
or its complement G is not connected
or G and G are both connected.
In the last case, the maximal modules define a partition of the vertex-set and are
said to be a prime composition.
The modular decomposition tree can be recursively built by a top-down approach.
At each step, the algorithm recurses on graphs induced by the maximal strong
modules. This technique gives an O(n4) complexity.
Here, derive a linear-time algorithm that computes a modular factorizing
permutation without computing the underlying decomposition tree.
This tree may be derived in a second step.Habib, de Montgolfier, Paul (2004)
10. Lecture WS 2004/05
Bioinformatics III 15
Modular decomposition of protein interaction graphs
A graph and its modular tree decomposition. The set {1,2} is a strong module.
The module {7,8} is weak: it is overlapped by the module {8,9}.
The permutation = (1,2,3,4,5,6,7,8,9) is a modular factorizing permutation.
Habib, de Montgolfier, Paul (2004)
10. Lecture WS 2004/05
Bioinformatics III 16
Module-factorizing orders
Let G=(V,E) be a graph and let O be a partial order on V.
For two comparable elements x and y where x <O y we state x precedes y and y
follows x.
Two subsets A and B cross if a,a‘ A and b,b‘ B such that a <O b and a‘ >O
b‘. A linear extension of a partially ordered set (‚poset‘) is a completion of the poset
into a total order.
Definition 1. A partial order O is a Module-Factorizing Partial Order (MFPO) of
V(G) if any pair of non-intersecting strong modules of G do not cross.
The modular factorizing permutations are exactly the module-factorizing total orders.
Proposition 1. A partial order O is an MFPO if and only if it can be completed into a
factorizing permutation.
Habib, de Montgolfier, Paul (2004)
10. Lecture WS 2004/05
Bioinformatics III 17
Module-factorizing orders
Definition 2. An ordered partition is a collection {P1, ..., Pk} of pairwise disjoint
parts, with and an order O such that for all
x Pi and y Pj, x <O y if i < j.
Start with trivial partition (a single part equal to the vertex set) and iteratively
extend (or refine) it until every part is a singleton.
A center vertex c V is distinguished and two refining rules, preserving the MFPO
property, are used. They are defined in Lemma 1:
Habib, de Montgolfier, Paul (2004)
10. Lecture WS 2004/05
Bioinformatics III 18
Defining rules
Lemma 1.
1. Center Rule: For any vertex c, the ordered partition
is module-factorizing.
Habib, de Montgolfier, Paul (2004)
The center rule picks a center and breaks a trivial partition to start the
algorithm.
Once launched, the process goes on based on the pivot rule, that splits each
part Pi (except the part Pi that contains the pivot), according to the neighborhood
of the pivot.
10. Lecture WS 2004/05
Bioinformatics III 19
Lemma 1 continued.
2. Pivot Rule: Let be an ordered partition with
center c and p Pi such that Pj, ij, overlaps N(p) .
If O is an MFPO, then the following refinements preserve the module-
factorizing property:
Defining rules: pivot rule
Habib, de Montgolfier, Paul (2004)
10. Lecture WS 2004/05
Bioinformatics III 20
Preliminary algorithm
Partition refinement scheme that outputs a partition of V into the maximal
modules not containing c.
Habib, de Montgolfier, Paul (2004)
When this algorithm ends, every part is a module. To obtain a factorizing
permutation it has to be recursively relaunched on the non-singleton parts.
10. Lecture WS 2004/05
Bioinformatics III 21
Habib, de Montgolfier, Paul (2004)
Execution example of algorithm
The resulting factorizing permutation is (a, s, v, w, u, y, x, z, t).