Top Banner
TORQUE: TOPOLOGY-FREE QUERYING OF PROTEIN INTERACTION NETWORKS Sharon Bruckner 1 , Falk Hüffner 1 , Richard M. Karp 2 , Ron Shamir 1 , and Roded Sharan 1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA
29

TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Jan 05, 2016

Download

Documents

Anabel Boyd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

TORQUE: TOPOLOGY-FREE QUERYING OF PROTEIN INTERACTION NETWORKS

Sharon Bruckner1, Falk Hüffner1 , Richard M. Karp2, Ron Shamir1, and Roded Sharan1

1 School of computer science, Tel Aviv University2 Int. Computer Science Institute, Berkley, CA

Page 2: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

OUR GOAL: NETWORK QUERYING Start with a protein-protein interaction network of

some species A. We seek subnetworks that match complexes or

pathways.

Network Querying: Given a protein complex from another species B, identify the subnetwork of A that is most similar to it.

Why network querying? Match hints at an evolutionary conserved region Infer the functionality of the matched region.

Page 3: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Previous Methods Assume knowledge of the interactions within

the query complex (the topology). Look for a match in the network with the same topology. Examples: Qnet (Dost et al, 2008), GraphFind (Ferro et al,

2008).

??

Page 4: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

?

NO NEED FOR TOPOLOGY!

Interaction information is noisy and incomplete, and for some species – not available.

Page 5: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

THE PROBLEM

Input: Graph G=(V,E) , |V|

=n, |E|=m

Color set {1,2,...,k}

A coloring of network vertices

Page 6: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

THE PROBLEM

We seek:Is there are connectedsubgraph of G that

has exactly one vertex of each color?

Call such a subgraph “colorful”

Page 7: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

ABOUT THE PROBLEM NP-complete

Hard even when the graph is a tree with max degree 3 (via reduction from 3SAT (Fellows et al, 2007)

Our Contributions: A fixed parameter dynamic

programming algorithm. Integer Linear Program Fast heuristics Implementation using a combination of

the above.

Page 8: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

DEFINING THE BASIC DP ALGORITHM

Input: A graph where each vertex is colored by one of k colors.Output: Find a colorful tree

Every connected subgraph has a spanning tree

Every colorful connected subgraph will have a colorful spanning tree

Instead of looking for a colorful subgraph, look for a colorful tree

Input: A graph where each vertex is colored by one of k colors.Output: Find the highest scoring colorful tree

Page 9: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

DYNAMIC PROGRAMMING ALGORITHM (FELLOWS

ET AL, 2008)

Row for each vertex Column for each subset of

colors, in increasing size.

S1 S2 S3 S4

v1 0 0 None 3.4

v2 0 None 2.3 2

v3 None 0 3.15 None

v4 None None 13.5 7.42

v5 0 0 6.4 8.1

vertices

Score of best tree Rooted in v3 that Is colored exactlyBy S3

IDEA: Instead of looking at all nk possible subgraphs, look only at all 2k color sets

Page 10: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

DYNAMIC PROGRAMMING ALGORITHM

The last column contains, for every vertex v, the highest scoring tree rooted in v colored by all the colors of the query!

Running time: O(3k|E|).

1 2

1 2, , , ,u N vS S S

T v S MAX T v S T u S w u v

Page 11: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

EXAMPLE

vv

uu

T(v, { } )

ww

v

u

1 2

1 2, , , ,u N vS S S

T v S MAX T v S T u S w u v

Page 12: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

EXTENSION 1: ALLOWING DELETIONS – MATCHING WITH LESS COLORS

?

Page 13: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

EXTENSION 2: ALLOWING INSERTIONS: SPECIAL NON-COLORED VERTICES,ARBITRARY VERTICES

Page 14: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

ALLOWING NON-COLORED INSERTIONS

For j insertions, we would expect running time: O(3k+jm).

Can show: O(3kmj). Make j copies of each column, and

recursively solve:

B(v, S, j’) = Highest score of a tree, rooted in v, colored by S, using exactly j’ insertions

Page 15: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

FORMULA & EXAMPLE

1 2

1 2

1 2 21

' , , ' 0

, , , , , , ,u N vSj j jS S

j j B v S j

B v S MAX B v S B u S w u v otherw sj j j i e

a

d

b

c

f

g

e

Running Time: O(3km*j)

Page 16: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Extension 3: ALLOWING MULTIPLE COLORS PER VERTEX

Page 17: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

?

PUTTING IT TOGETHER…

3

3

1.25

0.82

3.14

8

2.34

6.6

1.25

4.57

2.25

4.8

3.9 0.25

0.3

Page 18: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

A SECOND APPROACH

Formulate the problem as an integer linear program (ILP).

Use efficient ILP solvers.

Page 19: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

ILP at a glance

Want: Subset T of the vertices Formulate colorfulness

Only vertices in T are colored. Every vertex should get at most one color Every color should be given to at most one

vertex Formulate connectivity

Find a flow such that: Only vertices in T can be involved in the flow. Flow of k-1, single sink, k-1 sources Every source has connection to the sink via flow

edges.

Page 20: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

The Integer Linear Program

Page 21: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Heuristic Speedups

First do data reduction only 5% of the vertices are associated with one

or more query colors many non-colored vertices are too far from any

colored vertex to be useful For each remaining connected component:

Try a shortest-paths based heuristic that does not allow mismatches.

If this fails: If few colors, but large instance, use dynamic

programming Otherwise, use ILP

Page 22: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

IMPLEMENTATION, EXPERIMENTS & RESULTS

Page 23: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Experiments

We applied our method to query complexes within: yeast (5430 proteins, 39936 interactions), fly (6650 proteins, 21275 interactions) human (7915 proteins, 28972 interactions).

Queries: yeast, fly, human bovine, mouse, and rat.

Page 24: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

COMPARISON WITH OTHER METHODS Most previous work tested queries with a

known topology.

? We compare our results with those of Qnet (Dost

et al, 2008), designed to tackle topology-based queries.

QNet uses color coding to tackle the subgraph homemorphism problem, allowing insertions and deletions.

Page 25: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Comparison with QNet

Page 26: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Results Evaluation

Functional coherence Used GO TermFinder for functional enrichment in

T. Specificity

Looked at overlap between T and known complexes in the target species.

Compared to overlap between random subgraphs and the known complexes.

Corrected for multiple testing using FDR (q<0.05).

Quality match: Functionally coherent and specific.

Page 27: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

SELECTED RESULTS

Page 28: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Thanks: Nir Yosef, the TAU Computational Genomics group , and the Computational System Biology group.

Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ.

The PPI network querying problem motivates the colorful connected subgraph problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results.

SUMMARY

Page 29: TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

REFERENCES [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette.

Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, 2007.

[N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, 2006.

[BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, 2008.

[AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}.

[DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7):913-925, 2008.