Top Banner
Complementarity of network and sequence information in homologous proteins March, 2010 1 Department of Computing, Imperial College London, London, UK 2 Department of Computer Science, University of California, Irvine, USA International Symposium on Integrative Bioinformatics Vesna Memišević 2 , Tijana Milenković 2 , and Nataša Pržulj 1
46

Complementarity of network and sequence information in homologous proteins

Jan 02, 2016

Download

Documents

rafael-curry

Complementarity of network and sequence information in homologous proteins. Vesna Memišević 2 , Tijana Milenković 2 , and Nataša Pržulj 1. 1 Department of Computing, Imperial College London, London, UK 2 Department of Computer Science, University of California, Irvine, USA. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Complementarity  of network and sequence information in homologous proteins

Complementarity of network and sequence

information in homologous proteins

March, 2010

1Department of Computing, Imperial College London, London, UK2Department of Computer Science, University of California, Irvine, USA

International Symposium on Integrative Bioinformatics

Vesna Memišević2, Tijana Milenković2, and Nataša Pržulj1

Page 2: Complementarity  of network and sequence information in homologous proteins

Motivation

• Genetic sequences – revolutionized understanding of biology• Non-sequence based data of importance, e.g.:

– secondary & tertiary structure of RNA have the dominant role in RNA function (tRNA: Gautheret et al., Comput. Appl. Biosci., 1990)(rRNA: Woese et al., Microbiological Reviews, 1983)

– Secondary structure-based approach – more effective at finding new functional RNAs than sequence-based alignments(Webb et al., Science, 2009)

• What about patterns of interconnections in PPI networks?– Can they complement the knowledge learned from genomic sequence?– Wiring patterns of duplicated proteins in PPI net – insights into evol. dist.?

– Does the information about homologues captured by PPI network topology differ from that captured by their sequence?

Nataša Prž[email protected]

.uk

2

Page 3: Complementarity  of network and sequence information in homologous proteins

Background

• Homologs – descend from a common ancestor:

1. Paralogs: in the same species, evolve through gene duplication events

2. Orthologs: in different species, evolve through speciation events

3

Nataša Prž[email protected]

.uk

Page 4: Complementarity  of network and sequence information in homologous proteins

44

Background

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

4

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Page 5: Complementarity  of network and sequence information in homologous proteins

555

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

2. KEGG Orthology System[2]

5

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Background

Page 6: Complementarity  of network and sequence information in homologous proteins

666

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

2. KEGG Orthology System[2]

6

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

Page 7: Complementarity  of network and sequence information in homologous proteins

77

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

2. KEGG Orthology System[2]

7

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

Page 8: Complementarity  of network and sequence information in homologous proteins

888

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

2. KEGG Orthology System[2]

8

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4 67

Page 9: Complementarity  of network and sequence information in homologous proteins

999

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

• Triangles sharing a side merged into the groups of orthologs and paralogs

2. KEGG Orthology System[2]

9

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4 67

Page 10: Complementarity  of network and sequence information in homologous proteins

10101010

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

• Triangles sharing a side merged into the groups of orthologs and paralogs

2. KEGG Orthology System[2]

10

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

Page 11: Complementarity  of network and sequence information in homologous proteins

11111111

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

• Triangles sharing a side merged into the groups of orthologs and paralogs

• No dependence on the absolute level of similarity between compared proteins

2. KEGG Orthology System[2]

11

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

Page 12: Complementarity  of network and sequence information in homologous proteins

1212121212

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

12

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Page 13: Complementarity  of network and sequence information in homologous proteins

13

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Nataša Pržulj

[email protected]

Page 14: Complementarity  of network and sequence information in homologous proteins

1414141414

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

Page 15: Complementarity  of network and sequence information in homologous proteins

1515151515

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

• Again, no dependence on absolute level of similarity

Page 16: Complementarity  of network and sequence information in homologous proteins

161616161616

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

16

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Page 17: Complementarity  of network and sequence information in homologous proteins

17171717171717

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

17

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Page 18: Complementarity  of network and sequence information in homologous proteins

18181818181818

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

18

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Page 19: Complementarity  of network and sequence information in homologous proteins

1919191919191919

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Previous network-topology assisted approaches:

• Network-alignment-based (ISORank)• Yosef, Sharan & Noble, Bioinformatics, 2008

(hybrid Rankprop) Rely heavily on sequence information Use only limited amount of network topology

19

Nataša Prž[email protected]

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Page 20: Complementarity  of network and sequence information in homologous proteins

20202020202020

Our Method

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

• PPI networks are noisy• We analyze the high-confidence part of yeast PPI

network by Collins et al.[3]: 9,074 edges amongst 1,621 proteins

• Focus on proteins with degree > 3 to avoid noisy PPIs• There are 175 orthologous pairs amongst 181

proteins

20

Nataša Prž[email protected]

.uk

[3] Collins et al., Molecular and Cellular Proteomics, 6(3):439–450, 2008

Page 21: Complementarity  of network and sequence information in homologous proteins

21

Our Method

Nataša Prž[email protected]

.uk

• Does PPI network topology contain homology information? Are similarly wired proteins homologous?

• Does homology information obtained from network topology differ from that obtained from sequence?

Page 22: Complementarity  of network and sequence information in homologous proteins

2222

Our Method

Nataša Prž[email protected]

.uk

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Page 23: Complementarity  of network and sequence information in homologous proteins

232323

Our Method

Nataša Prž[email protected]

.uk

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Induced Of any frequency

Page 24: Complementarity  of network and sequence information in homologous proteins

24242424

Our Method

Nataša Prž[email protected]

.uk

Generalize node degree

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 25: Complementarity  of network and sequence information in homologous proteins

2525252525

Our Method

Nataša Prž[email protected]

.uk

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 26: Complementarity  of network and sequence information in homologous proteins

262626262626

Our Method

Nataša Prž[email protected]

.uk

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

Page 27: Complementarity  of network and sequence information in homologous proteins

27

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Graphlet Degree (GD) vectors, or “node signatures”

Nataša Prž[email protected]

.uk

Our Method

Page 28: Complementarity  of network and sequence information in homologous proteins

2828

Nataša Prž[email protected]

.uk

Our Method

Similarity measure between nodes’ Graphlet Degree vectors

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Page 29: Complementarity  of network and sequence information in homologous proteins

292929

Nataša Prž[email protected]

.uk

Our Method

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Signature Similarity Measure

Page 30: Complementarity  of network and sequence information in homologous proteins

303030

Nataša Prž[email protected]

.uk

Our Method

Page 31: Complementarity  of network and sequence information in homologous proteins

31

Results

Nataša Prž[email protected]

.uk

• Orthologous pairs often perform the same or similar function.

• Does GD vector similarity (GDS) imply shared biological function?

• Note: most GO annotations were obtained from sequences Similar topology ~ similar sequence ~ similar function

Network Topology

Page 32: Complementarity  of network and sequence information in homologous proteins

3232

Results

Nataša Prž[email protected]

.uk

• Orthologous proteins have high GD vector similarities Network Topology

Page 33: Complementarity  of network and sequence information in homologous proteins

333333

Results

Nataša Prž[email protected]

.uk

• Orthologous proteins have high GD vector similarities

p-value < 0.05

85%

Network Topology

Page 34: Complementarity  of network and sequence information in homologous proteins

34343434

Results

Nataša Prž[email protected]

.uk

• Orthologous proteins have high GD vector similarities

p-value < 0.05

85%

> 20% of orthologous pairs have GDS > 85%

Network Topology

Page 35: Complementarity  of network and sequence information in homologous proteins

3535353535

Results

Nataša Prž[email protected]

.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

Page 36: Complementarity  of network and sequence information in homologous proteins

363636363636

Results

Nataša Prž[email protected]

.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

Page 37: Complementarity  of network and sequence information in homologous proteins

373737373737

Results

Nataša Prž[email protected]

.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

Page 38: Complementarity  of network and sequence information in homologous proteins

38383838383838

Results

Nataša Prž[email protected]

.uk

• Sequence identities for the 175 orthologous pairsSequence

Page 39: Complementarity  of network and sequence information in homologous proteins

3939393939393939

Results

Nataša Prž[email protected]

.uk

• Sequence identities for the 175 orthologous pairsSequence

~70% orth. pairs have seq. identity < 35%

35%

Page 40: Complementarity  of network and sequence information in homologous proteins

404040404040404040

Results

Nataša Prž[email protected]

.uk

• Sequence identities for the 175 orthologous pairsSequence

~20% orth. pairs have seq. identity > 90%

90%

Page 41: Complementarity  of network and sequence information in homologous proteins

41414141414141414141

Results

Nataša Prž[email protected]

.uk

• Sequence identities for the 175 orthologous pairsSequence

“Twilight zone” for homology

20-35%

~70% orth. pairs have seq. identity < 35% No dependence on the absolute similarity COG& KEGG, but triangles in the graph of best matches

Page 42: Complementarity  of network and sequence information in homologous proteins

42

85%

20% 35%

~20% of orthologous pairs have signature similarities

above 85% (35 pairs)

~30% of orthologous pairs have sequence identities above 35% (53 pairs)

Overlap: 22 pairs (~60% of the smaller set) Sequence and network topology somewhat complementary slices of homology information

Nataša Prž[email protected]

.uk

ResultsComparison:

Page 43: Complementarity  of network and sequence information in homologous proteins

4343434343434343

Results

Nataša Prž[email protected]

.uk

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 50% signature similarity

Degrees 25 and 5

Page 44: Complementarity  of network and sequence information in homologous proteins

444444444444444444

Results

Nataša Prž[email protected]

.uk

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 65% signature similarity

Degrees 54 and 9

Page 45: Complementarity  of network and sequence information in homologous proteins

45

Conclusions

• Homology information captured by PPI network topology differs from that captured by sequence

• Complementary sources for identifying homologs

Future work:• Could topological similarity be used to

identify orthologs from best-hits graph analysis as done for sequences?

Page 46: Complementarity  of network and sequence information in homologous proteins

Acknowledgements

This project was supported by the NSF CAREER

IIS-0644424 grant

Nataša Prž[email protected]

.uk