Complementarity of network and sequence information in homologous proteins

Post on 02-Jan-2016

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Complementarity of network and sequence information in homologous proteins. Vesna Memišević 2 , Tijana Milenković 2 , and Nataša Pržulj 1. 1 Department of Computing, Imperial College London, London, UK 2 Department of Computer Science, University of California, Irvine, USA. - PowerPoint PPT Presentation

Transcript

Complementarity of network and sequence

information in homologous proteins

March, 2010

1Department of Computing, Imperial College London, London, UK2Department of Computer Science, University of California, Irvine, USA

International Symposium on Integrative Bioinformatics

Vesna Memišević2, Tijana Milenković2, and Nataša Pržulj1

Motivation

• Genetic sequences – revolutionized understanding of biology• Non-sequence based data of importance, e.g.:

– secondary & tertiary structure of RNA have the dominant role in RNA function (tRNA: Gautheret et al., Comput. Appl. Biosci., 1990)(rRNA: Woese et al., Microbiological Reviews, 1983)

– Secondary structure-based approach – more effective at finding new functional RNAs than sequence-based alignments(Webb et al., Science, 2009)

• What about patterns of interconnections in PPI networks?– Can they complement the knowledge learned from genomic sequence?– Wiring patterns of duplicated proteins in PPI net – insights into evol. dist.?

– Does the information about homologues captured by PPI network topology differ from that captured by their sequence?

Nataša Pržuljnatasha@imperial.ac

.uk

2

Background

• Homologs – descend from a common ancestor:

1. Paralogs: in the same species, evolve through gene duplication events

2. Orthologs: in different species, evolve through speciation events

3

Nataša Pržuljnatasha@imperial.ac

.uk

44

Background

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

4

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

555

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

2. KEGG Orthology System[2]

5

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

Background

666

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

2. KEGG Orthology System[2]

6

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

77

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

2. KEGG Orthology System[2]

7

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

888

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

2. KEGG Orthology System[2]

8

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4 67

999

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

• Triangles sharing a side merged into the groups of orthologs and paralogs

2. KEGG Orthology System[2]

9

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4 67

10101010

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

• Triangles sharing a side merged into the groups of orthologs and paralogs

2. KEGG Orthology System[2]

10

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

11111111

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

• Triangles in it found

• Triangles sharing a side merged into the groups of orthologs and paralogs

• No dependence on the absolute level of similarity between compared proteins

2. KEGG Orthology System[2]

11

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

1212121212

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

12

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

13

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Nataša Pržulj

natasha@imperial.ac.uk

1414141414

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

1515151515

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1 1’

2 3

4

5

67

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

• Again, no dependence on absolute level of similarity

161616161616

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

16

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

17171717171717

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

17

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

18181818181818

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

18

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

1919191919191919

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

• Previous network-topology assisted approaches:

• Network-alignment-based (ISORank)• Yosef, Sharan & Noble, Bioinformatics, 2008

(hybrid Rankprop) Rely heavily on sequence information Use only limited amount of network topology

19

Nataša Pržuljnatasha@imperial.ac

.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

20202020202020

Our Method

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

• PPI networks are noisy• We analyze the high-confidence part of yeast PPI

network by Collins et al.[3]: 9,074 edges amongst 1,621 proteins

• Focus on proteins with degree > 3 to avoid noisy PPIs• There are 175 orthologous pairs amongst 181

proteins

20

Nataša Pržuljnatasha@imperial.ac

.uk

[3] Collins et al., Molecular and Cellular Proteomics, 6(3):439–450, 2008

21

Our Method

Nataša Pržuljnatasha@imperial.ac

.uk

• Does PPI network topology contain homology information? Are similarly wired proteins homologous?

• Does homology information obtained from network topology differ from that obtained from sequence?

2222

Our Method

Nataša Pržuljnatasha@imperial.ac

.uk

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

232323

Our Method

Nataša Pržuljnatasha@imperial.ac

.uk

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Induced Of any frequency

24242424

Our Method

Nataša Pržuljnatasha@imperial.ac

.uk

Generalize node degree

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

2525252525

Our Method

Nataša Pržuljnatasha@imperial.ac

.uk

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

262626262626

Our Method

Nataša Pržuljnatasha@imperial.ac

.uk

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

27

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Graphlet Degree (GD) vectors, or “node signatures”

Nataša Pržuljnatasha@imperial.ac

.uk

Our Method

2828

Nataša Pržuljnatasha@imperial.ac

.uk

Our Method

Similarity measure between nodes’ Graphlet Degree vectors

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

292929

Nataša Pržuljnatasha@imperial.ac

.uk

Our Method

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Signature Similarity Measure

303030

Nataša Pržuljnatasha@imperial.ac

.uk

Our Method

31

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Orthologous pairs often perform the same or similar function.

• Does GD vector similarity (GDS) imply shared biological function?

• Note: most GO annotations were obtained from sequences Similar topology ~ similar sequence ~ similar function

Network Topology

3232

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Orthologous proteins have high GD vector similarities Network Topology

333333

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Orthologous proteins have high GD vector similarities

p-value < 0.05

85%

Network Topology

34343434

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Orthologous proteins have high GD vector similarities

p-value < 0.05

85%

> 20% of orthologous pairs have GDS > 85%

Network Topology

3535353535

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

363636363636

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

373737373737

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

38383838383838

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Sequence identities for the 175 orthologous pairsSequence

3939393939393939

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Sequence identities for the 175 orthologous pairsSequence

~70% orth. pairs have seq. identity < 35%

35%

404040404040404040

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Sequence identities for the 175 orthologous pairsSequence

~20% orth. pairs have seq. identity > 90%

90%

41414141414141414141

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• Sequence identities for the 175 orthologous pairsSequence

“Twilight zone” for homology

20-35%

~70% orth. pairs have seq. identity < 35% No dependence on the absolute similarity COG& KEGG, but triangles in the graph of best matches

42

85%

20% 35%

~20% of orthologous pairs have signature similarities

above 85% (35 pairs)

~30% of orthologous pairs have sequence identities above 35% (53 pairs)

Overlap: 22 pairs (~60% of the smaller set) Sequence and network topology somewhat complementary slices of homology information

Nataša Pržuljnatasha@imperial.ac

.uk

ResultsComparison:

4343434343434343

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 50% signature similarity

Degrees 25 and 5

444444444444444444

Results

Nataša Pržuljnatasha@imperial.ac

.uk

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 65% signature similarity

Degrees 54 and 9

45

Conclusions

• Homology information captured by PPI network topology differs from that captured by sequence

• Complementary sources for identifying homologs

Future work:• Could topological similarity be used to

identify orthologs from best-hits graph analysis as done for sequences?

Acknowledgements

This project was supported by the NSF CAREER

IIS-0644424 grant

Nataša Pržuljnatasha@imperial.ac

.uk

top related