Complementarity of network and sequence information in homologous proteins

Complementarity of network and sequence

information in homologous proteins

March, 2010

1Department of Computing, Imperial College London, London, UK2Department of Computer Science, University of California, Irvine, USA

International Symposium on Integrative Bioinformatics

Vesna Memišević2, Tijana Milenković2, and Nataša Pržulj1

Motivation

• Genetic sequences – revolutionized understanding of biology• Non-sequence based data of importance, e.g.:

– secondary & tertiary structure of RNA have the dominant role in RNA function (tRNA: Gautheret et al., Comput. Appl. Biosci., 1990)(rRNA: Woese et al., Microbiological Reviews, 1983)

– Secondary structure-based approach – more effective at finding new functional RNAs than sequence-based alignments(Webb et al., Science, 2009)

• What about patterns of interconnections in PPI networks?– Can they complement the knowledge learned from genomic sequence?– Wiring patterns of duplicated proteins in PPI net – insights into evol. dist.?

– Does the information about homologues captured by PPI network topology differ from that captured by their sequence?

Nataša Pržuljnatasha@imperial.ac

Background

• Homologs – descend from a common ancestor:

1. Paralogs: in the same species, evolve through gene duplication events

2. Orthologs: in different species, evolve through speciation events

Background

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]

1 1’

Background

• Triangles in it found

1 1’

Background

1 1’

Background

• Triangles sharing a side merged into the groups of orthologs and paralogs

1 1’

10101010

Background

1 1’

11111111

Background

• No dependence on the absolute level of similarity between compared proteins

1 1’

1212121212

Background

• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Nataša Pržulj

natasha@imperial.ac.uk

1414141414

1 1’

Background

1515151515

1 1’

Background

• Again, no dependence on absolute level of similarity

161616161616

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]

• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

17171717171717

Background

18181818181818

Background

1919191919191919

Background

• Previous network-topology assisted approaches:

• Network-alignment-based (ISORank)• Yosef, Sharan & Noble, Bioinformatics, 2008

(hybrid Rankprop) Rely heavily on sequence information Use only limited amount of network topology

20202020202020

Our Method

• PPI networks are noisy• We analyze the high-confidence part of yeast PPI

network by Collins et al.[3]: 9,074 edges amongst 1,621 proteins

• Focus on proteins with degree > 3 to avoid noisy PPIs• There are 175 orthologous pairs amongst 181

proteins

[3] Collins et al., Molecular and Cellular Proteomics, 6(3):439–450, 2008

Our Method

• Does PPI network topology contain homology information? Are similarly wired proteins homologous?

• Does homology information obtained from network topology differ from that obtained from sequence?

Our Method

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

232323

Our Method

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Induced Of any frequency

24242424

Our Method

Generalize node degree

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

2525252525

Our Method

262626262626

Our Method

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Graphlet Degree (GD) vectors, or “node signatures”

Our Method

Similarity measure between nodes’ Graphlet Degree vectors

292929

Our Method

Signature Similarity Measure

303030

Our Method

Results

• Orthologous pairs often perform the same or similar function.

• Does GD vector similarity (GDS) imply shared biological function?

• Note: most GO annotations were obtained from sequences Similar topology ~ similar sequence ~ similar function

Network Topology

Results

• Orthologous proteins have high GD vector similarities Network Topology

333333

Results

• Orthologous proteins have high GD vector similarities

p-value < 0.05

Network Topology

34343434

Results

• Orthologous proteins have high GD vector similarities

p-value < 0.05

> 20% of orthologous pairs have GDS > 85%

Network Topology

3535353535

Results

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

Network Topology – Robustness

363636363636

Results

373737373737

Results

38383838383838

Results

• Sequence identities for the 175 orthologous pairsSequence

3939393939393939

Results

~70% orth. pairs have seq. identity < 35%

404040404040404040

Results

~20% orth. pairs have seq. identity > 90%

41414141414141414141

Results

“Twilight zone” for homology

20-35%

~70% orth. pairs have seq. identity < 35% No dependence on the absolute similarity COG& KEGG, but triangles in the graph of best matches

20% 35%

~20% of orthologous pairs have signature similarities

above 85% (35 pairs)

~30% of orthologous pairs have sequence identities above 35% (53 pairs)

Overlap: 22 pairs (~60% of the smaller set) Sequence and network topology somewhat complementary slices of homology information

ResultsComparison:

4343434343434343

Results

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 50% signature similarity

Degrees 25 and 5

444444444444444444

Results

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 65% signature similarity

Degrees 54 and 9

Conclusions

• Homology information captured by PPI network topology differs from that captured by sequence

• Complementary sources for identifying homologs

Future work:• Could topological similarity be used to

identify orthologs from best-hits graph analysis as done for sequences?

Acknowledgements

This project was supported by the NSF CAREER

IIS-0644424 grant

Complementarity of network and sequence information in homologous proteins

different genomes sequence

sequencebased alignmentswebb

sequence information

genomic sequence

bmc bioinformatics

nucleic acids

foundkegg orthology

different species

Documents

Self-Assembled Molecules – New Kind of Protein...

The heat shock protein 70 family: Highly homologous proteins...

Phylogenetic analyses of the homologous transmembrane...

Homologous genes

Mechanism of Eukaryotic Homologous Recombination · Key...

2017 Lineage A Betacoronavirus NS2 Proteins and the...

Direct Involvement of Retinoblastoma Family Proteins in DNA....

Modeling Structurally Variable Regions in Homologous...

Identification of Protein Domains. Orthologs and Paralogs...

Complementarity and Preorganization

Research Article Origin and Status of Homologous Proteins...

Homologous proteins have similar structures and structural.....

Shu Proteins Promote the Formation of Homologous...

Complementarity & perceived competence

Structure- and sequence-based function prediction for...

Complementarity of network and sequence information in...