Top Banner
[email protected] http://www.colorado.edu/che/research/faculty http://compbio.uchsc.edu/Hunter Anis Karimpour-Fard , Ryan T. Gill , and Lawrence Hunter University of Colorado School of Medicine Department of Chemical and Biological Engineering, University of Colorado, Boulder Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling Dec 1, 2007
8

Dec 1, 2007

Jan 01, 2016

Download

Documents

keegan-avila

Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling. Dec 1, 2007. The problem …… More than 500 Microbial genomes are fully sequence and there is high percent of genes with unknown function. For example: E. coli K12 15% - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dec 1, 2007

[email protected]://www.colorado.edu/che/research/faculty/gill/http://compbio.uchsc.edu/Hunter

Anis Karimpour-Fard‡ , Ryan T. Gill†

, and Lawrence Hunter‡

‡ University of Colorado School of Medicine

† Department of Chemical and Biological Engineering, University of Colorado, Boulder

Investigation of factors affecting prediction of protein-protein interaction networks by

phylogenetic profiling

Dec 1, 2007

Page 2: Dec 1, 2007

The meaning of protein function

Eisenberg, D. et. al. Nature 2000

S PA

Biochemical view

The function of protein A is its action on Substrate to form a Product

The function of A is the context of its interactions with other proteins in the cell

Post genomic view

A

B

YZ

MDN

X C

The problem ……

More than 500 Microbial genomes are fully sequence and there is high percent of genes with unknown function.

For example: E. coli K12 15%

P. aeruginosa 45%http://www.genomesonline.org/

Page 3: Dec 1, 2007

• Homology based methods (gives partial understanding about protein role)– Simple sequence similarity searches (BLAST)– Profile searches (PSI-BLAST)– Databases of conserved domains (Pfam, SMART)

• Prediction from genomic context• Phylogenetic profile• Gene cluster• Gene neighbor• Rosetta Stone

• Prediction from high-throughput experimental data– Microarray gene expression data– Protein-protein interaction screens– ...

Prediction protein function

Page 4: Dec 1, 2007

Phylogenetic Profile

Pellegrini et al. PNAS 96, 4285 (1999)Marcotte et al. PNAS 97, 12115 (2000)

1- Select sets of genomes as a reference set

2- Create phylogenetic profile matrix for target organism:

•Do one-against-all BLAST search to identify all homologous target genes in diverse reference organisms.

Does the selection of the reference genomes influence the prediction?

if so? How?

How E-value threshold effects the protein-protein interactions prediction?

Reference selection?

Blast E-value threshold (present or absent)

Measure profile similarities

Reference selection

Page 5: Dec 1, 2007

Protein X: 110001111001001110001111Protein Y: 11100011110000011000111119 matching bits out of 24

3- Measure profile similarities

4- Generate protein-protein interactions

Generate Protein-protein interactions network

5- Create clusters from set of protein-protein interactions

Protein X Protein Y

2 nodes are connected if the 2 proteins have similar profile)

6- Visualize network

Page 6: Dec 1, 2007

Protein X Protein Y

Measure profile similarities

Protein X: 110001111001001110001111Protein Y: 111000111100000110001111

•Mutual information

MI(X, Y) = H(X) + H(Y) - H(X, Y)

H(Y) = -∑p(i) ln p(i)

p(i), (i= 0, 1) as the fraction of genomes in which protein Y in the state i

2 nodes are connected if the 2 proteins have similar profile)

•Pearson correlation coefficient

1

0i

1

0j),(ln),(Y)H(X, jipjip

•Inverse homology

•Calculate the homology between two genomes:

• The ratio of number of homologs of each reference organism j to the number of proteins in the target genome i ( Hi,j) .

•Pij =1/( Hi,j) otherwise Pij =0.Karimpour-Fard et al. BMC Genomics.

2007;8(1):393

Page 7: Dec 1, 2007

c)

Comparison of different combinations of reference genomes and E-value thresholds using COG

• PPV =TP/(TP+FP)

– TP = # predicted pair in the same functional category

– FP= # predicted pair that were classified but were not same functional category

Random sets

AllLow GC

Aerobic

Karimpour-Fard et al. BMC Genomics.

2007;8(1):393

Page 8: Dec 1, 2007

Co-evolution can be used to assign function to unstudied genes

Hypothetical proteins YcgB, YeaH, YeaG are co-conserved across different species. Comparison of sub-graphs across species (CS-CCC) suggested that a previously unstudied S. typhimurium gene, ycgB, is functionally related to yeaH. Experimental data support the hypothesis that both genes are important for antimicrobial peptide resistance.

Edge color code:

• E. coli K12 (green)

•E. coli O157 (blue)

•Shigella flexneri (black)

•S. typhimurium LT2 (purple)

•P. aeruginosa (mustard)

Karimpour-Fard et al. Genome Biology 2007 8:R185