[email protected]http://www.colorado.edu/che/research/faculty http://compbio.uchsc.edu/Hunter Anis Karimpour-Fard ‡ , Ryan T. Gill † , and Lawrence Hunter ‡ ‡ University of Colorado School of Medicine † Department of Chemical and Biological Engineering, University of Colorado, Boulder Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling Dec 1, 2007
Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling. Dec 1, 2007. The problem …… More than 500 Microbial genomes are fully sequence and there is high percent of genes with unknown function. For example: E. coli K12 15% - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Pellegrini et al. PNAS 96, 4285 (1999)Marcotte et al. PNAS 97, 12115 (2000)
1- Select sets of genomes as a reference set
2- Create phylogenetic profile matrix for target organism:
•Do one-against-all BLAST search to identify all homologous target genes in diverse reference organisms.
Does the selection of the reference genomes influence the prediction?
if so? How?
How E-value threshold effects the protein-protein interactions prediction?
Reference selection?
Blast E-value threshold (present or absent)
Measure profile similarities
Reference selection
Protein X: 110001111001001110001111Protein Y: 11100011110000011000111119 matching bits out of 24
3- Measure profile similarities
4- Generate protein-protein interactions
Generate Protein-protein interactions network
5- Create clusters from set of protein-protein interactions
Protein X Protein Y
2 nodes are connected if the 2 proteins have similar profile)
6- Visualize network
Protein X Protein Y
Measure profile similarities
Protein X: 110001111001001110001111Protein Y: 111000111100000110001111
•Mutual information
MI(X, Y) = H(X) + H(Y) - H(X, Y)
H(Y) = -∑p(i) ln p(i)
p(i), (i= 0, 1) as the fraction of genomes in which protein Y in the state i
2 nodes are connected if the 2 proteins have similar profile)
•Pearson correlation coefficient
1
0i
1
0j),(ln),(Y)H(X, jipjip
•Inverse homology
•Calculate the homology between two genomes:
• The ratio of number of homologs of each reference organism j to the number of proteins in the target genome i ( Hi,j) .
•Pij =1/( Hi,j) otherwise Pij =0.Karimpour-Fard et al. BMC Genomics.
2007;8(1):393
c)
Comparison of different combinations of reference genomes and E-value thresholds using COG
• PPV =TP/(TP+FP)
– TP = # predicted pair in the same functional category
– FP= # predicted pair that were classified but were not same functional category
Random sets
AllLow GC
Aerobic
Karimpour-Fard et al. BMC Genomics.
2007;8(1):393
Co-evolution can be used to assign function to unstudied genes
Hypothetical proteins YcgB, YeaH, YeaG are co-conserved across different species. Comparison of sub-graphs across species (CS-CCC) suggested that a previously unstudied S. typhimurium gene, ycgB, is functionally related to yeaH. Experimental data support the hypothesis that both genes are important for antimicrobial peptide resistance.