GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling)
GBCB 5874: Problem Solving in GBCB
Homology Modeling(Comparative Structure Modeling)
GBCB 5874: Problem Solving in GBCB
Aims of Structural Genomics
• High-throughput 3D structure determination andanalysis
• To determine or predict the 3D structures of all theproteins encoded in the genome
• Up to 40% of the known protein sequences have atleast one segment related to one or morestructures
=> Determine all of the folds=> Use homology modeling to predict 3D structures
GBCB 5874: Problem Solving in GBCB
Growth in the PDB
GBCB 5874: Problem Solving in GBCB
What is Homology?
• Homology: having a common evolutionaryorigin
• Cannot be partial• Assertion of homology is an hypothesis• Hypothesis usually based on extent of
sequence similarity between proteins,though similar functions should bedemonstrated
GBCB 5874: Problem Solving in GBCB
Some Definitions
• Homologues (homologs): proteins thatare evolutionarily related
• Orthologues (orthologs): homologuesfrom different organisms
• Paralogues (paralogs): homologuesfrom the same organism
GBCB 5874: Problem Solving in GBCB
Basis of Homology Modeling
• 3D structures conserved to greaterextent than primary structures
• Develop models of protein structurebased on structures of homologues
• Using known structure as a “template”,calculate 3D model of a protein forwhich only know the sequence (the“target”)
GBCB 5874: Problem Solving in GBCB
Steps in Homology Modeling
GBCB 5874: Problem Solving in GBCB
Template Selection
• Identify protein structures related to targetand select those to be used as templates
• Involves searching a database such as atNCBI (e.g., BLAST at NCBI)
• Involves a certain amount of sequencealignment
GBCB 5874: Problem Solving in GBCB
Aligning Sequences
• Critical step in homology modeling• Many options to consider• Factors to consider
– Which algorithm to use– Which scoring method to apply– Whether and how to assign gap penalties
GBCB 5874: Problem Solving in GBCB
Scoring Alignments• Need some method of scoring to find optimal
alignment• Four general types of scoring have been applied
– Identity: considers only identical residues– Genetic code: considers the number of base changes in
DNA or RNA to interconvert codons for the amino acids– Chemical similarity: considers physico-chemical properties– Observed substitutions: considers substitution frequencies
observed in alignments of sequences (*used the most*)
GBCB 5874: Problem Solving in GBCB
Scoring Matrices• PAM40 - short highly similar sequences• PAM160 - detecting members of protein family• PAM250 - longer more divergent sequences• BLOSUM90 - short highly similar sequences• BLOSUM80 - detecting members of protein family• BLOSUM62 - most effective in finding all potential
similarities• BLOSUM30 - longer more divergent sequences
GBCB 5874: Problem Solving in GBCB
Log-Odds Matrix
Si,j = log[qi,j)/(pipj)]
qi,j = frequency of substitutionpipj= probability of occurrence of
residues i and j in proteins
GBCB 5874: Problem Solving in GBCB
Building the 3D Model
• Rigid body assembly– Rigid bodies from aligned sequences– Core region, loops, and side chains
• Satisfaction of spatial restraints– Generate restraints from templates– Assume distances and angles between aligned template
and target are similar– Minimize violations of all restraints using distance
geometry or optimization techniques (i.e., force field) tosatisfy spatial restraints
GBCB 5874: Problem Solving in GBCB
Evaluation of Model Quality
• Check for proper protein stereochemistry– ProCheck (http://biotech.ebi.ac.uk:8400/cgi-bin/sendquery)
• Ramachandran plot, bond-length, …– Whatif (http://www.cmbi.kun.nl/gv/servers/WIWWWI)
• Packing quality– Both web-servers
• Fitness of sequence to structure– ProsaII (http://lore.came.sbg.ac.at/Services/prosa.html)
• Program runs on Linux and Unix– Verify3D (http://www.doe-mbi.ucla.edu/Services/Verify_3D/)
• Web-server
GBCB 5874: Problem Solving in GBCB
Evaluating the 3D ModelProcheck
• Ramachandran plot• Planar peptide bonds• Side chain
conformations thatcorrespond to thosein rotamer library
• Hydrogen bonding• No bad atom-atom
contacts
GBCB 5874: Problem Solving in GBCB
Evaluating the 3D Model3D-Profiler (Verify 3D)
• Based on statistical preferences of each of the 20amino acids for particular environments within aprotein
• Residue positions characterized by environment• Preferred environments defined by three
parameters– Area of each residue that is buried– Fraction of side-chain area that is covered by
polar atoms (i.e., O and N)– Local secondary structure
GBCB 5874: Problem Solving in GBCB
Refining the 3D Model
• MD and energy minimization• Application of restraints based on
experimental data (e.g., NMR,fluorescence)
GBCB 5874: Problem Solving in GBCB
Applications of the Model