Modelling Genome Structure and Function Ram Samudrala University of Washington
Jan 03, 2016
Modelling Genome Structure and FunctionRam Samudrala
University of Washington
Rationale for understanding protein structure and function
Protein sequence
-large numbers of sequences, including whole genomes
Protein function
- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution
?
structure determination structure prediction
homologyrational mutagenesisbiochemical analysis
model studies
Protein structure
- three dimensional- complicated- mediates function
Comparative modelling of protein structure
KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **
… …
scanalign
refine
physical functions
build initial model
minimum perturbation
construct non-conservedside chains and main chains
graph theory, semfold
de novo simulation
CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity
**T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)
**T128/sodm – 1.0 Å (198 residues; 50%)
**T125/sp18 – 4.4 Å (137 residues; 24%)
**T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%)
Comparative modelling at CASP
CASP2
fair~ 75%~ 1.0 Å~ 3.0 Å
CASP3
fair~75%
~ 1.0 Å~ 2.5 Å
CASP4
fair~75%~ 1.0 Å~ 2.0 Å
CASP1
poor~ 50%~ 3.0 Å> 5.0 Å
BC
excellent~ 80%1.0 Å2.0 Å
alignmentside chainshort loopslonger loops
Ab initio prediction of protein structure
sample conformational space such thatnative-like conformations are found
astronomically large number of conformations5 states/100 residues = 5100 = 1070
select
hard to design functionsthat are not fooled by
non-native conformations(“decoys”)
Semi-exhaustive segment-based foldingEFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK
generatefragments from database14-state , model
… …
minimisemonte carlo with simulated annealingconformational space annealing, GA
… …
filter all-atom pairwise interactions, bad contactscompactness, secondary structure
Ab initio prediction at CASP
CASP1: worse than random
CASP2: worse thanrandom with one
exception
CASP4: consistently predicted correct topology - ~4-6.0 A for 60-80+ residues
CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues
**T110/rbfa – 4.0 Å (80 residues; 1-80) *T114/afp1 – 6.5 Å (45 residues; 36-80)
**T97/er29 – 6.0 Å (80 residues; 18-97)
**T106/sfrp3 – 6.2 Å (70 residues; 6-75)
*T98/sp0a – 6.0 Å (60 residues; 37-105) **T102/as48 – 5.3 Å (70 residues; 1-70)
Before CASP (BC):“solved”
(biased results)
Application of prediction methods to Invb
Computational aspects of structural genomics
D. ab initio prediction
C. fold recognition
*
*
*
*
*
*
*
*
*
*
B. comparative modellingA. sequence space
*
*
*
*
*
*
*
*
*
*
*
*
E. target selection
targets
F. analysis
*
*
(Figure idea by Steve Brenner.)
Computational aspects of functional genomics
structure based methodsmicroenvironment analysis
zinc binding site?
structure comparison
homology function?
sequence based methods
sequence comparisonmotif searches
phylogenetic profilesdomain fusion analyses
+
experimental data+
*
**
*G. assign function
*
*
assign function toentire protein space
Modelling structure and function of the Oryza sativa (rice) genome
Most common functions (from PROSITE)
ATP/GTP-binding site motif A (P loop)Serine/Threonine protein kinase active site
EF-hand (Calcium binding)Cytochrome C Heme binding site
Most common functions (from annotations)
Reverse transcriptaseNucleotide Binding Site (NBS) Serine/Threonine protein kinase
Chitinase
~30 % with known homologs in PDB
6813 coding sequences3149 without a product annotation 816 classified as hypothetical protein1187 with a hypothetical function
47%
12%
17%
24%Annotation?
Protein?
Function?
Assigned
Bioverse webserver
sequence
structure summary
summary
function summary
see another variantopen/close subgrouplist links (or follow)mapping to sequence
http://bioverse.compbio.washington.edu
Bioverse webserver
sequence
structure summary
secondary structure
tertiary structure
summary
sequence
evidence for sequence
evidence for tertiary structure
structural similarity to another protein
structural similarity to another protein
structural similarity to another protein
evidence for similarity
Bioverse webserver
sequence
structure summary
summary
function summary
function 1
function 2evidence for function 2
functional similarity to another protein
functional similarity to another protein
functional similarity to another protein
evidence for similarity
Take home message
Prediction of protein structure and function can be used to model whole genomes to understand
organismal function and evolution
Jason McDermottYi-Ling Chen
Levitt and Moult groups
Acknowledgements