Burkhard Rost (Columbia New York) Some gory details of protein Some gory details of protein secondary structure prediction secondary structure prediction Burkhard Rost CUBIC Columbia University [email protected]http://www.columbia.edu/~rost http:// cubic.bioc.columbia.edu/
59
Embed
Some gory details of protein secondary structure prediction
Some gory details of protein secondary structure prediction. Burkhard Rost CUBIC Columbia University [email protected] http://www.columbia.edu/~rost http://cubic.bioc.columbia.edu/. HoMo. 1D ….the art of being humble. FoRc. Goal of secondary structure prediction. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Burkhard Rost (Columbia New York)
Some gory details of protein Some gory details of protein secondary structure predictionsecondary structure prediction
Some gory details of protein Some gory details of protein secondary structure predictionsecondary structure prediction
Goal of secondary structure predictionGoal of secondary structure predictionGoal of secondary structure predictionGoal of secondary structure prediction
LEDKSPDHNPTGID
AKGKPMDRNFTGRNHPPKDSS
AAQVKDALTK
LEQWGTLAQL
RAIWEQELTDFPEFLTMMARQETWLGWLTI
helix strand
loop
LAVIGVLMKW
FVFLMIE
KIYHKLT
DIRVGLTYYIAQ
VNTFVGTFAAVAHAL
Secondary structure predictionsSecondary structure predictions of 1. and 2. generation of 1. and 2. generation
Secondary structure predictionsSecondary structure predictions of 1. and 2. generation of 1. and 2. generation
• single residues (1. generation)
– Chou-Fasman, GOR 1957-70/8050-55% accuracy
• segments (2. generation)
– GORIII 1986-9255-60% accuracy
• problems
– < 100% they said: 65% max
– < 40% they said: strand non-local
– short segments
Burkhard Rost (Columbia New York)
Helix formation is localHelix formation is localHelix formation is localHelix formation is local
residuesi
andi+3
THYROID hormone receptor (2nll)
Burkhard Rost (Columbia New York)
-sheet formation is NOT local-sheet formation is NOT local-sheet formation is NOT local-sheet formation is NOT local
Erabutoxin (3ebx)
Burkhard Rost (Columbia New York)
SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLDOBS EEEE E E E EEEEEE EEEEEE EEEEEEHHHEEEE
TYP EHHHH EE EEEE EE HHHEE EEEHH
Problems of secondary structure predictionsProblems of secondary structure predictions(before 1994)(before 1994)
Problems of secondary structure predictionsProblems of secondary structure predictions(before 1994)(before 1994)
Simple neural network with hidden layerSimple neural network with hidden layerSimple neural network with hidden layerSimple neural network with hidden layer
Burkhard Rost (Columbia New York)
ACDEFGHIKLMNPQRSTVWY.
H
E
L
D (L)
R (E)
Q (E)
G (E)
F (E)
V (E)
P (E)
A (H)
A (H)
Y (H)
V (E)
K (E)
K (E)
Neural Network for secondary structureNeural Network for secondary structureNeural Network for secondary structureNeural Network for secondary structure
Burkhard Rost (Columbia New York)
Secondary structure predictionsSecondary structure predictions of 1. and 2. generation of 1. and 2. generation
Secondary structure predictionsSecondary structure predictions of 1. and 2. generation of 1. and 2. generation
• single residues (1. generation)– Chou-Fasman, GOR 1957-70/80
50-55% accuracy
• segments (2. generation)– GORIII 1986-92
55-60% accuracy
• problems– < 100% they said: 65% max
– < 40% they said: strand non-local
– short segments
Burkhard Rost (Columbia New York)
h e l i x s t r a n d o t h e ro v e r a l l
a c c u r a c ym e t h o d
u n b a l a n c e d 6 2 %
Burkhard Rost (Columbia New York)
h e l i x s t r a n d o t h e ro v e r a l l
a c c u r a c ym e t h o d
u n b a l a n c e d 6 2 %
c o m p a r i s o n :
d a t a b a n k
d i s t r i b u t i o n
Burkhard Rost (Columbia New York)
h e l i x s t r a n d o t h e ro v e r a l l
a c c u r a c ym e t h o d
u n b a l a n c e d 6 2 %
c o m p a r i s o n :
d a t a b a n k
d i s t r i b u t i o n
c o m p a r i s o n :
3 3 : 3 3 : 3 3
Burkhard Rost (Columbia New York)
E = oiμ −di
μ( )i∑
μ=α,,L∑
2
Eμ = oiμ −di
μ( )i∑ 2
ΔJ μ ∝ - ∂Eμ {J}∂J
normal training
balanced training
Balanced trainingBalanced trainingBalanced trainingBalanced training
...8....,....9....,....10...,....11...,....12...,....13...,....14...,....15...,....16.1evwA HTVPFLLEPDNINGKTCTASHLCHNTRCHNPLHLCWESADDNKGRNWCPGPNGGCVHAVVCLRQGPLYGPGATVAGPQQRGSHFVVDSSP HHH EE EEEEEEE E HHHEEEEEHHHHHHHHH EJPred2 EEEEE EEEEE EEE EEEEEEEE EEEPHD EEEEEE EEEEE EEEEEEE EEEEEEEEE EEE EEEEEPHDpsi EEEE EEEEEE EEEEE EEEEEEEEEE EEE EEEEEPROFsec EEE EEEEEE EEEEEE EEEEEEEEE EE EEEProf_king EEEEEEE EEEEEEE EEEEE EEEEEEE EE EEEEPSIPRED EEE EE HHH HHHHHHHH HHHHHHHHH HHHSAM T99sec EEEEEE E EEEEEEE E EESSpro HHE H EEEE EEEEEEE EE EE
1evw:A
Burkhard Rost (Columbia New York)
Stronger predictions more accurate!Stronger predictions more accurate!Stronger predictions more accurate!Stronger predictions more accurate!
.
0
20
40
60
80
100
0
20
40
60
80
100
3 4 5 6 7 8 9
Q per protein3fit: Q
3fit = 21 + 8.7 * Q
3
Reliability index averaged over protein
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60 70 80 90 100
Number of protein chains
Per-residue accuracy (Q3)
<Q3>=72.3% ; sigma=10.5%
1spf 1bct1stu
3ifm1psm
Burkhard Rost (Columbia New York)
Correct prediction of correctly predicted residuesCorrect prediction of correctly predicted residuesCorrect prediction of correctly predicted residuesCorrect prediction of correctly predicted residues
.
7 0
7 5
8 0
8 5
9 0
9 5
100
0 20 4 0 60 8 0 1 00
P H D sec
P H D acc
P H D h tm
70
75
80
85
90
95
10 0R I=9
R I=0R I=9
R I=0
R I=9
R I=4
7
percen tag e o f resd id ues p red ic ted
Burkhard Rost (Columbia New York)
BAD errors are frequent!BAD errors are frequent!BAD errors are frequent!BAD errors are frequent!
0
50
100
150
200
250
300
350
0 10 20 30 40
BAD error (H for E, or E for H)
<BAD>=4.0% ; sigma=5.9%
0
5
10
15
20
0 20 40 60 80 100Cumulative percentage of protein chains
Burkhard Rost (Columbia New York)
False prediction for engineered proteins!False prediction for engineered proteins!False prediction for engineered proteins!False prediction for engineered proteins!
G B 1 : I g G - b i n d i n g d o m a i n o f p r o t e i n G ( C H A M E L E O N )
K i m & B e r g , N a t u r e , 3 6 6 , 2 6 7 - 2 7 0 , 1 9 9 3
• database growth +3• PSI-BLAST +0.5• new training +1• ‘clever method’ +1
• limit?• max 88% -> 12% to go• 1/5 of proteins with more than 100 proteins
-> >80%• and from there?
Burkhard Rost (Columbia New York)
Prediction of protein secondary structurePrediction of protein secondary structurePrediction of protein secondary structurePrediction of protein secondary structure
• 1980: 55% simple• 1990: 60% less simple• 1993: 70% evolution• 2000: 76% more evolution• what is the limit?
• 88% for proteins of similar structure
• 80% for 1/5th of proteins with families > 100
• missing through: better definition of secondary structureincluding long-range interactions
0 20 40 60 80 100Cumulative percentage of proteins
Burkhard Rost (Columbia New York)
Reliability correlates with accuracy!Reliability correlates with accuracy!Reliability correlates with accuracy!Reliability correlates with accuracy!
70
75
80
85
90
95
100
70
75
80
85
90
95
100
0 20 40 60 80 100
JPred2PHDPROFPSIPRED
0 20 40 60 80 100
Percentage of residues predicted
Burkhard Rost (Columbia New York)
ConclusionConclusionConclusionConclusion
• big gain through using evolutionary information• are we going to reach above 80%? How high?• continuous secondary structure• better methods• other features• use secondary structure: ASP
Young M, Kirshenbaum K, Dill KA, Highsmith S: Predicting conformational switches in proteins. Protein Sci 1999, 8:1752-1764.
Burkhard Rost (Columbia New York)
Availability of methodsAvailability of methodsAvailability of methodsAvailability of methods