Top Banner
1 Ram Samudrala, University of Washington Protein Structure Prediction
87

Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

Mar 26, 2018

Download

Documents

vankhuong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

1

Ram Samudrala, University of Washington

Protein Structure Prediction

Page 2: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

2

Rationale for Understanding Protein Structure and Function

Protein sequence

-large numbers of sequences, includingwhole genomes

Protein function

- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution

?

structure determination structure prediction

homologyrational mutagenesisbiochemical analysis

model studies

Protein structure

- three dimensional- complicated- mediates function

Page 3: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

3

Protein Folding

…-L-K-E-G-V-S-K-D-…

…-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…

one amino acid

DNA

protein sequence

unfolded protein

native state

spontaneous self-organization (~1 second)

not uniquemobileinactive

expandedirregular

Page 4: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

4

Protein Folding

…-L-K-E-G-V-S-K-D-…

…-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-…

one amino acid

DNA

protein sequence

unfolded protein

native state

spontaneous self-organisation (~1 second)

unique shapeprecisely orderedstable/functionalglobular/compacthelices and sheets

not uniquemobileinactive

expandedirregular

Page 5: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

5

unfolded

Protein Folding Landscape

Large multi-dimensional space of changing conformationsfr

ee e

nerg

y

folding reaction

moltenglobule

J=10-8 s

native

J=10-3 s

ΔG**

RTG

e*

(J) timejumpΔ−

barrierheight

Page 6: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

6

Protein Primary Structuretwenty types of amino acids

R

HC

OH

O

N

H

HCα

two amino acids join by forming a peptide bond

R

HC

O

N

H

H NCα

H

C

O

OH

R

H

R

HC

O

N

H

NCα

H

C

O

R

HR

HC

O

N

H

NCα

H

C

O

R

χ

χ

χ

φφ φφ

ψ

ψ

ψ

ψ

each residue in the amino acid main chain has two degrees of freedom (φ and ψ)

the amino acid side chains can have up to four degrees of freedom (χ1-4)

Page 7: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

7

Protein Secondary Structure

β

α

Lφ 0

0 ψ

+180

+180-180

-180

many φ,ψ combinations are not possible

α helix

β sheet (anti-parallel)

N

C

N

C

β sheet (parallel)

Page 8: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

8

Protein Tertiary and Quaternary Structures

Ribonuclease inhibitor (2bnh) Haemoglobin (1hbh)

Hemagglutinin (1hgd)

Page 9: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

9

Methods for Determining Protein Structure

Protein sequence

-large numbers of sequences, includingwhole genomes

Protein function

- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution

?

X-ray crystallographyNMR spectroscopy

homologyrational mutagenesisbiochemical analysis

model studies

Protein structure

- three dimensional- complicated- mediates function

expensive

and slow

Page 10: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

10

A Naïve Approach

• Use the first principles to produce the native conformation of a protein• not only the correct structure, but entire energy landscape• it would explain dynamic behavior of a protein

Let’s see how this could work…

• there are only 5 atom types (C, H, O, N, S) , so if we can accurately model interactions between them, we could get to the solution of the folding problem

So, why is it then so complicated…

• atomic interactions cannot be modeled with sufficient accuracy (plus proteins are only marginally stable)

• some phenomena are highly non-linear (for example, Van der Waals forces)

• large number in the degrees of freedom + modeling water molecules

ab initio !!!

Page 11: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

11

Predictions Needed NOW!!!

• Pure ab initio approach is out of reach for a long time

• We must adopt a less purist approach

What should we do?

• use approximations

• use all available information• vast number of sequences• large number of structures• functional site information

Page 12: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

12

Methods for Predicting Protein Structure

Protein sequence

-large numbers of sequences, includingwhole genomes

Protein function

- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution

?

comparative modelingfold recognition

ab initio prediction

homologyrational mutagenesisbiochemical analysis

model studies

Protein structure

- three dimensional- complicated- mediates function

Page 13: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

13

Protein Sequence

Database Searching Domain AssignmentMultiple SequenceAlignment

Homologuein PDB

ComparativeModelling

SecondaryStructure

and Disorder

Prediction

No

Yes

3-D Protein Model

FoldRecognition

PredictedFold

Sequence-StructureAlignment

Ab-initioStructurePrediction

No

Yes

Overall Approach

modified from http://bioinf.cs.ucl.ac.uk

Page 14: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

14

Comparative (Homology) Modeling of Protein Structure

• Aims to produce protein models with high accuracy

• Proteins that have similar sequences (i.e., related by evolution) have similar three-dimensional structures

• A model of a protein whose structure is not known can be constructed if the structure of a related protein has been determined by experimental methods

• Similarity must be obvious and significant for good models to be built

• Need ways to build regions that are not similar between the two related proteins

• Need ways to move model closer to the native structure

Page 15: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

15

Comparative Modeling of Protein Structure

KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **

… …

scanalign

build initial modelconstruct non-conserved

side chains and main chains

refine

Page 16: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

16

Let’s Look Closer at Steps of Homology Modeling

1. Template recognition and initial alignment

2. Alignment correction

3. Backbone generation

4. Loop modeling

5. Side-chain modeling

6. Model optimization

7. Model validation

Page 17: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

17

Let’s Look Closer at Steps of Homology Modeling

1. Template recognition and initial alignment

2. Alignment correction

3. Backbone generation

4. Loop modeling

5. Side-chain modeling

6. Model optimization

7. Model validation

Page 18: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

18

Let’s Look Closer at Steps of Homology Modeling

1. Template recognition and initial alignment

2. Alignment correction

3. Backbone generation

4. Loop modeling

5. Side-chain modeling

6. Model optimization

7. Model validation

Page 19: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

19

Recognition of similarity between the target and template

Target – protein with unknown structure.

Template – protein with known structure.

Main difficulty – deciding which template to pick, multiple choices/template structures.

Template structure can be found by searching for structures in PDB using sequence-sequence alignment methods.

1. Template Recognition

Page 20: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

20

Two Zones of Sequence Alignment

50 100 150 200

50

100

Safe homology modeling zone

Twilight zone

Alignment length

Sequence identity

Page 21: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

21

1. If alignment between target and template is ready, copy the backbone coordinates of those template residues that are aligned.

2. If two aligned residues are the same, copy their side chain coordinates as well.

3. Backbone Generation

Page 22: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

22

insertion

AHYATPTTTAH---TPSS

deletion

Occur mostly between secondary structures, in the loop regions. Loop conformations – difficult to predict.

Approaches to loop modeling:- knowledge-based: searches the PDB for loops with known structure- energy-based: an energy function is used to evaluate the quality of a loop.

Energy minimization or Monte Carlo.

4. Loop Modeling

Page 23: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

23

Scan database and search protein fragments with correct number of residuesand correct end-to-end distances

4. Loop Modeling – Database Approach

Page 24: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

24

Side chain conformations – rotamers. In similar proteins - side chains have similar conformations.

If % identity is high - side chain conformations can be copied from template to target. If % identity is not very high - modeling of side chains using libraries of rotamers and different rotamers are scored with energy functions.

Problem: side chain configurations depend on backbone conformation which is predicted, not real

E1

E2

E3 E = min (E1, E2, E3)

5. Side-Chain Modeling

Page 25: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

25

• Energy optimization of entire structure.

• Since conformation of backbone depends on conformations of side chains and vice versa - iterative approach

Predict rotamers Shift in backbone

6. Model Optimization

Page 26: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

26

CASP5 assessors, homology modeling category:

“We are forced to draw the disappointing conclusion that, similarlyto what observed in previous editions of the experiment, no modelresulted to be closer to the target structure than the template toany significant extent.”

The consensus is not to refine the model, as refinement usually pulls themodel away from the native structure!!

6. Model Optimization???

Page 27: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

27

Historical Perspective on Comparative Modeling

BC

excellent~ 80%1.0 Å2.0 Å

alignmentside chainshort loopslonger loops

Page 28: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

28

Historical Perspective on Comparative Modeling

CASP1

poor~ 50%~ 3.0 Å> 5.0 Å

BC

excellent~ 80%1.0 Å2.0 Å

alignmentside chainshort loopslonger loops

Page 29: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

29

Prediction for CASP4 target T128/sodm

Cα RMSD of 1.0 Å for 198 residues (PID 50%)

Page 30: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

30

Prediction for CASP4 target T122/trpa

Cα RMSD of 2.9 Å for 241 residues (PID 33%)

Page 31: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

31

Prediction for CASP4 target T125/sp18

Cα RMSD of 4.4 Å for 137 residues (PID 24%)

Page 32: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

32

Prediction for CASP4 target T112/dhso

Cα RMSD of 4.9 Å for 348 residues (PID 24%)

Page 33: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

33

Prediction for CASP4 target T92/yeco

Cα RMSD of 5.6 Å for 104 residues (PID 12%)

Page 34: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

34

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity

**T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)

**T128/sodm – 1.0 Å (198 residues; 50%)

**T125/sp18 – 4.4 Å (137 residues; 24%)

**T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%)

Comparative Modeling at CASP - conclusions

CASP2

fair~ 75%~ 1.0 Å~ 3.0 Å

CASP3

fair~75%

~ 1.0 Å~ 2.5 Å

CASP4

fair~75%~ 1.0 Å~ 2.0 Å

CASP1

poor~ 50%~ 3.0 Å> 5.0 Å

BC

excellent~ 80%1.0 Å2.0 Å

alignmentside chainshort loopslonger loops

Page 35: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

35

• Aim to solve the structure of all proteins: this is too much work experimentally!

• Solve enough structures so that the remaining structures can be inferred from those experimental structures

• The number of experimental structures needed depend on our abilities to generate a model.

Structural Genomics Project

Page 36: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

36

Proteinswithknownstructures

Unknown proteins

Structural Genomics Project

Page 37: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

37

• Goal: to find protein with known structure which best matches a givensequence

• Since similarity between target and the closest to it template is not high, sequence-sequence alignment methods fail

• Solution: threading – sequence-structure alignment method

Fold Recognition

Page 38: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

38

Fold Recognition

• The number of possible protein structures/folds is limited (large number of sequencesbut few folds)

• Proteins that do not have similar sequences sometimes have similar three-dimensional structures

• A sequence whose structure is not known is fitted directly (or “threaded”) onto a known structure and the “goodness of fit” is evaluated using a discriminatoryfunction

• Need ways to move model closer to the native structure

3.6 Å5% ID

NK-lysin (1nkl) Bacteriocin T102/as48 (1e68)

Page 39: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

39

Fold Recognition

KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **

… …

evaluatefit

build initial modelconstruct non-conserved

side chains and main chains

refine

Page 40: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

40

• Step 1: Construction of Template Library • Step 2: Design of Scoring Function• Step 3: Sequence-Structure Alignment• Step 4: Template Selection and Model Construction

Only step 1 is relatively easy!

Steps in Threading

Page 41: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

41

Target Sequence

α & β structure from template structureTemplate

Steps in Threading

Page 42: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

42

• Sequence-structure alignment– target sequence is compared to all structural templates from the database

Requires:• Alignment method

– dynamic programming, Monte Carlo,…

• Scoring function– yields relative score for each alternative

alignment

Threading – Method for Structure Prediction

Page 43: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

43

A representative set of protein structures extracted from the PDB database. It satisfies the following conditions:

1. The resolution of each representative structure should be good;2. A good X-ray structure has higher priority than an NMR structure;3. The sequence identity between any two representatives should be no

more than 30%, in order to save computing time.

Examples:

• CATH: http://www.biochem.ucl.ac.uk/bsm/cath/

• SCOP: http://scop.mrc-lmb.cam.ac.uk/scop/

• PDB_SELECT: http://www.cmbi.kun.nl/gv/pdbsel/

Template Database

Page 44: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

44

• Contact-based scoring function depends on the amino acid types of two residues and distance between them.

• Sequence-sequence alignment scoring function does not depend on the distance between two residues.

• If distance between two non-adjacent residues in the template is less than 8Å, these residues make a contact.

Scoring Function for Threading

Page 45: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

45

),(),(

;),(1,

TrpIlewTyrAlawS

aawSN

jiji

+=

= ∑=

Ala

Ile Tyr

Trp

w - calculated from the frequency of amino acid contacts in PDB

ai - amino acid type of target sequence aligned with the position i of the template

N - number of contacts

Scoring Function for Threading

Page 46: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

46

Class work: calculate the score for target sequence “ATPIIGGLPY” aligned to the template structure which is defined by the contact matrix.

**10

9

*8

*7

*6

**5

*4

*3

2

***1

10987654321

0.3L

0.20.4G

0.40.20.3I

-0.2-0.1-0.2-0.4Y

-0.20.1-0.1-0.4-0.2P

00.1-0.3-0.2-0.10.3T

0.2-0.20.5-0.10-0.1-0.2A

LGIYPTA

∑=

=N

jiji aawS

1,),(

Page 47: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

47

• Dynamic programming.“frozen approximation”: traceback in the alignment matrix is not possible for interactions between two amino acids, so that:

),(1,

∑=

=N

jiji bawS

b – amino acid type from template, not from target; now the score of every position does not depend on the alignment elsewhere in thesequence.

• Monte Carlo

Alignment Algorithms

Page 48: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

48

• Approximation Algorithm– Interaction-Frozen Algorithm (A. Godzik et al.)– Monte Carlo Sampling (S.H. Bryant et al.)– Double dynamic programming (D. Jones et al.)

• Exact Algorithm– Branch-and-bound (R.H. Lathrop and T.F. Smith)– PROSPECT-I uses Divide-and-conquer (Y. Xu et al.)– Linear programming by RAPTOR (J. Xu et al.)

Pairwise Threading Algorithms

Page 49: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

49

• Sequence-sequence alignment• Sequence-profile alignment• Sequence-HMM model alignment

– e.g. SAMT02 (K. Karplus et al.)• Profile-sequence alignment

– e.g. PDB-Blast (A. Godzik et al.)• Profile-profile alignment

– e.g. PROSPECT-II (Y. Xu et al.)• Combinations of several alignments

– e.g. 3DPS (L.A. Kelley et al), SHGU (D. Fischer)

Non-Pairwise Threading Algorithms

Page 50: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

50

• Correct bond length and bond angles

• Correct placement of functionally important sites

• Prediction of global topology, not partial alignment (minimum number of gaps)

>> 3.8 Angstroms

Threading Model Validation

Page 51: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

51

Placement of functionally important sites in threading.

Prediction of structure of methylglyoxal synthase based on the template of carabamoyl phosphate synthase

Page 52: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

52

GenThreader

1. Predicts secondary structures for target sequence

2. Makes sequence profiles (PSSMs) for each template sequence

3. Uses threading scoring function to find the best matching profile

http://bioinf.cs.ucl.ac.uk/psipred

Page 53: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

53

• Threading models are generally not suitable for things like drug design

• Function prediction is only possible if the fold family is only associated with a single function

Threading - Conclusions

Page 54: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

54

Protein Sequence

Database Searching Domain AssignmentMultiple SequenceAlignment

Homologuein PDB

ComparativeModelling

SecondaryStructurePrediction

DisorderPrediction

No

Yes

3-D Protein Model

FoldRecognition

PredictedFold

Sequence-StructureAlignment

Ab-initioStructurePrediction

No

Yes

Overall Approach

http://bioinf.cs.ucl.ac.uk

Page 55: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

55

Ab Initio Methods

Page 56: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

56

What is an atom?

• Classical mechanics: a solid object

• Defined by its position (x, y, z), its shape (usually a ball) and its mass

• May carry an electric charge (positive or negative), usually partial (less than an electron)

Page 57: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

57

Atomic interactions

Torsion anglesAre 4-body

AnglesAre 3-body

BondsAre 2-body

Non-bondedpair

Page 58: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

58

Forces between atoms

Strong bonded interactions

20 )( bbKU −=

20 )( θθ −= KU

))cos(1( φnKU −=

b

θ

φ

All chemical bonds

Angle between chemical bonds

Preferred conformations forTorsion angles:

- ω angle of the main chain- χ angles of the sidechains

(aromatic, …)

Page 59: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

59

Forces between atoms: van der Waals interactions

⎟⎟

⎜⎜

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

⎛=

612

2)(r

Rr

RrE ijij

ijLJ ε

1/r12

1/r6

Rij

r

Lennard-Jones potential

jiijji

ij

RRR εεε =

+= ;

2

Page 60: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

60

Forces between atoms: Electrostatics interactions

r

Coulomb potential

qi qj

rqq

rE ji

επε041)( =

Page 61: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

61

Some Common force fields in Computational Biology

ENCAD (Michael Levitt, Stanford)

AMBER (Peter Kollman, UCSF; David Case, Scripps)

CHARMM (Martin Karplus, Harvard)

OPLS (Bill Jorgensen, Yale)

MM2/MM3/MM4 (Norman Allinger, U. Georgia)

ECEPP (Harold Scheraga, Cornell)

GROMOS (Van Gunsteren, ETH, Zurich)

Michael Levitt. The birth of computational structural biology. Nature Structural Biology, 8, 392-393 (2001)

Page 62: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

62

Protein Structure Prediction

• One popular model for protein folding assumes a sequence of events:

– Hydrophobic collapse

– Local interactions stabilize secondary structures

– Secondary structures interact to form motifs

– Motifs aggregate to form tertiary structure

Page 63: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

63

Protein Structure Prediction

A physics-based approach:

- find conformation of protein corresponding to a thermodynamics minimum (free energy minimum)

- cannot minimize internal energy alone! Needs to include solvent

- simulate folding…a very long process!

Folding time are in the ms to second time rangeFolding simulations at best run 1 ns in one day…

Page 64: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

64

What is a molecular dynamics simulation?

• Simulation that shows how the atoms in the system move with time

• Typically on the nanosecond timescale

• Atoms are treated like hard balls, and their motions are described by Newton’s laws.

Page 65: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

65

Why MD simulations?

• Link physics, chemistry and biology

• Model phenomena that cannot be observed experimentally

• Understand protein folding…

• Access to thermodynamics quantities (free energies, binding energies,…)

Page 66: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

66

Characteristic protein motions

> 5 Å20 ns

(20 ps)ms – hrs

Globalprotein tumbling(water tumbling)protein folding

1-5 Åns – μs

Medium scaleloop motions

SSE formation

< 1 Å0.01 ps0.1 ps1 ps

Local:bond stretchingangle bendingmethyl rotation

AmplitudeTimescaleType of motion

Periodic (harmonic)

Random (stochastic)

Page 67: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

67

The Ergodic Hypothesis

• Time averages = Ensemble Averages

timeensembleAA =

Page 68: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

68

The Folding @ Home initiative(Vijay Pande, Stanford University)

http://folding.stanford.edu/

Page 69: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

69

The Folding @ Home initiative

Page 70: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

70

Folding @ Home: Results

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000experimental measurement

(nanoseconds)

Pre

dic

ted

fo

ldin

g t

ime

(nan

ose

con

ds)

PPA

alpha helix

betahairpin

villinExperiments:

villin: Raleigh, et al, SUNY, Stony Brook

BBAW:Gruebele, et al, UIUC

beta hairpin: Eaton, et al, NIH

alpha helix: Eaton, et al, NIH

PPA: Gruebele, et al, UIUC

BBAW

http://pande.stanford.edu/

Page 71: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

71

Protein Structure Prediction

DECOYS:Generate a large numberof possible shapes

DISCRIMINATION:Select the correct, native-like fold

Need good decoy structures Need a good energy function

Page 72: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

72

The CASP experiment

• CASP= Critical Assessment of Structure Prediction

• Started in 1994, based on an idea from John Moult(Moult, Pederson, Judson, Fidelis, Proteins, 23:2-5 (1995))

• First run in 1994; now runs regularly every second year (CASP6 was held last december)

Page 73: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

73

The CASP experiment: how it works

1) Sequences of target proteins are made available to CASP participantsin June-July of a CASP year

- the structure of the target protein is know, but not yet releasedin the PDB, or even accessible

2) CASP participants have between 2 weeks and 2 months over thesummer of a CASP year to generate up to 5 models for each of thetarget they are interested in.

3) Model structures are assessed against experimental structure

4) CASP participants meet in December to discuss results

Page 74: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

74

CASP Statistics

2896516687CASP6

2290917567CASP5

515011143CASP4

12566143CASP3

9477242CASP2

1003533CASP1

# of 3D models

# of predictors

# of TargetsExperiment

Page 75: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

75

CASP

Three categories at CASP

- Homology (or comparative) modeling

- Fold recognition

- Ab initio prediction

CASP dynamics:

- Real deadlines; pressure: positive, or negative?

- Competition?

- Influence on science ?

Venclovas, Zemla, Fidelis, Moult. Assessment of progress over the CASP experiments. Proteins, 53:585-595 (2003)

Page 76: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

76

Ab initio prediction of protein structure – concept • Go from sequence to structure by sampling the conformational space in a reasonable

manner and select a native-like conformation using a good discrimination function

• Problems: conformational space is astronomical, and it is hard to design functions thatare not fooled by non-native conformations (or “decoys”)

Page 77: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

77

Ab initio prediction of protein structuresample conformational space such that

native-like conformations are found

astronomically large number of conformations5 states/100 residues = 5100 = 1070

select

hard to design functionsthat are not fooled by

non-native conformations(“decoys”)

Page 78: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

78

Sampling conformational space – continuous approaches• Most work in the field

- Molecular dynamics- Continuous energy minimisation (follow a valley)- Monte Carlo simulation- Genetic Algorithms

• Like real polypeptide folding process

• Cannot be sure if native-like conformations are sampled

energy

Page 79: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

79

Molecular dynamics

• Force = -dU/dx (slope of potential U); acceleration, m a(t) = force

• All atoms are moving so forces between atoms are complicated functions of time

• Analytical solution for x(t) and v(t) is impossible; numerical solution is trivial

• Atoms move for very short times of 10-15 seconds or 0.001 picoseconds (ps)

x(t+Δt) = x(t) + v(t)Δt + [4a(t) – a(t-Δt)] Δt2/6

v(t+Δt) = v(t) + [2a(t+Δt)+5a(t)-a(t-Δt)] Δt/6

Ukinetic = ½ Σ mivi(t)2 = ½ n KBT

• Total energy (Upotential + Ukinetic) must not change with time

new position

old position

new velocity

old velocity

acceleration

acceleration

old velocity

n is number of coordinates (not atoms)

Page 80: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

80

Energy minimisation• For a given protein, the energy depends on thousands of x,y,z Cartesian atomic

coordinates; reaching a deep minimum is not trivial

• With convergence, we have an accurate equilibrium conformation and a well-definedenergy value

energy

number of steps deep minimum

starting conformation

steepest descent

conjugate gradient

energy

number of steps

give up

converge

RMSD

Page 81: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

81

Monte Carlo simulation• Discrete moves in torsion or cartesian conformational space

• Evaluate energy after every move and compare to previous energy (ΔE)

• Accept conformation based on Boltzmann probability:

• Many variations, including simulated annealing (starting with a high temperature somore moves are accepted initially and then cooling)

• If run for infinite time, simulation will produce a Boltzmman distribution

⎟⎠⎞

⎜⎝⎛ −

∝kTΔEexpP

Page 82: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

82

Genetic Algorithms• Generate an initial pool of conformations

• Perform crossover and mutation operations on this set to generate a much larger pool ofconformations

• Select a subset of the fittest conformations from this large pool

• Repeat above two steps until convergence

Page 83: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

83

Sampling conformational space – exhaustive approachesenumerate all possible conformations

view entire space (perfect partition function)

computationally intractable:5 states/100 residues = 5100 = 1070 possible conformations

select

must use discrete statemodels to minimise

number of conformationsexplored

Page 84: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

84

Scoring/energy functions• Need a way to select native-like conformations from non-native ones

• Physics-based functions: electrostatics, van der Waals, solvation, bond/angle terms

• Knowledge-based scoring functions: derive information about atomic properties from adatabase of experimentally determined conformations; common parametres includepairwise atomic distances and amino acid burial/exposure.

Page 85: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

85

Requirements for sampling methods and scoring functions• Sampling methods must produce good decoy sets that are comprehensive and includeseveral native-like structures

• Scoring function scores must correlate well with RMSD of conformations (the betterthe score/energy, the lower the RMSD)

Page 86: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

86

Protein StructurePrimary (Sequence)

Secondary (Helix/Strand/Coil)and lack of structure (disorder)

Quaternary (Complexes)Domain and Tertiary (Fold)

IVGGYTCAANSIPYQVSLNSGSHFCGGSLINSQWVVSAAHCYKSRIQVRLGEHNIDVLEGNEQFINAAKIITHPNFNGNTL...

http://bioinf.cs.ucl.ac.uk

Page 87: Protein Structure Prediction - Indiana University …predrag/classes/2008springi619/week14.pdfProtein structure - three dimensional ... Main difficulty – deciding which template

87

Computational Aspects of Structural Genomics

D. ab initio prediction

C. fold recognition

*

*

*

*

*

*

*

*

*

*

B. comparative modelingA. sequence space

*

*

*

*

*

*

*

*

*

*

*

*

E. target selection

targets

F. analysis

**

(Figure idea by Steve Brenner.)