Top Banner
THEORETICAL METHODS TO STUDY PROTEIN FOLDING: EMPIRICAL FORCE FIELDS
50
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

THEORETICAL METHODS TO STUDY PROTEIN FOLDING: EMPIRICAL FORCE FIELDS

Page 2: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Individual components

System level(Networks)

Avera

gin

g o

ver individ

ual co

mp

one

nts

PDEs to describe reaction/diffusion

Network graphs

Fully-detailed

Atomistically-detailed

Coarse-grained

QM

QM/MM

All-atom

United-atom

Residue level

Molecule/domain level

Avera

gin

g o

ver „less im

porta

nt” deg

rees of

freed

omDescription

level

Page 3: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.
Page 4: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Anfinsen’s thermodynamic hypothesis.“The studies on the renaturation of fully denaturated ribonuclease required many supporting investigations to establish, finally, the generality which we have occasionally called the ‘thermodynamic hypothesis’. This hypothesis states that the three-dimensional structure of a native protein in its normal physiological milieu (solvent, pH, ionic strength, presence of other components such as metal ions or prosthetic group, temperature and other) is the one in which the Gibbs free energy of the whole system is lowest; that is, the native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence in a given environment.”

C.B. Anfinsen, Science, 181, 223-230, 1973.

To facilitate the implementation of this hypothesis in protein-structure prediction, “free energy” was replaced with “potential energy”.

Page 5: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Nature (and a canonical simulation) finds the basin with the lowest free energy, at a given temperature which might happen to but does not have to contain the conformation with the lowest potential energy.

The global-optimization methods are desinged to find structures with the lowest potential energy, thus ignoring conformational entropy. Technically this corresponds to canonical simulations at 0 K.

“Potential energy” or “free energy”?

Page 6: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

The stability of the structures of biological macromolecules results from special structure of their energy landscapes, which can be termed “minimal frustration” or “funnel-like structure”. A good example is the pit dug by antlion larva.

Page 7: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Theoretical studies of protein structure and protein folding

• Need to express energy of a system as function of coordinates

• Need an algorithm to explore the conformational space

Page 8: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

From Schrödinger equation to analytical all-atom potentials

),...,,;,...,,(

ˆ

ˆ),...,,;,...,,(

2121

2121

nN

nN

EH

HE

rrrRRR

rrrRRR

elN

ba ji ijai ai

a

ab

ba

a iia

a

HH

rr

Z

r

ZZ

mH

ˆˆ

11ˆ

Page 9: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

The Born-Oppenheimer approximation

elelelel

NelN

el

elji ijai ai

ael

N

ba ab

ba

NNN

nNelNN

nN

EH

EEE

E

rr

Z

E

r

ZZE

ˆ

),...,,(

1

)()...()(),...,,(

),...,,;,...,,(),...,),(

),...,,;,...,,(

2

22

2122

212

RRR

RRRRRR

rrrRRRRRR

rrrRRR

1

11

11

1

Page 10: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

What is a force field? A set of formulas (usually explicit) and parameters to express the conformational energy of a given class of molecules as a function of coordinates (Cartesian, internal, etc.) that define the geometry of a molecule or a molecular system.

Features:

• Cheap• Fast• Easy to program

• Restricted to conformational analysis

• Non-transferable• Results sometimes

unreliable

Page 11: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

ji ij

ij

ij

ijij

ij

ji

anglesdihedral n

n

angles

oiii

bonds

oii

di

r

r

r

r

r

qq

nV

kddkE

60120

22

2

)cos(2

)(2

1)(

2

1

All-atom empirical force fields: a very simplified representation of the potential energy surfaces

Class I force fields

Page 12: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Multiplication of atom types in empirical force fields

Page 13: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

NamePotential

type References

AMBER/OPLSall-atom,

united-atom

Weiner et al., 1984; 1986; Cornell et al., 1995; Jorgensen et al., 1996

http://ambermd.org/

CHARMm all-atomBrooks et al., 1983; MacKerrel et al.,

1998; 2001http://www.charmm.org/

GROMOS all-atomvan Gunsteren & Berendsen, 1987;

Scott et al., 1999http://www.gromos.net/

ECEPP/3all-atom; rigid

valence geometry

Nemethy et al., 1995; Ripoll et al., 1995

http://cbsu.tc.cornell.edu/software/eceppak/

http://www.icm.edu.pl/kdm/ECEPPAK

DISCOVER (CVFF)

all-atomDauber-Osguthorpe, 1988; Maple et

al., 1998

Force fields commonly used for protein simulations

Page 14: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

20

2

1ddkdE d

s

d

d0 d

Es(

d)

Bond distortion energy

Page 15: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Typical values of d0 and kd

Bond d0 [A] kd [kcal/(mol A2)]

Csp3-Csp3 1.523 317

Csp3-Csp2 1.497 317

Csp2=Csp2 1.337 690

Csp2=O 1.208 777

Csp2-Nsp3 1.438 367

C-N (amide) 1.345 719

Page 16: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Comparison of the actual bond-energy curve with that of the harmonic approximation

Page 17: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

11

6

1

2

1

2

3020

eddbes

ds

eDdE

ddddkdE Anharmonic potential

Morse potential (CVFF force field)

Potentials that take into account the asymmetry of bond-energy curve

d [A]

E [

kcal

/mol

]

Harmonic potential

Anharmonic potential

Morse potential

Page 18: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

20

2

1 kEb

0

Eb()

k

Energy of bond-angle distortion

Page 19: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Typical values of 0 and k

Angle 0 [degrees] k

[kcal/(mol degree2)]

Csp3-Csp3-Csp3 109.47 0.0099

Csp3-Csp3-H 109.47 0.0079

H-Csp3-H 109.47 0.0070

Csp3-Csp2-Csp3 117.2 0.0099

Csp3-Csp2=Csp2 121.4 0.0121

Csp3-Csp2=O 122.5 0.0101

Page 20: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Single bond between sp3 carbons or between sp3 carbon and nitrogen

Example: C-C-C-C quadruplet

dihedral angle [deg]

Eto

r [k

cal/m

ol] 60

50

40

30

20

10

0

3cos16.1 torE

Double or partially double bonds

Example: C-C(carboxyl)-C(amide)-C quadruplet

2cos120 torE

Single bond between electronegative atoms (oxygens, sulfurs, etc.).

Example: C-S-S-C quadruplet

cos16.02cos15.3 torE

Basic types of torsional potentials

Page 21: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Potentials imposed on improper torsional angles

A

B

X

X

3cos1

2cos1

3

2

V

VEtor

Page 22: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

61260120

42rr

rEr

r

r

rrE nbnb

Nonbonded Lennard-Jones (6-12) potential

r [A]

Enb

[kc

al/m

ol]

-

r0

jiij

jiij

o

rrr

r

000

6

1

2

Lorenz-Berthelot combining rules

Page 23: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Sample values of i and r0i

Atom type r0

C(carbonyl) 1.85 0.12

C(sp3) 1.80 0.06

N(sp3) 1.85 0.12

O(carbonyl) 1.60 0.20

H(bonded with C) 1.00 0.02

S 2.00 0.20

Page 24: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

1012

6exp

r

D

r

CrE

r

CrArE

hb

nb

Other nonbonded potentials

Buckingham potential

10-12 potential used in some force fields (e.g., ECEPP) for proton…proton donor pairs

Page 25: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Coulombic (electrostatic) potential

Page 26: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.
Page 27: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Charge determination

• Mullikan population charges (ECEPP/3, other early force fields).

• Fitting to molecular electrostatic potentials + subsequent adjustment to reproduce potential-energy surfaces or experimental association energies, etc.

• Based on atomic electronegativities with corrections to topology and geometry (No and coworkers, J. Phys. Chem. B, 105, 3624–3634, 2001; Koca and coworkers, J. Chem. Inf. Model., 53, 2548–2558, 2013).

Page 28: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Charge determination: fitting to molecular electrostatic potential (MEP) maps

N

jj

iN

Coulombi

initio abN

a a

aN

Coulomb

el

a a

ainitio ab

Qq

qqVVqqF

qqqV

dVZ

V

1

2

11

1

2

min,...,;,...,

,...,;

RR

RRR

Rr

r

RRR

Page 29: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Charge determination: fitting to molecular electrostatic potential (MEP) maps

Ab initio calculations Fitted by using CHELP-SV

Francl et al., J. Comput. Chem., 17, 367-383 (1996)

Page 30: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Tijij

ijij

ij

indjijii

indi

i ij

indjiji

indi

ii

indipol

r

U

rrIT

μTEαμ

μTEμEμ

ˆˆ31

2

1

2

1

3

0

0

Polarizable force fields

Page 31: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Energy contribution Source of parameters

Bond and bond angle distortion

Crystal and neutronographic data, IR spectroscopy

Torsional NMR and FTIR spectroscopy

Nonbonded interactionsPolarizabilities, crystal and neutronographic data

Electrostatic energy Molecular electrostatic potentials

AllEnergy surfaces of model systems calculated with molecular quantum mechanics

Sources of parameters

Page 32: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.
Page 33: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Class II force fields (MM3, MMFF, UFF, CFF)

Maple et al., J. Comput. Chem., 15, 162-182 (1994)

Page 34: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Maple et al., J. Comput. Chem., 15, 162-182 (1994)

Page 35: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.
Page 36: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Parameterization of class II force fields

n i ij ji

QMn

ji

nhn

n i i

QMn

i

nfn

n

QMnn

Enm

xx

E

xx

Ew

x

E

x

Ew

EEwpppF

222

)(

2

)(

2'')(21 ,...,,

p

p

p

Page 37: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Solvent in simulations

Explicit water

• TIP3P

• TIP4P

• TIP5P

• SPC

Implicit water

• Solvent accessible surface area (SASA) models

• Molecular surface area models

• Poisson-Boltzmann approach

• Generalized Born surface area (GBSA) model

• Polarizable continuum model (PCM)

Page 38: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

O

H H0.417 e

-0.834 e

104.52o

0.9572 ÅO

H H0.520 e

0.00 e

-1.040 eM

0.15 Å

TIP3P model TIP4P model

O=3.1507 Å

O=0.1521 kcal/mol

O=3.1535 Å

O=0.1550 kcal/mol

Page 39: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Solvent accessible surface area (SASA) models

atoms

iisolw AF

i Free energy of solvation of

atomu i per unit area,

Ai solvent accessible surface of

atom i dostępna

Page 40: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Vila et al., Proteins: Structure, Function, and Genetics, 1991, 10, 199-218.

Page 41: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Comparison of the lowest-energy conformations of [Met5]enkefalin (H-Tyr-Gly-Gly-Phe-Met-OH) obtained with the ECEPP/3 force field in vacuo and with the SRFOPT model

vacuum SRFOPT

Page 42: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

vacuum SRFOPT

Compariosn of the molecular sufraces of the lowest-energy conformation of [Met5]enkefaliny obtained without and with the SRFOPT model

Page 43: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Molecular surface are model

AFcav

Surface tension

A molecular surface area

Page 44: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

)(

1

11332

ijGBoutinji

GBpol

GBpolcavsolw

rfqqE

EFF

ji

ijjiijijGB RR

rRRrrf

4exp)(

22

Generalized Born molecular surface (GBSA) model

Page 45: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Protein structure calculation/prediction and folding simulations

• Single energy minimization (wishful thinking at the early stage of force-field development).

• Global optimization of the PES (ignores conformational entropy).

• Molecular dynamics/Monte Carlo (take entropy into account but slow) and liable to non-convergence).

• Generalized ensemble sampling (MREMD).

Page 46: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Force field validation

Page 47: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Structure of gramicidiny S predicted by using the build-up procedure with energy minimzation with the ECEPP/3 force field (M. Dygert, N. Go, H.A. Scheraga, Macromolecules, 8, 750-761 (1975). The structure turned out to be effectively identical with the NMR structure determined later.

Page 48: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Superposition of the native fold (cyan) and the conformation (red) with the lowest C RMSD (2.85 Å) from the native fold

Energy-RMSD diagram

Global optimization of the energy surface of the N-terminal portion of the B-domain of staphylococcal protein A with all-atom ECEPP/3 force field + SRFOPT mean-field solvation model (Vila et al., PNAS, 2003, 100, 14812–14816)

Page 49: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

First successful folding simulation of a globular protein by molecular dynamics

Duan and Kollman, Science, 282, 5389, 740-744 (1998)

Page 50: THEORETICAL METHODS TO STUDY PROTEIN FOLDING : EMPIRICAL FORCE FIELDS.

Folding proteins at x-ray resolution using a specially designed ANTON machine (x-ray: blue, last frame of MD) simulation (red): villin headpiece (left), a 88 ns of simulations, WW domain (right), 58 s of simulations. Good symplectic algorithm; up to 20 fs time step.D.E. Shaw et al., Science, 2010, 330, 341-346