THEORETICAL METHODS TO STUDY PROTEIN FOLDING: EMPIRICAL FORCE FIELDS
Individual components
System level(Networks)
Avera
gin
g o
ver individ
ual co
mp
one
nts
PDEs to describe reaction/diffusion
Network graphs
Fully-detailed
Atomistically-detailed
Coarse-grained
QM
QM/MM
All-atom
United-atom
Residue level
Molecule/domain level
Avera
gin
g o
ver „less im
porta
nt” deg
rees of
freed
omDescription
level
Anfinsen’s thermodynamic hypothesis.“The studies on the renaturation of fully denaturated ribonuclease required many supporting investigations to establish, finally, the generality which we have occasionally called the ‘thermodynamic hypothesis’. This hypothesis states that the three-dimensional structure of a native protein in its normal physiological milieu (solvent, pH, ionic strength, presence of other components such as metal ions or prosthetic group, temperature and other) is the one in which the Gibbs free energy of the whole system is lowest; that is, the native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence in a given environment.”
C.B. Anfinsen, Science, 181, 223-230, 1973.
To facilitate the implementation of this hypothesis in protein-structure prediction, “free energy” was replaced with “potential energy”.
Nature (and a canonical simulation) finds the basin with the lowest free energy, at a given temperature which might happen to but does not have to contain the conformation with the lowest potential energy.
The global-optimization methods are desinged to find structures with the lowest potential energy, thus ignoring conformational entropy. Technically this corresponds to canonical simulations at 0 K.
“Potential energy” or “free energy”?
The stability of the structures of biological macromolecules results from special structure of their energy landscapes, which can be termed “minimal frustration” or “funnel-like structure”. A good example is the pit dug by antlion larva.
Theoretical studies of protein structure and protein folding
• Need to express energy of a system as function of coordinates
• Need an algorithm to explore the conformational space
From Schrödinger equation to analytical all-atom potentials
),...,,;,...,,(
ˆ
ˆ),...,,;,...,,(
2121
2121
nN
nN
EH
HE
rrrRRR
rrrRRR
elN
ba ji ijai ai
a
ab
ba
a iia
a
HH
rr
Z
r
ZZ
mH
ˆˆ
11ˆ
The Born-Oppenheimer approximation
elelelel
NelN
el
elji ijai ai
ael
N
ba ab
ba
NNN
nNelNN
nN
EH
EEE
E
rr
Z
E
r
ZZE
ˆ
),...,,(
1
)()...()(),...,,(
),...,,;,...,,(),...,),(
),...,,;,...,,(
2
22
2122
212
RRR
RRRRRR
rrrRRRRRR
rrrRRR
1
11
11
1
What is a force field? A set of formulas (usually explicit) and parameters to express the conformational energy of a given class of molecules as a function of coordinates (Cartesian, internal, etc.) that define the geometry of a molecule or a molecular system.
Features:
• Cheap• Fast• Easy to program
• Restricted to conformational analysis
• Non-transferable• Results sometimes
unreliable
ji ij
ij
ij
ijij
ij
ji
anglesdihedral n
n
angles
oiii
bonds
oii
di
r
r
r
r
r
nV
kddkE
60120
22
2
)cos(2
)(2
1)(
2
1
All-atom empirical force fields: a very simplified representation of the potential energy surfaces
Class I force fields
Multiplication of atom types in empirical force fields
NamePotential
type References
AMBER/OPLSall-atom,
united-atom
Weiner et al., 1984; 1986; Cornell et al., 1995; Jorgensen et al., 1996
http://ambermd.org/
CHARMm all-atomBrooks et al., 1983; MacKerrel et al.,
1998; 2001http://www.charmm.org/
GROMOS all-atomvan Gunsteren & Berendsen, 1987;
Scott et al., 1999http://www.gromos.net/
ECEPP/3all-atom; rigid
valence geometry
Nemethy et al., 1995; Ripoll et al., 1995
http://cbsu.tc.cornell.edu/software/eceppak/
http://www.icm.edu.pl/kdm/ECEPPAK
DISCOVER (CVFF)
all-atomDauber-Osguthorpe, 1988; Maple et
al., 1998
Force fields commonly used for protein simulations
20
2
1ddkdE d
s
d
d0 d
Es(
d)
Bond distortion energy
Typical values of d0 and kd
Bond d0 [A] kd [kcal/(mol A2)]
Csp3-Csp3 1.523 317
Csp3-Csp2 1.497 317
Csp2=Csp2 1.337 690
Csp2=O 1.208 777
Csp2-Nsp3 1.438 367
C-N (amide) 1.345 719
Comparison of the actual bond-energy curve with that of the harmonic approximation
11
6
1
2
1
2
3020
eddbes
ds
eDdE
ddddkdE Anharmonic potential
Morse potential (CVFF force field)
Potentials that take into account the asymmetry of bond-energy curve
d [A]
E [
kcal
/mol
]
Harmonic potential
Anharmonic potential
Morse potential
20
2
1 kEb
0
Eb()
k
Energy of bond-angle distortion
Typical values of 0 and k
Angle 0 [degrees] k
[kcal/(mol degree2)]
Csp3-Csp3-Csp3 109.47 0.0099
Csp3-Csp3-H 109.47 0.0079
H-Csp3-H 109.47 0.0070
Csp3-Csp2-Csp3 117.2 0.0099
Csp3-Csp2=Csp2 121.4 0.0121
Csp3-Csp2=O 122.5 0.0101
Single bond between sp3 carbons or between sp3 carbon and nitrogen
Example: C-C-C-C quadruplet
dihedral angle [deg]
Eto
r [k
cal/m
ol] 60
50
40
30
20
10
0
3cos16.1 torE
Double or partially double bonds
Example: C-C(carboxyl)-C(amide)-C quadruplet
2cos120 torE
Single bond between electronegative atoms (oxygens, sulfurs, etc.).
Example: C-S-S-C quadruplet
cos16.02cos15.3 torE
Basic types of torsional potentials
Potentials imposed on improper torsional angles
A
B
X
X
3cos1
2cos1
3
2
V
VEtor
61260120
42rr
rEr
r
r
rrE nbnb
Nonbonded Lennard-Jones (6-12) potential
r [A]
Enb
[kc
al/m
ol]
-
r0
jiij
jiij
o
rrr
r
000
6
1
2
Lorenz-Berthelot combining rules
Sample values of i and r0i
Atom type r0
C(carbonyl) 1.85 0.12
C(sp3) 1.80 0.06
N(sp3) 1.85 0.12
O(carbonyl) 1.60 0.20
H(bonded with C) 1.00 0.02
S 2.00 0.20
1012
6exp
r
D
r
CrE
r
CrArE
hb
nb
Other nonbonded potentials
Buckingham potential
10-12 potential used in some force fields (e.g., ECEPP) for proton…proton donor pairs
Coulombic (electrostatic) potential
Charge determination
• Mullikan population charges (ECEPP/3, other early force fields).
• Fitting to molecular electrostatic potentials + subsequent adjustment to reproduce potential-energy surfaces or experimental association energies, etc.
• Based on atomic electronegativities with corrections to topology and geometry (No and coworkers, J. Phys. Chem. B, 105, 3624–3634, 2001; Koca and coworkers, J. Chem. Inf. Model., 53, 2548–2558, 2013).
Charge determination: fitting to molecular electrostatic potential (MEP) maps
N
jj
iN
Coulombi
initio abN
a a
aN
Coulomb
el
a a
ainitio ab
qqVVqqF
qqqV
dVZ
V
1
2
11
1
2
min,...,;,...,
,...,;
RR
RRR
Rr
r
RRR
Charge determination: fitting to molecular electrostatic potential (MEP) maps
Ab initio calculations Fitted by using CHELP-SV
Francl et al., J. Comput. Chem., 17, 367-383 (1996)
Tijij
ijij
ij
indjijii
indi
i ij
indjiji
indi
ii
indipol
r
U
rrIT
μTEαμ
μTEμEμ
ˆˆ31
2
1
2
1
3
0
0
Polarizable force fields
Energy contribution Source of parameters
Bond and bond angle distortion
Crystal and neutronographic data, IR spectroscopy
Torsional NMR and FTIR spectroscopy
Nonbonded interactionsPolarizabilities, crystal and neutronographic data
Electrostatic energy Molecular electrostatic potentials
AllEnergy surfaces of model systems calculated with molecular quantum mechanics
Sources of parameters
Class II force fields (MM3, MMFF, UFF, CFF)
Maple et al., J. Comput. Chem., 15, 162-182 (1994)
Maple et al., J. Comput. Chem., 15, 162-182 (1994)
Parameterization of class II force fields
n i ij ji
QMn
ji
nhn
n i i
QMn
i
nfn
n
QMnn
Enm
xx
E
xx
Ew
x
E
x
Ew
EEwpppF
222
)(
2
)(
2'')(21 ,...,,
p
p
p
Solvent in simulations
Explicit water
• TIP3P
• TIP4P
• TIP5P
• SPC
Implicit water
• Solvent accessible surface area (SASA) models
• Molecular surface area models
• Poisson-Boltzmann approach
• Generalized Born surface area (GBSA) model
• Polarizable continuum model (PCM)
O
H H0.417 e
-0.834 e
104.52o
0.9572 ÅO
H H0.520 e
0.00 e
-1.040 eM
0.15 Å
TIP3P model TIP4P model
O=3.1507 Å
O=0.1521 kcal/mol
O=3.1535 Å
O=0.1550 kcal/mol
Solvent accessible surface area (SASA) models
atoms
iisolw AF
i Free energy of solvation of
atomu i per unit area,
Ai solvent accessible surface of
atom i dostępna
Vila et al., Proteins: Structure, Function, and Genetics, 1991, 10, 199-218.
Comparison of the lowest-energy conformations of [Met5]enkefalin (H-Tyr-Gly-Gly-Phe-Met-OH) obtained with the ECEPP/3 force field in vacuo and with the SRFOPT model
vacuum SRFOPT
vacuum SRFOPT
Compariosn of the molecular sufraces of the lowest-energy conformation of [Met5]enkefaliny obtained without and with the SRFOPT model
Molecular surface are model
AFcav
Surface tension
A molecular surface area
)(
1
11332
ijGBoutinji
GBpol
GBpolcavsolw
rfqqE
EFF
ji
ijjiijijGB RR
rRRrrf
4exp)(
22
Generalized Born molecular surface (GBSA) model
Protein structure calculation/prediction and folding simulations
• Single energy minimization (wishful thinking at the early stage of force-field development).
• Global optimization of the PES (ignores conformational entropy).
• Molecular dynamics/Monte Carlo (take entropy into account but slow) and liable to non-convergence).
• Generalized ensemble sampling (MREMD).
Force field validation
Structure of gramicidiny S predicted by using the build-up procedure with energy minimzation with the ECEPP/3 force field (M. Dygert, N. Go, H.A. Scheraga, Macromolecules, 8, 750-761 (1975). The structure turned out to be effectively identical with the NMR structure determined later.
Superposition of the native fold (cyan) and the conformation (red) with the lowest C RMSD (2.85 Å) from the native fold
Energy-RMSD diagram
Global optimization of the energy surface of the N-terminal portion of the B-domain of staphylococcal protein A with all-atom ECEPP/3 force field + SRFOPT mean-field solvation model (Vila et al., PNAS, 2003, 100, 14812–14816)
First successful folding simulation of a globular protein by molecular dynamics
Duan and Kollman, Science, 282, 5389, 740-744 (1998)
Folding proteins at x-ray resolution using a specially designed ANTON machine (x-ray: blue, last frame of MD) simulation (red): villin headpiece (left), a 88 ns of simulations, WW domain (right), 58 s of simulations. Good symplectic algorithm; up to 20 fs time step.D.E. Shaw et al., Science, 2010, 330, 341-346