Reac%on coordinates and LSDMap
Reac%on coordinates and LSDMap
The challenges in molecular biophysics: a broad range of interconnected length and %me scales
Quantum Physics ~1 atom
~ 1 Ǻ Atom
Quantum Chemistry ~101 atoms
~ 1-‐10 Ǻ Molecule
Mesoscale Mul%scale
~103-‐104 atoms ~ 1-‐10 nm
Biomolecule (Macromolecule)
Mesoscale Mul%scale
~104-‐105 atoms ~ 10-‐100 nm
System
Thermodynamics Mesoscale
~1010 atoms ~1-‐10 µm
Cell
Thermodynamics Macroscale
~1020 atoms
E.Coli
Organism
What we study
10-10 10-9 10-8 10-7 10-6 10-5
Atoms Molecules
Macromolecules Systems
Cells
Length scale (m)
1
103
106
109
Number of atoms
Dynamics
Function
Way do we care about dynamics?
…func&on requires dynamics!
from
Main goals and challenges
d
1. Force Field a set of parameters and equations describing the interactions between atoms
... ... ...
2. Sampling Can the simulations cover the rare events that we are interested in?
3. Data analysis How do we understand the mechanism in the data?
Trajectories in equilibrium distribution Observables to compare with the experiment Mechanism from the dynamics New predictions
d
?
Protein Representa%ons
• Cartesian coordinate representa%on – x, y, z coordinate for each single atom
• Internal coordinate representa%on: – b -‐ bond length – α -‐ angle between two consecu%ve bonds – θ -‐ angle between three consecu%ve bonds
• Idealized geometry model – dihedral angles -‐ only DOFs – 2 backbone dihedrals – 4 sidechain dihedrals
All-‐Atom Molecular Mechanics Force-‐Field
A biomolecule is considered a collec%on of masses (atoms) connected by “springs” (bonds).
The associated effec%ve energy (including electronic effects) is parameterized in a classical force-‐field.
Generally, a force-‐field is in the form:
What we are interested in….
Large water clusters
Wet/Dry interfaces
Interac%on with solutes
quantum chemistry gives molecular orbitals
one water molecule
what are the relevant variables? what is the intrinsic dimensionality?
…in between…
thermodynamics describes the system
bulk water
C.Clemen%, Curr. Opin. Struct. Biol. 2008, vol.18(1), 10-‐15
What is relevant and what is “noise”?
What is the minimal set of variables needed to describe realis%cally the dynamics of a macromolecule?
The problem of reac%on coordinates
?
Reac%on Coordinates iden%fy the minimum free energy path along which the reac%on take place, and allow to locate reactants, products, and transi%on states
Example: H + D2O
The problem of reac%on coordinates
Figures from B.R.Strazisar et al., Science 290, 958(2000)
How to choose good reac%on coordinates
Physical-‐relevant variables Q, Rg, z automated ways: gene%c neural network, maximum likelihood a priori knowledge of the system
Iso-‐commitor based on some other reac%on coordinate
Dimensionality reduc%ons? linear dimensionality reduc%on (PCA, MDS) usually around equilibrium
nonlinear dimensionality reduc%on Isomap, Diffusion Map, Sketch-‐map
require correla%on to the physical-‐relevant variables
Reaction coordinates
Gauge the progress of a reaction Cluster the (meta)stable states Preserve the barrier height
Physical (intuitive) collective variables
Iso-commitor (Pfold)
Dimensionality reduction?
C5
αP αR
αL
Free
Ene
rgy
reaction coordinate
The problem of reac%on coordinates
p-‐fold (“commitment probability”, “isocommidor”):
probability for a path star%ng in a par%cular point on the landscape to visit the folded state before the unfolded state
50% 50%
Folded Unfolded
Empirical reac%on coordinates: how good are they?
C.Clemen%, P.A. Jennings, J.N. Onuchic J.Mol.Biol. 311, 879-‐890 (2001)
Folding trajectory of CI2 (Go-‐like model) at Tf
Is p-‐fold the “ul%mate” reac%on coordinate?
S.S. Cho, Y. Levy, P.G. Wolynes, “P versus Q: Structural reac:on coordinates capture protein folding on smooth landscapes” PNAS 103:3, 586–591 (2006)
Example: a set of points on a torus in 3d defines a 2d embedded “surface”
Mathema%cally this is a problem of “non-‐linear” dimensionality reduc%on
The problem consists in finding the best low dimensional descrip%on of a collec%on of macromolecular conforma%ons
Reduc%on coordinates for macromolecular mo%ons
Main mo%on 1
Main mo%
on 2
?
We need to introduce a measure of “similarity” between configura%ons
Similar problems arise in different research fields (computer science, engineering, applied math, sta%s%cs, biology, …)
Examples: classifica%on of documents, image recogni%on
Courtesy of M.Maggioni
Linear dimensionality reduc%on: Principle component analysis (PCA)
Pearson, Philos. Mag, 2, 559, 1901
Idea: Use geodesics to define the “surface”
A geodesic is the shortest path between two points
If we know the geodesics between any couple of points then we know everything about the geometry of the system
Isomap algorithm -‐ Tenenbaum, de Silva, & Langford, (2000) Science 290, 2319–2323.
Basic ideas of ISOMAP Define a low-‐dimensional hyper-‐surface preserving as best as possible geodesic distances between all pairs of data points in the sample
Geodesic distance and Euclidean distance
Network of nearest neighbors can serve to approximate geodesics
between any pairs of points
Use geodesics to define the manifold
P. Das, M. Moll, H. Stama:, L.E. Kavraki, & C.Clemen: Proc. Natl. Acad. Sci. USA 103, 9885-‐9890 (2006)
ScIMAP
Applica%on of nonlinear dimensionality reduc%on to SH3 folding dynamics
P. Das, M. Moll, H. Stama:, L.E. Kavraki, & C.Clemen: Proc. Natl. Acad. Sci. USA 103, 9885-‐9890 (2006)
Na%ve structure of SH3 protein
Applica%on of nonlinear dimensionality reduc%on to SH3 folding dynamics
P. Das, M. Moll, H. Stama:, L.E. Kavraki, & C.Clemen: Proc. Natl. Acad. Sci. USA 103, 9885-‐9890 (2006)
Free energy as a func%on of the first “collec%ve” coordinate
TS N
U
Applica%on of nonlinear dimensionality reduc%on to SH3 folding dynamics
P. Das, M. Moll, H. Stama:, L.E. Kavraki, & C.Clemen: Proc. Natl. Acad. Sci. USA 103, 9885-‐9890 (2006)
unfolded state
na%ve state
transi%on state
The limits of Isomap
Huang and Makarov, J. Chem. Phys. 128 114903 (2008)
Example: Polymer reversal inside a narrow pore
Empirical reac%on coordinate: distance between the first bead and the last bead projected on the z direc%on, z = zN -‐ z1
Is z a good reac%on coordinate to describe the polymer reversal dynamics?
How can we es%mate the reversal rate?
1. Direct method (“experiment”)
Measuring the wai%ng %me between reversal events:
p(t) dt α e-‐kt dt
2. Transi%on state theory
If we have an accurate free energy profile,
we can es%mate the rate:
Huang and Makarov, J. Chem. Phys. 128 114903 (2008)
Trajectories from simula%ons
Weighted Histogram Analysis Method (WHAM) to get free energy profile as a func%on of the reac%on coordinate
Transi%on State Theory (TST) to get the reversal rate
Huang and Makarov, J. Chem. Phys. 128 114903 (2008)
Transi%on state theory
Recrossing of the transi%on events
TST overes%mates the rate constant k
Transmission factor
Langevin equa%on
Kramers theory
Comparison TST with κKramer both overes%mate the rate constant k
if reac%on coordinate is not well chosen
Huang and Makarov, J. Chem. Phys. 128 114903 (2008)
Is z a good reac%on coordinate to describe the transloca%on of polymer inside the pore?
No.
Rate constant as obtained by using the 1st ISOMAP coordinate
S%ll huge gap between ISOMAP and sta%s%cal results
The geodesic distance is not the best way to describe the dynamics of polymer reversal inside the pore.
Sketch-‐map
Define a set of coordinates best preserve the distances in the medium range.
Minimize the sum of differences between the sigmoid func%ons (F and f) of distances in high and low dimensional space.
Cerio^, Tribello & Parrinello, Proc. Nat. Acad. Sci. (USA), 108, 13023, 2011
RMSD distribu%on: Short range matches gaussian noise and long range matches uniformly-‐distributed points for alanine-‐12.
Similarity measure
Mul%dimensional scaling: Euclidean distance (black) Isomap: Geodesic distance (purple) Diffusion map: Diffusion distance
If the data {x} are obtained from the sampling of a diffusion process with a poten%al energy func%on E(x), the associated probability distribu%on p(x,t) is expected to sa%sfy the Fokker-‐Planck equa%on:
A “natural” distance measure can be defined on the data
It measures “how easily” x0 and x1 transform into each other
Diffusion Map
RR Coifman, S Lafon, A Lee, M Maggioni, B Nadler, FJ Warner, and SW Zucker, Proc. of Natl. Acad. Sci. USA, 102, 7426-‐7431, 2005
Diffusion Map The Fokker-‐Planck equa%on has a discrete eigenvalue spectrum
0 = λ0 < λ1 < λ2 < λ3 ….
If there is a separa%on of %mescales: λk << λk+1
Diffusion distance
Boltzmann distribu%on (equilibrium) eigenfunc%ons
GOOD REACTION COORDINATES
> 0
= 0
A discrete approxima%on of these eigenvalues and eigenvectors can be obtained by considering the kernel:
eigenvalues and eigenfunc%ons of M are the discrete approxima%on of
R. R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, S.W. Zucker Geometric diffusions as a tool for harmonic analysis and structure defini:on of data: Diffusion maps
Proc.Natl.Acad.Sci.USA 102(21) 7426-‐7431, 2005
Diffusion Map
Idea of local scale
Within the local scale, the manifold should be approximately flat.
The spread of Gaussian distribu%on is different for different points, which tells the level of the flatness of the manifold.
Only noises in green circle. Curvature in blue circle.
Jung, Licle and Maggioni, Proc. AAAI, 26-‐33 (2009) Rohrdanz, Zheng, Maggioni and Clemen:. J Chem. Phys., 134(12), 124116 (2011)
Determina%on of the local scale
the k-‐th neighbor
Jung, Licle and Maggioni, Proc. AAAI, 26-‐33 (2009) Rohrdanz, Zheng, Maggioni and Clemen:. J Chem. Phys., 134(12), 124116 (2011)
Local intrinsic dimensionality
Noise
Find the smallest local scale above the noise in which PCA captures the dynamics reasonably well.
PCA on increasing scales Local Principle Component Analysis
Locally scaled diffusion map (LSDMap)
Point-‐specific local scale
RMSD
To extract
from a discrete data set (i.e. molecular dynamics data)
Rohrdanz, Zheng, Maggioni and Clemen:. J Chem. Phys., 134(12), 124116 (2011)
Diffusion eigenspectrum
large gap
%mescale separa%on
Sta%onary solu%on Pore radius
Zheng, Rohrdanz, Maggioni and Clemen:, J Chem. Phys., 134 144109 (2011)
Free energy landscape
Local PCA spectra
Minimum Barrier
Local scales
Different regions of the configura%on space have different local scales.
Local heterogenei%es
Barrier: Large spectra gap, small intrinsic dimension, large local scale Minimum: Small spectra gap, large intrinsic dimension, small local scale
We test the “goodness” of the first diffusion coordinate as reac%on
coordinate by es%ma%ng the rates
From Kramers’ theory of escape rates we have:
Reac%on Coordinate, x
Free Ene
rgy Escape rate
D(x) = diffusion coefficient, it’s NOT a constant
Reversal rate
Zheng, Rohrdanz, Maggioni and Clemen:, J Chem. Phys., 134 144109 (2011)
Zheng, Rohrdanz, and Clementi, J Chem. Phys., 134, 144109 (2011)�Z vs. 1st DC
Z
1st DC 1st DC = 0 surface is different from z=0 surface.
Correla%on to the contact probabili%es
z
1st DC
h4p://sourceforge.net/projects/lsdmap
LSDMap code in Fortran90 and MPI
Clemen&’s group
Dr. Mary Rohrdanz Dr. Jordane Preto Lorenzo Boninsegna Wenwei Zheng Fernando Yrazu Alex Kluber Amarda Shehu (now: GMU) Payel Das (now: IBM) Silvina Matysiak (now: U. Maryland) Brad Lambeth (now: Shell)
Collaborators: Prof. Mauro Maggioni (Duke – Math) Miles Crosskey
$$ NSF CHE-‐0835824 CHE-‐1152344
$$ Welch Founda%on C-‐1570