Reac%on(coordinates(and(LSDMap( - cgl.uni-jena.decgl.uni-jena.de/pub/Workshops/WebHome/CGL_workshop1.pdf · Whatwe(study(10-10 10-9 10-8 10-7 10-6 10-5 Atoms Molecules Macromolecules

Reac%on coordinates and LSDMap

The challenges in molecular biophysics: a broad range of interconnected length and %me scales

Quantum Physics ~1 atom

~ 1 Ǻ Atom

Quantum Chemistry ~101 atoms

~ 1-‐10 Ǻ Molecule

Mesoscale Mul%scale

~103-‐104 atoms ~ 1-‐10 nm

Biomolecule (Macromolecule)

Mesoscale Mul%scale

~104-‐105 atoms ~ 10-‐100 nm

System

Thermodynamics Mesoscale

~1010 atoms ~1-‐10 µm

Cell

Thermodynamics Macroscale

~1020 atoms

E.Coli

Organism

What we study

10-10 10-9 10-8 10-7 10-6 10-5

Atoms Molecules

Macromolecules Systems

Cells

Length scale (m)

1

103

106

109

Number of atoms

Dynamics

Function

Way do we care about dynamics?

…func&on requires dynamics!

from

Main goals and challenges

d

1. Force Field a set of parameters and equations describing the interactions between atoms

... ... ...

2. Sampling Can the simulations cover the rare events that we are interested in?

3. Data analysis How do we understand the mechanism in the data?

Trajectories in equilibrium distribution Observables to compare with the experiment Mechanism from the dynamics New predictions

d

?

Protein Representa%ons

•  Cartesian coordinate representa%on –  x, y, z coordinate for each single atom

•  Internal coordinate representa%on: –  b -‐ bond length –  α -‐ angle between two consecu%ve bonds –  θ -‐ angle between three consecu%ve bonds

•  Idealized geometry model –  dihedral angles -‐ only DOFs –  2 backbone dihedrals –  4 sidechain dihedrals

All-‐Atom Molecular Mechanics Force-‐Field

A biomolecule is considered a collec%on of masses (atoms) connected by “springs” (bonds).

The associated effec%ve energy (including electronic effects) is parameterized in a classical force-‐field.

Generally, a force-‐field is in the form:

What we are interested in….

Large water clusters

Wet/Dry interfaces

Interac%on with solutes

quantum chemistry gives molecular orbitals

one water molecule

what are the relevant variables? what is the intrinsic dimensionality?

…in between…

thermodynamics describes the system

bulk water

C.Clemen%, Curr. Opin. Struct. Biol. 2008, vol.18(1), 10-‐15

What is relevant and what is “noise”?

What is the minimal set of variables needed to describe realis%cally the dynamics of a macromolecule?

The problem of reac%on coordinates

?

Reac%on Coordinates iden%fy the minimum free energy path along which the reac%on take place, and allow to locate reactants, products, and transi%on states

Example: H + D2O


Figures from B.R.Strazisar et al., Science 290, 958(2000)

How to choose good reac%on coordinates

Physical-‐relevant variables Q, Rg, z automated ways: gene%c neural network, maximum likelihood a priori knowledge of the system

Iso-‐commitor based on some other reac%on coordinate

Dimensionality reduc%ons? linear dimensionality reduc%on (PCA, MDS) usually around equilibrium

nonlinear dimensionality reduc%on Isomap, Diffusion Map, Sketch-‐map

require correla%on to the physical-‐relevant variables

Reaction coordinates

Gauge the progress of a reaction Cluster the (meta)stable states Preserve the barrier height

Physical (intuitive) collective variables

Iso-commitor (Pfold)

Dimensionality reduction?

C5

αP αR

αL

Free

Ene

rgy

reaction coordinate


p-‐fold (“commitment probability”, “isocommidor”):

probability for a path star%ng in a par%cular point on the landscape to visit the folded state before the unfolded state

50% 50%

Folded Unfolded

Empirical reac%on coordinates: how good are they?

C.Clemen%, P.A. Jennings, J.N. Onuchic J.Mol.Biol. 311, 879-‐890 (2001)

Folding trajectory of CI2 (Go-‐like model) at Tf

Is p-‐fold the “ul%mate” reac%on coordinate?

S.S. Cho, Y. Levy, P.G. Wolynes, “P versus Q: Structural reac:on coordinates capture protein folding on smooth landscapes” PNAS 103:3, 586–591 (2006)

Example: a set of points on a torus in 3d defines a 2d embedded “surface”

Mathema%cally this is a problem of “non-‐linear” dimensionality reduc%on

The problem consists in finding the best low dimensional descrip%on of a collec%on of macromolecular conforma%ons

Reduc%on coordinates for macromolecular mo%ons

Main mo%on 1

Main mo%

on 2

?

We need to introduce a measure of “similarity” between configura%ons

Similar problems arise in different research fields (computer science, engineering, applied math, sta%s%cs, biology, …)

Examples: classifica%on of documents, image recogni%on

Courtesy of M.Maggioni

Linear dimensionality reduc%on: Principle component analysis (PCA)

Pearson, Philos. Mag, 2, 559, 1901

Idea: Use geodesics to define the “surface”

A geodesic is the shortest path between two points

If we know the geodesics between any couple of points then we know everything about the geometry of the system

Isomap algorithm -‐ Tenenbaum, de Silva, & Langford, (2000) Science 290, 2319–2323.

Basic ideas of ISOMAP Define a low-‐dimensional hyper-‐surface preserving as best as possible geodesic distances between all pairs of data points in the sample

Geodesic distance and Euclidean distance

Network of nearest neighbors can serve to approximate geodesics

between any pairs of points

Use geodesics to define the manifold

P. Das, M. Moll, H. Stama:, L.E. Kavraki, & C.Clemen: Proc. Natl. Acad. Sci. USA 103, 9885-‐9890 (2006)

ScIMAP

Applica%on of nonlinear dimensionality reduc%on to SH3 folding dynamics


Na%ve structure of SH3 protein



Free energy as a func%on of the first “collec%ve” coordinate

TS N

U



unfolded state

na%ve state

transi%on state

The limits of Isomap

Huang and Makarov, J. Chem. Phys. 128 114903 (2008)

Example: Polymer reversal inside a narrow pore

Empirical reac%on coordinate: distance between the first bead and the last bead projected on the z direc%on, z = zN -‐ z1

Is z a good reac%on coordinate to describe the polymer reversal dynamics?

How can we es%mate the reversal rate?

1. Direct method (“experiment”)

Measuring the wai%ng %me between reversal events:

p(t) dt α e-‐kt dt

2. Transi%on state theory

If we have an accurate free energy profile,

we can es%mate the rate:


Trajectories from simula%ons

Weighted Histogram Analysis Method (WHAM) to get free energy profile as a func%on of the reac%on coordinate

Transi%on State Theory (TST) to get the reversal rate


Transi%on state theory

Recrossing of the transi%on events

TST overes%mates the rate constant k

Transmission factor

Langevin equa%on

Kramers theory

Comparison TST with κKramer both overes%mate the rate constant k

if reac%on coordinate is not well chosen


Is z a good reac%on coordinate to describe the transloca%on of polymer inside the pore?

No.

Rate constant as obtained by using the 1st ISOMAP coordinate

S%ll huge gap between ISOMAP and sta%s%cal results

The geodesic distance is not the best way to describe the dynamics of polymer reversal inside the pore.

Sketch-‐map

Define a set of coordinates best preserve the distances in the medium range.

Minimize the sum of differences between the sigmoid func%ons (F and f) of distances in high and low dimensional space.

Cerio^, Tribello & Parrinello, Proc. Nat. Acad. Sci. (USA), 108, 13023, 2011

RMSD distribu%on: Short range matches gaussian noise and long range matches uniformly-‐distributed points for alanine-‐12.

Similarity measure

Mul%dimensional scaling: Euclidean distance (black) Isomap: Geodesic distance (purple) Diffusion map: Diffusion distance

If the data {x} are obtained from the sampling of a diffusion process with a poten%al energy func%on E(x), the associated probability distribu%on p(x,t) is expected to sa%sfy the Fokker-‐Planck equa%on:

A “natural” distance measure can be defined on the data

It measures “how easily” x0 and x1 transform into each other

Diffusion Map

RR Coifman, S Lafon, A Lee, M Maggioni, B Nadler, FJ Warner, and SW Zucker, Proc. of Natl. Acad. Sci. USA, 102, 7426-‐7431, 2005

Diffusion Map The Fokker-‐Planck equa%on has a discrete eigenvalue spectrum

0 = λ0 < λ1 < λ2 < λ3 ….

If there is a separa%on of %mescales: λk << λk+1

Diffusion distance

Boltzmann distribu%on (equilibrium) eigenfunc%ons

GOOD REACTION COORDINATES

> 0

= 0

A discrete approxima%on of these eigenvalues and eigenvectors can be obtained by considering the kernel:

eigenvalues and eigenfunc%ons of M are the discrete approxima%on of

R. R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, S.W. Zucker Geometric diffusions as a tool for harmonic analysis and structure defini:on of data: Diffusion maps

Proc.Natl.Acad.Sci.USA 102(21) 7426-‐7431, 2005

Diffusion Map

Idea of local scale

Within the local scale, the manifold should be approximately flat.

The spread of Gaussian distribu%on is different for different points, which tells the level of the flatness of the manifold.

Only noises in green circle. Curvature in blue circle.

Jung, Licle and Maggioni, Proc. AAAI, 26-‐33 (2009) Rohrdanz, Zheng, Maggioni and Clemen:. J Chem. Phys., 134(12), 124116 (2011)

Determina%on of the local scale

the k-‐th neighbor

Jung, Licle and Maggioni, Proc. AAAI, 26-‐33 (2009) Rohrdanz, Zheng, Maggioni and Clemen:. J Chem. Phys., 134(12), 124116 (2011)

Local intrinsic dimensionality

Noise

Find the smallest local scale above the noise in which PCA captures the dynamics reasonably well.

PCA on increasing scales Local Principle Component Analysis

Locally scaled diffusion map (LSDMap)

Point-‐specific local scale

RMSD

To extract

from a discrete data set (i.e. molecular dynamics data)

Rohrdanz, Zheng, Maggioni and Clemen:. J Chem. Phys., 134(12), 124116 (2011)

Diffusion eigenspectrum

large gap

%mescale separa%on

Sta%onary solu%on Pore radius

Zheng, Rohrdanz, Maggioni and Clemen:, J Chem. Phys., 134 144109 (2011)

Free energy landscape

Local PCA spectra

Minimum Barrier

Local scales

Different regions of the configura%on space have different local scales.

Local heterogenei%es

Barrier: Large spectra gap, small intrinsic dimension, large local scale Minimum: Small spectra gap, large intrinsic dimension, small local scale

We test the “goodness” of the first diffusion coordinate as reac%on

coordinate by es%ma%ng the rates

From Kramers’ theory of escape rates we have:

Reac%on Coordinate, x

Free Ene

rgy Escape rate

D(x) = diffusion coefficient, it’s NOT a constant

Reversal rate

Zheng, Rohrdanz, Maggioni and Clemen:, J Chem. Phys., 134 144109 (2011)

Zheng, Rohrdanz, and Clementi, J Chem. Phys., 134, 144109 (2011)�Z vs. 1st DC

Z

1st DC 1st DC = 0 surface is different from z=0 surface.

Correla%on to the contact probabili%es

z

1st DC

h4p://sourceforge.net/projects/lsdmap

LSDMap code in Fortran90 and MPI

Clemen&’s group

Dr. Mary Rohrdanz Dr. Jordane Preto Lorenzo Boninsegna Wenwei Zheng Fernando Yrazu Alex Kluber Amarda Shehu (now: GMU) Payel Das (now: IBM) Silvina Matysiak (now: U. Maryland) Brad Lambeth (now: Shell)

Collaborators: Prof. Mauro Maggioni (Duke – Math) Miles Crosskey

$$ NSF CHE-‐0835824 CHE-‐1152344

$$ Welch Founda%on C-‐1570

Reac%on(coordinates(and(LSDMap( - cgl.uni-jena.decgl.uni-jena.de/pub/Workshops/WebHome/CGL_workshop1.pdf · Whatwe(study(10-10 10-9 10-8 10-7 10-6 10-5 Atoms Molecules Macromolecules

Documents