MSI February 2012 Practical examples Contents 1 Introduction 2 2 System preparation 2 3 Practical Session I: Molecular dynamics 4 3.1 MD with NAMD .................................... 4 3.1.1 Creating a PSF file for PDB 1i45 ....................... 4 3.1.2 Solvating the structure ............................ 6 3.1.3 Running the simulations ........................... 7 3.2 MD with MOLARIS .................................. 8 3.2.1 Preparing the PDB .............................. 8 3.2.2 Running an interactive MOLARIS session .................. 9 3.3 MD with ADUN .................................... 14 3.3.1 Preparing the PDB .............................. 14 3.4 Running simulations with ADUN ........................... 14 3.4.1 Analyizing the results ............................ 15 4 Practical Session II: Solvation, pK a , FEP 16 4.1 Running solvation and pK a simulations using PDLD/S-LRA ............ 16 4.2 LIE runs with ADUN ................................. 18 5 Practical Session III: enzymatic reactivity with EVB 18 5.1 EVB for enzymatic reactivity analysis ........................ 18 A Running in the luke cluster 21 B Extra material for NAMD 21 B.1 Set up ......................................... 21 B.2 VMD: wat_sphere.tcl .............................. 21 B.3 VMD: sod2pot.tcl ................................. 22 B.4 VMD: 1i45_ws_eq.conf ............................. 23 B.5 VMD: 1i45_wb_eq.conf ............................. 24 C Extra material for ADUN 26 C.1 Set up ......................................... 26 c 2008-2012 Jordi Villà-Freixa, 2010 Nils Drechsel 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
In this session we will practice Molecular Dynamics simulations using three different programs andthe help of VMD, Chimera and R to visualize and analyze the data obtained. We will demonstratethe methods on triosephosphate isomerase (TIM), an enzyme catalyzing the reversible interconver-sion of the triose phosphate isomers dihydroxyacetone phosphate (DHAP) and D-glyceraldehyde3-phosphate (GAP). Apart from its obvious interest, the protein has some characteristics that makeit a good example for this class (it does not contain disulfide bonds, it is an enzyme with a multiplefree energy barrier, it is a dimer...).
The programs we will learn to use are
• NAMD[Phillips et al., 2005], a popular high performance computing MD program that is tightlylinked to the VMD[Humphrey et al., 1996] visualization program. Useful for running stan-dard MD runs with periodic boundary conditions and popular force fields like AMBER orCHARMM. 1
• MOLARIS[Lee et al., 1993], a program containing advanced algorithms for spherical bound-ary conditions and the PDLD/S-LRA model for semimacroscopic solvation calculations.2
• ADUN[Johnston et al., 2005], a high performance productivity and framework based computerprogram for MD simulations, including a plugin system for additions of complex algorithms.We will use it in Section 4 for LIE calculations.3
We will be running these programs in two different platforms. On the one hand, the use of NAMDwill be demonstrated assuming a Mac OS X based computer, although extremely analogous com-mands can be used in a unix machine. On the other hand, ADUN and MOLARIS will be run remotelyin a Linux cluster using Fedora 8. Additionally, the interested participants can download an ex-perimental live CD for ADUN as it can be found at http://susegallery.com/a/hvXWpn/adun-user (you need to -freely- register).
Through this document, we will use THIS COLOR when showing bash scripting code andTHIS COLOR when showing Tcl scripting for VMD.
2 System preparation
First, we need to obtain the PDB files we will be using. TIM is known to explore two conformationsthat influence its ability to bind the substrate. We will be using the following PDB codes: 1I45 forthe open[Rozovsky et al., 2001] and 1NEY for the closed [Jogl et al., 2003]. We will first check thatthese two files correspond precisely to the same protein sequence. From the PDB we can get thesequences of the A chains (the B is identical in this dimeric protein), by running:
1http://www.ks.uiuc.edu/Research/namd/2http://futura.usc.edu/programs/index.html#molaris3adun.imim.es, It is highly recommended subscribing to the adun-users mailing list (https://mail.gna.
org/listinfo/adun-users) to be aware of new improvements and to report problems.
$ wget −−o u t p u t−document =1NEY. f a s t a \h t t p : / / www. pdb . o rg / pdb / download / d o w n l o a d F i l e . do ? f i l e F o r m a t = f a s t a c h a i n \& s t r u c t u r e I d =1NEY\& c h a i n I d =A
$ wget −−o u t p u t−document =1 I45 . f a s t a \h t t p : / / www. pdb . o rg / pdb / download / d o w n l o a d F i l e . do ? f i l e F o r m a t = f a s t a c h a i n \& s t r u c t u r e I d =1 I45 \& c h a i n I d =A
$ wget −−o u t p u t−document =1YPI . f a s t a \h t t p : / / www. pdb . o rg / pdb / download / d o w n l o a d F i l e . do ? f i l e F o r m a t = f a s t a c h a i n \& s t r u c t u r e I d =1YPI\& c h a i n I d =A
where $ refers to the system prompt. From now on, we will not use the $ sign, to simplify thewriting. Notice we have also downloaded the sequence for the PDB code 1YPI[Lolis et al., 1990],which corresponds to the wild type protein, for comparison. We can then put the three sequences inthe same file
c a t 1NEY. f a s t a 1 I45 . f a s t a 1YPI . f a s t a > s e q s . f a s t a
Notice here that the two PDBs we will work with are indeed mutated structures from the wild type1YPI. It is ok, as they have been seen to have the same activity as the wt protein[Rozovsky et al., 2001,Jogl et al., 2003].
We can obtain the PDB files themselves in very different ways. We can see the contents of thefiles by accessing the PDB.4 and obtain the files from those pages or, more conveniently, by typing:
wget f t p : / / f t p . wwpdb . org / pub / pdb / d a t a / s t r u c t u r e s / a l l / pdb / pdb1ney . e n t . gzwget f t p : / / f t p . wwpdb . org / pub / pdb / d a t a / s t r u c t u r e s / a l l / pdb / pdb1i45 . e n t . gzwget f t p : / / f t p . wwpdb . org / pub / pdb / d a t a / s t r u c t u r e s / a l l / pdb / pdb1ypi . e n t . gz
Upon inspection of the two PDB files, we realize 1NEY, which is in the closed form, contains thesubstrate DHAP, while 1I45 contains no substrate. In addition, we realize that in both cases the ac-tual mutations include a fluorinated variant of Trp 168: Trp90Tyr Trp157Phe with 5’-fluorotryptophanat Trp168 (see ??). We can keep working with that fluorinated version but it is better to modify itby the original Trp, as the former was just used for experimental monitoring of loop 6 and we don’tneed it here.
4Try accessing http://www.rcsb.org/pdb/explore/explore.do?structureId=1I45 and http://www.rcsb.org/pdb/explore/explore.do?structureId=1NEY
The first task we will do is to run regular MD simulations with NAMD with periodic boundaryconditions using the CHARMM force field with CMAP, an energy correction lately added to theCHARMM force field. The NAMD team has produced a series of excellent tutorials that can be foundat http://www.ks.uiuc.edu/Training/Tutorials/. Here we will adapt the generalNAMD tutorial to the simulation of our two proteins 1I45 and 1NEY, to analyze their behavior. Torun NAMD we need:
• a PDB file
• a protein structure file (PSF), which stores the information about the topology of the proteinstructure
• a force field parameter file (for example the file toppar_c35b2_c36a2.tgz obtained from:http://mackerell.umaryland.edu/CHARMM_ff_params.html
• a configuration or input file, specifying what do we want to do with running the program
Figure 3.1 shows the way we will proceed. More details below and in the original NAMD tutorial.
3.1.1 Creating a PSF file for PDB 1i45
The first task to do is to split the pdb files into their two subunits, as this is needed by the psfgenprogram.
grep ’ A ’ pdb1i45 . e n t | grep −v ’HOH’ >1i45A . pdbgrep ’ B ’ pdb1i45 . e n t | grep −v ’HOH’ >1 i45B . pdb
We need to be this in order to make the two monomers being segments of the PSF file we willgenerate. In principle, the psfgen should do the rest for us. Thus, we simply run
Figure 2: Flowchart of the process of running a NAMD run, specifying the different tools to beused to generate the output. Extracted from the NAMD tutorial at http://www.ks.uiuc.edu/Training/Tutorials/.
a l i a s vmdrun =/ A p p l i c a t i o n s /VMD\ 1 . 8 . 7 b e t a 3 . app / C o n t e n t s / MacOS / s t a r t u p . commandvmdrun −d i s p d e v t e x t −e o f e x i t < 1 i45_pgn . t c l
or its equivalent in windows or unix, where the 1i45_pgn.tcl file contains:
package r e q u i r e p s f g e nr e s e t p s f# l o a d i n g t h e t o p o l o g yt o p o l og y t o p _ a l l 2 7 _ p r o t _ l i p i d . r t f# c r e a t i n g t h e segment f o r t h e f i r s t monomeri f {1} {
# a l i a s i n g some namesp d b a l i a s r e s i d u e HIS HSEp d b a l i a s r e s i d u e FTR TRPp d b a l i a s atom ILE CD1 CDsegment A {pdb 1 i45A.pdb }coordpdb 1 i45A.pdb A
}i f {1} {
# a l i a s i n g some namesp d b a l i a s r e s i d u e HIS HSEp d b a l i a s r e s i d u e FTR TRPp d b a l i a s atom ILE CD1 CDsegment B {pdb 1 i45B .pdb }coordpdb 1 i45B .pdb B
}guesscoordwritepdb 1 i 4 5 . p d bw r i t e p s f 1 i 4 5 . p s f
Notice the generation of the two segments. Equivalently, we can source the 1i45_pgn.tcl filefrom within VMD, by accessing the Extensions;Tk Console and, there:
cd <where t h e pgn f i l e i s >source 1 i 4 5 _ p g n . t c l
Inspection of the 1i45.pdb and 1i45.psf files generated shows that the psfgen program dida good job in assigning the patches capping the two chains:
PSF CMAP
9 !NTITLEREMARKS original generated structure x-plor psf fileREMARKS 4 patches were applied to the molecule.REMARKS topology ./toppar/top_all27_prot_lipid.rtfREMARKS segment A { first NTER; last CTER; auto angles dihedrals }REMARKS segment B { first NTER; last CTER; auto angles dihedrals }REMARKS defaultpatch NTER A:2REMARKS defaultpatch CTER A:248REMARKS defaultpatch NTER B:2REMARKS defaultpatch CTER B:248
7542 !NATOM1 A 2 ALA N NH3 -0.300000 14.0070 02 A 2 ALA HT1 HC 0.330000 1.0080 03 A 2 ALA HT2 HC 0.330000 1.0080 0
(...)
Exercise 1Prepare the PSF files for the 1NEY and 1YPI structures.
3.1.2 Solvating the structure
NAMD offers two alternatives for the solvation of the structure, prior to the MD runs. One can choosea sphere to solvate the proteins and treat the solvent using spherical boundary conditions (SBC) orone can use periodic boundary conditions (PBC) with, e.g., a a cube or a rectangular prism. We willdemonstrate later the SBC with the SCAAS method[Warshel and King, 1985] in MOLARIS but forthe sake of completeness we will show here how to build both a sphere and a rectangular prism ofwaters around our system, while running MD simulations in both.
We start by creating a sphere of waters around the system to run SBC. We will use the wat_sphere.tclfile in Appendix B.2
vmdrun −d i s p d e v t e x t −e o f e x i t < w a t _ s p h e r e . t c l
This generates the files 1i45_ws.pdb and 1i45_ws.psf, that can be displayed with VMD(seeFigure 3.1.2a).
Afterwards, we use the file wat_box.tcl below in an analogous manner and obtain the sol-vated system in Figure 3.1.2b.package r e q u i r e s o l v a t es o l v a t e 1 i 4 5 . p s f 1 i 4 5 . p d b −t 5 −o 1 i45_wb
Figure 3: Solvated systems for NAMD run with spherical and rectangular prism representations
Some proteins may be sensitive to the ionic strength of the surrounding solvent. Even whenthat is not the case, in molecular dynamics (MD) simulations with periodic boundary conditions,the energy of the electrostatic interactions is often computed using the particle-mesh Ewald (PME)summation, which requires the system to be electrically neutral. The vmd autoionize plugin pro-vides a quick way to make the net charge of the system zero by adding random (following someminimum distances between ions) sodium and chlorine ions to the solvent. In our case, for thePBC-based simulations with 0.05M in NaCl, we can run VMD in text mode again with this Tclscript:package r e q u i r e a u t o i o n i z ea u t o i o n i z e −psf 1 i 4 5 _ w b . p s f −pdb 1 i45_wb.pdb −is 0 . 0 5 −o 1 i45_wb_NaClsource s o d 2 p o t . t c l
where the sod2pot.tcl script, used to substitute the Na+ by K+ ions can be obtained in Ap-pendix B.3.
3.1.3 Running the simulations
Once the files needed have been built, we simply need to run the simulation by typing
namd2 1 i45_ws_eq . con f > 1 i45_ws_eq . l o g &
where an example of configuration file 1i45_ws_eq.conf is given in Appendix B.4.
In a similar manner, an example of configuration file for PBC is given in Appendix B.5.
More details on running NAMD simulations can be found in the official NAMD tutorial at http://www.ks.uiuc.edu/Training/Tutorials/. See also Appendix D for links to extraexamples and useful resources.
Exercise 2Run SBC and PBC relaxations and heating for 1NEY and 1YPI. Produce plots for RMSD and totalenergy in each case.
In this quick start guide we will show how to run molecular simulations using MOLARIS[Chu et al., 2003].In particular, we will be setting up a system and running molecular dynamics in a given region withan explicit representation of the solvent.
3.2.1 Preparing the PDB
Using our preferred editor, we edit the *.ent files and change the FTR entries by TRP entries. Forexample, we change the pdb1i45.ent file into a file we will call 1i45mod.pdb by doing the followingsubstitution:
HETATM 1282 N FTR A 168 38.662 51.541 42.102 1.00 15.78 NHETATM 1283 CA FTR A 168 37.687 51.997 43.087 1.00 16.01 CHETATM 1284 CB FTR A 168 37.016 53.291 42.612 1.00 16.50 CHETATM 1285 CG FTR A 168 36.457 53.215 41.211 1.00 18.36 CHETATM 1286 CD2 FTR A 168 35.103 52.917 40.831 1.00 18.75 CHETATM 1287 CE2 FTR A 168 35.046 52.962 39.419 1.00 18.66 CHETATM 1288 CE3 FTR A 168 33.932 52.616 41.545 1.00 20.74 CHETATM 1289 CD1 FTR A 168 37.142 53.419 40.045 1.00 18.66 CHETATM 1290 NE1 FTR A 168 36.302 53.270 38.967 1.00 17.24 NHETATM 1291 CZ2 FTR A 168 33.864 52.718 38.705 1.00 19.86 CHETATM 1292 CZ3 FTR A 168 32.754 52.372 40.827 1.00 21.32 CHETATM 1293 F FTR A 168 31.644 52.083 41.514 0.57 24.59 FHETATM 1294 CH2 FTR A 168 32.735 52.427 39.425 1.00 20.56 CHETATM 1295 C FTR A 168 36.600 50.963 43.385 1.00 16.95 CHETATM 1296 O FTR A 168 35.850 51.115 44.348 1.00 17.00 O
into
ATOM 1282 N TRP A 168 38.662 51.541 42.102 1.00 15.78 NATOM 1283 CA TRP A 168 37.687 51.997 43.087 1.00 16.01 CATOM 1284 CB TRP A 168 37.016 53.291 42.612 1.00 16.50 CATOM 1285 CG TRP A 168 36.457 53.215 41.211 1.00 18.36 CATOM 1286 CD2 TRP A 168 35.103 52.917 40.831 1.00 18.75 CATOM 1287 CE2 TRP A 168 35.046 52.962 39.419 1.00 18.66 CATOM 1288 CE3 TRP A 168 33.932 52.616 41.545 1.00 20.74 CATOM 1289 CD1 TRP A 168 37.142 53.419 40.045 1.00 18.66 CATOM 1290 NE1 TRP A 168 36.302 53.270 38.967 1.00 17.24 NATOM 1291 CZ2 TRP A 168 33.864 52.718 38.705 1.00 19.86 CATOM 1292 CZ3 TRP A 168 32.754 52.372 40.827 1.00 21.32 CATOM 1294 CH2 TRP A 168 32.735 52.427 39.425 1.00 20.56 CATOM 1295 C TRP A 168 36.600 50.963 43.385 1.00 16.95 CATOM 1296 O TRP A 168 35.850 51.115 44.348 1.00 17.00 O
as well as the same change for chain B, of course. Alternatively, we can do something like
sed ’ s / FTR / TRP / ’ pdb1i45 . e n t >temp . pdbmv temp . pdb 1 i45mod . pdbsed ’ s / FTR / TRP / ’ pdb1ney . e n t >temp . pdbmv temp . pdb 1neymod . pdb
Sourced /cbbl/soft/molaris/bin/.molaris_rcSourced /home/jvilla/.molaris_rcUsage:For interactive run, please press the Enter key.For using input file on command line, please press the Enter key,type quit, then type on the command line:molaris < input_file_nameor:molaris input_file_name
you may use command line options to read in alternative libraries:molaris [-a amino_lib_name] [-p parm_lib_name] [-e evb_lib_name]
[-s solvent_opt_name] [-o output_directory_name]
This message informs the user about the different possibilities of running MOLARIS. The usercan run the program interactively, as we will do now, or prepare an input file with all the appropriatecommands for running the calculation in the background. We type <Enter> and after some infor-mation we are prompted for a PDB name. At this point we write the name of the coordinates file ofthe system we are interested in. MOLARIS accepts both PDB and Mol2 formats, or a combinationof them. In this case we type 1i45mod.pdb.
Initially the program checks the coordinates file and look for possible errors in it. If the fileis OK then the program will proceed by comparing the residues of the coordinates file with theresidues in the topology library, provided with the program and called amino98.lib in the currentversion. If the file contains a residue that is not in the library a new entry is automatically added tothis library.
After checking and writing the topology in a special file called $OUT_DIR/1i45mod.topthe user is asked what task he/she wants to perform. In this case we will choose ENZYMIX and theprogram prompts the following table:
Table of the Keywords for the Enzymix Level...........................................
keyword modifier example------- -------- -------pre_enz no pre_enzrelax no relaxac no acevb no evbevb2 no evb2evb_ab no evb_abadiab_pot no adiab_potadiab_tem no adiab_temend no endhelp yes help <keyword1> <keyword2> ...help yes help allhelp no helpexit/quit no exit------------------------------------------------------------------------
Here you start to see that the MOLARIS package works as nested tasks, where every keywordfollow a hierarchy of execution. In this way, every time we finish a particular task we must write
an end statement if we want to save the changes made or exit if we did some mistake and we wantto quit without saving. In this particular case we want to perform a relaxation of the protein, so weselect relax. The following table appears:
Table of the Keywords RELAX Level.................................
keyword modifier example------- -------- ------md_parm no md_parmrest_in yes rest_in rest.inrest_out yes rest_out rest.1energy_out yes energy_out gap.outend no endhelp yes help <keyword1> <keyword2> ...help yes help allhelp no helpexit/quit no exit
Here we have several choices to make. If we just quit the level with end, the program willperform a relaxation taking the default parameters for the MD calculation. Let us change thoseparameters before quitting the relax level. When typing md_parm we enter in the next hierarchylevel and we have all the possible choices in the following table:
Table of the Keywords MD_PARM Level...................................
All the parameters have their default value, but let’s say that we want a shorter run and we wantto change the temperature and the stepsize. To do so we type:
md_parm> nsteps 300md_parm> temperature 200.md_parm> stepsize 0.0002md_parm> end
Closing the relax level with an additional end keyword will start the run. At the beginningthe relevant information is printed (different radii, coordinates for the center, number of solventmolecules generated...) and then the actual MD calculation starts, giving the values of the energiesat intervals of 10 steps:
system - epot : 6513.23 ekin : 1574.62 etot : 8087.85________________________________________________________________________
Constraint energy on region I: 0.00
In dynamics: Istep= 51 Temp= 205.25 Target= 200.00(...)
If the decrease in temperature and stepsize is not enough to obtain a stable run, we can use a sim-ple steepest descent minimization by choosing steep_mini 1 in the md_parm table. Once
rotate_axis 2.0 3.5 6.7 12.0 23.1 -2.3prot_prot no prot_protviewmovie no viewmovieviewpot no viewpotvdwsurf no vdwsurfmakepdb no makepdbmakelib1 no makelib1dock no dockadd_memgrid yes add_memgrid 1.0 3.2 0.5 Y 10.0 20.0 10.0 1 1.0end no endhelp yes help <keyword1> <keyword2> ...help yes help allhelp no helpexit/quit no exit
We choose makepdb:
Table of the Keywords makepdb Level...................................
The program executes the requested commands and it is ready to be quit by double typing end.
At this point it is important to note the use of the non-interactive way of running the program,which allows one to redirect the output. Try, for examplem o l a r i s 1 i 4 5 m o d _ r e l a x . i n p 1 i 4 5 m o d _ r e l a x . o u t
which puts the output in a file 1i45mod_relax.out file in the $OUT_DIR/1i45mod_relax.
Obviusly several runs can be concatenated in the input file when, for example, one needs to heatthe system in several stages. For example, one can create the configuration file:
Exercise 3Check the PDB created by the above scripts. What do you need to do to run a relaxation calculationincluding all the residues in the TIM dimer interface? Run such calculation for the three systems1I45, 1NEY and 1YPI. Plot the behavior of the total energy and the RMSD.
3.3 MD with ADUN
ADUN is a program that is based on the Cocoa/NextStep frameworks. This provides excellent toolsfor a graphical user interface and you can download the latest version of the program from the ADUNGNA site: https://gna.org/projects/adun/. In this session we will use ADUN using itscommand line version, as some of the calculations to be done are still experimental (in particularthe LIE implementation in Section 4.2.
3.3.1 Preparing the PDB
Again, we need to clean the pdb file for being used with ADUN. Analogously to what was donebefore:# download pdbswget h t t p : / / www. pdb . o rg / pdb / f i l e s / 1 I45 . pdb
# t r a n s f o r m FTP t o TRP and d e l e t e what i s n o t neededsed ’ s / FTR / TRP / ’ 1 I45 . pdb | grep −v HOH >temp . pdbsed ’ s /HETATM/ATOM / ’ temp . pdb >1i45mod . pdb
File 1i45mod.pdb has multiple models. Delete all of them except the one you want to use. Stripwater as well. Clean pdbs, renumber them, and add hydrogen atoms with reduce[Word et al., 1999].Clean again fixing hydrogens, cap the protein and take care of histidine namings./ c b b l / s o f t / adun / s h a r e d a p p s / repa i rPDB . py 1 i45mod . pdb numb c l e a nr e d u c e −BUILD 1 i45mod_f ixed . pdb > 1 i45mod_reduced . pdb/ c b b l / s o f t / adun / s h a r e d a p p s / repa i rPDB . py 1 i45mod_reduced . pdb c l e a n hyd cap h i s
Now we can build the adun datasources for each of the systems5. The build may complain thatthere are two Atoms (the two fluorines), although it can be safely ignored. There is a known issuewith the builder script. In around 10% cases it misteriouslycrashes with a segmentation fault. Justrerun it, it will work./ c b b l / s o f t / adun / c h i l e / s c r i p t \
B u i l d e r . s t \1 i 4 5 m o d _ r e d u c e d _ f i x e d . pdb \Amber96
3.4 Running simulations with ADUN
Make a separate directory for the simulation and put the PDB file theremkdir 1 i 4 5cp 1 i45mod_dimer . d a t a s o u r c e 1 i 4 5 /
5datasources or systems are the main objects in ADUN. Check http://lavandula.imim.es/adun-new/?page_id=294 for more details
Prepare a template file by editing copies of /cbbl/soft/adun/resources/template.tempthat you will place in each directory. The original file template.temp already has sensible seet-ings. However, <DATASOURCE>must be replaced by 1i45mod_reduced_fixed.datasource.Also, <NUMBER_OF_STEPS> need to be set to a sensible value. The unit is femtoseconds.
In order to run the simulations in the CBBL cluster we have provide a useful script that doesmost of the job for you# p r e p a r e a c l u s t e r f i l ecp / c b b l / s o f t / adun / r e s o u r c e s / c l u s t e r . i n i 1 i 4 5 /
# go i n t o a l l d i r e c t o r i e s and e d i t t h e c l u s t e r . i n i# p u t in a n i c e name and a queue ( e . g . c b b l )
# s t a r t s i m u l a t i o n s/ c b b l / s o f t / adun / c h i l e / c l u s t e r / c b b l / u s e r s / s c r a t c h / c h i l e / 1 i 4 5
3.4.1 Analyizing the results
One of the powerful characteristics of ADUN is its ability to extract results from the simulations tobe analyzed using diverse algorithms.
RMSD analysis We can run the RMSD plugin using the alpha carbons only by/ c b b l / s o f t / adun / c h i l e / s c r i p t \
RMSD. s t \/ c b b l / u s e r s / s c r a t c h / c h i l e / 1 i 4 5 / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \1 i 4 5 m o d _ r e d u c e d _ f i x e d \@CA
Extract energies and trajectories As ADUN is free energy calculation oriented, the analysis ofthe energetics of the system through the MD trajectory is critical. The folowing commands extractenergies, starting at frame 0 until frame 1000 is reached and obtained every second frame:/ c b b l / s o f t / adun / c h i l e / r e s u l t s C o n v e r t e r \
−Mode Energy \−S i m u l a t i o n / c b b l / u s e r s / s c r a t c h / c h i l e / 1 i 4 5 / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \−S t a r t 0 \−Length 1000 \−S t e p S i z e 2
Using also the resultsConverter tool, one can extract a series of pdbs, so that the trajec-tory can be viewed in, e.g., VMD:/ c b b l / s o f t / adun / c h i l e / r e s u l t s C o n v e r t e r \
−Mode C o n f i g u r a t i o n \−S i m u l a t i o n / c b b l / u s e r s / s c r a t c h / c h i l e / 1 i 4 5 / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \−S t a r t 0 \−Length 1000 \−S t e p S i z e 2
Essential dynamics To obtain the essential modes for the alpha carbons only of the system weuse the corresponding ED.st script:/ c b b l / s o f t / adun / c h i l e / s c r i p t \
ED . s t \/ c b b l / u s e r s / s c r a t c h / c h i l e / 1 i 4 5 / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \1 i45mod_dimer \@CA
Exercise 4Follow the same procedure to run an MD simulation for 1NEY. 1NEY has an 13P ligand, strip it,
we will come back to how to incorporate the ligand later. Create RMSD plots and a movie of theMD run for 1NEY.
4 Practical Session II: Solvation, pKa, FEP
4.1 Running solvation and pKa simulations using PDLD/S-LRA
The next task consists in the calculation of the pKa shift for the residues in the TIM interface. Todo so, we will choose the polaris task and the program will prompt us with a table of the options forpolaris:
Table of the Keywords for the Polaris Level...........................................
keyword modifier example------- -------- -------pre_pol no pre_polsolv_pdld no solv_pdldsolv_pdld_evb no solv_pdld_evbsolv_fep no solv_fepai_pdld no ai_pdldbind_pdld no bind_pdldbind_pdld_evb no bind_pdld_evbbind_fep no bind_feppka_pdld no pka_pdldpka_fep no pka_fepredox_pdld no redox_pdldredox_fep no redox_feplogp no logptitra_ph_0 no titra_ph_0titra_ph no titra_phpka_multi no pka_multievb_pdld no evb_pdldprot_prot no prot_protend no endhelp yes help <keyword1> <keyword2> ...help yes help allhelp no helpexit/quit no exit
We will choose pka_pdld, and the program will prompt:
Table of the Keywords for the pKa_pdld Level............................................
keyword modifier example------- -------- -------reg1_res yes reg1_res 2pka_w yes pka_w 3.0pdld_fn yes pdld_fn asp.pdldreg1_atm yes reg1_atm 10 to 20ab_crg yes ab_crg 10 0.50 0.0regII_r yes regII_r 16.0config yes config 0 5use_restart no use_restartmd_parm_r no md_parm_rmd_parm_w no md_parm_wmd_parm_p no md_parm_p
help yes help <keyword1> <keyword2> ...help no helpend no endexit/quit no exit
Next we will choose residue 137 as our region I. We will also set the number of configurationsto run and the characteristics of the dynamics. Thus, we will tell the program to run the calculationon the initial protein structure and on 2 more conformations which will be generated automaticallyby MD runs.
In order to run the program we will just end the level and the calculation will proceed. The finalresult of the pKa calculations of this very simple (and of course unreliable because of the short run)test:
PDLD SEMI-MACROSCOPIC ESTIMATE FOR pKa......................................
where the pKa_int corresponds to the intrinsic pKa, the one due to the self energy of the system,while the estimated apparent pKa includes the charge-charge contribution (see the course slides fordetails).
Exercise 5Find the pKa shifts for all residues in the interface of the TIM dimer. Use the prot_prot keywordat the analyze level as a guide.
Exercise 6Recall the solv_pdld keyword at the polaris level is in fact a simplified version of the ther-
modynamic cycle for the pKa shift calculations. Based on this fact, check the stability of theloop 6 residues in the three structure 1I45, 1NEY and 1YPI and discuss the results. See, e.g.,[Bonet et al., 2006, Scheper et al., 2009].
Next we will evaluate the absolute free energy of solvation of a ligand to the two structures bymeans of the linear interaction energy method by Aqvist and coworkers[Hansson et al., 1998]. Weare going to run a linear interaction energy calculation using PGH as a ligand for the TIM structures.6
First, we will build PDB files from XXXXmod_reduced_fixed containing the protein plusthe ligand (after docking with autodock, for example). We will call these files XXXXmod_complex.pdb.
Then, we will build datasources for the TIM+PGH and the PGH alone as in Section 3.3.1. Besure that in all PDB files, the PGH moiety bears the same chain label (C, here)./ c b b l / s o f t / adun / c h i l e / s c r i p t B u i l d e r . s t 1 neymod_dimere . pdb Amber96/ c b b l / s o f t / adun / c h i l e / s c r i p t B u i l d e r . s t 1 i45mod_dimere . pdb Amber96/ c b b l / s o f t / adun / c h i l e / s c r i p t B u i l d e r . s t PGH. pdb Amber96
Finally, the LIE run is done by:/ c b b l / s o f t / adun / c h i l e / s c r i p t LIE . s t \
/ c b b l / u s e r s / s c r a t c h / c h i l e / 1 ney / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \/ c b b l / u s e r s / s c r a t c h / c h i l e / pgh / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \C
/ c b b l / s o f t / adun / c h i l e / s c r i p t LIE . s t \/ c b b l / u s e r s / s c r a t c h / c h i l e / 1 i 4 5 / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n/ c b b l / u s e r s / s c r a t c h / c h i l e / pgh / s i m u l a t i o n 1 / CHILE . s i m u l a t i o n \C
Exercise 7Calculate the absolute binding free energy for DHAP in both proteins (the 13P HETATM structurein 1NEY).
5 Practical Session III: enzymatic reactivity with EVB
5.1 EVB for enzymatic reactivity analysis
In order to study an enzymatic reaction we should compare the reaction mechanism in the proteinand the corresponding reaction in water. Thus, most of the times we are interested in comparing thefree energy profiles in both environments. We will see here how to run a simulation in protein. Torun it in water you should modify the pdb file and the input files below.
First, we need to define the resonance states we are going to explore, following Aqvist[Åqvist and Fothergill, 1996]..
endend# now we check the numbering of the atoms for defining region Ianalyze
resatom 1resatom 2resatom 3
end
After the end command, the program will start the FEP protocol, according to the settingsabove. The final result of the program is a bunch of *.map files, each of them corresponding toevery frame.
Exercise 8Design the EVB run for the water system and execute both in the CBBL cluster.
A Running in the luke cluster
The luke cluster is going to be used for all expensive runs with NAMD as well as for all runs inMOLARIS and ADUN. To connect, use
s sh <username >@arbutus . imim . es
This will bring you to your /home/<username> directory. To send calculations, first changeto the /homes/users/<username> directory, which is the one shared by all the nodes in thecluster. Otherwise your calculations will not succeed.
The luke cluster uses the Sun grid engine (SGE) queuing system. In order to use it, first add thisline to your $HOME/.bashrc file:
s o u r c e / c b b l / s o f t / sge6 . 2 u2_1 / d e f a u l t / common / s e t t i n g s . sh
Some important keywords are qsub, qstat, qdel. Use the unix man command to obtain help oneach SGE keyword. You can also find information in several web sites (see, e.g., http://www.ats.ucla.edu/clusters/common/computing/batch/sge.htm).
Running ADUN in the luke cluster is done by using special scripts, as described in the corre-sponding sections.
B Extra material for NAMD
B.1 Set up
In this hands on class you are supposed to use NAMD in a local installation. In case this is notpossible or you need to run in the cluster, you can add the following line to your $HOME/.bashrcfile:
e x p o r t PATH=/ c b b l / s o f t /NAMD_2. 6 _Linux−amd64 / : $PATH
B.2 VMD: wat_sphere.tcl
### S c r i p t t o immerse TIM i n a s p h e r e o f wa ter j u s t l a r g e enough### t o c o v e r i t . $max i s t h e r a d i u s o f t h e p r o t e i n### Adapted from t h e NAMD t u t o r i a l
s e t molname 1 i 4 5
mol new ${molname} . p s fmol a d d f i l e ${molname} . p d b
### Determine t h e c e n t e r o f mass o f t h e m o l e c u l e and s t o r e t h e c o o r d i n a t e ss e t cen [ measure c e n t e r [ a t o m s e l e c t t o p a l l ] we ig h t mass ]s e t x1 [ l i n d e x $cen 0]s e t y1 [ l i n d e x $cen 1]s e t z1 [ l i n d e x $cen 2]
### Determine t h e d i s t a n c e o f t h e f a r t h e s t atom from t h e c e n t e r o f massforeach atom [ [ a t o m s e l e c t t o p a l l ] g e t index ] {
s e t pos [ l i n d e x [ [ a t o m s e l e c t t o p " i n d e x $atom " ] g e t {x y z } ] 0 ]s e t x2 [ l i n d e x $pos 0 ]s e t y2 [ l i n d e x $pos 1 ]s e t z2 [ l i n d e x $pos 2 ]s e t d i s t [ expr pow ( ( $x2−$x1 ) ∗ ( $x2−$x1 ) + ( $y2−$y1 ) ∗ ( $y2−$y1 ) + ( $z2−$z1 ) ∗ ( $z2−$z1 ) , 0 . 5 ) ]i f { $ d i s t > $max} { s e t max $ d i s t }}
mol d e l e t e t o p
### S o l v a t e t h e m o l e c u l e i n a water box w i t h enough padding (15 A ) .### One c o u l d a l t e r n a t i v e l y a l i g n t h e m o l e c u l e such t h a t t h e v e c t o r### from t h e c e n t e r o f mass t o t h e f a r t h e s t atom i s a l i g n e d w i t h an a x i s ,### and t h e n use no paddingpackage r e q u i r e s o l v a t es o l v a t e ${molname} . p s f ${molname} . p d b −t 15 −o d e l _ w a t e r
r e s e t p s fpackage r e q u i r e p s f g e nmol new d e l _ w a t e r . p s fmol a d d f i l e d e l _ w a t e r . p d br e a d p s f d e l _ w a t e r . p s fcoordpdb d e l _ w a t e r . p d b
### Determine which water m o l e c u l e s need t o be d e l e t e d and use a f o r loop### t o d e l e t e thems e t wat [ a t o m s e l e c t t o p " same r e s i d u e as { w a t e r and ( ( x−$x1 ) ∗ ( x−$x1 ) + ( y−$y1 ) ∗ ( y−$y1 ) + ( z−$z1 ) ∗ ( z−$z1 ) ) < ( $max∗$max ) } " ]s e t d e l [ a t o m s e l e c t t o p " w a t e r and n o t same r e s i d u e as { w a t e r and ( ( x−$x1 ) ∗ ( x−$x1 ) + ( y−$y1 ) ∗ ( y−$y1 ) + ( z−$z1 ) ∗ ( z−$z1 ) ) < ( $max∗$max ) } " ]s e t seg [ $ d e l g e t s e g i d ]s e t r e s [ $ d e l g e t r e s i d ]s e t name [ $ d e l g e t name ]f o r { s e t i 0} { $ i < [ l l e n g t h $seg ] } { i n c r i } {
de l a tom [ l i n d e x $seg $ i ] [ l i n d e x $ r e s $ i ] [ l i n d e x $name $ i ]}
w r i t e p s f ${molname} _ w s . p s fwritepdb ${molname} _ws.pdb
mol d e l e t e t o p
mol new ${molname} _ w s . p s fmol a d d f i l e ${molname} _ws.pdbputs "CENTER OF MASS OF SPHERE I S : [ measure c e n t e r [ a t o m s e l e c t t o p a l l ] we ig h t mass ] "puts "RADIUS OF SPHERE I S : $max "mol d e l e t e t o p
B.3 VMD: sod2pot.tcl
# ! / u s r / l o c a l / b i n / vmd −dispdev t e x t# r e p l a c i n g Na+ w i t h K+ ( or a n y t h i n g e l s e w i t h a n y t h i n g e l s e )# adap ted from t h e o r i g i n a l f i l e from# I l y a B a l a b i n ( i l y a @ k s . u i u c . e d u ) , 2002−2003
# d e f i n e i n p u t f i l e s he res e t p s f f i l e " 1 i45_wb_NaCl .ps f "s e t p d b f i l e " 1 i45_wb_NaCl.pdb "s e t p r e f i x " 1 i45_wb_KCl "
# d e f i n e what i o n s t o r e p l a c e w i t h what i o n ss e t i on f rom "SOD"s e t i o n t o "POT"
# do n o t change a n y t h i n g below t h i s l i n epackage r e q u i r e p s f g e nt o p o l og y t o p _ a l l 2 7 _ p r o t _ l i p i d . r t f
puts " \ nSod2pot ) Reading ${ p s f f i l e } / ${ p d b f i l e } . . . "r e s e t p s f
r e a d p s f $ p s f f i l ecoordpdb $ p d b f i l emol load p s f $ p s f f i l e pdb $ p d b f i l e
s e t s e l [ a t o m s e l e c t t o p " name $ ionf rom " ]s e t p o s l i s t [ $ s e l g e t {x y z } ]s e t s e g l i s t [ $ s e l g e t s e g i d ]s e t r e s l i s t [ $ s e l g e t r e s i d ]s e t num [ l l e n g t h $ r e s l i s t ]puts " Sod2pot ) Found ${num} ${ ion f rom } i o n s t o r e p l a c e . . . "s e t num 0foreach s e g i d $ s e g l i s t r e s i d $ r e s l i s t {
de l a tom $ s e g i d $ r e s i di n c r num
}puts " Sod2pot ) D e l e t e d ${num} ${ ion f rom } i o n s "segment $ i o n t o {
f i r s t NONEl a s t NONEforeach r e s $ r e s l i s t {
r e s i d u e $ r e s $ i o n t o}
}s e t num [ l l e n g t h $ r e s l i s t ]puts " Sod2pot ) C r e a t e d ${num} t o p o l o g y e n t r i e s f o r ${ i o n t o } i o n s "s e t num 0foreach xyz $ p o s l i s t r e s $ r e s l i s t {
coord $ i o n t o $ r e s $ i o n t o $xyzi n c r num
}puts " Sod2pot ) S e t c o o r d i n a t e s f o r ${num} ${ i o n t o } i o n s "w r i t e p s f " ${ p r e f i x } . p s f "writepdb " ${ p r e f i x } . p d b "puts " Sod2pot ) Wrote ${ p r e f i x } . p s f / ${ p r e f i x } . p d b "puts " Sod2pot ) A l l d o n e . "q u i t
B.4 VMD: 1i45_ws_eq.conf
1 # Minimization and Equilibration of TIM in a water sphere2
37 # Constant Temperature Control38 langevin on ;# do langevin dynamics39 langevinDamping 5 ;# damping coefficient (gamma) of 5/ps40 langevinTemp $temperature41 langevinHydrogen off ;# don’t couple langevin bath to hydrogens42
38 # Constant Temperature Control39 langevin on ;# do langevin dynamics40 langevinDamping 5 ;# damping coefficient (gamma) of 5/ps41 langevinTemp $temperature42 langevinHydrogen off ;# don’t couple langevin bath to hydrogens43
Add these lines to your $HOME/.bashrc file:e x p o r t LD_LIBRARY_PATH=/ c b b l / s o f t / adun / c h i l e / GNUstep / L i b r a r y / L i b r a r i e s : \/ s o f t / l i b : / c b b l / s o f t / GNUstep / System / L i b r a r y / L i b r a r i e s : \/ c b b l / s o f t / adun / l i b / l i b : / c b b l / s o f t / OMPI / l i be x p o r t HOMEPATH=$HOMEe x p o r t PATH=/ c b b l / s o f t / r educe −3 . 1 3 / :$PATH
D Additional tools
NAMD • a nice interface to NAMD runs can be found at http://mmtsb.org/workshops/mmtsb-ctbp_workshop_2009/Tutorials/MMTSB_NAMDSimulation/MMTSB_NAMDSimulation.html
• an extensive example of a standard protocol for minimizing, heating and producingsimulations with NAMD is provided here: http://faculty.uml.edu/vbarsegov/teaching/bioinformatics/lectures/MDSimulationsModified.pdf
• running replica exchange simulations with NAMD: http://www.ks.uiuc.edu/Research/
namd/2.6/ug/node40.html
• NAMD case studies http://www.ks.uiuc.edu/Training/CaseStudies/
MOLARIS • The complete MOLARIS tutorials: http://cbbl.imim.es/?page_id=143#molaris.
ADUN • The Adun site: http://adun.imim.es
• The Adun install guide can be found here: http://lavandula.imim.es/adun-new/?page_id=103
• What to check if something goes wrong with adun? http://lavandula.imim.es/adun-new/?page_id=308
• Experimental Live CD including and ADUN distribution: http://susegallery.com/a/hvXWpn/adun-user.
References
[Åqvist and Fothergill, 1996] Åqvist, J. and Fothergill, M. (1996). Computer Simulation of theTriosephosphate Isomerase Catalyzed Reaction. 271(17):10010–10016.
[Bonet et al., 2006] Bonet, J., Caltabiano, G., Khan, A., Johnstons, M., Corbí, C., Gómez, A.,Rovira, X., Teyra, J., and Villà-Freixa, J. (2006). The Role of Residue Stability in Tran-sient Protein-Protein Interactions Involved in Enzymatic Phosphate Hydrolysis. A ComputationalStudy. 63:65–77.
[Chu et al., 2003] Chu, Z. T., Villà-Freixa, J., Štrajbl, M., Schutz, C. N., Shurki, A., and Warshel,A. (2003). MOLARIS version alpha9.06.01.
[Hansson et al., 1998] Hansson, T., Marelius, J., and Åqvist, J. (1998). Ligand-binding affinityprediction by linear interaction energy methods. J. of Comput-Aided Mol. Design, 12(1):27–35.
[Humphrey et al., 1996] Humphrey, W., Dalke, A., and Schulten, K. (1996). Vmd: visual molecu-lar dynamics. J Mol Graph, 14(1):33–8, 27–8.
[Jogl et al., 2003] Jogl, G., Rozovsky, S., McDermott, A. E., and Tong, L. (2003). Optimal align-ment for enzymatic proton transfer: Structure of the Michaelis complex of triosephosphate iso-merase at 1.2-A resolution. Proceedings of the National Academy of Sciences of the United Statesof America, 100(1):50–55.
[Johnston et al., 2005] Johnston, M. A., Galvan, I. F., and Villà-Freixa, J. (2005). Framework-baseddesign of a new all-purpose molecular simulation application: the adun simulator. J ComputChem, 26(15):1647–1659.
[Lee et al., 1993] Lee, F., Chu, Z., and Warshel, A. (1993). Microscopic and semimicroscopiccalculations of electrostatic energies in proteins by the POLARIS and ENZYMIX programs.Journal of Computational Chemistry, 14(2):161–185.
[Lolis et al., 1990] Lolis, E., Alber, T., Davenport, R. C., Rose, D., Hartman, F. C., and Petsko,G. A. (1990). Structure of yeast triosephosphate isomerase at 1.9A resolution. Biochemistry,29(28):6609–6618.
[Phillips et al., 2005] Phillips, J. C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E.,Chipot, C., Skeel, R. D., Kalé, L., and Schulten, K. (2005). Scalable molecular dynamics withnamd. J Comput Chem, 26(16):1781–1802.
[Rozovsky et al., 2001] Rozovsky, S., Jogl, G., Tong, L., and McDermott, A. E. (2001). Solution-state NMR investigations of triosephosphate isomerase active site loop motion: ligand release inrelation to active site loop dynamics. Journal of Molecular Biology, 310(1):271 – 280.
[Scheper et al., 2009] Scheper, J., Oliva, B., Villà-Freixa, J., and Thomson, T. M. (2009). Analysisof electrostatic contributions to the selectivity of interactions between ring-finger domains andubiquitin-conjugating enzymes. Proteins, 74(1):92–103.
[Warshel and King, 1985] Warshel, A. and King, G. (1985). Polarization Constraints in MolecularDynamics Simulation of Aqueous Solutions: The Surface Constraint All Atom Solvent (SCAAS)Model. 121:124–9.
[Word et al., 1999] Word, J., Lovell, S., Richardson, J., and Richardson, D. (1999). Asparagine andglutamine: using hydrogen atom contacts in the choice of side-chain amide orientation1. Journalof molecular biology, 285(4):1735–1747.