Cellulose-Builder: A Toolkit for Building Crystalline Structures of Cellulose Thiago C. F. Gomes [a] and Munir S. Skaf* [a] Cellulose-builder is a user-friendly program that builds crystalline structures of cellulose of different sizes and geometries. The program generates Cartesian coordinates for all atoms of the specified structure in the Protein Data Bank format, suitable for using as starting configurations in molecular dynamics simulations and other calculations. Crystalline structures of cellulose polymorphs Ia,Ib, II, and III I of practically any size are readily constructed which includes parallelepipeds, plant cell wall cellulose elementary fibrils of any length, and monolayers. Periodic boundary conditions along the crystallographic directions are easily imposed. The program also generates atom connectivity file in PSF format, required by well-known simulation packages such as NAMD, CHARMM, and others. Cellulose-builder is based on the Bash programming language and should run on practically any Unix-like platform, demands very modest hardware, and is freely available for download from ftp://ftp.iqm.unicamp.br/pub/cellulose-builder. V C 2012 Wiley Periodicals, Inc. DOI: 10.1002/jcc.22959 Introduction Cellulose has recently attracted a great deal of attention due to its potential to become a carbon-neutral feedstock for renewable biofuels and chemicals. As a major component of vegetable biomass, cellulose is the most abundant organic compound on Earth’s biosphere. Many efforts have been devoted to comprehend the structure and properties of cellu- lose itself, [1–8] and to understand the microscopic nature of plant cell wall architecture and the molecular aspects associ- ated with its structural strength. [9] One of the major challenges in the development of a sus- tainable means of obtaining biofuels and other valuable chem- icals from lignocellulosic biomass is the recalcitrance of the cellulose to the action of degrading enzymes and chemi- cals. [10] The deconstruction of lignocellulose biopolymers into fermentable sugars by means of enzymatic saccharification is the most economically costly and scientifically challenging step of the currently available process for biochemical conver- sion of biomass into liquid fuels. Therefore, it is of fundamen- tal importance to gain further understanding of the mecha- nisms by which enzymes and auxiliary proteins recognize, bind to, and disrupt crystalline cellulose for subsequent cleav- age of the glycosidic bonds of the polysaccharide chains. To this aim, several molecular dynamics (MD) computer simula- tions have been reported recently which utilize crystalline cel- lulose tridimensional structures or surfaces as model sub- strates, in addition to the proteins of interest. [7,11–14] Very recently, MD simulations of the interactions between cellulose and ionic liquids have also been reported [15] in attempt to deepen our molecular level understanding of how ionic liquids dissolve crystalline cellulose. [16,17] These simulation studies share in common the need for the atomic coordinates of the cellulose crystal structures as initial configurations, which were independently constructed accord- ing to the structure of the desired substrate. As the scientific activity in this area is rapidly increasing, it would be very use- ful to theoretical and computational chemists and physicists alike to be able to readily construct crystalline structures of cellulose of different sizes and shapes. In this work, we present cellulose-builder, an automated solution for generating atomic coordinate files in the Protein Data Bank (PDB) format that can be readily used as input to simulate systems containing struc- tures of crystalline cellulose of practically any size, shape, and dimension. The code is freely available for download at ftp:// ftp.iqm.unicamp.br/pub/cellulose-builder. Cellulose-builder is written as a Bash script and relies on sev- eral well-established tools available on most Unix-like opera- tional system and on the VMD package [18] to provide an auto- mated, straightforward, and user friendly means of generating cellulose crystals of different shapes and sizes. Cellulose-builder only requires users to enter three integers (i, j, k) correspond- ing to the number of cellulose unit cells to be replicated in each crystallographic direction (a, b, c). The script will then perform all operations needed to build a crystal of the chosen size and will produce the initial configuration file in PDB for- mat. In addition, cellulose-builder will also output the corre- sponding atom connectivity (topology) information as a PSF file, which contains molecule-specific information required by some of the most popular MD simulation packages, including NAMD [19] and CHARMM, [20] and can be easily converted into the AMBER [21] file type prmtop. [a] T. C. F. Gomes, M. S. Skaf Institute of Chemistry, State University of Campinas –UNICAMP, Cx. P. 6154, Campinas, SP 13083-970, Brazil E-mail: [email protected]Contract/grant sponsor: Fapesp; contract/grant number: 08/56255-9; Contract/grant sponsor: CNPq; contract/grant number: 140978/2009-7. V C 2012 Wiley Periodicals, Inc. 1338 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
9
Embed
Cellulose-Builder: A toolkit for building crystalline structures of cellulose
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Cellulose-Builder: A Toolkit for Building CrystallineStructures of Cellulose
Thiago C. F. Gomes[a] and Munir S. Skaf*[a]
Cellulose-builder is a user-friendly program that builds
crystalline structures of cellulose of different sizes and
geometries. The program generates Cartesian coordinates
for all atoms of the specified structure in the Protein Data
Bank format, suitable for using as starting configurations in
molecular dynamics simulations and other calculations.
Crystalline structures of cellulose polymorphs Ia, Ib, II, andIIII of practically any size are readily constructed which
includes parallelepipeds, plant cell wall cellulose elementary
fibrils of any length, and monolayers. Periodic boundary
conditions along the crystallographic directions are easily
imposed. The program also generates atom connectivity file
in PSF format, required by well-known simulation packages
such as NAMD, CHARMM, and others. Cellulose-builder is
based on the Bash programming language and should run
on practically any Unix-like platform, demands very modest
hardware, and is freely available for download from
with respect to the three axes shown in Figure 3. The user
may wish to implement pbc to one, two or all three crystallo-
graphic directions (a, b, c). This can be easily accomplished by
editing the variable PBC in the text file input.inp and run ./cel-
lulose-builder at the prompt, as shown by examples 1 to 3
above, for instance. Let us discuss how to implement pbc
along directions a and b separately from direction c.
Figure 2. Cellulose-builder simplified workflow. Given the experimentally determined space group (P21 for cel-
lulose Ib, II and IIII) and experimental fractional (atomic) coordinates, the program determines the symmetry
equivalent positions of all other atoms within the unit cell. The program then replicates unit cells according to
user input requirements, exploring the convenience of working in fractional coordinates for such task. After rep-
lication, fractional coordinates are transformed into Cartesian coordinates using experimental cell dimensions to
yield a file in XYZ format. A major editing is then performed to achieve the initial configuration file in PDB for-
mat, suited for common MD simulation packages. The program also writes a script for psfgen and executes it,
yielding a connectivity information file, in PSF format, meant to model the cellulose crystal using the CHARMM
force field for carbohydrates.[26,27]
*A Linux-like environment for Windows that ports software running on POSIXsystems (such as Linux, BSD, and Unix systems) onto Windows. (http://www.cygwin.com).
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
1340 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM
For pbc along the crystallographic a direction only one
must set PBC¼A. For instance, with PBC variable set to PBC¼A,
the following command:
$ :=cellulose-builder 4 5 5 (example 4)
yields the crystallite shown in Figure 5A.
For obtaining translational symmetry along the crystallo-
graphic b direction, one must set PBC¼B in file input.inp. With
PBC¼B in file input.inp, the command
$ :=cellulose-builder 5 4 5 (example 5)
produces the crystallite shown in Figure 5B.
To obtain a structure periodically replicated along both crys-
tallographic a and b directions, the input.inp file must be
edited to set PBC¼ALL. The crystallite shown in Figure 5C was
obtained from the command below with PBC¼ALL:
$ :=cellulose-builder 4 4 5 (example 6)
Figure 3. Cellulose Ib crystallite generated by cellulose-builder as seen from its nonreducing ends (left) and rotated by 90� (right). For the sake of consis-
tency with notation used by other authors,[7] we have adopted the same viewpoint as those authors for showing the cellulose crystallite. Crystallographic
faces are indicated by their corresponding Miller indices. Origin and center chain layers, and unit cell axes are indicated as well. Cellulose chains are parallel
to the c direction.
Figure 4. Different surfaces exposed by two different cellulose Ib blade-shaped crystallites. Left: predominantly hydrophilic (010) surfaces are exposed.
Right: predominantly hydrophobic (100) surfaces are exposed. Top and bottom images represent the same crystallite seen from different viewpoints on
VMD’s X-window OpenGL display.
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1341
Imposing pbc along crystallographic directions a or b only
makes sense to allomorphs Ib and II, since for Ia and IIII, the
unit cell is such that crystallites automatically have transla-
tional symmetry in all crystallographic directions.
Regarding the crystallographic c direction, no special action is
needed to endow crystallites with translational symmetry along
that direction, since any replication of cellulose unit cell yields a
crystallite already possessing that property for any allomorph.
As a consequence, all cellulose crystallites generated by cellu-
lose-builder will automatically possess translational symmetry in
the crystallographic c direction. Indeed, the default PBC value in
input.inp file is PBC¼NONE. The crystallites shown in Figures 3
and 4 were built with PBC¼NONE. Setting the Bash variable
PBC to NONE does not mean that the resulting crystallite will
have no translational symmetry, but instead that no subsequent
procedure is necessary to confer further translational symmetry
to the crystallite after its construction. However, with this
option, a hydrogen atom (H) and a hydroxyl group (OH) will be
respectively added to the opposing end points of every chain
ensuring there are no dangling bonds.†
Very often in computer simulations of crystalline cellulose,
one would like to work with truly infinitely periodic systems
along the c direction, which requires a perfectly matching
bond between reducing and nonreducing ends of replicated
chains along the c direction.‡ To control periodic covalent
bonding in the final connectivity information file (crystal.psf )
delivered by cellulose-builder one must edit the third line of
the file input.inp, which reads:
PCB c ¼ FALSE
Setting the Bash variable PCB_c to FALSE in the input.inp
file causes no periodic covalent bonding along c direction, and
is the default. To include periodic covalent bonding in the final
connectivity file one must set:
PCB c ¼ TRUE
In addition to parallelepiped crystallites, this option can be
also applied to the other two crystal types provided by cellu-
lose-builder, i.e. elementary fibrils and monolayers, as
described next.
Elementary fibrils
Cellulose-builder can also build cellulose elementary fibrils of any
length from allomorphs Ia, Ib, and II. Cellulose elementary fibrils
from allomorph Ib possess the cross-section depicted in Figure
1. The primary fibril with such a cross section was constructed
by carving out a larger Ib parallelepiped crystallite (see Support-
ing Information). This disposition of chains corresponds to a
recently proposed model for the elementary cellulose fibrils of
maize cell wall, free of hemicelluloses, lignins, and pectins.[9] The
model is likely to be applicable to several other species of plants
since the terminal enzymatic complex that synthesizes cellulose
elementary fibrils at maize cell membranes is similar to that of
other angiosperms.[31] Depending on the source tissue and orga-
nism, cellulose chains in the elementary fibrils may have from a
few tens to several hundreds of cellobiose residues.
To build an elementary fibril, the string fibril must be passed
as first argument in the command line, whereas the number
of cellobiose units in the chains that compose the elementary
fibril (i.e. the degree of polymerization) is specified by an inte-
ger k as second argument:
$ :=cellulose-builder fibril k (example 7)
Cellulose Ib elementary fibrils of several lengths are shown
in Figure 6, for k ¼ 5, 25, 50, 100, and 500. If one wishes to
impose pbc along the chains direction, periodic covalent
bonding can be implemented by setting PBC_C¼TRUE, as
discussed above. In the case of elementary fibrils, pbc are
supported in the c direction only. Fibrils with arbitrary cross
sections, different from the maize cell wall cellulose elemen-
tary fibril shown in Figure 1, can also be readily constructed
(see Supporting Information). Elementary fibrils can be fur-
ther arranged to assemble complex hyperstructures as mod-
els for plant cell walls or simply solvated by molecular sol-
vents (Supporting Information) using software such as
PACKMOL.[22]
Monolayers
Cellulose Ib crystal structure consists of alternating layers of or-
igin and center cellulose chains, with no hydrogen bonds
between layers.[1] Recent experimental studies have shown cel-
lulose elementary fibrils from woody material to undergo
delamination (or peeling) along its (200) plane after (2,2,6,6-
tetramethylpiperidin-1-yl)oxyl-mediated oxidation and intensive
sonication.[32] Those finding motivated us to include an option
for generating monolayers. So, the command:
$ :=cellulose-builder origin j k (example 8)
will build a monolayer composed of j cellulose origin chains
containing k cellobiose residues each, whereas,
$ :=cellulose-builder center j k (example 9)
yields a similar monolayer composed of center chains. Exam-
ples 9 and 10 are also valid for obtaining monolayers com-
posed of origin or center chains from allomorph II. Since allo-
morph Ia has only one type of chain, the equivalent command
line for obtaining a monolayer of chains from cellulose Ia is
$ :=cellulose-builder monolayer j k (example 10)
†Indeed, since for any allomorph the unit cell is composed of anhydrogluco-pyranose units, any replication of cellulose unit cell yields a crystallite possess-ing translational symmetry along the c direction. Nevertheless, in the finalsteps, missing atoms at the extremities of cellulose chains are added: onehydrogen atom is added at one terminus, one OH group at the other terminusof each cellulose chain. These are the only atoms in the whole crystallitewhose coordinates are guessed for (except for allomorph II whose hydrogensHO2, HO3, and HO6 positions have not been determined experimentally[3]
due to lack of neutron diffraction data and so have to be guessed). Thus, trans-lational symmetry along the c direction is actually conditioned to the elimina-tion of those added atoms (one water molecule per cellulose chain).‡Covalent bonding between atoms of different chains is usually set up in theinput files (topology) of molecular simulation program suites.
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG
1342 Journal of Computational Chemistry 2012, 33, 1338–1346 WWW.CHEMISTRYVIEWS.COM
Cellulose IIII has only one type of chain as well, but
there is more than one manner of producing monolayers
from its structure. Therefore, monolayers are not automati-
cally supported for allomorph IIII. Nevertheless, one can
always build crystallites of arbitrary shapes for any of the
allomorphs by a simple method provided in Supporting
Information.
Similarly, periodic boundary conditions can be implemented
for monolayers along the chains direction via periodic covalent
bonding by setting PBC_C¼TRUE in file input.inp.
Figure 5. Cellulose Ib crystallites suited for pbc in a (left), b (middle) and both a and b (right) crystallographic directions. Origin and center chain layers
are indicated, as well as crystallographic directions and Miller indices. The notation adopted for Miller indices is the same adopted by Matthews et al.,[7] so
faces where center chains are in the surface are indicated by (200) or (020) to reflect their positions half-way the unit cell. Colored lines within crystallites
indicate the crystallographic directions along which the crystallite is endowed with translational symmetry.
Figure 6. Cellulose Ib elementary fibrils of different lengths, possessing k ¼ 5, 25, 50, 100, and 500 cellobiose units, generated with cellulose-builder. The
fibril with k ¼ 5 is magnified. All fibrils have the cross section shown in Figure 1.
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1343
File Structure and Variables
Under the parent directory cellulose-builder, several files
should be present for each supported allomorph: a file
describing its asymmetric unit in fractional coordinates
(asy_I_alpha, asy_I_beta, asy_II, asy_III_I); a file describing the
unit cell parameters (dimensions_I_alpha, dimensions_I_beta,
dimensions_II, dimensions_III_I); a file listing atom labels
cache size: 1024 Kb), under GNU/Linux operating system (Ubuntu 8.10,
kernal 2.6.27-7-generic, i686). Elapsed real time (wall clock), amount of
CPU-time that the process used directly (in user mode), amount of
CPU-time used by system on behalf of the process (in kernel mode),
and percentage of the CPU usage by the code, as provided by /usr/bin/
time. All times are in seconds. Percentage of CPU is just user þ system
times divided by the total running time. The resident memory usage
for the largest system was only 360 Mb.
§In computing, a symbolic link (also symlink or soft link) is a special type of filesupported by the POSIX operating-system standard that contains a referenceto another file or directory in the form of an absolute or relative path.
WWW.C-CHEM.ORG SOFTWARE NEWS AND UPDATES
Journal of Computational Chemistry 2012, 33, 1338–1346 1345
The crystalline structures that can be built with cellulose-
builder may be further combined with other molecules, such
as solvent and proteins, using available codes for generating
initial configurations of molecular simulations.[22] Similarly, the
structures may be combined among themselves to create
more complex assemblies of cellulose and more elaborate
models of plant cell wall. Further developments of the pro-
gram are under way. Extending cellulose-builder to generate
crystalline structures of other glycans of putative relevance to
the study and design of cellulose-degrading enzymes, such
chitin[33] is also under consideration.
Acknowledgments
The authors thank Rodrigo L. Silveira for discussions.