Adapting Visual-Analytical Tools for the Exploration of Structural and Dynamical Features of Polymer Conformations

Full Paper

286

Adapting Visual-Analytical Tools for theExploration of Structural and DynamicalFeatures of Polymer Conformationsa

Sidharth Thakur, Melissa A. Pasquinelli*

Conformational analysis ofmacromolecular structures reveals interesting higher-order spatialarrangements. Analyzing these features as a function of time provides insights into thedynamical behavior of these systems and the identification of relevant subdomains. Wepresent some visual-analytic methods that we devised toexplore the spatial-temporal properties from moleculardynamics simulation data. These methods automaticallydetect common features and connect them to propertiesof interest. These methods yield physical insights thatare not easily obtainable with existing methods forparticle simulation data, as illustrated for polyacetyleneinteracting with a carbon nanotube.

Introduction

In simulations of polymer-based nanomaterials, research-

ers often observe interesting structural features and higher-

order arrangements of polymers. Some of the emergent

structures include coils and loops in individual polymer

chains[1–3] and alignments and entanglements in com-

plexes.[4,5] These structures can be within one chain or

among chains of the same or different chemical composi-

tion. An exploration of the types of these substructures and

their spatial-temporal evolutions as functions of time can

provide researchers useful insight about important func-

tional and physical properties of the underlying system.

M. A. PasquinelliFiber and Polymer Science/TECS, North Carolina State University,Raleigh, North Carolina 27695, USAE-mail: [email protected]. ThakurRenaissance Computing Institute, Chapel Hill, North Carolina27517, USA

a Supporting information for this article is available at Wiley OnlineLibrary or from the author.

Macromol. Theory Simul. 2011, 20, 286–298

� 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim wileyonline

The spatio-temporal behavior of polymeric systems can

be explored using standard tools such as animations,

interactive visualizations, and computational-based ana-

lytical methods. Some of these tools are commonly

available in many standard existing molecular visualiza-

tion applications such as visual molecular dynamics

(VMD).[6] However, the standard tools are sometimes not

sufficient to conduct a detailed or customized analysis of

some of the complicated structures observed in polymeric

systems. For example, tools such as animations and

interactive explorations are generally limited to providing

information about basic spatial-temporal dynamics of a

molecular simulation. Another challenge is that many

existing sophisticated analytical methods are designed for

specific domains; e.g., computational methods developed to

analyze spatial structures of proteins are sometimes not

directly applicable to polymeric systems. In addition, no

existing tools can easily extract and quantify features that

could relate to bulk phenomena such as entanglements and

persistent substructures that may suggest crystal nuclea-

tion or unique characteristics in interfacial regions. There-

fore, there is a strong motivation to develop analytical and

library.com DOI: 10.1002/mats.201000086

Adapting Visual-Analytical Tools for the Exploration . . .

www.mts-journal.de

exploratory methods that are customized to focus on the

features of polymeric systems. The customized tools can be

coupled with visualization techniques to facilitate addi-

tional exploratory tasks that cannot be supported by

traditional tools.

The goal of this work is to develop visual-analytical tools

that can be used to compare substructures of polymer

conformations and to explore local attributes of atomic

trajectories obtained from molecular dynamics (MD)

simulations. We have developed a method for exploring

polymer conformations that combines a number of

existing computational techniques. For example, a curve-

matching technique was adapted to compare and extract

polymer substructures. A dimensionality reduction tech-

nique called multidimensional scaling (MDS) was used to

cluster and visualize common substructures of polymer

conformations. Finally, a visualization system was built

that integrates the computational methods and interactive

visualizations so that similarities of polymer conforma-

tions and their substructures can be explored. Although the

focus is on atomistic MD simulation data, these tools are

general enough to be adapted to any particle-based

simulation data that provides spatio-temporal informa-

tion. One of the challenges that we aim to address using the

visual-analytical tools is to relate the spatio-temporal

behavior of important substructures to the macroscopic

properties of polymers and polymer-based nanocompo-

sites, and this work is an initial step toward that goal.

A fundamental approach in this work is based on

explorations of similarity relationships among polymer

conformations. Similarity refers to proximity or measure of

nearness in some space[7] and similarity computations

involve standard metrics such as Euclidean distances

d2RN� �

. Similarity analysis allows us to address some of

the challenges associated with understanding structure–

property relationships of polymeric systems, and serves as a

basis for the development of more complex visual-analytic

tools. For example, similarity analyses and visualizations of

local attributes of polymer conformations and atomic

trajectories are employed to explore spatial and temporal

dynamics of the polymer molecular system. Another

technique is to identify and compare persistent and salientb

features of polymer conformations. These methods can be

employed to study systems involving single or multichain

polymers; to compare sets of related polymers, such as

across a class like polyolefins or polyamides; or to

b In the context of molecular structure analysis, salient molecularstructures are defined as those that differ significantly from othersin a neighborhood; in previous work,[8] saliency was defined usingdifferences of global structures of helices in protein complexesover distinct time steps. Persistent structures are those that arestable over some time interval during a molecular dynamicssimulation.

www.MaterialsViews.com

Macromol. Theory Simul.

� 2011 WILEY-VCH Verlag Gmb

investigate the evolution of distinct features in interfacial

regions, whether the interface is with another polymer

system or with a nanoparticle.

As a test case for this paper, we have chosen to use two

simulations of a single chain of polyacetylene (PAC), one as

it interacts with a single-walled carbon nanotube (CNT) and

the other as an isolated system. We chose PAC because of its

simplistic structure and also because the rigidity along the

backbone due to the conjugated p system provides some

conformational limitations. In addition, recent studies by

Tallury and Pasquinelli[9] investigated the interactions of a

single chain of PAC with a single-walled CNT, and reported

that PAC forms a distinct folding-like structure along the

CNT surface, where intrachain interactions and crossings

are maximized. These observations for PAC (which can also

be observed in Figure 4 in this paper) were in contrast to

other rigid backbone polymers in the study that contained

aromatic groups in the backbone, which formed more

helical structures to maximize thep–p interactions with the

CNT surface. These chain folds for PAC were observed lead to

the polymer ‘‘coating’’ the CNT surface rather than fully

wrapping around its diameter. Therefore, PAC serves as an

interesting system to use as a case study for testing the

visual-analytic tools discussed in this report.

Generation of Data

The datasets used in this work were generated from

atomistic MD simulations; specific details can be found in

previous work,[9,10] but will be briefly summarized here.

The polymer chain was generated with the graphical user

interface of DL_POLY version 2.19.[11] The polymers were all

built in head-to-tail configuration, and the molecular

structures were first equilibrated with an MD simulation

for 100 steps using a time step of 1 fs. A (10,10) zigzag single-

walled CNT was used with a diameter of 7.7 A and a length

of 125.0 A, which was built with the DL_POLY[11] graphical

user interface with the built-in Bucky-tube module. To

simplify the computational analysis of the simulations, the

atoms in the CNT were frozen spatially. Initial configura-

tions of the CNT and polymer chain were created by

aligning the CNT along the Z-axis, and then the relaxed

polymer chain was placed such that the perpendicular

distance from the CNT was around 40 A, implying that the

polymer chain was well outside of the cutoff radius of

interaction at the initial stage.

Molecular dynamics (MD) simulations were performed

with DL_POLY version 2.19.[11] We used a constant number

of molecules, constant volume, and constant temperature

(NVT) ensemble at 300 K with cubic periodic boundary

conditions and the DREIDING[12] force field. We used a time

step of 1 fs, and all cutoff radii were set to 10.0 A. After a

system equilibration of 0.5 ns, the dynamics of the polymer-

2011, 20, 286–298

H & Co. KGaA, Weinheim287

288

www.mts-journal.de

S. Thakur, M. A. Pasquinelli

CNT system were recorded for 3.0 ns. The data analysis tools

described in this report were performed on this 3 ns

production run.

Exploration of Conformational Similarities

Polymer molecules often form interesting three-dimen-

sional spatial structures such as coils and regular bends or

loops. Knowledge gained from analysis of the polymer

structures can help in understanding how the physical and

chemical properties of polymer molecules induce the

formation of the interesting spatial arrangements, such

as the nucleation of polymer crystallization. In this section,

we discuss a visual-analytical approach used to explore

similarity relationships among polymer conformations

and to expose interesting properties of the MD of polymers.

This approach involves visualization of similarity analysis

and the development of atom-time-value (ATV) plots.

Similarity Analysis and Visualization

To determine similarity relationships among a set of

polymer conformations, global comparisons were per-

formed in earlier work[13,14] that considered the entire

three-dimensional structure of conformations in spatial

and temporal regimes. This goal was accomplished by

adapting a computational technique proposed by Best and

Hege[15] that employs a two-step computational procedure.

First, feature vectors that describe molecular conformations

are generated using numeric measures or indexes that are

based on geometric and statistical properties of the

conformations. A feature vector for a polymer conforma-

tion at the ith time step is an m-tuple array:

di ¼ d1; d1; � � � ;dt� �

t2 0;m½ � (1)

where dt is a scalar value computed using a numeric metric

and the size m of the feature vector is dependent on the

selected metric. Table 1 lists standard metrics that were

explored in the previous work.[13,14]

In the second step in the computational procedure,

similarity scores for all unique pairs of conformations are

obtained based on root mean square deviation (RMSD) error

between corresponding feature vectors. The following

equation represents the computation of the similarity

score between two distinct conformations at time

steps (i, j):

Dij ¼ 1�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

m

Xmt¼1

dti�dt

j

� �2s

; (2)

t t
where di ; dj are corresponding elements of the feature
vectors for the conformations at the time steps i and j;



Di¼j ¼ 1; and m is the size of the vectors. In their earlier

work,[15] Best and Hege applied this method to analyze

protein conformations and employed a two-dimensional

graph-based representation to visualize clusters of similar

conformations.

Correlation matrices were adapted in our earlier

work[13,14] to display similarity relationships among whole

conformations and among trajectories of backbone atoms.

A correlation matrix is a standard technique to compactly

display pairwise similarity (or dissimilarity) relationships

among a large set of entities. In the matrix, the entities are

arranged along rows and columns and the similarity scores

are represented by encoding the matrix cells with a color-

coding scheme. An example of such a correlation matrix is

given in the right hand side of Figure 1. The matrix

representation provides a more detailed overview of the

changes in local and global conformational similarity

relationships than the 2D graphs used by Best and Hege.[15]

Figure 1 depicts how a correlation matrix is constructed to

display pairwise similarity relationships among a set of

conformations through the use of feature vectors derived

from MD trajectory data. To support interactive exploration

of the similarity relationships, the correlation matrix

technique was also supplemented with standard visualiza-

tion tools such as linked matrix-molecular visualization

displays and matrix filtering operations, which is described

in more detail in our previous work.[13,14]

These visualizations of correlations matrices are useful

for displaying global structural changes in entire polymer

conformations and for displaying relationships among

trajectories of backbone atoms. However, it is challenging

to apply the matrix-based approach to display changes in

local attributes associated with substructures of polymer

conformations, such as those that may be relevant to

interfacial properties or crystal nucleation. To address this

limitation, we next describe an approach that extends the

matrix visualization approach to the visualization of local

properties of atomic trajectories.

Visualization of Local Properties

Visualization of local properties simultaneously in spatial

and temporal domains is obtained by plotting some scalar

property in a matrix-based visualization relative to two

dimensions, namely (i) sequential order of atomic indexes

(arranged vertically in the plot), and (ii) time steps (arranged

horizontally). We refer to these plots as Atom-Time-Value

(ATV) plots. Figure 2 illustrates an example of an ATV plot

and depicts its relation to trajectories of backbone atoms

and time. Normalized scalar values are represented using a

bi-variate color-coding scheme and the values are typically

aggregated into a few (five or six) intervals; the bi-variate

color coding scheme helps to expose emergent patterns

corresponding to similarities among backbone atoms.

2011, 20, 286–298

H & Co. KGaA, Weinheim www.MaterialsViews.com

Table 1. Numeric metrics used for comparing polymer conformations. More details of these metrics can be found in ref. [13,14]

Metric Description

Interatomic distances Based on interatomic distances and is defined as follows:

d ¼ dijðxÞ ¼ j~xi�~xjj� �

; i; j21; � � � ;N ð3Þwhere the~x1;~x2; � � � ;~xN�1 are three-dimensional positions of N atoms or centroids of

groups of atoms on polymer backbone. Utilizing interatomic distances as a

metric[37] is useful because it is invariant under rotation and translation.

Rotational moment of inertia Based on the rotational moment of inertia of a polymer that is computed relative to

a region in space. This metric can provide information relative to another molecular

species. For example, this quantity relative to the longitudinal axis of a carbon

nanotube (CNT) can provide information on behavior such as polymer wrapping

and thus help quantify the interfacial interactions.[9,10] The corresponding feature

vector is computed as follows:

dRMI ¼ dðiÞ ¼ mðiÞRrefðiÞ2n o

; i21; � � � ;N ð4Þ

where mðiÞ is the atomic mass of the ith atom and RrefðiÞ is the perpendicular

distance of the atom relative to the region of reference, such as the longitudinal axis

of a CNT.

Radius of gyration Based on the average squared distance from the center of gravity. This metric can be

useful in exploring the clustering behavior of a polymer. Instead of averaging the

distances as is typically done to obtain a scalar quantity, the squared distance

(RgðiÞ2) of each atom in the polymer from the center of gravity of the polymer is

stored in a feature vector, as indicated by the following:

dRoG ¼ dðiÞ ¼ RgðiÞ2n o

; i21; � � � ;N ð5Þ

Bond vectors Based on vectors along bonds. This metric can be used to compare local

orientations in a polymer backbone. This feature vector is computed as follows:

dBV ¼ vðiÞ ¼ ~xiþ1�~xij~xiþ1�~xij

� �n o; i21; � � � ;N ð6Þ

where ~xi is the three-dimensional position of the ith backbone atom and vi is a unit

vector directed along the bond between adjacent atoms. Since each element of the

feature vector is a three-dimensional vector, two feature vectors can be compared

using dot products of the bond vectors at corresponding indices; the dot product

values are summed to obtain a single scalar value that represents the global

similarity of the conformation pair.

Bond-orientational order Bond-orientational order[1] represents the global alignment of a polymer chain with

a given fixed axis. The measure is defined as:

d ¼ dðiÞ ¼ 1N�3

Pn�1

i¼2

3 cos2 ci�12

� �� ; i21; � � � ;N ð7Þ

where Ci is the angle between average vectors between pairs of every other

backbone atoms (called sub-bond vectors) and the z axis.


www.mts-journal.de

Several types of local quantities or ‘‘measures’’ can be

used to generate the ATV plots. For example, the distances of

backbone atoms from some origin can be mapped in both

spatial and temporal scales, such as from the center of mass

of a CNT in a polymer-based nanocomposite. Other




measures include instantaneous velocities of backbone

atoms and relative displacements of backbone atoms with

respect to an initial set of locations (pseudo persistence

length). Depending on the type of measure that is selected,

either a single plot or related plots can be generated.

2011, 20, 286–298


Figure 2. Illustration of an ATV (right) and its relation to trajectories of backbone atoms of a polymer chain (left). The ATV plot conveys, in acompact visualization, an overview of changes in some local scalar quantity (e.g., instantaneous velocities) for each atom (verticaldimension) at each time step (horizontal dimension). A colorized version of this figure is also available as Supporting Information.

Figure 1. Illustration of the construction of similarity matrix visualization. A snapshot from two different time steps of theMD trajectory fora polymer–CNT interaction is given on the left. From these time steps, feature vectors are defined and compared with the RMSD error toobtain a scalar similarity score. This value for each time step is then mapped onto a correlation matrix, depicted on the right. A colorizedversion of this figure is also available as Supporting Information.

290

www.mts-journal.de


Atom-time-value (ATV) plots have been used to inves-

tigate the conformational states of a single chain of PAC

interacting with and without a CNT. Snapshots from the

MD simulations are given in Figure 4 along with projections

of the snapshot in the xy, yz, and xz planes. The ATV plots in

Figure 3 enable the visualization of some of the local and

global spatial changes in the polymer conformations during

these MD simulations. For the simulations with a CNT,

where the CNT is aligned along the Z axis of the global

frame, Figure 3(a) displays an ATV plot of the perpendicular

distances of backbone atoms of PAC from the three

coordinate axes of a global reference frame, which is at

the center of the CNT. A comparable plot for PAC without a

CNT is given in Figure 3(b) and the frame is approximately

at the center of mass of the polymer molecule. Note that due

to the differences in the global frame of reference for the

two scenarios in Figure 3, differences are expected in the

general patterns in the ATV plots. These ATV plots in

Figure 3 provide details about the local structure in the

polymer conformations. Note that the regular patterns such

as the horizontal bands in the plots represent persistent

loops in the corresponding directions. For Figure 3(b) for the



dynamics of an isolated PAC molecule, persistent bands are

observed, particularly at later steps of the simulation. No

distinctive difference is observed relative to the three

frames of reference (the X, Y, and Z axes).

A different behavior is observed in the presence of the

CNT in Figure 3(a). Since the frame of reference is defined as

the center of the CNT, the features about the Z axis, which

extends along the longitudinal axis of the CNT, quantifies

how far away the atoms are from the CNT surface, whereas

theXandYdirections correspond to how much the polymer

molecule extends along the longitudinal axis of the CNT. An

abrupt shift in the system patterns is observed in all three

frames of reference at around 1 ns, which corresponds to the

time step where the polymer finds the CNT surface and

quickly adsorbs onto it. The lack of features in theZdirection

after 1 ns indicates that the atoms reached an optimal

minimum distance from the CNT surface, and that this

adsorption is persistent with time. The gray bands in the Z

direction after the 1 ns adsorption phenomenon are likely

due to regions where chain loops are slightly lifted from the

CNT surface and from geometric hindrances due to

intrachain crossings, both of which can be observed in

2011, 20, 286–298


Figure 3. Atom-time-value (ATV) plots for polymer PAC showing perpendicular distances of backbone atoms from axes of referencecoordinate frames for the entire 3 ns of the MD trajectory (broken into 661 steps). The figure contains plots for two different scenarios: (a)interaction of PAC with a CNT, where the CNT lies along the Z axis; and (b) PAC in isolation. The perpendicular distances in each case arenormalized based on the maximum distance in the set of distances from the three axes. The distance values are binned into six equalintervals and are represented by a univariate color scheme. A colorized version of this figure is also available as Supporting Information.


www.mts-journal.de

the MD snapshot in Figure 4(a). In some instances, these

gray bands disappear with time, suggesting that the system

was able to overcome the barrier to find a lower energetic

minimum in which its interaction with the CNT surface had

increased.

For the X and Y directions in Figure 3(a), persistent bands

are also observed after the 1 ns adsorption time step. Note

that if the molecule preferred to form a tight random coil

rather than an extended structure, there would be less color

variations in the distances relative to theXandYaxis than is

observed in Figure 3(a). Therefore, the range of color bands

indicates that the polymer formed an extended structure, as

observed in Figure 4(a). As discussed by Tallury and

Pasquinelli,[9] the polymer molecule, as an entity, tends

to transverse the longitudinal axis of the CNT as a function

of simulation time. Thus, the deviations (in other words, the

‘‘wavy appearance’’) observed in these persistent bands in

Figure 3(a) as a function of time captures the longitudinal

motion of the molecule relative to the CNT surface, but the

persistence of the bands suggests that the global three-

dimensional structure of the polymer stays relatively intact

with this longitudinal motion. Note that with this frame of

reference, it is not possible to quantify the motion of the




molecule in the xy plane, which is along the circumference

of the CNT [refer to the inset in Figure 4(a)]. However, this

motion could be quantified with the use of a spherical

coordinate system.

The ATV plots convey in a single, compact visualization

the spatial-temporal distributions in local scalar quantities.

To improve the usefulness of these plots, some standard

visualization features and enhancements have been

employed, such as the following:

(i) L

2011

H &

inking and brushing: in computer visualization,

linking and brushing are standard interactive data

exploration techniques that allow an observer to

examine one or more data points in multiple,

coordinated data visualizations (linking) and to inter-

actively select a set of data points for comparison

(brushing). Linking and brushing can be used to explore

relationships among atomic trajectories based on the

exploration of patterns of similarities in the ATV

plots.Figure 5 illustrates an example of linking and

brushing; in the example, a set of atoms and time steps

are selected interactively on the ATV plot and

corresponding atom trajectories are visualized using

, 20, 286–298

Co. KGaA, Weinheim291

Figure 4. Molecular dynamics (MD) snapshots of low energy states of PAC (a) with a single CNT, where the CNT is drawn using a stylizedrepresentation; and (b) without a CNT. Projections of the conformations in three two-dimensional orthogonal planes are shown on the rightin each figure. Each snapshot is taken at an interval of 5 ps from theMD trajectory. The two polymers are drawn at differing zoom levels dueto the large size of the CNT. A movie of the corresponding full trajectory of (a) is given in ref.[9] and on the Pasquinelli lab website (http://www.te.ncsu.edu/mpasquinelli). A colorized version of this figure is also available as Supporting Information.

Fssst

292

www.mts-journal.de


distinct colors in a molecular display window. This

simple visualization approach can allow exploration of

interesting patterns on the ATV plot and correspond-

ing spatial-temporal dynamics of polymer backbone

atoms. The atomic subset may also be colored by time

step, which can be useful for exploring the evolution of

the structure as a function of time.

(ii) P
lot of moving averages: alternatively, moving
averages of the geometric properties of the backbone

atoms can be plotted over a small interval to display

igure 5. An example exhibiting the linking and brushing operations betwimulation trajectory of PAC interacting with a CNT. A region on the ATV plelect a set of ten atoms and around five hundred fifty frames of the MD sets are shown as the green curves in the display on the right. The conformhe onset of wrapping of the polymer around the CNT. A colorized versi

Macromol. Theory Simul. 2011

� 2011 WILEY-VCH Verlag GmbH &

the most important spatio-temporal changes; this

operation also reduces the noise due to small local

perturbations in the conformational structures.

These matrix-based approaches have limitations. For

example, a matrix visualization can reveal mostly global

and some local information about structural changes in

polymers during MD simulations. Another constraint is

that many of the numeric measures based on global

een an ATV plot (left) and a molecular display (right) of an MDot, indicated by the black rectangle, was sketched interactively toimulation. Trajectories corresponding to the selected atom-timeation shown using thick lines corresponds to a time step duringon of this figure is also available as Supporting Information.

, 20, 286–298

Co. KGaA, Weinheim www.MaterialsViews.com


www.mts-journal.de

geometric properties are sensitive to the length of the

polymer chains and are therefore not suitable for identify-

ing macroscopic effects that are dependent on molecular

weight or for comparing conformations among different

polymer types. Therefore, the exploration of substructures

is supplemented with local structural comparisons to find

common substructures among a set of conformations. A

local structural comparisons can be used to identify salient

and persistent substructures of conformations of a single

polymer during a particle-based simulation or among a

family of related polymers (e.g., nylons with the same

overall chemistry but varying sizes of aliphatic portions in

the monomers).

Development of Substructure Analysis andVisualization Tools

Polymer conformations often contain a variety of local and

global structural features and three-dimensional arrange-

ments. Identification and analysis of these structures reveal

relevant information about properties of the polymer

systems. For example, identification of persistent sub-

structures in conformations corresponding to low energy or

relaxed states in a family of related polymers (e.g., nylons)

enables changes in spatial properties to be compared due to

changing the chemistry of the polymer, and thus could

enable experimentalists to better tune the desired proper-

ties of the material. Structural analysis provides insights

into the spatio-temporal dynamics of the interfacial regime

of nanocomposites.

In this section, we describe our method, which consists of

the following main steps: (i) construct a proxy curve that

represents the backbone of a polymer conformation;

(ii) build feature vectors that uniquely describe curve

segments on the proxy curve; (iii) compare the feature

vectors to generate similarity relationships among a set of

curve segments; and (iv) connect this substructure analysis

to visualization tools. Since some exploratory and analy-

tical methods have been developed by others to assist in the

exploration of structures of macromolecules, such as those

developed for chain-like macromolecules like proteins, the

applicable work will first be summarized in context, and

then we will give specifics on our approach.

Patro et al.[8] proposed a general framework to detect

frames that correspond to salient conformations of proteins

among a set of frames from MD simulations. The approach

involves construction of an affinity matrix that contains

relative orientations of amino acid units on a protein

backbone. The authors applied this approach to extract

salient conformations of a helices and to identify major

structural changes in protein complexes that form ion

channels at boundaries of biological cells. A number of other

researchers have used geometric information of macro-




molecules for similarity comparisons. A standard approach

is to model backbones of chain-like molecules such as

proteins using parametric spline curves. For example, Kim

and Singh[16] have described an approach for comparing

protein structures that is based on extraction and analysis

of the geometry of protein backbones. Similarly, Can and

Wang[17] have developed a method in which portions of

curves representing molecular backbones are aligned based

on local shape and curvature information. Other research-

ers have developed techniques that utilize a variety of local

and global geometric information as metrics to compute

‘‘distances’’ among matched segments or whole conforma-

tions of chain-like molecules. Some applications of this

metric-based approach include clustering protein confor-

mations using interatomic distances;[15] clustering a set of

proteins based on sequence alignment and geometric curve

fitting;[18] a comparison based on angles between bonds

and angles between tangents;[19–21] a comparison using

attributes of protein secondary structures;[22,23] and a

combination of shape, structural alignment, superposition,

and biochemical attributes of residues.[21,24,25]

Construction of Proxy Curves

A standard approach to describe the three-dimensional

geometry of chain-like molecular structures such as

polymer conformations is to represent the backbone of

the molecular structure using a proxy parametric curve

such as Uniform B-Splines.[26] The rationale for adapting

this approach is that a parametric spline curve allows

sampling at regular intervals, which is useful for a curve-

matching technique described later in this section.

To approximate a molecular backbone using a B-Spline

proxy curve, control points need to be specified on the

backbone. There are two choices for selecting control points:

(i) coordinates of backbone atoms, and (ii) a simplified

model of the geometry of the backbone. Both approaches for

selecting control points result in proxy curves that are

approximations of the geometry of the original backbone;

however, the latter approach based on simplification allows

control over the amount of details of the backbone that can

be retained in the proxy curve. The ability to vary the details

is sometimes useful to eliminate local ‘‘noise’’ in molecular

geometry (for example, zig-zag directions in a small

neighborhood), which can be used as a smoothing operator

prior to a following curve matching step.

We adapt a curve simplification that is based on standard

geometric techniques for simplifying a three-dimensional

curve.[27,28] The level of simplification is controlled by a

scalar parameter, the error, that is based on the differences

in length of the original curve and the simplified

representation. Previously, Agarwal and et al.[29] used such

an approach to compare the three-dimensional spatial

structures of proteins. Figure 6 gives examples of spline

2011, 20, 286–298


Figure 6. Illustration of the parametric spline curves constructed to represent the polymer backbone curve. (a) The original backbone of thePAC polymer from anMD simulation, (b) standard spline constructed based on backbone atoms, (c) a simplifiedmodel of the polymer chain,and (d) a spline derived from the simplified model of the polymer in (a). A colorized version of this figure is also available as SupportingInformation.

294

www.mts-journal.de


curves generated for a conformation of the PAC polymer

from a single time step of an MD simulation. Note that

although simplification generates a smoother spline with

some loss of information, major features of the conforma-

tion are still preserved, such as loop regimes.

Another consideration is that the approach we use to

build invariant descriptions of curve segments (discussed

later in this section) requires a curve to be sampled at equal

intervals of arc length. In the standard construction of a B-

Spline, the parametric space is sampled at regular intervals

and basis functions are interpolated to generate a

continuous curve. However, sampling regularly in the

parametric space does not guarantee equal lengths

between points on the generated proxy curve. Therefore,

a technique called arc-length parameterization has been

adapted to re-sample the spline curve to obtain points on

the curve that have some fixed, equal distance between

them. A numerical reparameterization method is

employed that is available in a geometry processing

toolkit called GeometricTools.[26] Typical values of arc

lengths used in this work were in the range l2 0:01; 0:03½ �,where higher values or arc-length result in sparse

sampling of a proxy spline.

Description of Molecular Conformations

There are a number of standard geometric properties that

can be used to obtain unique descriptions or signatures of

the three-dimensional geometry of the proxy curves.

Examples of the geometric properties are coordinates of

vertices on the curve, tangents, curvature, and torsion.

Additionally, domain specific information may also be

included to describe proxy curves associated with mole-

cular backbones; some common examples are bond



directions, dihedral angles, and attributes of residues

(e.g., the types of amino acid units in proteins).

We employ a general curve description technique called

similarity invariant coordinate system (SICS)[30] to build

feature vectors that uniquely describe curve segments on

polymer conformations. We chose the SICS-based approach

because it is relatively straightforward to implement and

provides feature descriptions that are invariant under

transformations such as rotations and translations.

Figure 7 illustrates the definition and construction of a local

reference frame for curve segments using the SICS method. In

the SICS method, a description of three-dimensional curve

segments is obtained by re-defining the geometric informa-

tion of a segment in terms of a local coordinate reference

frame. This local reference frame provides a useful descrip-

tion of curve segments and one that is invariant under

transformations such as rotation, scaling, and translations

(for details about this method please refer to ref. [30]).

Once a local frame is defined for a given curve segment, a

number of geometric quantities is used to define a feature

vector that uniquely represents the curve segment.

Examples of some standard geometric quantities that can

be used to build feature vectors include three-dimensional

coordinates of points sampled on the curve segment, three-

dimensional coordinates of tangents at sample points,

angles between sample points and the mid point of the line

joining the terminal points of the segment, and angles

between the tangents at sample points and the local frame.

Some other geometric information that can be used include

curve curvature, inter-point distances, ratio of distances

between points, and higher-order derivatives.

A particular feature of the SICS method is that it is

designed to perform scale invariant matching. However,

since the primary interest is matching substructures at

2011, 20, 286–298


Figure 7. Illustration of the construction of a local reference frame for defining feature vectors for curve segments using the SICSmethod.[30]

A colorized version of this figure is also available as Supporting Information.


www.mts-journal.de

comparable scales, we provide an option to override scale

invariance; scale invariance is ignored by including inter-

point distances because relationships among a set of inter-

point distances are uniquely defined for a given curve

segment. Therefore, given a curve segment s¼ <ui>having

k points at equal arc length, a feature vector can be defined

as follows:

fvec

TabFig

Com

dxy

du

txyz

tf

k, t

<d

arc

www.M

¼ dxyz;du; txyz; tf; k; t; dij �

;arc len u1; � � � ;uNð Þ

juN�u1j

� ;

(8)

where the various components of the feature vector are

defined in Table 2. During experimentation, we found that

including curvature and torsion information did not

greatly alter the matching results; we therefore excluded

them from the feature vectors to reduce the number of

calculations.

Finally, to use feature vectors to compare segments of

proxy curves, we have adopted a simple scheme to uniquely

describe and identify each curve segment in a set. According

to the scheme, a segment is defined as follows:

seg ¼ timeid; startPos; endPos; fvecð Þ; (9)

le 2. Components of feature vectors described by Equation (8). Theure 7.

ponent Description

z Three coordinates of the point ui on a

Three angles between the vector joini

and the origin of the local reference f

Three coordinates of the tangent at p

Three angles between the tangent an

Normalized values of curvature and t

ij> A set of k k�1ð Þ=2 inter-point distance

len u1; � � � ;uNð ÞjuN�u1j

Ratio of the length of the segment an

aterialsViews.com



where timeid is time stamp of the frame in an MD

simulation in which the segment occurs; startPos and

endPos are indexes of terminal points of the segment on a

regularly-sampled proxy spline curve; and fvec is a feature

vector based on Equation (8) and computed using points on

the curve segment.

Matching of Substructures

Algorithm 1 outlines the main steps of an iterative

approach to perform substructure matching of data from

MD simulation trajectories. In order to perform the curve

matching, we divide the arc-length parameterized

spline curve representing a polymer conformation into

multiple segments. Segment definitions such as segment

size and relative offsets of adjacent segments are set by the

user prior to curve matching. The segments are compared

with one another using the RMSD error between corre-

sponding feature vectors. These comparisons are performed

for all possible unique pairs of curve segments. The RMSD

comparison for each pair of feature vector produces a single

scalar value,d2 0; 1½ �, that represents the similarity between

corresponding curve segments (where smaller values

components are associatedwith a curve segment that is illustrated in

segment of a piecewise linear curve

ng the midpoint of line segment v¼uN – u1

rame

oint ui

d the basis vectors of the reference frame e1; e2; e3h iorsion of the curve at ui

s between the k points on the segment

d the length of the line joining its terminals, i.e., juN�u1j

2011, 20, 286–298


296

www.mts-journal.de


indicate closely matching curve seg-

ments). An adjustable range slider is used

to set the maximum and minimum value

of RMSD to control the set of resulting

matches. Finally, we use a simple seg-

ment-matching procedure to compile sets

of similar segments. Each of these sets

represents a unique feature or substruc-

ture of the parameterized splines.

Performance optimization can be used

to reduce the number of comparisons;

some of these techniques include hash-

ing, binning and noise reduction, and also

by using chemical and biological align-

ment.[17,24,25] Another approach to match

substructures is described by Li,[30] which

is based on comparing relationships

among substructures using a graph-

based approach. In on-going work, we

are focusing on improving performance

of our substructure matching approach based on some of

these optimization techniques mentioned.

The methodology of matching similar segments can be

used to perform substructure matching and comparisons.

The structural comparisons include:

(i) I
dentification of salient features and substructures:
determine the most commonly occurring features of

polymer conformations in molecular simulations or

among a set of related polymers.

(ii) E
xploration of specific features: explore the spatial
and temporal distributions of specific interesting

features on polymer conformations. A specific feature

is selected in one time step on a conformation and is

searched on all conformations and over all time steps.

This information is used to explore spatial arrange-

ments such as alignment of multiple segments. An

example is the spatial-temporal alignment of sub-

structures during polymer crystallization.

(iii) C
ompare global structures of related polymers:
determine similarity relationships among related

polymers or among conformations of a polymer

under different scenarios, such as polymer conforma-

tions with and without nanoparticles (e.g., CNTs)

present. The comparison of related systems can

provide information on the types of substructures

that can persist near interfaces when polymers

interact with other systems, such as nanoparticles.

Substructure Visualization

Interactive visualization techniques are utilized to display

and explore spatial and temporal distributions of salient

substructures of polymer conformations. An example is



represented in Figure 8. In this example, an interactive

histogram is provided to support fast exploration of all of

the sets of common substructures found by the structure

matching system. The bars in the histogram represent the

different types of matching substructures found by the

system, and the height of the bars corresponds to the

number of substructures of each type. Moreover, the

histogram is linked to the molecular visualization screen

to show the matching substructures for each set. The spatial

distribution of the substructures in a single set can be

visualized by selecting the corresponding histogram bar

(shown highlighted in yellow). All segments corresponding

to the selected set of matching substructures are visualized

as curve segments; the segment of the molecular chain in

the currently selected time step is indicated as a thick pink

curve. Furthermore, by hovering over the bars in the

histogram, a user can quickly inspect various substructures

found by the system. A visualization of the temporal

distribution of the substructures in a set is provided by a

timeline (refer to the bottom of Figure 8) and markers above

the timeline to indicate the occurrence of a matched

structure at corresponding time steps. Note that the

timeline is over the entire MD trajectory, which in this

case is 3 ns but has been parsed into 661 steps in this

visualization.

Another option for visualizing relationships among

matched features is to use standard graph layout techni-

ques to display similarity relationships among the features.

One example of such an approach is MDS,[7,31,32] which is a

technique to display proximity relationships among a large

number of items. MDS is a general dimensionality

reduction technique that produces an embedding of

high-dimensional data points (i.e., points in RN) in a low

dimensional space Rd;d << N� �

such that inter-point

2011, 20, 286–298


Figure 8. A snapshot of our visualization system which illustrates a set of matched features of polymer conformations (left) that wasgenerated using our substructure matching methodology. Each substructure in this example is an entire conformation of the PAC polymer.Two other panels in the visualization provide supplementary information using a histogram (right-top) of the number and distribution ofthe unique features of the polymer and using a layout (right-bottom) based onMDS to display similarity relationships among all features. IntheMDS display, each dot is a single feature and dots close to one another represent similar features. The dots are color coded based on timestamp of the frame in which the corresponding feature occurs. A colorized version of this figure is also available as Supporting Information.


www.mts-journal.de

distance relationships are close to original distances among

the points in high dimensions. A stress function is used to

calculate error or lack of fit between dissimilarities in high-

dimensional space, Dij, and distances in the low-

dimensional embedding jjxi�xjjj� �

.[31] A typical stress

function is the following:

www.M

Stress ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXNi „ j¼ 1

Dij�jjxi�xjjj� �2

vuut (10)

Multidimensional scaling (MDS) has been adapted to

visualize proximity relationships among clusters of similar

features of molecules. However, the standard MDS method

has quadratic complexity and is therefore usually not

suitable for exploring large data sets. Consequently, many

methods have been developed that combine MDS with

other feature-learning methods[33] or use modified forms of

MDS.[34] In a recent work, Rajan et al.[35] described a general,

non-metric MDS approach to cluster trajectories in protein

folding. Their method is able to discriminate small

differences between shapes of protein trajectories.

In our initial experiments, we have explored MDS to

generate visualizations of similarity relationships among

substructures of polymer conformations. Since the current

data sets used in our work are relatively small (typically

having up to to a few thousand features depending on a

segment’s size set by a user) we chose to use a standard

aterialsViews.com



implementation of MDS. An example application of this

technique is given in a plot in the bottom-right of Figure 8.

The procedure used in our work to visualize relationships

among polymer substructures is as follows: first, we

generate an N�N similarity matrix that specifies pairwise

similarity scores for all unique features of polymer

conformations. The comparison of polymer features is

based on the SICS approach described earlier in this section.

Next, a fast iterative MDS algorithm[36] is used to compute a

two-dimensional layout of points, where each point

represents a polymer substructure.

The MDS graph in Figure 8 for PAC suggests four distinct

groupings as a function of simulation time. The final

timesteps (dots in group A in the figure) indicate less

dissimilarity than the rest, suggesting the equilibrium

structure remains relatively unchanged; the difference in

the MDS points suggest some local structural fluctuations

within this equilibrium structure. The time frame prior to

that (dots in group B in the figure) have a bit more

dissimilarity than the final time steps, but overall the global

structure maintains some structural order. The conforma-

tions of PAC that have the most dissimilarity are those from

early-to-mid timesteps, indicated at the bottom of the

graph (group C in the figure). In this time window, the PAC

polymer finds the CNT and begins to adsorb onto the

surface, thus large structural deviations are expected as the

conformation changes from relatively linear/coiled within

itself to coiled and folded along the CNT surface. Early time

2011, 20, 286–298


298

www.mts-journal.de


steps (dark gray, middle—group D in the figure) also show

some dissimilarity as the system is searching for energe-

tically-favorable conformations during the equilibration,

but not as much as for the adsorption process. For flexible

polymers with much more degrees of freedom with respect

to the polymer backbone,[10] the MDS plot is expected to

exhibit more dissimilarity with respect to polymer–CNT

interactions.

Conclusion

We adapted visual-analytic methods to develop a tool for

analyzing MD trajectories of polymer systems. This tool

combines computational methods such as feature match-

ing and MDS to determine persistent substructures of

polymer conformations. In addition, standard visualization

techniques such as interactive ATV plots and linked

displays were implemented to support exploratory analysis

of spatial-temporal relationships among matched features.

Although this method used atomistic MD simulations as a

test case, this tool can be generally applied to any particle-

based simulation method. The use of this tool to look at

some interesting characteristics of polymer simulations is

currently being done and will be the subject of future

reports.

We plan to extend this work to develop methods for

investigating multi-chain polymer systems that provide

their own level of complexity but connect to some

experimentally-relevant information such as the degree

of entanglements and crystal nucleation. Connecting

molecular details to macroscopic properties is also a goal.

Acknowledgements: The authors thank Syamal Tallury forproviding the datasets and for insightful discussions. Thiswork was funded by the Renaissance Computing Institute(www.renci.org).

Received: November 12, 2010; Revised: February 12, 2011;Published online: April 5, 2011; DOI: 10.1002/mats.201000086

Keywords: conformational analysis; molecular dynamics; struc-ture–property relations; supramolecular structures; visual analytics

[1] H. Yang, Y. Chen, Y. Liu, W. S. Cai, Z. S. Li, J. Chem. Phys. 2007,127, 094902.

[2] H. Yang, S. Parthasarathy, D. Ucar, Alg. Mol. Biol. 2007, 2, 3.[3] Q. Zheng, Q. Xue, K. Yan, L. Hao, Q. Li, X. Gao, J. Phys. Chem. C

2007, 111, 4628.[4] T. He, R. S. Porter, Macromol. Theory Simul. 1992, 1, 119.



[5] W. Brostow, M. Drewniak, N. N. Medvedev, Macromol. TheorySimul. 1995, 4, 745.

[6] W. Humphrey, A. Dalke, K. Schulten, J. Mol. Graphics 1996, 14,33.

[7] T. F. Cox, M. A. A. Cox, Multidimensional Scaling, 2nd editionChapman and Hall, New York, NY 2001.

[8] R. Patro, Y. Kim, C. Y. Ip, A. Anishkin, S. Sukharev, D. P. O’ Leary,A. Varshney, Scientific Visualization: Advanced Concepts,Schloss Dagstuhl–Leibniz Center for Informatics, Dagstuhl,Germany 2010.

[9] S. S. Tallury, M. A. Pasquinelli, J. Phys. Chem. B 2010, 114, 9349.[10] S. S. Tallury, M. A. Pasquinelli, J. Phys. Chem. B 2010, 114, 4122.[11] W. Smith, Mol. Simul. 2006, 32, 933.[12] S. L. Mayo, B. D. Olafson, W. A. Goddard, J. Phys. Chem. 1990,

94, 8897.[13] S. Thakur, S. Tallury, M. Pasquinelli, T.-M. Rhyne, in: Visual-

ization of the Molecular Dynamics of Polymers and CarbonNanotubes, Springer-Verlag, Berlin, Heidelberg 2009, pp. 129–139.

[14] S. Thakur, S. Tallury, M. Pasquinelli, T.-M. Rhyne, Explorationof Polymer Conformational Similarities in Polymer-CarbonNanotube Interfaces, IEEE Computer Society, 2010, pp. 320–323.

[15] C. Best, H.-C. Hege, Comput. Sci. Eng. 2002, 4, 68.[16] J. Kim, R. Singh, Bioinformatics Research and Applications,

Vol. 6053, Springer, Berlin, Heidelberg 2010, pp. 77–88.[17] T. Can, Y.-F. Wang, Comput. Syst. Bioinf. Conf., Int. IEEE Com-

put. Soc. 2003, 169.[18] G. Vriend, C. Sander, Proteins: Struct., Funct., Genetics 1991, 11,

52.[19] H. Matsuda, F. Taniguchi, A. Hashimoto, Proc. Pacific Symp.

Biocomputing 1997, 2, 280.[20] K. Kedem, L. Chew, R. Elber, Proteins: Struct., Funct., Genetics

1999, 37, 554.[21] L. P. Chew, K. Kedem, Algorithmica 2003, 38, 115.[22] H. Sugeta, T. Miyazawa, Biopolymers 1967, 5, 673.[23] P. Enkhbayar, S. Damdinsuren, M. Osaki, N. Matsushima,

Comput. Biol. Chem. 2008, 32, 307.[24] M. Coatney, S. Parthasarathy, Knowledge Inf. Syst. 2005, 7,

202.[25] S. A. Aghili, D. Agrawal, A. E. Abbadi, Database Systems for

Advanced Applications, Vol. 3453, Springer, Berlin, Heidelberg2005, pp. 993–993.

[26] P. J. Schneider, D. Eberly, Geometric Tools for ComputerGraphics, Elsevier Science Inc., New York, NY, USA 2002.

[27] D. Douglas, T. Peucker, Can. Cartographer 1973, 10, 112.[28] M. A. Abam, M. de Berg, P. Hachenberger, A. Zarei, Discrete

Comput. Geometry 2010, 497.[29] P. K. Agarwal, S. Har-Peled, N. H. Mustafa, Y. Wang, Algor-

ithmica 2005, 42, 203.[30] S. Z. Li, Pattern Recognit. 1997, 30, 447.[31] A. Buja, D. F. Swayne, M. L. Littman, N. Dean, H. Hofmann,

L. Chen, J. Comput. Graph. Stat. 2008, 17, 444.[32] S. Ingram, T. Munzner, M. Olano, IEEE Trans. Visual. Comput.

Graph. 2009, 15, 249.[33] D. K. Agrafiotis, D. N. Rassokhin, V. S. Lobanov, J. Comput.

Chem. 2001, 22, 488.[34] Y. Cao, T. Jiang, T. Girke, Bioinformatics 2010, 26, 953.[35] A. Rajan, P. L. Freddolino, K. Schulten, PLoS One 2010, 5,

e9890.[36] C. Bentley, M. Ward, Proc. IEEE Symp. Inf. Visual. 1996, 72.[37] G. Maggiora, V. Shanmugasundaram, Methods Mol. Biol.

2004, 275, 1.

2011, 20, 286–298


Adapting Visual-Analytical Tools for the Exploration of Structural and Dynamical Features of Polymer Conformations

Documents