Structure Article Geometry-Based Sampling of Conformational Transitions in Proteins Daniel Seeliger, 1 Ju ¨ rgen Haas, 2 and Bert L. de Groot 1, * 1 Computational Biomolecular Dynamics Group 2 Department of Theoretical and Computational Biophysics Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Go ¨ ttingen, Germany *Correspondence: [email protected]DOI 10.1016/j.str.2007.09.017 SUMMARY The fast and accurate prediction of protein flex- ibility is one of the major challenges in protein science. Enzyme activity, signal transduction, and ligand binding are dynamic processes in- volving essential conformational changes rang- ing from small side chain fluctuations to reorien- tations of entire domains. In the present work, we describe a reimplementation of the CON- COORD approach, termed tCONCOORD, which allows a computationally efficient sampling of conformational transitions of a protein based on geometrical considerations. Moreover, it allows for the extraction of the essential degrees of freedom, which, in general, are the biologically relevant ones. The method rests on a reliable es- timate of the stability of interactions observed in a starting structure, in particular those inter- actions that change during a conformational transition. Applications to adenylate kinase, cal- modulin, aldose reductase, T4-lysozyme, staph- ylococcal nuclease, and ubiquitin show that ex- perimentally known conformational transitions are faithfully predicted. INTRODUCTION Regardless of whether a protein functions as an enzyme, molecular motor, transport protein, or receptor, its func- tion is often coupled to motion. These motions range from side chain fluctuations to reorientations of domains and partial unfolding and refolding. An understanding of protein function is thus strongly coupled to insight into dy- namics and flexibility. X-ray crystallography, which is still the major source of structural information of proteins, pro- vides mainly static pictures of one conformation, even though a number of proteins have been resolved in differ- ent conformations, providing insights into protein flexibil- ity directly from experimental data (Gerstein and Krebs, 1998). Structures resolved by NMR spectroscopy are usu- ally published as an ensemble of conformations that fulfill the experimentally determined restraints and provide more information about protein flexibility. However, the method is still restricted to proteins of limited size. Knowledge about protein structures in different confor- mational substates, either from experimental data or sim- ulation, has been proven to enhance protein-protein dock- ing (Bonvin, 2006; Mustard and Ritchie, 2005; Ehrlich et al., 2005) and structure-based drug design (SBDD) (Knegtel et al., 1997; Carlson, 2002; Meagher and Carlson, 2004; McGovern and Shoichet, 2003; Teague, 2003). However, proteins often undergo conformational changes upon li- gand binding. Therefore, molecular docking or the deriva- tion of pharmacophore models from a single receptor structure often leads to unsatisfying results, either by ex- cluding known binders due to overdefinition of the binding site when using a holo structure, or by not identifying the correct binding pose when using an apo structure or pro- tein model (McGovern and Shoichet, 2003). Among the computational approaches used to tackle protein flexibility, molecular dynamics (MD) simulations are predominantly employed. However, despite the enor- mous increase in computer power and advances in algo- rithm techniques and parallelization, MD simulations are computationally expensive; moreover, high-energy bar- riers are often not overcome within accessible time. In or- der to alleviate the resulting sampling problem, several advanced simulation methods based on MD, including replica-exchange molecular dynamics (REMD) (Sugita and Okamoto, 1999), conformational flooding (Grubmuel- ler, 1995; Lange et al., 2006), and targeted molecular dy- namics (TMD) (Schlitter et al., 1994; van der Vaart and Karplus, 2005), have been developed and successfully applied to numerous problems within the field of protein research. However, even these methods are not routinely applicable to the efficient sampling of conformational tran- sitions. Computationally more efficient, but less accurate, methods are based on Gaussian network models (Bahar et al., 1998; Haliloglu et al., 1997), normal mode analysis (Go et al., 1983; Brooks and Karplus, 1983; Krebs et al., 2002; Alexandrov et al., 2005), or graph theoretical ap- proaches (Jacobs et al., 2001). A different approach is the CONCOORD method (de Groot et al., 1997), which is based on geometrical consid- erations for the prediction of protein flexibility. A given input structure is analyzed and translated into a geometric description of the protein. Based on this description, the structure is rebuilt, commonly several hundreds of times, 1482 Structure 15, 1482–1492, November 2007 ª2007 Elsevier Ltd All rights reserved
11
Embed
Structure Article - mpibpc.mpg.de · Structure Article Geometry-Based Sampling of Conformational Transitions in Proteins Daniel Seeliger,1 Ju¨rgen Haas,2 and Bert L. de Groot1,*
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Structure
Article
Geometry-Based Sampling ofConformational Transitions in ProteinsDaniel Seeliger,1 Jurgen Haas,2 and Bert L. de Groot1,*1Computational Biomolecular Dynamics Group2Department of Theoretical and Computational Biophysics
Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Gottingen, Germany*Correspondence: [email protected]
DOI 10.1016/j.str.2007.09.017
SUMMARY
The fast and accurate prediction of protein flex-ibility is one of the major challenges in proteinscience. Enzyme activity, signal transduction,and ligand binding are dynamic processes in-volving essential conformational changes rang-ing from small side chain fluctuations to reorien-tations of entire domains. In the present work,we describe a reimplementation of the CON-COORD approach, termed tCONCOORD, whichallows a computationally efficient sampling ofconformational transitions of a protein based ongeometrical considerations. Moreover, it allowsfor the extraction of the essential degrees offreedom, which, in general, are the biologicallyrelevant ones. The method rests on a reliable es-timate of the stability of interactions observedin a starting structure, in particular those inter-actions that change during a conformationaltransition. Applications to adenylate kinase, cal-modulin, aldose reductase, T4-lysozyme, staph-ylococcal nuclease, and ubiquitin show that ex-perimentally known conformational transitionsare faithfully predicted.
INTRODUCTION
Regardless of whether a protein functions as an enzyme,
molecular motor, transport protein, or receptor, its func-
tion is often coupled to motion. These motions range
from side chain fluctuations to reorientations of domains
and partial unfolding and refolding. An understanding of
protein function is thus strongly coupled to insight into dy-
namics and flexibility. X-ray crystallography, which is still
the major source of structural information of proteins, pro-
vides mainly static pictures of one conformation, even
though a number of proteins have been resolved in differ-
ent conformations, providing insights into protein flexibil-
ity directly from experimental data (Gerstein and Krebs,
1998). Structures resolved by NMR spectroscopy are usu-
ally published as an ensemble of conformations that fulfill
the experimentally determined restraints and provide more
1482 Structure 15, 1482–1492, November 2007 ª2007 Elsevier
information about protein flexibility. However, the method
is still restricted to proteins of limited size.
Knowledge about protein structures in different confor-
mational substates, either from experimental data or sim-
ulation, has been proven to enhance protein-protein dock-
ing (Bonvin, 2006; Mustard and Ritchie, 2005; Ehrlich et al.,
2005) and structure-based drug design (SBDD) (Knegtel
et al., 1997; Carlson, 2002; Meagher and Carlson, 2004;
McGovern and Shoichet, 2003; Teague, 2003). However,
proteins often undergo conformational changes upon li-
gand binding. Therefore, molecular docking or the deriva-
tion of pharmacophore models from a single receptor
structure often leads to unsatisfying results, either by ex-
cluding known binders due to overdefinition of the binding
site when using a holo structure, or by not identifying the
correct binding pose when using an apo structure or pro-
tein model (McGovern and Shoichet, 2003).
Among the computational approaches used to tackle
protein flexibility, molecular dynamics (MD) simulations
are predominantly employed. However, despite the enor-
mous increase in computer power and advances in algo-
rithm techniques and parallelization, MD simulations are
Sampling of Conformational Transitions in Proteins
represented by the cyan circles. The tCONCOORD sam-
pling, however, is not affected by energy barriers and sam-
ples most of the space covered by the MD trajectories.
Figure 3. Aldose Reductase
The loops labeled A and C form parts of the Tolrestat-binding site.
Loop B interacts with the cofactor.
1486 Structure 15, 1482–1492, November 2007 ª2007 Elsevier
Although the tCONCOORD ensemble samples both open
and closed conformations, it does not completely sample
the conformational space sampled by the MD simulations
that started from open conformations. This is due to the
fact that tCONCOORD defines constraints from a single in-
put structure, in this particular case a closed conformation.
If unstable interactions are not entirely detected in the con-
straint definition process, this can lead to an exclusion of
regions of the conformational space.
The tCONCOORD ensemble furthermore samples re-
gions of the conformational space that are not visited by
the MD simulations and the experimental structures.
This could be either due to an energy barrier that is too
high to be overcome by MD simulations within the acces-
sible timescale, or to the energy of this region of the con-
formational space being too high to be part of the relevant
conformational space.
Rigid and Flexible Regions in ProteinsFunctional studies on protein structures benefit signifi-
cantly from information about the flexibility and rigidity of
protein parts. The calculation of root-mean-square flu-
ctuations (rmsf) from tCONCOORD ensembles can provide
valuable hints regarding these properties. To test the reli-
ability of flexibility predictions, we chose two test cases
with completely different structure and flexibility proper-
ties, which have been experimentally determined. As the
first test case, we chose ubiquitin, a small 70 residue pro-
tein of which 46 X-ray structures are available in the PDB
Figure 4. Projection of tCONCOORD Ensembles of Aldose Reductase onto Eigenvectors 1 and 2 of a Principal Components
Analysis
The structures on the right represent the predominant motions along these vectors. On the left, the two-dimensional projection of three different
ensembles is shown. The green dots represent the ensemble of the entire complex, the red dots represent the holo form, and the black dots represent
the apo form. The projection shows the reduced flexibility of the binding site in the presence of Tolrestat. Binding of NADP, however, has no effect on
these modes.
Ltd All rights reserved
Structure
Sampling of Conformational Transitions in Proteins
Figure 5. Projection of tCONCOORD Ensembles of Aldose Reductase onto Eigenvectors 3 and 4 of a Principal Components
Analysis
The structures on the right represent the predominant motions along these vectors. On the left, the two-dimensional projection of three different
ensembles is shown. The green dots represent the ensemble of the entire complex, the red dots represent the holo form, and the black dots represent
the apo form. The projection shows increased flexibility along eigenvector 3 if NADP is removed, because loop B is predominantly involved in this
motion. Eigenvector 4 mainly represents a movement of loop C, which leads to decreased flexibility for the ensemble with Tolrestat bound.
(see the Supplemental Data available with this article on-
line). The rmsf determined from the X-ray structures (Fig-
ure 7, red curve) shows that the protein is relatively rigid,
and that the only noteworthy flexibility is at the C terminus
and a loop. The rmsf calculated from the tCONCOORD
ensemble generated by using PDB code 1UBI (Love
et al., 1997) as input (Figure 7, black curve) represents
the same flexibility properties as the experimental data. Al-
though the flexibility level of the tCONCOORD ensemble is
constantly above the X-ray ensemble, the overall picture of
a rigid protein with a flexible C terminus is reproduced (cor-
relation coefficient of 0.95). For comparison, the rmsf of an
ensemble generated with an elastic network model (Suhre
and Sanejouand, 2004a, 2004b) is shown (Figure 7, green
curve). This fast and efficient method is routinely employed
to predict protein flexibility and reproduces the experi-
mental fluctuations only slightly worse than tCONCOORD
(correlation coefficient of 0.9). However, the structures
from the tCONCOORD ensemble all have reasonable ge-
ometry (bond lengths, angles, dihedrals, and interatomic
distances), which is not always the case for single struc-
tures derived from elastic network models.
As a second test case, we chose staphylococcal nucle-
ase, of which an NMR ensemble (Wang et al., 1997) (PDB
code: 1JOR) provides information on the flexibility of the
protein. The rmsf calculated from the NMR ensemble (Fig-
ure 8, red curve) renders mainly one loop around residue
42 very flexible. Furthermore, the loops around residues
Structure 15, 1482–149
80 and 110 show increased flexibility. The rmsf calculated
from a tCONCOORD ensemble (Figure 8, black curve), by
using an X-ray structure (PDB code: 1EY4) (Chen et al.,
2000) as input, qualitatively yields the same picture. The
most flexible regions detected by the tCONCOORD
ensemble are in good agreement with the experimental
data (correlation coefficient of 0.8) and, again, are slightly
better than those predicted by the elastic network
model (green curve, correlation coefficient of 0.78). The
tCONCOORD ensemble predicts higher flexibility for
some parts of the protein than observed in the NMR en-
semble. This might be due either to interactions that
tCONCOORD underestimates, or toanoverly tight represen-
tation of the NMR data, which is sometimes caused by
imposing time- and ensemble-averaged experimental prop-
erties onto single structures during refinement (Spronk et al.,
2003; Bonvin and Brunger, 1995; Cuniasse et al., 1997).
DISCUSSION
We report a novel, to our knowledge, approach to accu-
rately predict large conformational transitions in proteins
and its application to selected systems with biological
relevance. The method rests on a thorough analysis of
the interactions in proteins and their translation into con-
straints. In particular, hydrogen bonds are investigated,
and their stability is estimated by analyzing their sur-
roundings in respect to hydrophobic protection. Using
2, November 2007 ª2007 Elsevier Ltd All rights reserved 1487