1 THÈSE DE DOCTORAT DE L’UNIVERSITÉ PARIS 6 PIERRE & MARIE CURIE Ecole doctorale CHIMIE PHYSIQUE ET CHIMIE ANALYTIQUE DE PARIS CENTRE Spécialité : CHIMIE (Matière Condensée) Présentée par : Isabelle SOURY-LAVERGNE NAVIZET Pour obtenir le grade de DOCTEUR de l’UNIVERSITÉ PARIS 6 MODÉLISATION ET ANALYSE DES PROPRIÉTÉS MÉCANIQUES DES PROTÉINES Soutenue le 5 mars 2004 devant le jury composé de : Richard LAVERY ……………..Directeur de thèse Monique GENEST……………..Rapporteur David PERAHIA……………….Rapporteur Christian AMATORE…………..Président Anne HOUDUSSE……………..Examinateur Jean-Marc VICTOR…………….Examinateur
202
Embed
modélisation et analyse des propriétés mécaniques des protéines
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
THÈSE DE DOCTORAT DE L’UNIVERSITÉ PARIS 6
PIERRE & MARIE CURIE
Ecole doctorale CHIMIE PHYSIQUE ET CHIMIE ANALYTIQUE DE PARIS CENTRE
Spécialité :
CHIMIE (Matière Condensée)
Présentée par :
Isabelle SOURY-LAVERGNE NAVIZET
Pour obtenir le grade de DOCTEUR de l’UNIVERSITÉ PARIS 6
MODÉLISATION ET ANALYSE DES PROPRIÉTÉS
MÉCANIQUES DES PROTÉINES
Soutenue le 5 mars 2004
devant le jury composé de :
Richard LAVERY ……………..Directeur de thèse
Monique GENEST……………..Rapporteur
David PERAHIA……………….Rapporteur
Christian AMATORE…………..Président
Anne HOUDUSSE……………..Examinateur
Jean-Marc VICTOR…………….Examinateur
AVERTISSEMENT
La version de cette thèse n’est pas la version complète de la thèse soutenue le 5 mars 2004. J’y ai enlevé l’article du chapitre 7 qui n’a pas encore été publié.
3
À Damien et Léonard.
4
5
REMERCIEMENTS Le présent travail a été réalisé au Laboratory of Experimental and Computational Biology, au National Cancer Institute des NIH de Bethesda aux États-Unis et au Laboratoire de Biochimie Théorique à l’Institut de Biologie Physico-Chimique à Paris. Je souhaite remercier tout particulièrement Richard Lavery qui a accepté de diriger cette thèse et m’a permis de partir un an travailler avec Robert Jernigan. Je tiens à lui témoigner ici toute ma reconnaissance pour m’avoir acceuillie dans son laboratoire et pour m’avoir accordé sa confiance dans la réalisation de ce travail. Je le remercie sincèrement d’avoir inspiré cette thèse avec enthousiasme. I would like to thank Robert Jernigan for accepting to have me as his first PhD student and for his kindness. Je voudrais exprimer ma profonde reconnaissance à Christian Amatore pour le soutient qu’il m’a témoigné tout au long de cette thèse et pour l’honneur qu’il me fait de présider le jury de thèse. Je remercie Monique Genest et David Perahia d’avoir aimablement accepté d’être les rapporteurs de cette thèse. Merci également à Anne Houdusse et Jean-Marc Victor, qui ont bien voulu examiner mon travail. Un merci tout particulier à Fabien Cailliez qui a su reprendre si vite la relève et à Chantal Prévost pour les longues discussions et les conseils en programmation. J’adresse également mes plus vifs remerciements à Marc Baaden, Philippe Derreumaux, Brigitte Hartmann, Anne Lebrun, Thérèse Malliavin, Alexey Mazur, Sophie Sacquin-Mora, Youri Timsit, Peter Varnai et Krystyna Zakrzewska, pour leur disponibilité, leurs conseils et les nombreuses discussions qui m’ont aidé tout au long de ces années. Merci à Daniel Piazzola pour sa bonne humeur et son assistance technologique qui ont contribué à la réussite de ce travail. Merci à Isabelle Lépine pour sa gentillesse et sa disponibilité en toutes circonstances. Merci à tous les étudiants en thèse rencontrés au Laboratoire de Biochimie théorique : Guillaume, Raphael, Ingrid, Emmanuel, Dragana, Guillaume, Cyril, Karine, Cyril et Fabien pour leur amitié et les pauses déjeuner. I am also grateful to all other scientists at the National Cancer Institute for their help and interesting discussions: particularly Pemra Doruker, Peter Greif, Ozlem Keskin, Ruth Nussinov, Yinon Shafrir, Michael Tolstorukov and Victor Zhurkin. Merci à Alain, Damien et Fabien pour la relecture de ce manuscrit. Merci aussi à vous que je ne cite pas ici mais qui avez contribué à cette thèse par vos conseils ou votre amitié. Enfin, je tiens à remercier ma famille et tout particulièrement mon tendre époux pour avoir toujours été à mes côtés pendant ces années, pour avoir partagé mes doutes et mes espoirs, pour m’avoir encouragée et soutenue lorsque je ne croyais plus en mon travail et pour avoir fêté avec moi mes petites victoires. Merci aussi à Léonard, qui a certes retardé un peu la réalisation de ce manuscrit mais qui m’a permis de l’écrire dans les meilleures conditions qu’il soit en faisant ses nuits.
6
7
TABLE DES MATIÈRES Chapitre 1 Introduction.................................................................................... 11
Chapitre 2 Les protéines................................................................................... 15
I Structure des protéines .............................................................................................. 16
I.2.1 Le carbone chiral ............................................................................................. 16 I.2.2 Propriétés acido-basiques................................................................................ 17 I.2.3 Classification suivant la nature des chaînes latérales ..................................... 17
I.3 La liaison peptidique ................................................................................................... 19 I.3.1 Une liaison plane ............................................................................................. 19 I.3.2 Les angles de la chaîne peptidique .................................................................. 21 I.3.3 Diagramme de Ramachandran ........................................................................ 22
I.4 La hiérarchie dans la description d’une structure protéique ........................................ 24 I.4.1 La structure primaire ....................................................................................... 24 I.4.2 La structure secondaire ................................................................................... 24 I.4.3 La structure tertiaire........................................................................................ 24 I.4.4 La structure quaternaire .................................................................................. 25
II Les structures secondaires......................................................................................... 26
II.1 Les hélices............................................................................................................... 26
II.1.1 L’hélice α........................................................................................................ 26 II.1.2 Les autres structures hélicoïdales .................................................................. 28
II.2 Le feuillet β............................................................................................................. 29 II.3 Coudes et boucles ................................................................................................... 30
Chapitre 3 Repliement, dynamique et stabilité des protéines....................... 31
III Stabilité des protéines ............................................................................................... 32
III.1 Une stabilité marginale ........................................................................................... 32 III.2 Les différents effets influençant la stabilité de la structure native.......................... 32
IV Le repliement des protéines ...................................................................................... 36
IV.1 Contrôle thermodynamique ou cinétique ? ............................................................. 36 IV.2 Les différents modèles de repliement proposés ...................................................... 38
8
IV.3 Le repliement in vivo .............................................................................................. 39 IV.4 Modèles théoriques pour étudier le repliement....................................................... 40
V Dépliement des protéines .......................................................................................... 42
V.1 Provoquer un dépliement in vitro ........................................................................... 42 V.1.1 Contraintes globales ....................................................................................... 42 V.1.2 Nanomanipulations ......................................................................................... 45
V.2 Études théoriques .................................................................................................... 48 V.2.1 Exemple d’une étude par dynamique moléculaire et données expérimentales48 V.2.2 Études théoriques du dépliement .................................................................... 49
VI Rigidité et flexibilité des protéines ........................................................................... 51
VI.1 Dynamique des protéines........................................................................................ 51 VI.2 Facteurs de température .......................................................................................... 52
VII Champ de force................................................................................................. 54
VIII Minimisation..................................................................................................... 59
VIII.1 Gradient simple et conjugué ............................................................................... 59 VIII.2 Quasi-Newton..................................................................................................... 61
IX Dynamique moléculaire ............................................................................................ 63
IX.1 Résolution de l’équation du mouvement ................................................................ 63 IX.2 Ensemble NPT ........................................................................................................ 65 IX.3 Conditions périodiques ........................................................................................... 66 IX.4 Troncature LJ .......................................................................................................... 67 IX.5 Protocole de dynamique moléculaire...................................................................... 68
X Représentation du solvant ......................................................................................... 70
X.1 Solvant explicite ..................................................................................................... 70 X.2 Solvant implicite : modèle de Born généralisé ....................................................... 70
XI Coordonnées internes ................................................................................................ 73
XI.1 Le système des coordonnées internes ..................................................................... 73 XI.2 Système d’axe : le pivot.......................................................................................... 74 XI.3 Minimisation ........................................................................................................... 75 XI.4 Champ de force et représentation du solvant .......................................................... 75 XI.5 Préparation des données : le programme PCHEM.................................................. 75 XI.6 Les différentes utilisations de LIGAND ................................................................. 76 XI.7 Limitation par la taille des protéines....................................................................... 76
XII Modèle granulaire............................................................................................. 77
XII.1 L’origine du modèle granulaire .............................................................................. 77
9
XII.2 Le modèle granulaire appliqué aux protéines ......................................................... 77 XII.3 GNM : Gaussian Network Model ........................................................................... 80 XII.4 ANM : Anisotropic Network Model....................................................................... 81 XII.5 Applications ............................................................................................................ 83
XII.5.1 Facteurs de température .............................................................................. 83 XII.5.2 Étude des modes normaux, graphes de corrélation ..................................... 84 XII.5.3 Description des ouvertures et des fermetures des sites enzymatiques ......... 88
XII.6 Modification des modèles granulaires .................................................................... 89 XII.7 Minimisation avec une représentation granulaire ................................................... 90
Chapitre 5 Des outils originaux pour comprendre les propriétés mécaniques
des protéines....................................................................................................... 91
XIII Les contraintes mécaniques appliquées sur les protéines ................................. 92
XIII.2 Contraintes globales : expériences de dépliement partiel ................................... 93 XIII.2.1 Contrainte RMS de distance ....................................................................... 93
XIII.2.2 Variante ne prenant en compte que les carbones α.................................... 94 XIII.2.3 Avantage de la contrainte ........................................................................... 94 XIII.2.4 Expériences de dépliement partiel en dynamique moléculaire................... 95
XIII.3 Contraintes locales............................................................................................ 100 XIII.3.1 Contrainte sur la valeur moyenne des distances ...................................... 100 XIII.3.2 Pourquoi cette contrainte ? ...................................................................... 101 XIII.3.3 Calcul des constantes de force de déplacement par résidu ...................... 101
XIV Domaines structuraux et mécaniques ............................................................. 104
XIV.1 Classification automatique ............................................................................... 105 XIV.1.1 Indice de dissimilarité ............................................................................... 106 XIV.1.2 Constitution des groupes........................................................................... 106
XIV.2 Comparer deux structures................................................................................. 108 XIV.2.1 Comparaison de deux structures............................................................... 108 XIV.2.2 Classification des protéines par comparaison .......................................... 109
XIV.3 Identifier des domaines rigides à partir de deux structures distinctes .............. 111 XIV.3.1 Utilisation de superposition entre structures ............................................ 112 XIV.3.2 Utilisation de matrice de distances ........................................................... 112 XIV.3.3 Exemple d’algorithme utilisant la comparaison de matrices de distances114 XIV.3.4 Le problème du bruit................................................................................. 114
10
XIV.4 Définition des blocs structuraux ....................................................................... 114 XIV.5 Identification de domaines sans comparaison .................................................. 116 XIV.6 Domaines mécaniques ...................................................................................... 117
Chapitre 6 Article : Flexibilité de la myosine : domaines structuraux et
I Contraintes globales ......................................................................................................... 199 II Contraintes locales .......................................................................................................... 201
ANNEXE 2 : Important Fluctuation Dynamics of Large Protein Structures
are Preserved upon Coarse-Grained Renormalization ............................... 205
where kB is the Boltzmann constant, Z is the configurational partition function, and tr[Η-1]ij is
the trace of the ijth submatrix [Η-1]ij of H-1. <∆Ri . ∆Rj> can be expressed as a sum over the
contributions [∆Ri • ∆Rj]k of the 3N-6 individual internal fluctuation modes as
<∆Ri • ∆Rj > = Σ k [∆Ri • ∆Rj]k. The contribution of the kth mode is explicitly given by,
[∆Ri • ∆Rj]k = kT tr [λk-1 uk ukT]ij
where λk is the kth non-zero eigenvalue of Η and uk is the corresponding eigenvector. The
eigenvalues are related to the frequencies of individual modes, and the eigenvectors describe
the effect of each mode on the positions of the N residues constituting the structure. The
eigenvalues are usually organized in ascending order (after removing the six zero eigenvalues
corresponding to overall translation and rotation), so that λ1 denotes the lowest frequency and
[∆Ri • ∆Rj]1 is the correlation for this mode of motion separately. Likewise, [(∆Ri)2]1 is the
mean-square fluctuation in the position of site i for mode 1. The slowest vibrational modes
usually dominate the collective dynamics of the structure and are particularly relevant to
biological function.
3. Determination of rigid blocks
Blocks of residues which move together in a coupled manner can be determined by the
comparison of two structures of the same protein. This analysis requires the construction of a
symmetric matrix termed D whose elements Dij are equal to 1 if the difference ∆ij of the
distances between two residues i and j in the two protein structures studied is below a
specified cutoff and is otherwise set to zero.
∆ij = | dA(i,j)-dB(i,j)|
and Dij = h(rd - ∆ij)
142
where dA(i,j) is the distance between residues i and j in structure A, dB(i,j) is the distance
between residues i and j in structure B and h(x) is the Heaviside step function (h(x)=1 if x ≥0,
and zero otherwise). D has dimensions NxN for an N residue protein. The value of the cutoff,
rd, is adjusted so that the analysis yields a reasonable number of blocks (see below).
As the resulting matrix is still complicated, it has to be refined in order to clearly delimit the
underlying blocks. This procedure involves starting with the first residue and constituting a
block with all consecutive residues j, as long as D(1,j) is equal to 1. If D(1,i) is equal to 0, a
new block is started with the criteria D(i,j)=1. Diagonal blocks are created this way. Two
diagonal blocks A and B then become part of a single block if the matrix element D(iA,iB) is
equal to 1, where iA and iB are the central residues within blocks A and B respectively (see
figure 57). The final matrix D is again a binary matrix, with D(i,j)=1 if i and j belong to the
same block.
Result and Discussion
Flexible regions within the myosin head
Starting from our ANM analysis of the three available structures of the myosin head, it is
possible to calculate the overall fluctuations of each amino acid residue in the form of the B-
factors commonly used in analyzing crystallographic structures,
Bi = 38 2π <∆Ri • ∆Ri >
figure 52 shows plots of these fluctuations for the DS, NR and TS structures. All calculation
of ∆Ri were performed with rc = 11 Å following the study of Atilgan et al. [Atilgan, et al.;
2001]. It is remarked that excellent agreement between such B-factors and crystallographic
data has already been demonstrated for other proteins [Atilgan, et al.; 2001, Bahar, et al.;
1998, Doruker, et al.; 2002a, Doruker, et al.; 2002b, Keskin, et al.; 2002a, Keskin, et al.;
143
2002b]. We can only make such comparisons in the case of the better resolved DS structure,
where the experimental values are available. The comparison with the theoretical results is
presented in figure 52 and shows a good overall agreement, with the exception of residues
belonging to the lever arm (775-835) and the RLC. These exceptions are most probably due to
the interactions which exist between the myosin lever arms within the crystal lattice, but are
naturally absent in our calculations. Since the spring constant γ is the only remaining
parameter of our calculations, its value can be determined by matching the areas under the
experimental and theoretical B-factor curves. This has been done for the residues in the zone
1-800 and leads to a value of 1.3 kcal/(Å2.mol). This value is comparable to the values found
for other proteins [Atilgan, et al.; 2001].
We can now compare the DS, NR and TS states of myosin. All three structures show rather
similar overall fluctuations. Each indicates a significant difference between the motor domain
(residues 1 to 775), which is rigid, and the lever arm (residues 820 to 835), which is flexible.
The regulatory light chain, which is located at the end of the lever arm structure, is also very
flexible, in contrast to the essential light chain. It should be recalled that these results refer to
an isolated myosin head, truncated at residue 835, and do not take into account the effects of
interactions with the actin filament or between neighboring myosin motors.
It is also recalled that myosin head structures we use are incomplete and the absence of
residues in some domains are the cause of significant local differences between the three
states which can be seen in figure 52. This is notably the case for the peaks observed near
residue 410 in the DS and TS structures and near residue 320 in the NR structure. There are
however some mechanically significant differences between the three states, most notably for
the contact region between the lever arm and the motor domain which are different in DS
compared to either the NR or TS structures. This change shows up in figure 52 as the peak in
fluctuations of residues 48-56 which is only seen for NR and TS, while only the DS structure
shows a peak for residues 508-510. The first peak can be easily explained by the fact that the
residues 48-56, belonging to the SH3 β-barrel, are distant from the lever arm in the NR and
TS structures, but close in DS. The second peak is coupled to the fact that the distance
separating the β-strand and α-helix elements of the so-called "relay" structure are more distant
from one another in DS than in either NR or TS (the elements of the relay are visible of the
left-hand side of the detailed views in the lower part of figure 53).
144
figure 52 : Calculated B-factors (solid curves) as a function of the residue numbers for three
structures of the myosin head composed of the main chain and the RLC and ELC light chains.
Calculations used a spring constant g of 1.3 kcal/(Å2.mol). Experimental B-factors are shown
for DS structure (dashed curves). The curves are interrupted at points where residues are
missing in the experimental structures. The scale chosen leads to overlap of the curves for the
particularly flexible RLC domain, but makes the details more visible for the remainder of the
structure.
In order to link these results more easily to the 3-dimensional structure of myosin, we use
color-coded ribbon models (where increasing fluctuations are indicated with a blue to red
gradation). The results shown in the upper part of figure 53 again stress the overall similarity
of the fluctuations for the three myosin structures. They also emphasize the flexibility of the
loops which compose the actin binding domain at the top of the S1 domain and the, probably
artefactual, flexibility of the end of the lever arm, compared to the stiffer region near the
essential light chain. Fluctuations are also seen to be more important at the surface of the
motor domain and in the lever arm, whereas the buried ATP site is a relatively rigid zone.
Since it is not easy to see the changes occurring with the motor domain in the full structure,
we have added detailed views in the lower part of figure 53. In addition to the changes in the
145
relay discussed above, these views show that the most rigid region corresponds to switch II
(the strand linking the central β-sheet to the α-helix of the relay) in NR and TS, but rather to
the ATP binding site in DS. This is in agreement with the remarks of Houdusse et al.
indicating that there is a stronger interaction between the elements linked by switch II in the
former structures [Houdusse, et al.; 2000].
The RLC and ELC light chains are known to play an important role biologically, and they can
be expected to modify the flexibility of the long α-helices which constitute the lever arm.
Their effect can be tested theoretically by comparing ANM calculations on the full myosin
head with calculations on structures where the light chains have been removed. The results of
these calculations are shown in figure 54 and figure 55. Removing the light chains is seen to
have a dramatic effect. As might be expected, in the absence of these proteins, there is a
significant increase in the fluctuations within the lever arm. However, it is also interesting to
note that although the more flexible parts of the motor domain (colored in orange in figure 55)
are still located on the surface of the structure, they do not occur in the same zones. Notably,
in the absence of the light chains, the loops near the actin-binding site become less flexible,
although the reason for this long range coupling is not obvious. Overall, maintaining the value
for the spring constant γ, the structure without RLC and ELC becomes four times more
flexible.
146
figure 53 : Upper part: Ribbon diagrams of the DS, NR and TS myosin head structures, color-
coded on the basis of the calculated B-factors (the color range from blue to red corresponds to
increasing fluctuations). Lower part: Detailed view of the part of the motor domain showing
the relay structure on the left and the nucleotide binding site on the right. Note that the color
scale has been adapted to show up changes within this fragment of the overall myosin
structure.
147
figure 54 : Calculated B-factors for the DS main chain as a function of residue number either
with (solid curves) or without the RLC and ELC light chains (dashed curves) . The inset
shows an expanded view of the results for the lever arm (residues 775 to 835). The curves are
normalized to yield equal areas for the residues 1-775.
figure 55 : Ribbon diagram of the DS myosin head, color-coded on the basis of the calculated
B-factors (the color range from blue to red corresponds to increasing fluctuations). On the left
- in the presence of the RLC and ELC light chains. On the right - in the absence of the light
chains.
148
Structurally coherent blocks of residues
The crystallographic data available for the DS, NR and TS structures of the myosin head
enables us to study flexibility from another point of view, by asking which blocks of residues
move in a coherent, coupled manner as myosin undergoes the conformational changes linked
to its motor cycle. We have carried out the rigid block analysis described in the methodology
section for the three possible pairs of structures: DS-TS, DS-NR and TS-NR. The limit
distance rd, which determines whether two residues are considered as part of the same block
was chosen as 0.1 Å following the preliminary studies illustrated in figure 56. These show
three representations of the matrix ∆, where ∆i,j=|dA(i,j)-dB(i,j)|. The data shown refers to the
case A = TS and B = DS. The color of a point within the matrix is red if ∆i,j > rd and graduated
from red to blue in terms of decreasing distance if ∆i,j < rd. If rd = 10 Å (figure 56a), we obtain
only two blocks which correspond, not surprisingly, to the myosin motor domain and the
lever arm. By decreasing rd (figure 56b and figure 56c), a finer distinction of movement is
obtained and more blocks appear. The selected limit of rd = 0.1 Å leads to roughly 20
structural blocks after the refinement procedure described in the methodology section and is
reasonable limit given the limited resolution of the experimental data.
figure 56 : Representation of the matrix ∆ij for the DS-TS structure comparison: (a) the values
of |dTS(i,j)-dDS(i,j)| from 0-10 Å are colored from blue to red. All values beyond 10 Å are
shown in red. (b) all values of |dTS(i,j)-dDS(i,j)| beyond 1 Å are shown in red. (c) all values
of |dTS(i,j)-dDS(i,j)| beyond 0.1 Å are shown in red.
figure 57 shows the D matrix with rd = 0.1 Å before and after refinement for the DS-TS, DS-
NR and TS-NR pairs. The resulting blocks can be linked to the 3-dimensional structure of
149
myosin, again using color-coded ribbon models (figure 58). Note that isolated residues and
two-residue blocks have been colored gray.
These results are in agreement with the division into four sub-domains connected by flexible
regions suggested by Houdusse et al. 6, although the subdivisions shown in figure 58 are
somewhat finer. The results for the three pairs of structures analyzed show overall similarity.
There are however some notable differences. In particular, the helix at the top of the motor
domain (colored tan in figure 58a, residues 416-446) belongs to a single block for the TS-DS
pair of structures, but is divided into three blocks (colored tan-yellow-orange in figure 58b
and figure 58c) when the structure NR is involved in the comparison. Given the position of
these residues, this change may well be related to the fact that the nucleotide binding pocket is
occupied in the structures DS and TS, but empty in NR.
150
figure 57 : Binary representation of the matrix Dij where 1's are colored in black and 0's in
white. Figures (a), (c) and (e) show the comparisons DS-TS, DS-NR and NR-TS before
refinement of the structural blocks (see methodology), while figures (b), (d) and (f) show the
same comparisons after refinement.
In fact, the presence of a nucleotide in the binding pocket seems to lead to larger structural
blocks in several regions. Thus, the zone formed by residues 231-243 (shown as ice blue in
figure 58a) forms a single block only when the nucleotide pocket is occupied and a similar
result is found for the residues 216-230 and 244-356 (shown in orange in figure 58a). A
151
similar distinction is found within the lever arm and light chains, where the three blocks
observed in the presence of a bound nucleotide (figure 58a), become four blocks when the
comparison involves an empty nucleotide pocket. It is also important to note that this analysis
clearly shows the "pliant point" within the region 775-780 (indicated by an arrow between the
yellow and red blocks in figure 58c) reported by Houdusse et al. [Houdusse & Sweeney;
2001].
figure 58 : Ribbon diagram of the DS myosin head structure, color-coded on the basis of the
calculated structural blocks (the color range from blue to red corresponds to increasing
fluctuations). Figures a, b and c show the blocks obtained from the DS-TS, DS-NR and NR-
TS comparisons respectively. Residues belonging to blocks of less than three residues are
shown in gray. The arrow in figure c indicates the so-called pliant point.
Links between collective vibrations and structural blocks
In order to test whether the results obtained by our rigid block analysis are related to the ANM
collective vibration analysis, we have repeated the B-factor calculations using a modified
spring model of myosin. The modification involves using two different spring constants to
152
mimic the existence of structural blocks. While maintaining the usual spring constant between
residues belonging to different blocks, we increase the spring constant by a factor of 100 for
residue pairs within a single block. If the block analysis can be related to rigidity within
blocks and flexibility between blocks, the modified spring constants would not be expected to
significantly change the calculated B-factors. As a control, we have also carried out B-factor
calculations with modified spring constants based on artificially constructed blocks which
cross the block boundaries we have actually determined. Note that the cutoff distance for
forming inter-residue springs is kept at 11 Å for all these studies.
figure 59 : Calculated B-factors for the DS structure using two spring constants which take
into account the rigid blocks obtained from the DS-TS comparison (solid curve) or using a
single spring constant (dashed curve).
figure 59 displays the modified B-factors calculated with two spring constants for the DS
structure, taking into account the structural blocks obtained from the DS-TS comparison. The
B-factors calculated with the standard spring constant of 1.3 kcal/(Å2.mol) are shown for
comparison. Note that the total area under the two curves have been made equal. It can be
seen that the modified B-factors are nearly identical to those calculated with a single spring
153
constant. Minor differences occur for residues 475-525 and residues 650-690 which do not
belong to structural blocks and are found to be a little more flexible than with the previous
calculation.
We have repeated this analysis for the three available myosin structures, using either of the
rigid block definitions involving the structure in question. This leads to a total of six different
B-factor curves which can be compared with the single spring constant result. In all cases, the
minor changes observed support the compatibility of the rigid block and the ANM analyses.
In contrast, if we use artificially constructed blocks bridging the principal boundaries between
the true rigid blocks, much more significant changes in the B-factor curves are found.
Compared to the reference B-factor curve, the mean relative error found with the artificial
blocks is 22%, compared to only 5% with the correctly formed blocks. We can therefore
conclude that there is indeed a close relation between the ANM calculations and the rigid
block analysis.
Conclusions
By combining coarse-grained methods with available crystallographic data, we have been
able to study the flexibility of myosin motor protein, a system involving almost 1000 amino
acid residues. We have used two approaches to obtain information, first, calculating residue
fluctuations using the ANM elastic model and, second, defining rigid structural blocks by an
analysis of conformational changes. Good agreement is found with available experimental
data.
These two approaches, which have been shown to yield compatible results, enable us to
distinguish and to quantify the rigid and flexible domains within the myosin structure.
Although, the basic mechanics of myosin seems to be preserved amongst its various known
conformations, changes have been detected in the flexibility at the motor domain-lever arm
interface and also linked to the presence or absence of a ligand within the nucleotide binding
pocket. We have also been able to show that the regulatory and essential light chains play a
significant role in determining the rigidity of the myosin lever arm.
Acknowledgment
I.N. acknowledges support from Foundation for Advanced Education in the Sciences and
from the National Institutes of Health.
154
XVII Conclusion
Cette étude comparative de trois structures de myosine II à différents moments du cycle acto-
myosine permet d’apporter quelques indices utiles à la compréhension du mécanisme de ce
moteur moléculaire.
D’une part, l’étude des modes normaux de la représentation granulaire des structures montrent
que le bras de levier est beaucoup plus mobile que le cœur de la tête, et en particulier que le
site de fixation de l’ATP. Cette remarque est vraie pour des structures du fragment S1 isolées
(en opposition à des fragments dans un réseau cristallographique où les cous sont en
interaction avec les autres structures ou reliés à la queue de la myosine). De même les boucles
situées en surface et particulièrement celles proches du site de fixation de l’actine (absente
dans toutes les structures) sont plutôt mobiles. Des différences sont toutefois observées pour
l’état détaché dont le bras de levier est proche d’un motif SH3 de la tête et dans lequel la
distance séparant l’hélice constituant le bras de levier du brin β dans la région dite « relay »
est plus grande que dans les autres structures. De plus, dans cette même structure, la région la
plus rigide se situe au niveau du site de fixation du nucléotide au lieu de se trouver au niveau
du lien dit « switch II » liant la région « relay » au site de fixation du nucléotide comme dans
les autres structures. Cela confirme le fait que les éléments structuraux constitutifs de l’état
détaché sont plus découplés que des autres états.
Le rôle des chaînes légères modifiant la flexibilité du bras de levier est confirmé par une
comparaison des facteurs de température du bras de levier calculés en présence ou absence
des chaînes légères. De même, les zones de la tête présentant une mobilité élevée sont
différents suivant que l’on considère ou non les chaînes légères dans le calcul.
D’autre part, la détermination de domaines structuraux par comparaison des matrices de
distance entre structures est compatible avec l’étude des modes normaux. La prise en compte
de ces domaines dans le calcul d’AMN donne en effet des résultats similaires à ceux présentés
précédemment. Les domaines définis dans notre étude sont plus fins que ceux couramment
employés pour décrire les têtes de myosines mais compatibles avec ces derniers. Ils
permettent de mettre en évidence des points de charnières comme la région dite « pliant
point » ou « kink » [Houdusse & Sweeney; 2001, Xiao, et al.; 2003] du bras de levier. En
155
regardant plus en détail la région de fixation du nucléotide, on remarque que l’hélice α
constituée des résidus 416 à 446 est divisée en deux blocs lorsqu’on compare la structure non
complexée aux autres structures dans lesquelles un nucléotide est lié au site enzymatique.
Ceci montre que cette hélice n’est pas rigide et se plie en son centre suivant l’absence ou la
présence d’un nucléotide.
L’étude présentée dans ce premier article montre une approche des propriétés mécaniques des
protéines par la comparaison de structures et le calculs de modes normaux. Nous allons
décrire dans la suite une autre approche des problèmes mécaniques des protéines.
156
157
Chapitre 7 Article : Propriétés mécaniques des
protéines à l’échelle du résidu et leur
utilisation pour définir des structurations en
domaines
158
XVIII Introduction
Dans cet article, nous présentons une méthode théorique pour tester les propriétés mécaniques
des protéines à l’échelle du résidu et son utilisation afin de définir des domaines structuraux
basés sur ces propriétés.
Les résidus d’une protéine sont sondés les uns après les autres en augmentant ou diminuant la
longueur moyenne reliant le carbone α du résidu sondé aux autres carbones α. La forme de la
surface énergétique le long de cette coordonnée autour de la position d’équilibre initiale est
quadratique (voir paragraphe XIII.3 page 100). On définit donc une constante de force qui
rend compte de la résistance du système à une telle contrainte (plus la constante est grande,
plus le système est résistant). L’ordre de grandeur de cette constante de force est le nN.Å-1
mais ses valeurs peuvent varier d’un facteur 50 suivant le résidu sondé. La réponse de la
protéine à la contrainte nous permet aussi de définir des domaines mécaniques en nous basant
sur le déplacement relatif des carbones α par rapport au carbone α testé (voir paragraphe
XIV.6 page 117). Une étude plus systématique de la position des résidus possédant une
grande constante de force nous a par ailleurs révélé que ceux-ci étaient situés aux interfaces
entre les domaines précedemment définis.
Deux modèles de représentation de protéines ont été testés : un modèle représentant tous les
atomes dans un champ de force défini par les paramètres parm99 d’AMBER en travaillant
avec les variables internes (voir la description du programme LIGAND paragraphe XI page
73 ) et un modèle granulaire modélisant les protéines sous la forme d’un réseau de ressorts
gaussiens entre les carbones α (voir la description du programme GNMlig paragraphe XII.7
page 90 ). Le modèle granulaire a l’avantage d’être très rapide et permet une étude
systématique ainsi que l’étude de système de taille importante.
L’exemple de la nucléase du staphylocoque est étudié avec ces deux approches et six autres
protéines, possédant entre 140 et 750 résidus, ont été étudiées avec la représentation
granulaire.
159
XIX Probing protein mechanics: Residue-level
properties and their use in defining domain structures
Isabelle Navizet, Fabien Cailliez and Richard Lavery
Soumis en février 2004 à Biophysical Journal
Abstract
It is becoming clear that, in addition to structural properties, the mechanical properties of
proteins can play an important role in their biological activity. It nevertheless remains difficult
to probe these properties experimentally. While single molecule experiments give access to
overall mechanical behavior, notably the impact of end-to-end stretching, it is currently
impossible to directly obtain data on more local properties. We propose a theoretical method
for probing the mechanical properties of protein structures at the single amino acid level. This
approach can be applied to both all-atom and simplified protein representations. The probing
leads to force constants for local deformations and to deformation vectors indicating the paths
of least mechanical resistance and also defining the mechanical coupling which exists
between residues. Results obtained for a variety of proteins show that the calculated force
constants vary over a wide range. An analysis of the induced deformations provides
information which is distinct from that obtained with measures of atomic fluctuations and is
more easily linked to residue-level properties than normal mode analyses or dynamic
trajectories. It is also shown that the data obtained from residue-level probing makes it
possible to define domains using this mechanical information.
Keywords: Molecular modeling, molecular dynamics, protein deformation, coarse-grained
models, dynamical domains
Introduction
AVERTISSEMENT
La version de cette thèse n’est pas la version complète de la thèse soutenue le 5 mars 2004. J’y ai enlevé l’article du chapitre 7 qui n’a pas encore été publié.
AVERTISSEMENT
La version de cette thèse n’est pas la version complète de la thèse soutenue le 5 mars 2004. J’y ai enlevé l’article du chapitre 7 qui n’a pas encore été publié.
182
XX Conclusion
Cet article décrit une méthode que nous avons mise au point afin de tester les propriétés
mécaniques des structures protéiques à l’échelle du résidu. En appliquant une contrainte sur la
distance moyenne séparant un résidu i donné des autres résidus, la structure se déforme
donnant une information scalaire et une information vectorielle. L’information scalaire est
une constante de force informant sur la facilité avec laquelle le résidu i répond à une telle
contrainte. L’information vectorielle est la direction préférentielle de déplacement que choisit
le résidu i correspondant à la direction de plus faible résistance. La donnée des variations des
distances entre les carbones α pour satisfaire les contraintes permet de définir des domaines
structuraux. La combinaison de ces deux informations nous a de plus permis de remarquer
que les résidus les plus résistants sont situés à l’interface des domaines.
Nous avons utilisé cette méthode pour définir les domaines mécaniques de la nucléase du
staphylocoque avec une représentation tenant compte de tous les atomes et ceux de six autres
protéines en utilisant une représentation simplifiée ne tenant compte que des carbones α.
Il serait intéressant de regarder plus en détails les différents domaines obtenus. Ils peuvent en
effet sûrement expliquer des propriétés mécaniques liées aux informations structurelles et aux
mécanismes chimiques. De même, la comparaison de leur évolution le long d’un dépliement
et la comparaison de leur emplacement avec l’enchaînement du dépliement peut constituer
une étude intéressante en vue de mieux comprendre le dépliement et le repliement des
protéines.
183
Chapitre 8 Conclusion générale
Le travail de thèse qui vient d’être présenté a été effectué au sein de deux laboratoires :
l’étude des modes normaux et la détermination des domaines structuraux de la myosine ont
été développées au Laboratory of Experimental and Computational Biology, au National
Cancer Institute des NIH de Bethesda dans le Maryland (Etats-Unis) avec Robert L. Jernigan
et le développement des contraintes mécaniques et leur utilisation pour déterminer des
domaines mécaniques ont été effectués au Laboratoire de Biochimie Théorique à l’Institut de
Biologie Physico-Chimique à Paris (France) sous la direction de Richard Lavery.
L’étude des propriétés mécaniques des protéines a été abordée à différents niveaux de
représentation (atomiques ou granulaires) et sous plusieurs aspects.
Nous avons montré qu’une représentation très simplifié de la protéine comme appliquée dans
les programmes GNM et GNMlig permettait d’obtenir des résultats très intéressants avec des
calculs rapides et applicables sur de gros systèmes. L’analyse des résultats des études utilisant
cette représentation doit toutefois se limiter à des informations rudimentaires des propriétés.
En effet, le modèle granulaire gomme les informations sur les interactions chimiques entre
résidus et contraint l’étude de la structure autour de son état d’équilibre. Ainsi, l’étude du
dépliement d’une protéine sur un tel modèle se limite aux conformations proches de l’état
natif puisqu’elle ne permet pas à la structure de franchir des barrières d’énergie. Par contre, ce
modèle permet d’avoir accès aux facteurs de température par une étude des modes normaux
184
car ce sont les modes normaux les plus globaux qui contribuent majoritairement à leur calcul
théorique. Les facteurs de température, qu’on peut aussi obtenir expérimentalement si la
résolution de l’étude cristallographique est suffisamment bonne, sont liés à la compaction
locale autour des résidus étudiés.
Pour aborder une étude plus fine des liens entre structure et mécanique, nous avons défini un
indice permettant de caractériser l’élasticité d’un brin polypeptidique résidu par résidu. De
telles informations ne sont pas faciles à obtenir par l’analyse des trajectoires de dynamique
moléculaire ou par les calculs des modes normaux. La réponse d’une structure protéique à une
contrainte sur la moyenne des distances séparant un carbone Cα,i aux autres carbones α de la
structure permet de calculer une constante de force et révèle la direction de déplacement
montrant la plus faible résistance. La localisation des résidus les plus résistants et l’analyse
des déformations favorables sont des caractéristiques de la chaîne protéique étudiée. Il serait
intéressant de poursuivre cette recherche dans le cadre des études sur le rôle biologique des
résidus en question.
Une autre approche du problème mécanique que nous avons abordée est la délimitation de
domaines au sein des structures biologiques. La première méthode présentée est issue de la
comparaison entre structures d’une même protéine. Elle s’applique de façon naturelle dans le
cadre de notre étude de la myosine dont nous possédons plusieurs structures. Cette approche
simpliste est toutefois limitée à des études de structures très proches. Elle peut être utilisée par
exemple sur l’étude de structures obtenues par dynamique moléculaire, par l’étude des modes
normaux ou par des expériences de dépliement sous contrainte.
La deuxième méthode découle de la réponse aux contraintes mécaniques locales. La démarche
originale de détermination de domaines mécaniques est intéressante car, d’une part, elle est
intrinsèque à une structure donnée et ne nécessite pas de comparaison ni de superposition de
structures et, d’autre part, elle est liée à une information plus riche qu’une simple observation
de la structure. Il serait intéressant de comparer les domaines ainsi obtenus avec d’autres
méthodes. De même, un certain nombre de questions pourraient être abordées : Retrouve-t-on
les mêmes domaines mécaniques si on analyse deux structures différentes d’une même
protéine ? Les domaines mécaniques permettent-ils de prédire les réponses à une contrainte
mécanique globale dans l’étude du dépliement des protéines ? L’analyse de leur évolution lors
d’un dépliement ainsi que celle des constantes de force relatives à leur obtention donne-t-elle
des informations sur les parties les plus sensibles et les plus résistantes ?
185
Ainsi, les algorithmes que nous avons développés et dont nous avons présenté les premières
applications pourront dans l’avenir peut-être amener des éléments de réponses sur quelques
questions fondamentales comme le mécanisme de repliement des protéines. Mais on espère
aussi qu’ils permettront de donner des indications sur les caractéristiques mécaniques des sites
enzymatiques (notamment en comparant des enzymes dont le rôle des sites catalytiques a
divergé au cours de l’évolution tout en conservant la même localisation dans la structure
[Hasson, et al.; 1998]) et les surfaces d’interaction des protéines (observe-t-on des différences
de propriétés mécaniques au niveau des sites d’interaction ? Comment les propriétés
mécaniques d’une protéine au sein d’un complexe sont modifiées par rapport à celles de la
même protéine hors du complexe ?) ou les raisons mécaniques de la thermostabilité (quelle
différence observe-t-on entre les propriétés mécaniques des protéines thermophiles et de leurs
homologues mésophiles ?).
Le travail de recherche présenté dans ce mémoire de thèse correspond essentiellement au
développement des méthodes présentées. Seules quelques applications de ces méthodes
originales ont été abordées. Le champ d’application de ces méthodes est vaste car la
compréhension du comportement des protéines est encore très partielle et nous avons montré
que la modélisation moléculaire permet d’aller là où l’expérience ne peut pas encore fournir
les informations nécessaires.
186
187
BIBLIOGRAPHIE Alberts B., Bray D., Lewis J., Raff M., Roberts K. & Watson J. (1994) Molecular biology of
the cell. Garland Science, New York. Allemand J. F., Bensimon D., Lavery R. & Croquette V. (1998) Stretched and overwound
DNA forms a Pauling-like structure with exposed bases. Proc Natl Acad Sci U S A. 95(24): 14152-7.
Allen M. & Tildesley D. (1987) Computer simulations of liquids. Clarendon Press., Oxford. Alonso D. O. & Daggett V. (1995) Molecular dynamics simulations of protein unfolding and
limited refolding: characterization of partially unfolded states of ubiquitin in 60% methanol and in water. J Mol Biol. 247(3): 501-20.
Anfinsen C. B. & Scheraga H. A. (1975) Experimental and theoretical aspects of protein folding. Adv Protein Chem. 29: 205-300.
Atilgan A. R., Durell S. R., Jernigan R. L., Demirel M. C., Keskin O. & Bahar I. (2001) Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys J. 80(1): 505-15.
Bahar I., Atilgan A. R., Demirel M. C. & Burack E. (1998) Vibrational Dynamics of Folded Proteins: Significance of Slow and Fast Motions in Relation to Function and Stability. Phys Rev Lett. 80: 2733-2736.
Bahar I., Atilgan A. R. & Erman B. (1997) Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des. 2(3): 173-81.
Bahar I., Erman B., Jernigan R. L., Atilgan A. R. & Covell D. G. (1999) Collective motions in HIV-1 reverse transcriptase: examination of flexibility and enzyme function. J Mol Biol. 285(3): 1023-37.
Bahar I. & Jernigan R. L. (1998) Vibrational dynamics of transfer RNAs: comparison of the free and synthetase-bound forms. J Mol Biol. 281(5): 871-84.
Bahar I. & Jernigan R. L. (1999) Cooperative fluctuations and subunit communication in tryptophan synthase. Biochemistry. 38(12): 3478-90.
Baker J. P. & Titus M. A. (1998) Myosins: matching functions with motors. Curr Opin Cell Biol. 10(1): 80-6.
Baldwin R. L. (1996) Why is protein folding so fast? Proc Natl Acad Sci U S A. 93(7): 2627-8.
Bashford D. & Case D. (2000) Generalized Born models of macromolecular solvation effects. Annu Rev Phys Chem. 51: 129-152.
Bastard K., Thureau A., Lavery R. & Prevost C. (2003) Docking macromolecules with flexible segments. J Comput Chem. 24(15): 1910-20.
Bensimon D. (1996) Force: a new structural control parameter? Structure. 4(8): 885-9. Berendsen H. J. C., Postma J. P. M., van Gunsteren W. F., DiNola A. & Haak J. R. (1984)
Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81: 3684-3690. Berg J. S., Powell B. C. & Cheney R. E. (2001) A millennial myosin census. Mol Biol Cell.
12(4): 780-94. Berman H. M., Battistuz T., Bhat T. N., Bluhm W. F., Bourne P. E., Burkhardt K., Feng Z.,
Gilliland G. L., Iype L., Jain S., Fagan P., Marvin J., Padilla D., Ravichandran V., Schneider B., Thanki N., Weissig H., Westbrook J. D. & Zardecki C. (2002) The Protein Data Bank. Acta Crystallogr D Biol Crystallogr. 58(Pt 6 No 1): 899-907.
Berman H. M., Westbrook J., Feng Z., Gilliland G., Bhat T. N., Weissig H., Shindyalov I. N. & Bourne P. E. (2000) The Protein Data Bank. Nucleic Acids Res. 28(1): 235-42.
188
Bertucat G., Lavery R. & Prevost C. (1999) A molecular model for RecA-promoted strand exchange via parallel triple-stranded helices. Biophys J 77: 1562-76.
Bjorkman A. J. & Mowbray S. L. (1998) Multiple open forms of ribose-binding protein trace the path of its conformational change. J Mol Biol. 279(3): 651-64.
Block S. M. (1996) Fifty ways to love your lever: myosin motors. Cell. 87(2): 151-7. Bond C. J., Wong K. B., Clarke J., Fersht A. R. & Daggett V. (1997) Characterization of
residual structure in the thermally denatured state of barnase by simulation and experiment: description of the folding pathway. Proc Natl Acad Sci U S A. 94(25): 13409-13.
Bork P. (1992) Mobile modules and motifs. Curr Opin Struct Biol. 2: 413-421. Brockwell D. J., Beddard G. S., Clarkson J., Zinober R. C., Blake A. W., Trinick J., Olmsted
P. D., Smith D. A. & Radford S. E. (2002) The effect of core destabilization on the mechanical resistance of I27. Biophys J. 83(1): 458-72.
Brockwell D. J., Paci E., Zinober R. C., Beddard G. S., Olmsted P. D., Smith D. A., Perham R. N. & Radford S. E. (2003) Pulling geometry defines the mechanical resistance of a beta- sheet protein. Nature Structural Biology. 10(9): 731-737.
Bryant Z., Pande V. S. & Rokhsar D. S. (2000) Mechanical unfolding of a beta-hairpin using molecular dynamics. Biophysical Journal. 78(2): 584-589.
Bryant Z., Stone M. D., Gore J., Smith S. B., Cozzarelli N. R. & Bustamante C. (2003) Structural transitions and elasticity from torque measurements on DNA. Nature. 424(6946): 338-41.
Bustamante C., Bryant Z. & Smith S. B. (2003) Ten years of tension: single-molecule DNA mechanics. Nature. 421(6921): 423-7.
Carrion-Vazquez M., Li H., Lu H., Marszalek P. E., Oberhauser A. F. & Fernandez J. M. (2003) The mechanical stability of ubiquitin is linkage dependent. Nat Struct Biol. 10(9): 738-43.
Carrion-Vazquez M., Oberhauser A. F., Fowler S. B., Marszalek P. E., Broedel S. E., Clarke J. & Fernandez J. M. (1999) Mechanical and chemical unfolding of a single protein: a comparison. Proc Natl Acad Sci U S A. 96(7): 3694-9.
Carugo O. & Pongor S. (2002) Protein fold similarity estimated by a probabilistic approach based on C(alpha)-C(alpha) distance comparison. J Mol Biol. 315(4): 887-98.
Case D. A., Pearlman D. A., Caldwell J. W., Cheatham III T. E., Wang J., Ross W. S., Simmerling C. L., Darden T. A., Mer K. M., Stanton R. V., Cheng A. L., Vincent J. J., Crowley M., Tsui V., Gohlke H., Radmer R. J., Duan Y., Pitera J., Massova I., Seibel G. L., Singh U. C., Weimer P. K. & Kollman P. A. (2002) AMBER7.
Chakravarty S. & Varadarajan R. (2002) Elucidation of factors responsible for enhanced thermal stability of proteins: a structural genomics based study. Biochemistry. 41(25): 8152-61.
Chan H. S. & Dill K. A. (1998) Protein folding in the landscape perspective: chevron plots and non-Arrhenius kinetics. Proteins: Struct. Funct. Genet. 30(1): 2-33.
Chandon J. L. & Pinson S. (1981) Analyse typologique : théories et applications. Masson, Paris New York.
Chattopadhyaya R., Meador W. E., Means A. R. & Quiocho F. A. (1992) Calmodulin structure refined at 1.7 A resolution. J Mol Biol. 228(4): 1177-92.
Cheatham III T. E., Miller J. L., Fox T., Darden T. A. & Kollman P. A. (1995) Molecular Dynamics Simulation on Solvated Biomolecular Systems: The Particle Mesh Ewald Method Leads to Stable Trajectories of DNA, RNA and Proteins. J. Am. Chem. Soc. 117(14): 4193-4194.
189
Cheatham T. E., Miller J. L., Fox T., Darden T. A. & Kollman P. A. (1995) Molecular-Dynamics Simulations on Solvated Biomolecular Systems - the Particle Mesh Ewald Method Leads to Stable Trajectories of DNA, Rna, and Proteins. Journal of the American Chemical Society. 117(14): 4193-4194.
Chen J., Lu Z., Sakon J. & Stites W. E. (2000) Increasing the thermostability of staphylococcal nuclease: implications for the origin of protein thermostability. J Mol Biol. 303(2): 125-30.
Chen J. & Stites W. E. (2001) Packing is a key selection factor in the evolution of protein hydrophobic cores. Biochemistry. 40(50): 15280-9.
Chothia C. (1976) The nature of the accessible and buried surfaces in proteins. J Mol Biol. 105(1): 1-12.
Cluzel P., Lebrun A., Heller C., Lavery R., Viovy J. L., Chatenay D. & Caron F. (1996) DNA: an extensible molecule. Science. 271(5250): 792-4.
Cooper J. B., Khan G., Taylor G., Tickle I. J. & Blundell T. L. (1990) X-ray analyses of aspartic proteinases. II. Three-dimensional structure of the hexagonal crystal form of porcine pepsin at 2.3 A resolution. J Mol Biol. 214(1): 199-222.
Corey R. B. & Pauling L. (1953) Fundamental dimensions of polypeptide chains. Proc R Soc Lond B Biol Sci. 141(902): 10-20.
Cornell W. D., Cieplak P., Bayly C. I., Gould I. R., Merz K. M. J., Ferguson D. M., Spellmeyer D. C., Fox T., W. C. J. & Kollman P. A. (1995) A second generation force field for the simulation of proteins and nucleic acids. J. Am. Chem. Soc. 117(19): 5179-5197.
Cornell W. D., Cieplak P., Bayly C. I., Gould I. R., Merz K. M. J., Ferguson D. M., Spellmeyer D. C., Fox T., W. C. J. & Kollman P. A. (1996) A second generation force field for the simulation of proteins and nucleic acids, Additions & Correction. J. Am. Chem. Soc. 118(9): 2309-2309.
Crippen G. M. (1978) The tree structural organization of proteins. J Mol Biol. 126(3): 315-32. Daggett V. (2000) Long timescale simulations. Curr Opin Struct Biol. 10(2): 160-4. Daggett V. (2001) Molecular dynamics simulations of protein unfolding/folding. dans Protein
Structure, Stability, and Folding. ed. K. Murphy dans la série, Methods in molecular biology par J. Walker, Humana Press, 168, Totowa.
Daggett V. & Fersht A. (2003a) The present view of the mechanism of protein folding. Nat Rev Mol Cell Biol. 4(6): 497-502.
Daggett V. & Fersht A. R. (2003b) Is there a unifying mechanism for protein folding? Trends Biochem Sci. 28(1): 18-25.
Daggett V. & Levitt M. (1992) Molecular dynamics simulations of helix denaturation. J Mol Biol. 223(4): 1121-38.
Daggett V., Li A., Itzhaki L. S., Otzen D. E. & Fersht A. R. (1996) Structure of the transition state for folding of a protein derived from experiment and simulation. J Mol Biol. 257(2): 430-40.
Darden T., York D. & Pedersen L. (1993) Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. Journal of Chemical Physics. 98(12): 10089-10092.
Demirel M. C., Atilgan A. R., Jernigan R. L., Erman B. & Bahar I. (1998) Identification of kinetically hot residues in proteins. Protein Sci. 7(12): 2522-32.
Diday E., Lemaire J., Pouget J. & Testu F. (1982) Eléments d'analyse de données. Dunod, Paris.
Dill K. A. (1990) Dominant forces in protein folding. Biochemistry. 29(31): 7133-55. Dill K. A., Fiebig K. M. & Chan H. S. (1993) Cooperativity in protein-folding kinetics. Proc
Natl Acad Sci U S A. 90(5): 1942-6.
190
Dohoney K. M. & Gelles J. (2001) Chi-sequence recognition and DNA translocation by single RecBCD helicase/nuclease molecules. Nature. 409(6818): 370-4.
Dominguez R., Freyzon Y., Trybus K. M. & Cohen C. (1998) Crystal structure of a vertebrate smooth muscle myosin motor domain and its complex with the essential light chain: visualization of the pre-power stroke state. Cell. 94(5): 559-71.
Doruker P., Atilgan A. R. & Bahar I. (2000) Dynamics of proteins predicted by molecular dynamics simulations and analytical approaches: application to alpha-amylase inhibitor. Proteins. 40(3): 512-24.
Doruker P., Jernigan R. L. & Bahar I. (2002a) Dynamics of large proteins through hierarchical levels of coarse-grained structures. J Comput Chem. 23(1): 119-27.
Doruker P., Jernigan R. L., Navizet I. & Hernandez R. (2002b) Important fluctuation dynamics of large protein structures are preserved upon coarse-grained renormalization. Int J of Quantum Chem. 90(2): 822-837.
Duan Y. & Kollman P. A. (1998) Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science. 282(5389): 740-4.
Eisenberg D. & McLachlan A. D. (1986) Solvation energy in protein folding and binding. Nature. 319(6050): 199-203.
Essevaz-Roulet B., Bockelmann U. & Heslot F. (1997) Mechanical separation of the complementary strands of DNA. Proc Natl Acad Sci U S A. 94(22): 11935-40.
Evans E. & Ritchie K. (1997) Dynamic strength of molecular adhesion bonds. Biophys J. 72(4): 1541-55.
Falicov A. & Cohen F. E. (1996) A surface of minimum area metric for the structural comparison of proteins. J Mol Biol. 258(5): 871-92.
Fersht A. R. & Daggett V. (2002) Protein folding and unfolding at atomic resolution. Cell. 108(4): 573-82.
Finkelstein A. V. (1997) Can protein unfolding simulate protein folding? Protein Eng. 10(8): 843-5.
Fisher T. E., Marszalek P. E. & Fernandez J. M. (2000) Stretching single molecules into novel conformations using the atomic force microscope. Nat Struct Biol. 7(9): 719-24.
Florin E. L., Moy V. T. & Gaub H. E. (1994) Adhesion forces between individual ligand-receptor pairs. Science. 264(5157): 415-7.
Flory P. J. (1969) Statistical mechanics of chain molecules. Interscience-Wiley Publishers, New York.
Freire E. (2001) The thermodynamic linkage between protein structure, stability and function. dans Protein Structure, Stability, and Folding. ed. K. Murphy dans la série, Methods in molecular biology par J. Walker, Humana Press, 168, Totowa.
Frenkel D. & Smit B. (2002) Understanding molecular simulation, from algorithms to applications. Academic press,
Frye K. J. & Royer C. A. (1998) Probing the contribution of internal cavities to the volume change of protein unfolding under pressure. Protein Sci. 7(10): 2217-22.
Gao M., Lu H. & Schulten K. (2001) Simulated refolding of stretched titin immunoglobulin domains. Biophys J. 81(4): 2268-77.
Gao M., Lu H. & Schulten K. (2002) Unfolding of titin domains studied by molecular dynamics simulations. J Muscle Res Cell Motil. 23(5-6): 513-21.
Geeves M. A. (1991) The dynamics of actin and myosin association and the crossbridge model of muscle contraction. Biochem J. 274 ( Pt 1): 1-14.
Gerstein M., Lesk A. M. & Chothia C. (1994) Structural mechanisms for domain movements in proteins. Biochemistry. 33(22): 6739-49.
191
Gilquin B., Guilbert C. & Perahia D. (2000) Unfolding of hen egg lysozyme by molecular dynamics simulations at 300K: insight into the role of the interdomain interface. Proteins. 41(1): 58-74.
Godzik A. (1996) The structural alignment between two proteins: is there a unique answer? Protein Sci. 5(7): 1325-38.
Grottesi A., Ceruso M. A., Colosimo A. & Di Nola A. (2002) Molecular dynamics study of a hyperthermophilic and a mesophilic rubredoxin. Proteins. 46(3): 287-94.
Gulick A. M., Bauer C. B., Thoden J. B., Pate E., Yount R. G. & Rayment I. (2000) X-ray structures of the Dictyostelium discoideum myosin motor domain with six non-nucleotide analogs. J Biol Chem. 275(1): 398-408.
Ha Duong T. & Zakrzewska K. (1997) Calculation and analysis of low frequency normal modes for DNA, Lab. de Biochimie Theor. Inst. de Biol. Physico-Chimique Paris France.
Haliloglu T. & Bahar I. (1999) Structure-based analysis of protein dynamics: comparison of theoretical results for hen lysozyme with X-ray diffraction and NMR relaxation data. Proteins. 37(4): 654-67.
Haliloglu T., Bahar I. & Erman B. (1997) Gaussian Dynamics of Folded Proteins. Phys Rev Lett. 79(16): 3090-3093.
Halle B. (2002) Flexibility and packing in proteins. Proc Natl Acad Sci U S A. 99(3): 1274-9. Harrison S. C. & Durbin R. (1985) Is there a single pathway for the folding of a polypeptide
chain? Proc Natl Acad Sci U S A. 82(12): 4028-30. Hasson M. S., Schlichting I., Moulai J., Taylor K., Barrett W., Kenyon G. L., Babbitt P. C.,
Gerlt J. A., Petsko G. A. & Ringe D. (1998) Evolution of an enzyme active site: the structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase. Proc Natl Acad Sci U S A. 95(18): 10396-401.
Hawkins G. D., Cramer C. J. & Truhlar D. G. (1995) Pairwise solute screening of solute charges from a dielectric medium. Chem. Phys. Lett. 246: 122-129.
Hawkins G. D., Cramer C. J. & Truhlar D. G. (1996) Parameterized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium. J. Phys. Chem. 100: 19824-19839.
Hayward S., Kitao A. & Berendsen H. J. (1997) Model-free methods of analyzing domain motions in proteins from simulation: a comparison of normal mode analysis and molecular dynamics simulation of lysozyme. Proteins. 27(3): 425-37.
Himmel D. M., Gourinath S., Reshetnikova L., Shen Y., Szent-Gyorgyi A. G. & Cohen C. (2002) Crystallographic findings on the internally uncoupled and near-rigor states of myosin: further insights into the mechanics of the motor. Proc Natl Acad Sci U S A. 99(20): 12645-50.
Hinsen K. (1998) Analysis of domain motions by approximate normal mode calculations. Proteins. 33(3): 417-29.
Hinsen K., Thomas A. & Field M. J. (1999) Analysis of domain motions in large proteins. Proteins. 34(3): 369-82.
Hirakawa H., Muta S. & Kuhara S. (1999) The hydrophobic cores of proteins predicted by wavelet analysis. Bioinformatics. 15(2): 141-8.
Hirano S., Mihara K., Yamazaki Y., Kamikubo H., Imamoto Y. & Kataoka M. (2002) Role of C-terminal region of Staphylococcal nuclease for foldability, stability, and activity. Proteins. 49(2): 255-65.
Hodge T. & Cope M. J. (2000) A myosin family tree. J Cell Sci. 113 Pt 19: 3353-4. Holm L. & Sander C. (1993) Protein structure comparison by alignment of distance matrices.
J Mol Biol. 233(1): 123-38.
192
Holm L. & Sander C. (1994) Parser for protein folding units. Proteins. 19(3): 256-68. Holm L. & Sander C. (1997) Dali/FSSP classification of three-dimensional protein folds.
Nucleic Acids Res. 25(1): 231-4. Holmes K. C. & Geeves M. A. (2000) The structural basis of muscle contraction. Philos
Trans R Soc Lond B Biol Sci. 355(1396): 419-31. Honig B. (1999) Protein folding: from the levinthal paradox to structure prediction. J Mol
Biol. 293(2): 283-93. Houdusse A., Kalabokis V. N., Himmel D., Szent-Gyorgyi A. G. & Cohen C. (1999) Atomic
structure of scallop myosin subfragment S1 complexed with MgADP: a novel conformation of the myosin head. Cell. 97(4): 459-70.
Houdusse A. & Sweeney H. L. (2001) Myosin motors: missing structures and hidden springs. Curr Opin Struct Biol. 11(2): 182-94.
Houdusse A., Szent-Gyorgyi A. G. & Cohen C. (2000) Three conformational states of scallop myosin S1. Proc Natl Acad Sci U S A. 97(21): 11238-43.
Hubbard T. J., Murzin A. G., Brenner S. E. & Chothia C. (1997) SCOP: a structural classification of proteins database. Nucleic Acids Res. 25(1): 236-9.
Humphrey W., Dalke A. & Schulten K. (1996) VMD: visual molecular dynamics. J Mol Graph. 14(1): 33-8, 27-8.
Hunenberger P. H., Mark A. E. & van Gunsteren W. F. (1995) Computational approaches to study protein unfolding: hen egg white lysozyme as a case study. Proteins. 21(3): 196-213.
Idiris A., Alam M. T. & Ikai A. (2000) Spring mechanics of alpha-helical polypeptide. Protein Eng. 13(11): 763-70.
Ikura T., Tsurupa G. P. & Kuwajima K. (1997) Kinetic folding and cis/trans prolyl isomerization of staphylococcal nuclease. A study by stopped-flow absorption, stopped-flow circular dichroism, and molecular dynamics simulations. Biochemistry. 36(21): 6529-38.
Irving M. & Goldman Y. E. (1999) Motor proteins. Another step ahead for myosin. Nature. 398(6727): 463, 465.
Isin B., Doruker P. & Bahar I. (2002) Functional motions of influenza virus hemagglutinin: a structure-based analytical approach. Biophys J. 82(2): 569-81.
Itzhaki L. S., Neira J. L., Ruiz-Sanz J., de Prat Gay G. & Fersht A. R. (1995a) Search for nucleation sites in smaller fragments of chymotrypsin inhibitor 2. J Mol Biol. 254(2): 289-304.
Itzhaki L. S., Otzen D. E. & Fersht A. R. (1995b) The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation-condensation mechanism for protein folding. J Mol Biol. 254(2): 260-88.
Izrailev S., Stepaniants S., Balsera M., Oono Y. & Schulten K. (1997) Molecular dynamics study of unbinding of the avidin-biotin complex. Biophys J. 72(4): 1568-81.
Janin J. & Chothia C. (1985) Domains in proteins: definitions, location, and structural principles. Methods Enzymol. 115: 420-30.
Jarvis R. A. & Patrick E. A. (1973) Clustering using a similarity measure based on shared near neighbours. IEEE Transactions in Computers. C-22: 1025-1034.
Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W. & Klein M. L. (1983) Comparison of Simple Potential Functions for Simulating Liquid Water. Journal of Chemical Physics. 79(2): 926-935.
Karplus M. & Weaver D. L. (1994) Protein folding dynamics: the diffusion-collision model and experimental data. Protein Sci. 3(4): 650-68.
193
Karplus P. A. (1996) Experimentally observed conformation-dependent geometry and hidden strain in proteins. Protein Sci. 5(7): 1406-20.
Kazmirski S. L. & Daggett V. (1998) Simulations of the structural and dynamical properties of denatured proteins: the "molten coil" state of bovine pancreatic trypsin inhibitor. J Mol Biol. 277(2): 487-506.
Kellermayer M. S., Smith S. B., Granzier H. L. & Bustamante C. (1997) Folding-unfolding transitions in single titin molecules characterized with laser tweezers. Science. 276(5315): 1112-6.
Keskin O., Bahar I., Flatow D., Covell D. G. & Jernigan R. L. (2002a) Molecular mechanisms of chaperonin GroEL-GroES function. Biochemistry. 41(2): 491-501.
Keskin O., Durell S. R., Bahar I., Jernigan R. L. & Covell D. G. (2002b) Relating molecular flexibility to function: a case study of tubulin. Biophys J. 83(2): 663-80.
Keskin O., Jernigan R. L. & Bahar I. (2000) Proteins with similar architecture exhibit similar large-scale dynamic behavior. Biophys J. 78(4): 2093-106.
Kitamura K., Tokunaga M., Iwane A. H. & Yanagida T. (1999) A single myosin head moves along an actin filament with regular steps of 5.3 nanometres. Nature. 397(6715): 129-34.
Koehl P. (2001) Protein structure similarities. Curr Opin Struct Biol. 11(3): 348-53. Korn E. D. (2000) Coevolution of head, neck, and tail domains of myosin heavy chains. Proc
Natl Acad Sci U S A. 97(23): 12559-64. Kundu S., Melton J. S., Sorensen D. C. & Phillips G. N., Jr. (2002) Dynamics of proteins in
crystals: comparison of experiment with simple models. Biophys J. 83(2): 723-32. Ladoux B., Quivy J. P., Doyle P. S., Almouzni G. & Viovy J. L. (2001) Direct imaging of
single-molecules: from dynamics of a single DNA chain to the study of complex DNA-protein interactions. Sci Prog. 84(Pt 4): 267-90.
Lavery R. & Lebrun A. (1999) Modelling DNA stretching for physics and biology. Genetica. 106(1-2): 75-84.
Lavery R., Lebrun A., Allemand J.-F., Bensimon D. & Croquette V. (2002) Structure and mechanics of single biomolecules: experiment and simulation. Journal of Physics-Condensed Matter 14: R383-R414.
Lavery R., Parker I. & Kendrick J. (1986a) A general approach to the optimization of the conformation of ring molecules with an application to valinomycin. J Biomol Struct Dyn. 4(3): 443-62.
Lavery R., Sklenar H., Zakrzewska K. & Pullman B. (1986b) The flexibility of the nucleic acids: (II). The calculation of internal energy and applications to mononucleotide repeat DNA. J Biomol Struct Dyn. 3(5): 989-1014.
Lavery R., Zakrzewska K. & Sklenar H. (1995) JUMNA: Junction Minimisation of Nucleic Acids. Computer Physics Communications. 91: 135-158.
Leach A. (2001) Molecular modelling principles and applications. Prentice hall., Lebrun A. & Lavery R. (1996) Modelling extreme stretching of DNA. Nucleic Acids Res.
24(12): 2260-7. Lebrun A. & Lavery R. (1998) Modeling the mechanics of a DNA oligomer. J Biomol Struct
Dyn. 16(3): 593-604. Lebrun A. & Lavery R. (1999) Modeling DNA deformations induced by minor groove
binding proteins. Biopolymers. 49(5): 341-53. Lebrun A., Shakked Z. & Lavery R. (1997) Local DNA stretching mimics the distortion
caused by the TATA box-binding protein. Proc Natl Acad Sci U S A. 94(7): 2993-8. Lesk A. M. (1998) Extraction of geometrically similar substructures: least-squares and
Chebyshev fitting and the difference distance matrix. Proteins. 33(3): 320-8.
194
Lesk A. M. & Chothia C. (1984) Mechanisms of domain closure in proteins. J Mol Biol. 174(1): 175-91.
Levinthal C. (1968) Are there pathways for protein folding ? J. Chem. Phys. 65: 44-45. Levitt M. & Gerstein M. (1998) A unified statistical framework for sequence comparison and
structure comparison. Proc Natl Acad Sci U S A. 95(11): 5913-20. Liphardt J., Onoa B., Smith S. B., Tinoco I. J. & Bustamante C. (2001) Reversible unfolding
of single RNA molecules by mechanical force. Science. 292(5517): 733-7. Lu H. & Schulten K. (2000) The key event in force-induced unfolding of Titin's
immunoglobulin domains. Biophys J. 79(1): 51-65. Marsh R. E. & Donohue J. (1967) Crystal structure studies of amino acids and peptides. Adv
Protein Chem. 22: 235-56. Masugata K., Ikai A. & Okazaki S. (2002) Molecular dynamics study of mechanical extension
of polyalanine by AFM cantilever. Applied Surface Science. 188(3-4): 372-376. Matouschek A. & Bustamante C. (2003) Finding a protein's Achilles heel. Nat Struct Biol.
10(9): 674-676. Mayor U., Guydosh N. R., Johnson C. M., Grossmann J. G., Sato S., Jas G. S., Freund S. M.,
Alonso D. O., Daggett V. & Fersht A. R. (2003) The complete folding pathway of a protein from nanoseconds to microseconds. Nature. 421(6925): 863-7.
Mayor U., Johnson C. M., Daggett V. & Fersht A. R. (2000) Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. Proc Natl Acad Sci U S A. 97(25): 13518-22.
Mendelson R. & Morris E. P. (1997) The structure of the acto-myosin subfragment 1 complex: results of searches using data from electron microscopy and x-ray crystallography. Proc Natl Acad Sci U S A. 94(16): 8533-8.
Meyer E., Cole G., Radhakrishnan R. & Epp O. (1988) Structure of native porcine pancreatic elastase at 1.65 A resolutions. Acta Crystallogr B. 44 ( Pt 1): 26-38.
Milner-White E. J. (1997) The partial charge of the nitrogen atom in peptide bonds. Protein Sci. 6(11): 2477-82.
Murphy K. (2001) Stabilization of protein structure. dans Protein Structure, Stability, and Folding. ed. K. Murphy dans la série, Methods in molecular biology par J. Walker, Humana Press, 168, Totowa.
Myers J. K., Pace C. N. & Scholtz J. M. (1995) Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci. 4(10): 2138-48.
Navizet I., Lavery R. & Jernigan R. L. (2004) Myosin flexibility: Structural domains and collective vibrations. Proteins: Structure, Function and Bioinformatics 54: 384-393.
Nichols W. L., Rose G. D., Ten Eyck L. F. & Zimm B. H. (1995) Rigid domains in proteins: an algorithmic approach to their identification. Proteins. 23(1): 38-48.
Orengo C. A., Pearl F. M. & Thornton J. M. (2003) The CATH domain structure database. Methods Biochem Anal. 44: 249-71.
Paci E. & Karplus M. (1999) Forced unfolding of fibronectin type 3 modules: an analysis by biased molecular dynamics simulations. J Mol Biol. 288(3): 441-59.
Paci E., Smith L. J., Dobson C. M. & Karplus M. (2001) Exploration of partially unfolded states of human alpha-lactalbumin by molecular dynamics simulation. J Mol Biol. 306(2): 329-47.
Pande V. S., Grosberg A., Tanaka T. & Rokhsar D. S. (1998) Pathways for protein folding: is a new view needed? Curr Opin Struct Biol. 8(1): 68-79.
Pauling L. & Corey R. B. (1953) Stable configurations of polypeptide chains. Proc R Soc Lond B Biol Sci. 141(902): 21-33.
195
Pearl F. M., Bennett C. F., Bray J. E., Harrison A. P., Martin N., Shepherd A., Sillitoe I., Thornton J. & Orengo C. A. (2003) The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 31(1): 452-5.
Pearlman D. A., Case D. A., Caldwell J. W., Ross W. S., Cheatham III T. E., DeBolt S., Ferguson D., Seibel G. L. & Kollman P. A. (1995) AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comp. Phys. Commun. 91: 1-41.
Perrett S. & Zhou J. M. (2002) Expanding the pressure technique: insights into protein folding from combined use of pressure and chemical denaturants. Biochim Biophys Acta. 1595(1-2): 210-23.
Phelan P., Gorfe A. A., Jelesarov I., Marti D. N., Warwicker J. & Bosshard H. R. (2002) Salt bridges destabilize a leucine zipper designed for maximized ion pairing between helices. Biochemistry. 41(9): 2998-3008.
Plaxco K. W. & Dobson C. M. (1996) Time-resolved biophysical methods in the study of protein folding. Curr Opin Struct Biol. 6(5): 630-6.
Ptitsyn O. B. (1991) How does protein synthesis give rise to the 3D-structure? FEBS Lett. 285(2): 176-81.
Radford S. E. (2000) Protein folding: progress made and promises ahead. Trends Biochem Sci. 25(12): 611-8.
Ramachandran G. N. & Ramakrishnan C. (1963) Stereochemistry of polypeptide chain configurations. J Mol Biol. 7: 95-99.
Ramachandran G. N. & Sasisekharan V. (1968) Conformation of polypeptides and proteins. Adv Protein Chem. 23: 283-438.
Ramakrishnan C. (2001) In memoriam: Professor G.N. Ramachandran (1922-2001). Protein Sci. 10(8): 1689-91.
Rayment I. (1996) The structural basis of the myosin ATPase activity. J Biol Chem. 271(27): 15850-3.
Rayment I., Holden H. M., Whittaker M., Yohn C. B., Lorenz M., Holmes K. C. & Milligan R. A. (1993a) Structure of the actin-myosin complex and its implications for muscle contraction. Science. 261(5117): 58-65.
Rayment I., Rypniewski W. R., Schmidt-Base K., Smith R., Tomchick D. R., Benning M. M., Winkelmann D. A., Wesenberg G. & Holden H. M. (1993b) Three-dimensional structure of myosin subfragment-1: a molecular motor. Science. 261(5117): 50-8.
Richardson J. S. (1981) The anatomy and taxonomy of protein structure. Adv Protein Chem. 34: 167-339.
Rief M., Gautel M., Oesterhelt F., Fernandez J. M. & Gaub H. E. (1997a) Reversible unfolding of individual titin immunoglobulin domains by AFM. Science. 276(5315): 1109-12.
Rief M., Oesterhelt F., Heymann B. & Gaub H. E. (1997b) Single molecule force spectroscopy on polysaccharides by atomic force microscopy. Science. 275(5304): 1295-7.
Robbins A. H. & Stout C. D. (1989) Structure of activated aconitase: formation of the [4Fe-4S] cluster in the crystal. Proc Natl Acad Sci U S A. 86(10): 3639-43.
Rogen P. & Fain B. (2003) Automatic classification of protein structure by using Gauss integrals. Proc Natl Acad Sci U S A. 100(1): 119-24.
Rohs R., Etchebest C. & Lavery R. (1999) Unraveling proteins: a molecular mechanics study. Biophys J. 76(5): 2760-8.
196
Ryckaert J. P., Ciccotti G. & Berendsen H. J. C. (1977) Numerical Integration of the Cartesian equations of motion of a system with constraints: Molecular dynamics of n-alkanes. J. Comp. Phys. 23: 327-341.
Schliwa M. & Woehlke G. (2003) Molecular motors. Nature. 422(6933): 759-65. Schneider T. R. (2000) Objective comparison of protein structures: error-scaled difference
distance matrices. Acta Crystallogr D Biol Crystallogr. 56 ( Pt 6): 714-21. Siddiqui A. S. & Barton G. J. (1995) Continuous and discontinuous domains: an algorithm for
the automatic generation of reliable protein domain definitions. Protein Sci. 4(5): 872-84.
Siddiqui A. S., Dengler U. & Barton G. J. (2001) 3Dee: a database of protein structural domains. Bioinformatics. 17(2): 200-1.
Smith D. A., Brockwell D. J., Zinober R. C., Blake A. W., Beddard G. S., Olmsted P. D. & Radford S. E. (2003) Unfolding dynamics of proteins under applied force. Philos Transact Ser A Math Phys Eng Sci. 361(1805): 713-28; discussion 728-30.
Smith S. B., Cui Y. & Bustamante C. (1996) Overstretching B-DNA: the elastic response of individual double-stranded and single-stranded DNA molecules. Science. 271(5250): 795-9.
Socci N. D., Onuchic J. N. & Wolynes P. G. (1998) Protein folding mechanisms and the multidimensional folding funnel. Proteins Struct. Funct. Genet. 32(2): 136-58.
Sowdhamini R. & Blundell T. L. (1995) An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins. Protein Sci. 4(3): 506-20.
Spudich J. A. (2001) The myosin swinging cross-bridge model. Nat Rev Mol Cell Biol. 2(5): 387-92.
Sundaralingam M. & Sekharudu Y. C. (1989) Water-inserted alpha-helical segments implicate reverse turns as folding intermediates. Science. 244(4910): 1333-7.
Swindells M. B. (1995) A procedure for the automatic determination of hydrophobic cores in protein structures. Protein Sci. 4(1): 93-102.
Tajkhorshid E., Aksimentiev A., Balabin I., Gao M., Isralewitz B., Phillips J. C., Zhu F. & Schulten K. (2003) Large scale simulation of protein mechanics and function. Adv Protein Chem. 66: 195-247.
Tama F., Gadea F. X., Marques O. & Sanejouand Y. H. (2000) Building-block approach for determining low-frequency normal modes of macromolecules. Proteins. 41(1): 1-7.
Tama F. & Sanejouand Y. H. (2001) Conformational change of proteins arising from normal mode calculations. Protein Eng. 14(1): 1-6.
Taylor W. R. & Orengo C. A. (1989) Protein structure alignment. J Mol Biol. 208(1): 1-22. Thomas A., Hinsen K., Field M. J. & Perahia D. (1999) Tertiary and quaternary
conformational changes in aspartate transcarbamylase: a normal mode study. Proteins. 34(1): 96-112.
Tirion M. M. (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Physical Review Letters. 77(9): 1905-1908.
Tirion M. M. & ben-Avraham D. (1993) Normal mode analysis of G-actin. J Mol Biol. 230(1): 186-95.
Tskhovrebova L., Trinick J., Sleep J. A. & Simmons R. M. (1997) Elasticity and unfolding of single molecules of the giant muscle protein titin. Nature. 387(6630): 308-12.
Tsui V. & Case D. A. (2000) Theory and applications of the generalized Born solvation model in macromolecular Simulations. Biopolymers. 56(4): 275-291.
197
Uyeda T. Q., Abramson P. D. & Spudich J. A. (1996) The neck region of the myosin motor domain acts as a lever arm to generate movement. Proc Natl Acad Sci U S A. 93(9): 4459-64.
van Meerssche M. & Feneau-Dupont J. (1984) Introduction à la cristallographie et à la chimie structurale. Peeters, Paris.
Verlet L. (1967) Computer experiments on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys. Rev. 159: 98-103.
Vinayagam A., Shi J., Pugalenthi G., Meenakshi B., Blundell T. L. & Sowdhamini R. (2003) DDBASE2.0: updated domain database with improved identification of structural domains. Bioinformatics. 19(14): 1760-4.
Volkmann N. & Hanein D. (2000) Actomyosin: law and order in motility. Curr Opin Cell Biol. 12(1): 26-34.
Wang J., Cieplak P. & Kollman P. A. (2000) How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J. Comput. Chem. 21(12): 1049-1074.
Wang J., Truckses D. M., Abildgaard F., Dzakula Z., Zolnai Z. & Markley J. L. (1997) Solution structures of staphylococcal nuclease from multidimensional, multinuclear NMR: nuclease-H124L and its ternary complex with Ca2+ and thymidine-3',5'-bisphosphate. J Biomol NMR. 10(2): 143-64.
Ward J. H. (1963) Hierarchical grouping to optimise an objective function. American Statistical Association Journal. 236-244.
Washizu M. (1990) Manipulation of DNA in Microfabricated Structures. IEEE Transactions on Industry Applications. 26: 1165-1172.
Wernisch L., Hunting M. & Wodak S. J. (1999) Identification of structural domains in proteins by a graph heuristic. Proteins. 35(3): 338-52.
Wetlaufer D. B. (1973) Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci U S A. 70(3): 697-701.
Williams P. M., Fowler S. B., Best R. B., Toca-Herrera J. L., Scott K. A., Steward A. & Clarke J. (2003) Hidden complexity in the mechanical properties of titin. Nature. 422(6930): 446-9.
Wolynes P. G., Onuchic J. N. & Thirumalai D. (1995) Navigating the folding routes. Science. 267(5204): 1619-20.
Wriggers W. & Schulten K. (1997) Protein domain movements: detection of rigid domains and visualization of hinges in comparisons of atomic coordinates. Proteins. 29(1): 1-14.
Xia B., Tsui V., Case D. A., Dyson H. J. & Wright P. E. (2002) Comparison of protein solution structures refined by molecular dynamics simulation in vacuum, with a generalized Born model, and with explicit water. J Biomol NMR. 22(4): 317-31.
Xiao M., Reifenberger J. G., Wells A. L., Baldacchino C., Chen L. Q., Ge P., Sweeney H. L. & Selvin P. R. (2003) An actin-dependent conformational change in myosin. Nat Struct Biol. 10(5): 402-8.
Xu C., Tobi D. & Bahar I. (2003) Allosteric changes in protein structure computed by a simple mechanical model: hemoglobin T<-->R2 transition. J Mol Biol. 333(1): 153-68.
Xu Y., Xu D. & Gabow H. N. (2000) Protein domain decomposition using a graph-theoretic approach. Bioinformatics. 16(12): 1091-104.
Yanagida T., Esaki S., Iwane A. H., Inoue Y., Ishijima A., Kitamura K., Tanaka H. & Tokunaga M. (2000a) Single-motor mechanics and models of the myosin motor. Philos Trans R Soc Lond B Biol Sci. 355(1396): 441-7.
198
Yanagida T., Kitamura K., Tanaka H., Hikikoshi Iwane A. & Esaki S. (2000b) Single molecule analysis of the actomyosin motor. Curr Opin Cell Biol. 12(1): 20-5.
Yanagida T. & Iwane A. H. (2000c) A large step for myosin. Proc Natl Acad Sci U S A. 97(17): 9357-9.
Yang J., Dokurno P., Tonks N. K. & Barford D. (2001) Crystal structure of the M-fragment of alpha-catenin: implications for modulation of cell adhesion. Embo J. 20(14): 3645-56.
199
ANNEXE 1 : Contraintes mécaniques
I Contraintes globales Nous avons programmé d’autres contraintes que celle sur la RMS de distance décrite dans le
chapitre XIII.2 page 93. Nous ne présentons dans cette annexe que les équations qui ont été
utilisées dans les programmes . LIGAND et GNMlig
Rayon de giration Le rayon de giration est défini de la manière suivante :
2,
2
d
jiij
N
drg
∑=
La somme se fait sur les Nd couples (i,j) où i et j sont les carbones α de la protéine.
L'énergie de contrainte associée est 2*)( rgrgkEpen −×= avec rg* la valeur du rayon de
giration que l’on veut atteindre.
La force exercée le long d’une coordonnée xi du carbone Cα,j due à la contrainte est l’opposé
de la dérivée de cette énergie par rapport à cette coordonnée et est donnée par la formule
suivante :
∑≠
−××
−××−=
N
jiij
dj xx
rgN
rgrgkx )(
)(2)F( 2
*
200
RMS angulaire Au lieu de contraindre les distances, on peut contraindre les angles de torsion. Ce genre de
contrainte est logique pour une description en coordonnées internes des structures comme
celle utilisée dans LIGAND.
On définit la RMS de torsion de la manière suivante :
N
frms
N
ii∑
== 1)(τ
avec N le nombre de torsions et la fonction f égale au carré de la différence (comprise entre
-180° et 180°) entre l’angle de torsion τi et sa valeur τi0 dans la structure de référence.
Travailler sur les angles exige de faire attention à deux choses. D’une part, les angles sont
définis modulo 360°, les différences d’angles doivent donc toutes êtres comprises dans
l’intervalle [-180°,180°] (Nous avons choisi de prendre les valeurs angulaires entre -180° et
180°.). D’autre part, les énergies de contrainte doivent être continues et de dérivées continues,
notamment lorsque les valeurs des angles passent de –180° à 180°. La fonction f est donc
définie par morceaux de manière à ce qu’elle soit continue et que sa dérivée soit continue :
On prend pour d(τi) la valeur de l’angle différence τi- τi0 comprise entre –180° et 180°.
lim180limd
da −−= , lim180
lim360ddb −
×= , lim180lim180 2
dd
c −×−
= avec dlim, une valeur d’angle limite proche de
180° (par exemple 179°) (au delà de laquelle f n’est plus égale au carré de la différence
d’angle).
L'énergie de contrainte associée est 2*)( rmsrmskEp −×= avec rms* la valeur de la fonction
RMS angulaire souhaitée.
f(τi)=d(τi)2 si d(τi)∈[-dlim,dlim]
f(τi)=a d(τi)2 + b d(τi ) + c si d(τi)∈[dlim,180]
f(τi)= a d(τi)2 - b d(τi) + c si d(τi)∈[-180,-dlim]
201
L’opposé de la dérivée de cette énergie par rapport à un angle τj donne la force exercée sur cet
angle due à la contrainte en torsion et est donnée par la formule suivante :
)(')(
)F(*
jj frmsNrmsrmsk
ττ ××−×−
=
On passe à la force sur chaque atome grâce à un sous-programme (subroutine deltor) inclus
dans LIGAND.
Nous avons également programmé une autre contrainte de torsion ne tenant compte que des
carbones α. On définit alors des angles de torsion entre carbones α comme les angles formés
par les deux plans ABC et BCD des carbones α des résidus consécutifs A, B, C et D. τi est
alors l’angle de torsion entre quatre carbones α. Cette contrainte permet de faire des
comparaisons entre les programmes LIGAND et GNMlig.
II Contraintes locales
Contrainte locale « tirer-pousser » par rapport au centre de
masse On oblige le carbone Cα,i d’un résidu i particulier à se déplacer sur la droite le reliant au
centre de masse de tous les carbones α de la protéine.
L’énergie de contrainte appliquée s’exprime alors suivant l’équation :
2)( *ii rrkpE −×=
où ri est la distance entre le centre de masse et Cα,i et ri* est la valeur de cette distance que l’on
veut atteindre.
La force dérivant de ce potentiel s’écrit pour la composante xj du carbone Cα,j de la façon
suivante :
202
)()(2)F(*
cmii
iij xxrN
rrkx −××−××= si j≠i
et pour Cα,i :
)()1()(2)F(*
cmii
iii xxNrN
rrkx −×−××−××=
N est le nombre de résidus de la protéine et xcm est la composante cartésienne du centre de
masse.
Contrainte locale « tirer-pousser » le long des axes principaux
de la protéine Définition des axes principaux
Les axes principaux sont les directions dont les vecteurs directeurs sont les vecteurs propres
du tenseur central d’inertie IG :
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡
−−−−−−
=zzyzxz
yzyyxy
xzxyxx
G
IIIIIIIII
I avec ∑ +=i
iiixx zymI )( 22 et ∑=i
iiixy yxmI
Les sommes sont calculées sur les atomes pris en compte (ici les Cα) de masse mi (mi peut
être mis en facteur dans le cas d’atomes identiques) et de coordonnées (xi, yi, zi) dans un
repère cartésien de centre G, le centre de masse de ces atomes.
On peut donc définir pour n’importe quel ensemble de points trois axes principaux passant par
le centre de masse et de vecteurs directeurs définis comme ci-dessus.
Contrainte
La variable contrainte est la longueur de la projection du vecteur reliant le centre de masse au
carbone α du résidu i sur l’axe principal j. Pour chaque résidu, on peut imposer des
contraintes le long de trois directions privilégiées de la protéine.
Les notations sont explicités dans le schéma suivant (figure 67) :
203
figure 67 : Définition de la contrainte suivant les axes principaux : u1, u2 et u3 sont les trois
axes principaux, CM est le centre de masse, Cα,i est le carbone α du résidu i et l est la
longueur de la projection du vecteur i,C-CM α sur l’axe principal 1.
L’énergie de pénalité pour une contrainte suivant l’axe principal j est donnée par la formule
suivante : 2)( *ii llkpE −×= avec li la longueur de la projection du vecteur i,C-CM α sur l’axe
principal j étudié et li* la longueur imposée.
j
iji u
rul r
rr .=
La dérivée de l’énergie de pénalité doit donc prendre en compte la dérivée du vecteur unitaire
de l’axe j ainsi que celle de la position du centre de masse.
Contrainte locale « tirer-pousser » le long des axes structuraux Définition des axes structuraux
On définit les axes dits structuraux pour chacun des résidus (figure 68). L'axe 1 du résidu i est
l'axe passant par le centre de masse de vecteur directeur le vecteur normé reliant le carbone α
du résidu (i-1) et le carbone α du résidu (i+1). Le vecteur directeur de l'axe 2 est le vecteur
l
Cα,i
CM
u1
u2
u3
irr
204
perpendiculaire à celui de l'axe 1 dans le plan des carbones (Cαi, Cαi+1, Cαi-1). Le troisième axe
possède comme vecteur directeur le produit vectoriel des deux premiers de façon à former une
base orthonormée. D’après cette définition, les axes structuraux des résidus qui sont aux
extrémités de la chaîne ne sont pas définis car il n’ont pas deux voisins. On ne peut donc pas
appliquer cette contrainte sur les deux extrémités de la protéine.
figure 68 : Définition des axes structuraux u1, u2 et u3 associés au Cα,i. r est la projection du
vecteur reliant le centre de masse des Cα (CM) au carbone Cαi sur le vecteur structural u2.
Contrainte
La norme r de la projection du vecteur reliant le centre de masse des carbones α et le centre
du carbone α du résidu i étudié sur un des vecteurs structuraux de i (ou sur une combinaison
des trois vecteurs structuraux de i) est contrainte à une valeur r*.
L’énergie correspondante est donnée par l’équation : 2)( *rrkEpen −×=
A chaque cycle de la minimisation, les vecteurs structuraux et la norme de la projection du
vecteur i,C-CM α sont recalculés. La dérivée par rapport à une coordonnée d’un carbone α est
déterminée en calculant les dérivés de la position du centre de masse et des vecteurs
structuraux.
i
i+1
i-1
u1
u2
u3 irr
r
CM
205
ANNEXE 2 : Important Fluctuation Dynamics of Large
Protein Structures are Preserved upon Coarse-Grained
Renormalization
Introduction
L’article ci-dessous présente deux alternatives au programme ANM.
La première est basée sur le regroupement des carbones α consécutifs afin d’avoir une
représentation granulaire plus grossière que dans le programme ANM classique.
Contrairement à l’approche ANM classique dont chaque nœud du réseau représente un
carbone α, chaque nœud représente alors le groupement de n carbones α consécutifs en
« segment ». Le seuil de coupure pour déterminer l’emplacement des ressorts doit être plus
grand que le rayon de giration de chaque segment. Par analogie avec ANM, le seuil rc doit
être pris égal à deux fois le rayon moyen de giration d’un segment plus une distance de
contacte invariante R0 typiquement prise égale à 13 Å. En étudiant trois très grosses protéines
(la β-galactosidase, la xanthine hydrogénase et l’hémagglutinine), nous montrons que le
comportement du rayon de giration moyen en fonction de n est similaire jusqu’à des segments
de 40 résidus et que le rayon de giration des segments dans les protéines globulaires est
nettement plus petit que celui d’un modèle de polypeptide de n résidus.
La comparaison des courbes donnant les facteurs de température obtenus avec des nœuds tous
les résidus ou tous les 10 résidus le long de la chaîne montre que plus le modèle est simplifié,
plus la courbe est lissée mais que l’allure de la courbe reste similaire. Il en est de même pour
les modes normaux de plus grande amplitude qui sont retrouvés avec des segment de 10
résidus.
206
L’autre approche est une approche fondée sur le regroupement des carbones α en domaines
structuraux déterminés par la comparaison de deux structures comme présentée dans le
premier article page 135. Un premier résultat sur la myosine est présenté.
Alors que la plupart des études des protéines se focalisent sur les sites fonctionnels en
ignorant le reste de la protéine, nous montrons que les mouvements fonctionnels impliquent
l’ensemble de la structure protéique et qu’il n’y a pas besoin d’avoir des données
cristallographiques de très haute résolution pour obtenir les mouvements globaux les plus
importants.
Important Fluctuation Dynamics ofLarge Protein Structures Are Preservedupon Coarse-Grained Renormalization∗
PEMRA DORUKER,1,2 ROBERT L. JERNIGAN,2 ISABELLE NAVIZET,2,3
RIGOBERTO HERNANDEZ4
1Chemical Engineering Department and Polymer Research Center, Bogazici University,Bebek 80815, Istanbul, Turkey2Molecular Structure Section, Laboratory of Experimental and Computational Biology,Center for Cancer Research, National Cancer Institute, National Institutes of Health,Bethesda, Maryland 20892-56773Institut de Biologie Physico-Chimique, 75005 Paris, France4Center for Computational Molecular Science and Technology, School of Chemistryand Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332-0400
Received 2 October 2001; revised 14 January 2002; accepted 25 January 2002
R ecently we and others have developed a me-chanics approach for studying the motions of
proteins [1 – 14] to obtain the equilibrium fluctua-tions near an initial structure. The initial structurehas usually been determined by crystallography,but other experimental methods, or even modeledstructures, could be utilized instead. The underlyingassumption in the method is that the starting struc-ture is the minimum energy structure in a local—ifnot global—minimum. All fluctuations about thisform are presumed to be higher in energy, propor-tional to their mean-square displacements, i.e., theenergy form is Gaussian. Within the structure, allclose-lying residues (as defined by a cutoff radius)are restrained by an effective spring with a uni-versal force constant and are said to be in contact.Residues nearest in sequence are not distinguishedbecause they necessarily fall within the cutoff ra-dius. The close-lying residue pairs are utilized toform a contact matrix that makes explicit referenceto these restraining springs. Because of the simpleGaussian form of the energy, the dynamics can beintegrated directly to obtain the mean-square fluctu-ations of positions, as well as the correlations of thedisplacements of residue pairs. The required com-putation is simply the inversion of the contact ma-trix. This method was initially developed to obtainscalar displacements, but it was readily apparentthat the directions of displacement are also impor-tant. Recently a three-dimensional version [11] ofthis approach was developed, and it yields the cor-relations in the directions of the displacements, withthe attendant computational cost from tripling eachdimension of the contact matrix.
When structures are coarse-grained at the levelof one point per residue, excellent agreement of thisapproach with experiments has been demonstratedfor several proteins with respect to the crystallo-graphic temperature factors [3, 4, 6, 8, 10, 13], aswell as with nuclear magnetic resonance (NMR) or-der parameters [5] and hydrogen exchange data [1].The computed results reveal that the most impor-tant motions are those typically involving largedomains such as hinge motions. In addition manyother large-scale motions are typically observed,e.g., rotation, stretching, shear, disintegration, andflap motions. Individual residue displacements are
observed primarily as components of the motionsof these subdomains. Moreover, the relative con-tributions of the modes involving the largest-scalemotions to the observables are significantly largerthan that of those modes at the other end of thespectrum, which involve only extremely local mo-tions.
Interestingly, relatively few short-range contactsgive rise to the large displacements of other residuesby acting as the foci of the motions, such as thehinge foci. These largest-scale motions primarily re-flect the shape of the protein rather than detailsof its internal structure. Some examples we haveobserved are: thin regions of structure that act ashinge sites, large interior cavities that undergo com-pression, and small numbers of contacts at subunitinterfaces that support interfacial motions such aswobble and counterrotation of two subunits. Sincethese small numbers of residues involved in themost important motions do not involve the inter-nal structure of the peptide chain, it suggests thatcoarse graining of the protein structures may readilybe performed. We have recently applied this coarsegraining, by retaining only 1 of every 40 residues, tohaemagglutinin [12], where we have shown that it ispossible to reproduce about 73% of the total proteinmotions. This initial coarse-grained application hasraised many issues regarding this procedure. Whatis the optimal way to perform the coarse graining?In the model, there are only two adjustable parame-ters, a spring constant and a cutoff distance. Howshould these be modified or scaled for the coarse-graining renormalization? It is also important tounderstand how the distance cutoff, determiningthe spring contacts, scales with the coarse graining,as well as how the spring constant itself ought to bescaled. This work represents a first attempt at an-swering these questions.
PROTEINS
We have chosen three large proteins to considerin this study, namely β-galactosidase [15] (GAL),xanthine dehyrogenase [16] (XDH), and hemag-glutinin [17, 18] (HA), with corresponding pdbfile names 1DPO, 1FO4, and 2HMG. The numberof residues and number of atoms in the crystalstructures in each monomer are, respectively, 1011,8125; 1299, 10077; and 503, 3957. See Figure 1 forviews of these structures. The structural and func-tional details of these proteins are summarized be-low, although in this study we will not discuss
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 823
208
DORUKER ET AL.
FIGURE 1. Ribbon diagrams of β-galactosidase (right), xanthine dehydrogenase (middle), and influenzavirus hemagglutinin (left).
the structure–function relationships of these pro-teins.
The X-ray structure of Escherichia coli β-galac-tosidase determined by Juers and co-workers [15]at 1.7 Å resolution is shown in the left part ofFigure 1. This enzyme hydrolyzes lactose and otherβ-galactosides into monosaccharides. The func-tional form is a tetramer having 4 identical subunits,with each monomer comprising 1023 residues. Thesubunits are assembled into a prolate ellipsoidalstructure with approximate dimensions of 175 Å ×135 Å × 90 Å.
The crystal structure of the dimeric bovine milkxanthine dehyrogenase, displayed in the middlepart of Figure 1, has been determined to 2.1 Å res-olution [16]. The enzyme catalyzes the hydroxyladdition of hypoxanthine and xanthine, whichare the two last steps in the formation of urate.Each monomer has 1332 residues conformed intoa butterfly-shaped dimeric enzyme with overall di-mensions of approximately 155 Å × 90 Å × 70 Å.
The influenza virus hemagglutinin is an inte-gral membrane glycoprotein, which is involved inthe binding of virus to target cells and in the fu-sion of viral and endosomal membranes at low pH.
The X-ray structure of the neutral pH form of HAhas been determined [17] and refined [18] by Wi-ley and co-workers to a resolution of 3 Å and isshown in the right part of Figure 1. HA, comprising1509 residues, is a cylindrically shaped homo-trimerabout 135 Å long, varying between 35 and 70 Åin the radial directions. Each monomer consists of2 polypeptides chains: HA1 (328 residues) and HA2(175 residues) that are linked by 2 disulfide bonds.The 3 monomers are assembled into a central coiledcoil that forms the stemlike domain, and the 3 glob-ular heads containing the receptor binding sites.Each globular head folds into a jelly-roll motif of8 antiparallel β-strands.
Methods
The coarse graining of structure involves replac-ing groups of individual points with single points toyield a less detailed structure. This operation resem-bles the development of an equivalent chain modelfor polymers, where multiple repeat units of a poly-mer are coarse-grained into a single unit so as toimitate the behavior of one link of a model chain.
824 VOL. 90, NO. 2
209
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
For example, several real bonds of polyethylene,because of their additive flexibility, are equivalentto the enhanced flexibility of a single link in thefreely jointed chain model [19]. Such equivalent rep-resentations have often been utilized in polymerstudies [19]. Applying this concept to the singlefixed configurations of segments of a protein is notquite the same physical situation as in a polymericrandom coil, since the conformations of the individ-ual segments vary from one to another and cannotuniformly benefit from averaging over conforma-tions, as is the case with polymer models. This iswhy it is important to see how variable these seg-ments’ conformations actually are. In what follows,we first outline the anisotropic network model de-veloped earlier to capture the essential dynamicsabout the initial (equilibrium) structure and subse-quently analyze the degree to which it is invariantto various coarse-graining strategies.
ANISOTROPIC NETWORK MODEL (ANM)
This is a model for protein motions developed asa three-dimensional extension of the Gaussian net-work model (GNM). It incorporates the anisotropyof fluctuations and yields the directions of eachmode of motion; whereas the GNM assumes all fluc-tuations to be isotropic and gives only the magni-tudes of the modes of motion. The potential energyof a structure having N interaction sites is expressedwith ANM as a Gaussian form:
V = γ
2RTH R, (1)
where R is a 3N-dimensional vector of the fluc-tuations Ri in the position vectors Ri of all sites(1 ≤ i ≤ N), RT being its transpose, and H theHessian matrix composed based upon the secondderivatives of the potential:
V = γ
2
∑i
∑j
h(rc − Rij)(Rj − Ri)2. (2)
The summations will be performed over all in-teraction sites, h(x) is the Heaviside step function[h(x) = 1 if x ≥ 0, and zero otherwise], Rij is thedistance between sites i and j, and rc is the cutoffdistance defining the interactions; H is expressed asa function of N2 submatrices Hij in the form
Hij =
∂2V/∂Xi∂Xj ∂2V/∂Xi∂Yj ∂2V/∂Xi∂Zj
∂2V/∂Yi∂Xj ∂2V/∂Yi∂Yj ∂2V/∂Yi∂Zj
∂2V/∂Zi∂Xj ∂2V/∂Zi∂Yj ∂2V/∂Zi∂Zj
,
(3)
with Xi, Yi, and Zi being the components of Ri.Note that ∂2V/∂Xi∂Yj = −∂2V/∂Xj∂Yi = −γ (Xj −Xi)(Yj − Yi)/R2
ij for i = j, and ∂2V/∂Xi∂Yi =γ
∑j(Xj − Xi)(Yj − Yi)/R2
ij.In general the correlations between the fluctua-
tions at sites i and j are given by
〈Ri · Rj〉= 1
Z
∫(Ri · Rj) exp−V/kT dR
= 3kBTγ
tr[H−1]
ij , (4)
where k is the Boltzmann constant, Z is the con-figurational partition function, and tr [H−1]ij is thetrace of the ijth submatrix [H−1]ij of H−1; 〈Ri ·Rj〉can be expressed as a sum over the contributions[Ri ·Rj]k of the 3N − 6 individual internal fluctu-ation modes, as 〈Ri · Rj〉 = ∑
k[Ri · Rj]k. Thecontribution of the kth mode is explicitly given by
[Ri ·Rj]k = 3kTγ
tr[λ−1
k ukuTk
]ij, (5)
where λk is the kth nonzero eigenvalue of H and uk isthe corresponding eigenvector. The eigenvalues arerelated to the frequencies of individual modes, andthe eigenvectors describe its effect on the positionsof the N points of the structure. The eigenvaluesare usually organized in ascending order (after re-moving the six zero eigenvalues), so that λ1 denotesthe lowest frequency, also called the global, modeof motion, and [Ri · Rj]1 is the correlation forthis mode of motion separately. Actually here weuse only the individual residue mean-square (ms)fluctuations for the position at site i for mode k,[(Ri)2]k. Note that zero values can arise eitherfrom being uncorrelated or being perpendicular.The slowest modes usually dominate the collectivedynamics of the structure and would be the onlysurviving modes at long times, thus they are partic-ularly relevant to biological function, unless othereffects such as anharmonicity interfere.
COARSE GRAINING OF THE ANM
Here we take N to be the number of residues inthe total structure (protein), s the number of coarse-grained segments, and n the number of residues inone coarse-grained segment, so that
N = sn. (6)
The cutoff distance rc defining interactions (springs)needs to be sufficiently large to include the s
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 825
210
DORUKER ET AL.
FIGURE 2. (a) Radius of gyration of chain segments inthe folded proteins GAL, XDH, and HA. (b) comparisonof the radius of gyration of chain segments in randomcoil polypeptides and folded proteins, where valuesgiven on the lower curve are average values for the threeproteins, with the bars showing the standard deviations.
residues in each of the n segments. For this purposewe compute RG the radius of gyration for each ofthe segments in the three proteins. See Figure 2(a)for segments up to 140 residues in length. Becauseof the finite size of the proteins, the values con-
verge to a clear limit. This behavior is reminiscentof the behavior of flexible polymer chains of dif-ferent lengths. Despite the heterogeneity in eachof the segments (or links), the three proteins be-have similarly up to the coarse-graining level of40 residues.
RADIUS OF GYRATION OFFOLDED CHAIN SEGMENTS
A point of comparison for the RG values of theprotein segment size is found in the RG values of therandom coil model for homopolymers consistingof N peptide units [20, 21]. The average dimension,expressed as the characteristic ratio, from an av-erage of several experiments, for several differentpolypeptides having β carbons, is
⟨r2⟩/NL2 = 9, (7)
where r is the end-to-end distance, and L is the vir-tual bond length. For a long Gaussian chain, theradius of gyration is related to the mean square ofthe end-to-end distance by
⟨R2
G
⟩ = 16
⟨r2⟩. (8)
Thus
RG/√
NL2 = 1.225, (9)
where, as before, N is the number of residues and Lis the virtual bond length.
In Figure 2(b), the random coil limit for RG ap-pears as the smooth upper curve. As might be ex-pected, all of the protein segments are more compactthan the random coil peptide. The bars show therange of individual values for segments of differentsizes, all of which are significantly more compactthan the random polypeptide case.
It would be interesting to learn the origin of thevariations in the RG values for a fixed size segment.Are the locally compact segments determined bytheir own sequences or by more global considera-tions? Do the segments with the lowest RG valuesinclude glycines, which could facilitate turns, or dothey have more hydrophobic residues on average,which could contribute to collapsed forms? Or arethere other composition effects?
In order to further coarse-grain folded proteins,it is helpful to know how the overall dimensionsof the chain segments in folded proteins change asa function of segment length. This will indicate howthe cutoff radius in the ANM calculations should be
826 VOL. 90, NO. 2
211
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
adjusted for further coarse graining along the back-bone of the protein.
For the three proteins that are considered in thisstudy, we calculate the mean-square radius of gy-ration, 〈R2
G〉, for segments of various lengths. Thiscalculation is carried out separately for the 6, 2,and 4 chains that make up HA, XDH, and GAL,respectively. And the average is calculated by mov-ing the starting point of each segment along thechain backbone one by one toward the end of thechain. Therefore, for a single chain composed of Nc
residues, the radius of gyration is averaged over(Nc − n + 1) frames for a segment of length n.
In Figure 2(a), the radius of gyration, RG, is plot-ted as a function of segment length for the threeproteins. The behavior is similar up to n = 40,presumably reflecting the average behavior of pep-tides. For n > 40, differences begin to be manifestedwhich occur because of the differences in the overallsizes and shapes of proteins.
For n < 40, the data can be fit with the form
RG = anb. (10)
These parameter values are found to be a = 1.778and b = 0.595 from a fit to the average over the threelog–log plots of RG vs. n for HA, XDH, and GAL.The n = 1 limit of Eq. (10) corresponds to a sin-gle monomer whose radius of gyration must be a,suggesting that the average bond length is approx-imately equal to 2a (= 3.556 Å), which is in closeconsistency with the virtual bond length betweensequential α-carbon atoms of 3.8 Å.
In Figure 2(b), the lower curve gives the radius ofgyration averaged over all segments of a given sizein the three folded proteins (HA, XDH, and GAL),and the error bars are shown for some representa-tive values of n. Here, the standard deviation fora specific value of n has been calculated over theframes of all possible segments in the three pro-teins. The dashed curve in the same figure givesthe RG of unfolded segments of length n, as pre-dicted by the model for polyalanine developed byFlory [21].
In earlier work, a cutoff radius of 13 Å wasfound to be suitable for ANM calculations, in whichall α-carbon atoms in the protein structure wereretained [11]. In the current study, as we furthercoarse-grain the structures, we recognize that therenormalized sites are interacting at longer rangesbecause their effective sizes have grown. The cutoff
a Cutoff radius is calculated according to rc = 2RG + 13 Å,where RG is found from Eq. (10).
radius should thus equal the sum of the renormal-ized radii of each site plus the invariant contactdistance R0 between the sites, i.e.,
rc = 2RG + R0, (11)
where RG is obtained according to Eq. (10) withthe parameters found above. To be consistent withour earlier work, R0 should be set to a valueof (13 Å − 2a), but for simplicity, in what followswe have used the value of 13 Å instead. Thischoice leads to little change in the results since theyare only modestly dependent on R0, while beingstrongly dependent on the growth of RG with N. Re-sults for the three illustrative proteins of this studyare shown in Table I.
Results and Discussion
X-RAY CRYSTALLOGRAPHICTEMPERATURE FACTORS
The relationship between an individual residue’sfluctuations and its temperature factor is
Bi = (8π2/3
)⟨R2
i
⟩. (12)
In Figure 3, these experimental temperature factorsmeasured by X-ray crystallography (solid curves)are compared to those predicted by the ANM(dashed curves). For each of the three proteins,each monomer exhibits practically the same behav-ior both in experiment and calculation. Therefore,the fluctuations of residues are presented as av-erages over all monomers. The overall agreementis excellent as has often been observed with thismodel.
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 827
212
DORUKER ET AL.
FIGURE 3. Comparison of temperature factors from X-ray crystallography and those calculated with ANM calculationsfor (a) β-galactosidase, (b) xanthine dehydrogenase, and (c) hemagglutinin.
828 VOL. 90, NO. 2
213
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
TABLE IIForce constants γ for coarse-grainedANM calculations.
Once the cutoff radius for the interactions isfixed, the force constant γ is the only remainingparameter in the calculations. In turn its value isfixed by requiring a match between the average val-ues of the mean-square fluctuations predicted byANM and the experimental B factors. In Figure 3,such adjustments were made in order to comparethe experimental and theoretical results. The exper-imental B factor, Bn of a coarse-grained segmentcomposed of n residues is calculated as the averageof the B factors of its n constituent residues. Andthe force constant is extracted by a comparison ofthe coarse-grained B factors with the mean-squarefluctuations calculated with ANM. Table II givesthe force constant values. As our previous experi-ence with a large number of proteins has indicated,γ varies among proteins by no more than a factorof 2. However, as the coarse graining is applied,the force constants become stronger monotonically,upon passing from the scaling at n = 2 to n = 30.
Parenthetically, it should be noted that in thecase of β-galactosidase [Fig. 3(a)], only an N/2 cal-culation was carried out instead of an all-residuecalculation because of the large size of this pro-tein (4044 residues in total). Although an n = 1calculation is feasible, this has not been executedhere. And the experimental B factors, for com-parison, were averaged over neighboring pairs ofresidues.
COMPARISON OF ANM RESULTS AT DIFFERENTLEVELS OF COARSE GRAINING
B Factors
Figure 4(a) compares the temperature factorsfrom coarse-grained calculations N/2 and N/10
for GAL. Higher levels of coarse graining lead tosmoother curves, but the basic structure of the peaksis readily apparent at the level of N/10 calculations.Figure 4(b) shows the calculated B factors at thesame N/10 level for xanthine dehydrogenase. Fromthese results it is clear that the essential structure offluctuations is retained after the coarse graining.
First Mode
The slowest mode shapes obtained with N/2and N/10 calculations are displayed in Figure 5(a)for GAL. There is a remarkable match between thecurves, which have been normalized to match thescales. Figure 5(b) shows a comparison of the Nand N/10 calculations for hemagglutinin. Clearly,the general features of the first mode shape areobtained. As a result of these comparisons, it is evi-dent that the functionally important collective modeshapes can still be reproduced quite well at higherlevels of coarse-graining.
Eigenvalues
Figure 6 compares the weighted contributionof each mode to the mean-square fluctuations atthe different levels of coarse graining employedfor GAL, XDH, and HA. The modes are sortedand indexed starting from the slowest mode havingthe largest contribution and running up to higherfrequencies. In order to capture the same collec-tive modes at higher levels of coarse graining, thefractional contributions at the low-frequency endof the spectrum need to be similar. And thisis ex-actly what we observe in these logarithmic plots.In Table III, the cumulative contributions of thefirst three modes are listed. As the level of coarsegraining increases, the cumulative contribution ofslowest modes increases because there are fewermodes at the high-frequency end of the distribu-tion. Yet the fractional contributions of the collectivemodes appear to be comparable after renormaliza-tion.
Mechanisms of Motion
In Figure 7the two extreme positions for thefirst two slowest modes of β-galactosidase areshown at two different levels of coarse grain-ing, N/2 and N/10. It is amply clear from thesefigures that the same motions occur, despite thecoarse graining. The first mode is for bendingat the “waist” of the protein, and the second isa stretching–compression type of motion that we
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 829
214
DORUKER ET AL.
FIGURE 4. Comparison of temperature factors predicted by ANM at different levels of coarse graining for(a) β-galactosidase and (b) xanthine dehydrogenase.
have often observed in asymmetric elongated pro-tein structures.
The correlations computed between the motionswith the coarser-grained models and with the singleresidue–single point results are high. For hemagglu-tinin (see Table IV) it can be seen that, whereas thetotal motions are not so well represented (at the 49%level for the 1 out of every 40 models), the represen-tations of the first, slowest mode remain above 90%for even the 1 out of every 40-residue model. Thus
the coarse-grained results are most viable for mo-tions having the largest displacements.
Structure-Based Coarse Graining
Finally we consider a completely structure-basedapproach, which requires multiple structures tospecify which parts of the structure are to be coarse-grained. The parts of the two structures having thesmallest differences are identified directly to deter-
830 VOL. 90, NO. 2
215
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
FIGURE 5. Slowest mode shapes predicted by ANM at different levels of coarse graining for (a) β-galactosidaseand (b) hemagglutinin.
mine the blocks to be coarse-grained. Then, withinthese most constant blocks, the spring constants areincreased to prevent intrablock motions. Anotherway of implementing this approach would be totreat these fixed blocks as “fat” rigid elements in-cluding many more than usual contacts with theother individual residues. This approach is appliedhere for demonstration purposes to two structuresof myosin (pdb names 1B7T [22] and 1DFL [23]).
The blocks defined by this approach are shownin Figure 8 within which the changes in distanceshave been limited to a maximum of 0.1 Å. The in-variant regions are identified in different colors inFigure 8, with the few remaining residues not in-cluded within the rigid blocks are shown in gray.Importantly this approach yields nearly identicalcomputed temperature factors, to those computedwith the individual one point per residue model (see
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 831
216
DORUKER ET AL.
FIG
UR
E6.
Con
trib
utio
nsof
the
mod
esat
diffe
rent
leve
lsof
coar
segr
aini
ngfo
r(a
)β-g
alac
tosi
dase
,(b)
xant
hine
dehy
drog
enas
e,an
d(c
)he
mag
glut
inin
.A
llpl
ots
log–
log
plot
sto
emph
asiz
eth
aton
lyth
elo
wes
tind
exed
mod
esar
esi
gnifi
cant
cont
ribut
ors
toth
eov
eral
lmot
ions
.Als
ono
tabl
eis
the
exte
ntag
reem
enti
nth
edo
min
antm
ode
cont
ribut
ions
betw
een
the
mod
els,
rega
rdle
ssof
the
leve
lofc
oars
egr
aini
ng.
832 VOL. 90, NO. 2
217
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
TABLE IIITotal fractional contribution of the slowest threemodes to the mean-square fluctuations.
Fig. 9). Consequently, this model represents an alter-native coarse-grained model that has its basis in twodifferent structures. It is noteworthy that the mostrigid regions of the structure are clearly clusteredwithin these local domains.
Discussion
One of the most important findings from thesetypes of computations is the occurrence of func-tional “local motions” not independently but withinone of the slowest most important motions. Ex-amples that we have previously observed includeflaps opening and closing over small molecule bind-
(a)
FIGURE 7. First (a), (b) and second (c), (d) modes of motion for β-galactosidase at N/2 (a), (c) and N/10 (b),(d) levels of coarse graining. Note that in parts (a) and (c) only half of the α-carbon positions are shown (and used)and in parts (b) and (d) only 1 out of every 10 residue is shown (and used in the computations). The first mode isa bending of the molecule along its activating interface, and the second mode is a stretching–compression type ofmotion. Loops often are opened and closed during these large-scale motions. This can be seen most clearly at thetop and bottom of the structure in the stretching–compression mode of motion.
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 833
218
DORUKER ET AL.
FIGURE 7. (Continued.)
834 VOL. 90, NO. 2
219
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
(d)
FIGURE 7. (Continued.)
ing sites. These motions do not occur locally andindependently but rather together with a highly co-ordinated motion of the entire protein. This typeof motion can be clearly seen in Figure 7(b) wherethe flaps at the top and bottom of the structureopen upon compression and close upon stretching,whereas opposite behavior can be observed for sur-face flaps in the center of the structure.
TABLE IVCorrelations at different levels of coarse graining.
Two alternative approaches for coarse graininghave been presented, one based on scaling the sizeof the cutoff distance based on the average di-mensions of protein segments and the other moreempirically based on actual changes between twoexperimental structures.
In many protein studies there has been a focuson functional sites while the remainder of the pro-tein structure has been substantially ignored. Thepresent work emphasizes that there is a truly impor-tant role for the entire protein in controlling thesecritical functional motions. In our view, the raisond’être for protein structure is that a fold pattern leadsto its shape, which in turn controls the importantfunctional motions of the protein. It is furthermoreimportant that it be possible to substantially ig-nore the details of the structure in extracting theselargest-scale motions. A secondary implication isthat high-resolution structures may not be requiredin order to infer the important motions of proteins.
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 835
220
DORUKER ET AL.
FIGURE 8. Ribbon diagram of the myosin headstructure [22] 1B7T. Residues in the same block areshown in the same color. The few residues in grayare those not included in any blocks.
ACKNOWLEDGMENTS
R.H. is supported through the National Sci-ence Foundation (Grant No. NSF 97-03372) andis presently an Alfred P. Sloan Fellow and Re-search Corporation Cottrell Scholar. P.D. is partiallysupported by the Bogazici Research Fund (project01HA501), and she thanks O.T. Turget for helpfuloccasions.
References
1. Bahar, I.; Wallqvist, A.; Covell, D. G.; Jernigan, R. L. Bio-chemistry 1998, 37, 1067–1075.
2. Demirel, M. C.; Atilgan, A. R.; Jernigan, R. L.; Erman, B.;Bahar, I. Protein Sci 1998, 7, 2522–2532.
3. Bahar, I.; Jernigan, R. L. J Mol Biol 1998, 281, 871–884; Ba-har, I.; Atilgan, A. R.; Erman, B. Folding Des 1997, 2, 173–181.
4. Bahar, I.; Erman, B.; Jernigan, R. L.; Covell, D. G. J Mol Biol1999, 285, 1023–1037.
5. Haliloglu, T.; Bahar, I. Proteins 1999, 37, 654–667.6. Bahar, I.; Jernigan, R. L. Biochemistry 1999, 38, 3478–3490.7. Jernigan, R. L.; Demirel, M. C.; Bahar, I. Int J Quantum Chem
(B. Pullman Memorial Volume) 1999, 75, 301–312.8. Keskin, O.; Jernigan, R. L.; Bahar, I. Biophys J 2000, 78, 2093–
2106.9. Jernigan, R. L.; Bahar, I.; Covell, D. G.; Atilgan, A. R.; Er-
man, B.; Flatow, D. T. J Biomol Struct Dyn, Conversation 11,Issue 1, 2000, 49–55.
10. Keskin, O.; Bahar, I.; Jernigan, R. L. Biochemistry, to appear.11. Atilgan, A. R.; Durell, S. R.; Jernigan, R. L.; Demirel, M. C.;
Keskin, O.; Bahar, I. Biophys J 2001, 80, 505–515.12. Doruker, P.; Jernigan, R. L.; Bahar, I. J Comput Chem 2002,
23, 119–127.
FIGURE 9. Comparison of temperature factors of myosin predicted from calculations taking into accountthe blocks (solid) and the full non-coarse-grained single-residue calculations (dashed).
836 VOL. 90, NO. 2
221
FLUCTUATION DYNAMICS OF LARGE PROTEIN STRUCTURES
13. Doruker, P.; Atilgan, A. R.; Bahar, I. Proteins 2000, 40, 512–524.