Computational modelling of tropoelastin modifications and interactions Jazmin Ozsvar School of Life and Environmental Sciences Faculty of Science University of Sydney 2020 A thesis in fulfilment for the completion of Doctor of Philosophy
Computational modelling of tropoelastin
modifications and interactions
Jazmin Ozsvar
School of Life and Environmental Sciences
Faculty of Science
University of Sydney
2020
A thesis in fulfilment for the completion of
Doctor of Philosophy
“Cell and tissue, shell and bone, leaf and flower, are so manyportions of matter, and it is in obedience to the laws of physics that
their particles have been moved, moulded and conformed.”
D’Arcy Wentworth Thompson
i
Declaration
This is to certify that to the best of my knowledge, the content of this thesis
is my own work. This thesis has not been submitted for any degree or other
purposes.
I certify that the intellectual content of this thesis is the product of my own work
and that all the assistance received in preparing this thesis and sources have been
acknowledged.
Jazmin Ozsvar Date
ii
Acknowledgements
My first thank you goes to my supervisor, Professor Tony Weiss, for his constantenthusiasm for research, his relentless stream of fascinating ideas, and remarkabledepth of knowledge. I deeply appreciate his patience with me whilst I transitionedfrom biology to biophysics, and for encouraging me to pursue studies in a fieldthat had been unexplored within our laboratory. It has been my pleasure andprivilege to have him as my supervisor throughout my doctoral studies.
I also wish to thank Dr Suzanne Mithieux and Dr Giselle Yeo for their encour-agement, insights and feedback throughout my candidature. I greatly appreciatetheir help and advice, both within and outside the realm of science.
I could not have transitioned into molecular dynamics without the help of numer-ous people. My biggest thanks goes Assistant Professor Anna Tarakanova, whonot only answered my many questions on protein modelling and molecular dy-namics, but also provided me with the tropoelastin model and providing feedbackon this thesis. I am incredibly grateful to have the opportunity to learn fromand collaborate with her. I would also like to acknowledge the advice I receivedfrom Professor Markus Buehler and his group, as well as advice from AssociateProfessor Serdar Kuyucak and Dr Jeffry Setiadi on several technical aspects of myproject.
I would like to acknowledge the Cell Therapy Manufacturing Cooperative ResearchCentre for their meticulously planned ePhD program, their funding support, andfor providing me with opportunities to present my work throughout my stud-ies.
I am deeply indebted the members of the Weiss laboratory, past and present, espe-cially those who dedicated their work to uncovering the mysteries of tropoelastinstructure, coacervation and cell interactions. In terms of laboratory members frommy time, thank you to Behnaz, Matti, Kekini, Pearl, Ed, Avelyn, Lea, Karen, Ziyu,Aleen, Howard, Johnny and Sally for sharing your time and conversations withme. In particular, I would like to thank Richard, my partner in crime, for bothemotional and engineering support during my doctoral studies.
A big thank you also goes to Cordwell and Reeves labs for years of games nights,food outings, and lunch time “what grinds my gears” moments. Moreoever, Iwould like to further thank my friends, both near and far, outside the scientificcommunity for putting up with me during my doctoral studies and for giving meencouragement and sound advice when required.
Last, but certainly not the least, I would like to thank my family. To my parentsand my sister, words cannot describe how much it means to me that you havebelieved in me and supported my scientific endeavours throughout all these years.Sometimes a shoulder to lean on is better than all the scientific advice in theworld.
iii
Abstract
Despite their biological importance and prevalence, the elucidation of the struc-
tures and dynamics of highly flexible proteins have presented a profound challenge
to the structural biological community. Improvements over the last decades in
computational hardware and the accuracy of computational chemistry software
have permitted the in-depth exploration of flexible proteins.
Here, I delve into the molecular dynamics and mechanisms of tropoelastin - the
building block of the elastin proteins - that are crucial to its functionality, and the
interplay between primary sequence, local structure and global structure. I lever-
age the recently derived full-atomistic structure of tropoelastin through a series of
computational molecular dynamics models to dissect three facets of tropoelastin’s
functionality in this thesis. Firstly, I examine the effect of natural modifications
on the global and local structure of tropoelastin, and their implications for the
self-assembly process through which elastin is formed. I find that the global struc-
tures deviate from the canonical wild type structure, indicating the formation of
heterogeneous aggregates and cross-linking. The implied heterogeneity of these
aggregates is further explored using dimers as representative nucleation events,
where I examine the influence of physical forces and initial tropoelastin structures
on early stage self-assembly. Dimers of tropoelastin result in surprisingly diverse
associations, indicating that elastin assembly is not as homogeneous as previously
thought. Finally, I probe the interaction between tropoelastin monomers and in-
tegrins, a class of cell receptors crucial for signalling and tissue integrity. I identify
tropoelastin as a fuzzy binding protein which is capable of binding to integrins in
a variety of conformations. Furthermore, I determine that tropoelastin exhibits
preferential binding, which is dependent on the initial starting conformation.
iv
Author contribution statement
Chapters 1, 3 and 5 contain material that has been published in or submitted to
scientific journals. Where applicable, the relevant publication is cited at the start
of the chapter.
For Chapter 1, I wrote 50 % of the manuscript for the cited review.
For Chapter 3, I designed the study, conducted the experiments, performed the
majority of the data analysis, and wrote the manuscript for the cited publication. I
received assistance with data analysis and manuscript editing from the co-authors
of the publication.
For Chapter 5, I designed the study, conducted the experiments, performed the
majority of the data analysis, and wrote the manuscript for the submitted publi-
cation. I received assistance with data analysis and manuscript editing from the
co-authors of the publication.
In addition, in cases where I am not the corresponding author of a published item,
permission to include the published material has been granted by the correspond-
ing author.
Jazmin Ozsvar Date
As supervisor for the candidature upon which this thesis is based, I can confirm
that the authorship attribution statements above are correct.
Anthony S. Weiss Date
v
Disseminations Arising from this Work
Publications
Ozsvar, J., Wang, R., Tarakanova, A., Buehler, M. J. and Weiss, A.S., 2020.
Fuzzy binding model of molecular interactions between tropoelastin and integrin
αvβ3. Submitted to the Biophysical Journal.
Wang, R., Ozsvar, J., Yeo, G. C., Weiss, A. S., 2019. “Hierarchical assembly
of elastin materials”. Current Opinion in Chemical Engineering, Vol. 24, pp.
54-60.
Ozsvar, J., Tarakanova, A., Wang, R., Buehler, M. J. and Weiss, A.S., 2019.
Allysine modifications perturb tropoelastin structure and mobility on a local and
global scale. Matrix Biology Plus, 3(6), pp.800-809.
Book Chapters
Wang, R., Mithieux, S.M., Ozsvar, J. and Weiss, A.S., 2016. Synthetic-Elastin
Systems. Elastic Fiber Matrices: Biomimetic Approaches to Regeneration and
Repair, p.97-132. CRC Press.
Conference Presentations and Posters
Ozsvar, J., Wang, R., Weiss, A.S., 2019. Unravelling the interactions between
tropoelastin and integrins. Oral presentation at the Annual Matrix Biology Society
of Australia and New Zealand 2019, the Woolcock Institute, New South Wales,
Australia.
Ozsvar, J., Weiss, A.S., 2018. The Role of Allysines in Tropoelastin Dynamics
and Assembly. Oral presentation at the 3rd Matrix Biology Europe 2018, Univer-
sity of Manchester, Manchester, United Kingdom.
vi
Ozsvar, J., Weiss, A.S., 2018. The Role of Allysines in Tropoelastin Dynamics
and Assembly. Oral presentation at the 10th European Elastin Meeting 2018,
Radboud University Medical Center, Radboud, Netherlands.
Ozsvar, J., Weiss, A.S., 2017. Dynamic Landscape of Tropoelastin Assembly.
Oral presentation at the Annual Matrix Biology Society of Australia and New
Zealand 2017, Royal Childrens Hospital of Melbourne, Victoria, Australia.
Ozsvar, J., Weiss, A.S., 2017. Dynamic Landscape of Tropoelastin Assembly.
Oral presentation at the 3rd Annual Charles Perkins Center Early to Mid Ca-
reer Researchers Symposium 2017, University of Sydney, New South Wales, Aus-
tralia.
Ozsvar, J., Weiss, A.S., 2017. Untangling the interactions between cells and
tropoelastin. Cell Therapy Manufacturing Cooperative Research Centre ImpaCT
Day, Adelaide.
vii
Abbreviations
ALL allysine-aldol
ANM anisotropic network model
BSA buried surface area
BS3 bissulfosuccinimidyl suberate
cMD classical molecular dynamics
COM centre of mass
EAF exchange acceptance frequency
EBP elastin binding protein
ECM extracellular matrix
ELP elastin-like polypeptides
ENM elastic network model
GAG glycosaminoglycans
GB Generalised-Born
HADDOCK High Ambiguity Driven protein-protein Docking
LNL lysinonorleucine
LOX lysyl oxidase
MD molecular dynamics
NMA normal mode analysis
PC principal component
PCA principal component analysis
QM quantum mechanics
REMD replica exchange molecular dynamics
RGD arginine - glycine - asparate
RMSD root mean square deivation
SANS small angle neutron scattering
SASA solvent accessible surface area
SAXS small angle x-ray scattering
WT wild type
viii
Contents
1 Introduction 11.1 Elastin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Elastic fibres . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Tropoelastin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 The ELN gene . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Primary sequence . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Secondary structure . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Overall tertiary structure . . . . . . . . . . . . . . . . . . . . 71.2.5 Computational model . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Elastogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1 Tropoelastin synthesis . . . . . . . . . . . . . . . . . . . . . 111.3.2 Coacervation . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.3 Cross-linking . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.4 Head-to-tail model of elastin assembly . . . . . . . . . . . . 161.3.5 Deposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Tropoelastin-cell interactions . . . . . . . . . . . . . . . . . . . . . . 181.4.1 Elastin binding protein . . . . . . . . . . . . . . . . . . . . . 191.4.2 Glycosaminoglycans . . . . . . . . . . . . . . . . . . . . . . 191.4.3 Integrins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.4 Model of tropoelastin-cell interactions . . . . . . . . . . . . . 22
1.5 Elastin diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.6 Applications of tropoelastin . . . . . . . . . . . . . . . . . . . . . . 24
1.6.1 Tropoelastin-only materials . . . . . . . . . . . . . . . . . . 241.6.2 Blended biomaterials . . . . . . . . . . . . . . . . . . . . . . 241.6.3 Surface coatings . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.7 Thesis aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Materials and Methodology 292.1 Computational multiscale modelling . . . . . . . . . . . . . . . . . . 302.2 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.1 Molecular dynamics workflow . . . . . . . . . . . . . . . . . 312.2.2 Modelling atomic movement . . . . . . . . . . . . . . . . . . 322.2.3 Force fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.4 Solvent models . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 Replica exchange molecular dynamics . . . . . . . . . . . . . . . . . 392.4 Normal mode analysis . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.1 Elastic and anisotropic network models . . . . . . . . . . . . 432.5 Molecular docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
ix
2.6 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.6.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . 482.6.2 Elastic net regularisation . . . . . . . . . . . . . . . . . . . . 50
3 Allysine modifications perturb tropoelastin structure and mobil-ity on a local and global scale 523.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.1 Allysine parameterisation . . . . . . . . . . . . . . . . . . . 563.2.2 Molecular dynamics input . . . . . . . . . . . . . . . . . . . 57
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.3.1 Structures of single allysine-modified tropoelastin . . . . . . 583.3.2 Converting lysine to allysine perturbs the global structure
and intrinsic dynamics of tropoelastin . . . . . . . . . . . . . 613.3.3 Allysines alter the conformational sampling of domains . . . 653.3.4 Allysines facilitate changes in salt bridges that contribute
to structural variance and lead to local secondary structuralchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.5 Hydrophobic solvent accessible surface area decreases in thepresence of allysines . . . . . . . . . . . . . . . . . . . . . . 71
3.3.6 Distances between residues decrease upon allysine modifica-tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 Modelling of tropoelastin nucleation events 754.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.1 Selection of tropoelastin conformations . . . . . . . . . . . . 774.2.2 Protein-protein docking . . . . . . . . . . . . . . . . . . . . 784.2.3 Preparation of structural data . . . . . . . . . . . . . . . . . 794.2.4 Determination of head-to-tail association . . . . . . . . . . . 794.2.5 Assembly of docking data . . . . . . . . . . . . . . . . . . . 804.2.6 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.2.7 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.3.1 Semi-automated annotation of head-to-tail association . . . 814.3.2 Overview of dimer associations by starting conformation and
study type . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.3.3 Overview of dimer associations by native or synthetic origin 894.3.4 Structures arising from the canonical cross-link . . . . . . . 904.3.5 Electrostatic interactions of dimers . . . . . . . . . . . . . . 914.3.6 Surface area and solvent accessibility of dimers is driven by
tropoelastin conformation . . . . . . . . . . . . . . . . . . . 934.3.7 Correlation of dimer energies and features . . . . . . . . . . 954.3.8 Machine learning model selection using energy and surface
features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.9 Logistic regression of whole dimer data set . . . . . . . . . . 99
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
x
5 Interactions of tropoelastin with integrins 1065.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2.1 Preparation of integrin headpiece structure . . . . . . . . . . 1095.2.2 Preparation of tropoelastin structure . . . . . . . . . . . . . 1105.2.3 Tropoelastin-integrin configuration preparation . . . . . . . 1105.2.4 Molecular dynamics modelling . . . . . . . . . . . . . . . . . 1115.2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.3.1 Docking of tropoelastin to integrin αvβ3 . . . . . . . . . . . 1145.3.2 Integrin headpiece opening and associated structural changes
with REMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.3.3 Areas of tropoelastin-integrin interaction . . . . . . . . . . . 1195.3.4 Principal component analysis . . . . . . . . . . . . . . . . . 1255.3.5 Headpiece opening remains stable in explicit solvent . . . . . 130
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6 Discussion 1406.1 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.2 Allysine modifications and their implication for self-assembly . . . . 1416.3 Updating the head-to-tail model of assembly . . . . . . . . . . . . . 1426.4 Fuzzy binding mechanisms of tropoelastin and integrins . . . . . . . 1446.5 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A Code and scripts 171A.1 Code for implementing machine learning . . . . . . . . . . . . . . . 172
xi
Chapter 1
Introduction
Parts of this chapter have been published as:
Wang, R., Ozsvar, J., Yeo, G. C., Weiss, A. S., 2019. “Hierarchical assembly ofelastin materials”. Current Opinion in Chemical Engineering, Vol. 24, pp. 54-60,2019.
1
1.1 Elastin
1.1.1 Elastic fibres
Elastic fibres are a vital component of the extracellular matrix (ECM) of vertebrate
elastic tissues, which includes the skin, lungs and cardiovascular system. Elastic
fibres are composed of approximately 90% elastin, an insoluble, multimeric protein
[1]. The remaining components of elastin fibres are primarily fibrillins, particularly
fibrillin-1, a class of insoluble glycoproteins [2]. As the major constituent of elastic
fibres, elastin is predominantly responsible for the key characteristic property of
these fibres, which is that of mechanical resilience. Elastin’s ability to return
to its resting state undeformed allows tissues to undergo repeated stretch and
recoil cycles that are required for the execution of appropriate functionality [3–5].
Elastin’s mechanical properties are made even more remarkable by its incredible
durability. Carbon dating methodologies have estimated elastin’s half-life to be
70-80 years [6] and, as such, it endures throughout an organism’s lifetime [7].
Elastin is composed solely of its soluble monomer subunit, tropoelastin.
1.2 Tropoelastin
1.2.1 The ELN gene
Tropoelastin is encoded by the 45 kb ELN gene, which is located on the long arm
of chromosome 7q11.2 [8]. The 34 exons of ELN are interspersed between lengthy
introns [9,10], the splicing of which incurs a variety of mRNA variants. Alternative
splicing has been observed with exons 22, 23, 24, 26A, 32 and 33 [11, 12], and
combinations thereof give rise to 13 known human tropoelastin isoforms [13]. Out
of these exons, 26A has only been found in humans [14–16]. Exon 22 is spliced
out of human transcripts but has been noted in other mammalian isoforms, such
2
as murine mRNA [11, 16, 17]. A highly conserved 3’ untranslated region exists
directly downstream of domain 36 that is thought to play a role in the regulation
of tropoelastin expression [10,18].
Variations in the relative abundance of alternatively spliced ELN mRNA tran-
scripts has been observed between tissues (Figure 1.1). This diversity is thought
to be necessary for the fine tuning of the mechanical characteristics of tissues
to suit their diverse functional requirements [13]. Indeed, studies examining the
consequences of domain insertions and deletions note changes in the intrinsic func-
tionality of tropoelastin, corroborating the hypothesis that domain insertions and
deletions result in altered tissue mechanics [19–23]. Great benefit would be gleaned
from understanding of the mechanical contributions of tropoelastin arising from
various splice variants to tissue function.
Figure 1.1: Human ELN mRNA splice variants obtained from elastic tissues. Relativeabundance of the most highly expressed human mRNA isoforms isolated from aorta, coronaryartery, lung, skin, uterus, and bladder. The loss and/or gain of exons is displayed. Imageadapted from [13].
The isoform investigated in this thesis is tropoelastin containing domain 26A and
lacking domain 22, and is commonly found in elastic tissues [9, 24].
1.2.2 Primary sequence
Tropoelastin domain are encoded by single ELN exons, and can be categorised
as either “hydrophobic” or “cross-linking” based on amino acid content and func-
3
tionality.
As seen in Table 1.1, tropoelastin’s amino acid content is dominated by non-
polar residues such as glycine, alanine, valine and proline [14]. The hydrophobic
domains comprise of variations of VPGVG repeating motifs [25–27], giving rise to
a low complexity primary sequence. The length of the hydrophobic domains are
variable, with the shorter (9 - 15 residues) hydrophobic domains occurring closer
to the N-terminus, whereas the longer (up to 55 residues) hydrophobic domains
are located within the central and C-terminal regions of the molecule [10]. The
hydrophobic domains are primarily responsible for facilitating tropoelastin self-
assembly [28–30], maintaining tropoelastin’s flexibility [31, 32], and the stretch-
recoil properties of elastin [31,33].
The cross-linking domains are distinguished by the presence of lysines, which form
cross-links within mature elastin. Cross-linking domains are termed either KP or
KA type domains, describing the amino acids (proline or alanine respectively),
which flank the lysines. KP domains exist closer toward the N-terminus, whereas
KA domains are found closer to the C-terminus. Cross-linking domains contain
anywhere between one to three lysines present within their sequences [10].
There are two exceptions to these classifications. Domain 1 (not shown) forms a
signal peptide that is cleaved off to give rise to the mature form of the protein. The
second exception is domain 36, which contains lysines but does not participate in
cross-linking [34]. Moreover, the amino acid sequence of domain 36 is unique as
its lysines are interspersed between positively charged arginines, forming a RKRK
sequence that caps off tropoelastin’s C-terminus [29].
Other residues that differ from tropoelastin’s low complexity sequence are the
cysteines in domain 36, which form the single disulfide bond within the molecule
[35]. Additionally, tropoelastin contains three negatively charged residues, which
are crucial for maintaining tropoelastin’s tertiary structure [36,37].
4
Domain Sequence Residue numbers
2 GGVPGAIPGGVPGGVFYP 1 – 183 GAGLGALGG 19 – 274 GALGPGGKPLKP 28 – 395 VPGGLAGAGLGA 40 – 51
6GLGAFPAVTFPGALVPGGVADAAAAYKAAKA
52 – 82
7 GAGLGGVPGVGGLGVS 83 – 988 AGAVVPQPGAGVKPGKVP 99 – 1169 GVGLPGVYPGGVLPGA 117 – 13210 RFPGVGVLPGVPTGAGVKPKAP 133 – 15411 GVGGAFAGIP 155 – 16512 GVGPFGGPQPGVPLGYPIKAPKLP 166 – 18813 GGYGLPYTTGKLPY 189 – 20214 GYGPGGVAGAAGKAGYPTGT 203 – 22215 GVGPQAAAAAAAKAAAKF 223 – 24016 GAGAAGVLPGVGGAGVPGVPGAIPGIGGIA 241 – 27017 GVGTPAAAAAAAAAAKAAKY 271 – 290
18GAAAGLVPGGPGFGPGVVGVPGAGVPGVGVPGAGIPVVPGAGIPGAAVP
291 – 339
19 GVVSPEAAAKAAAKAAKY 340 – 357
20GARPGVGVGGIPTYGVGAGGFPGFGVGVGGIPGVAGVPSVGGVPGVGGVPGVGIS
358 – 412
21 PEAQAAAAAKAAKY 413 – 42623 GVGTPAAAAAKAAAKAAQF 427 – 445
24GLVPGVGVAPGVGVAPGVGVAPGVGLAPGVGVAPGVGVAPGVGVAPGI
446 – 493
25 GPGGVAAAAKSAAKVAAKAQL 494 – 514
26RAAAGLGAGIPGLGVGVGVPGLGVGAGVPGLGVGAGVPGFGA
515 – 556
27 VPGALAAAKAAKY 557 – 56928 GAAVPGVLGGLGALGGVGIPGGVV 570 – 59329 GAGPAAAAAAAKAAAKAAQF 594 – 61330 GLVGAAGLGGLGVGGLGVPGVGGLG 614 – 63831 GIPPAAAAKAAKY 639 – 65132 GAAGLGGVLGGAGQFPLG 652 – 66933 GVAARPGFGLSPIFP 670 – 68436 GGACLGKACGRKRK 685 – 698
Table 1.1: Summary of human tropoelastin’s domains and their respective se-quences. The residue numbers of the sequences are indicated.
1.2.3 Secondary structure
Our understanding of tropoelastin’s structure has been hindered by the insolubil-
ity of elastin fibres, the repetitiveness of its primary sequence, and tropoelastin’s
5
inherent flexibility. Due to the lack of a full-atomistic x-ray crystal structure, a
number of elastin derivatives, including α-elastin, κ-elastin, isolated tropoelastin
domains, and synthetic elastin-like polypeptides (ELPs), were studied using circu-
lar dichroism (CD) [38, 39], Raman spectroscopy [40, 41], Fourier transform infra
red spectroscopy (FTIR) [40,42], and nuclear magnetic resonance (NMR) [43,44].
Collectively, these studies have yielded much insight into tropoelastin’s secondary
structure.
Despite initial discrepancies between studies due experimental context, such as
the solvent of choice [45–47], it is now generally agreed that the majority of
tropoelastin forms random coils and transient ordered secondary structures, which
include α-helices and β-structures [29, 48]. The majority of the random coil con-
tent is found in the hydrophobic domains, rendering them highly flexible in so-
lution [44, 49]. This flexibility is partly attributed to the numerous PG motifs
found in the primary sequence of the hydrophobic domains [50]. This is a unique
combination due to the peculiar pairing of the most and least flexible amino acids.
Glycine confers flexibility to local protein structure due to its small sidechain,
which consists of a single hydrogen molecule. Proline, on the other hand, con-
tains a bulky ring that impedes local conformational sampling, thereby disrupting
secondary structure formation. Thus, this pairing results in flexible hydrophobic
domains that exhibit transient secondary structures [51], which are thought to con-
tribute to efficient conformational sampling during self-assembly and subsequent
to cross-linking [32,52].
Tropoelastin’s cross-linking domains, particularly the KA domains, were tradi-
tionally presumed to form α-helices and poly-proline II helices (PPII) due to the
presence of desmosine cross-links [53]. Desmosine requires the specific alignment
of four lysines between two tropoelastin domains, which can be achieved via helical
configuration [53]. Further studies demonstrating that alanines are predisposed
to form α-helices within other proteins appear to support this argument [54, 55].
However, KA domains present high helical content when in trifluoroethanol, a
6
solvent that stabilises secondary structures [56, 57]. Studies examining the heli-
cal content of ELPs demonstrate that whilst α-helices are indeed present during
later stages of self-assembly, KA domains are primarily composed of random coil
content whilst in monomeric form, similar to the hydrophobic domains [58]. This
corroborates with other studies indicating that less than 10% of tropoelastin’s
structure is helical [29,48], despite almost half of its sequence consisting of cross-
linking domains. The KA domains of ELPs undergo a transition from random coil
to β-strands during early self-assembly, before forming α-helices that are stable
enough to detect via NMR [58].
1.2.4 Overall tertiary structure
Tropoelastin’s overall flexibility greatly impeded studies of its tertiary structure:
thus, no crystal structure exists to date. The first experiments to successfully de-
termine the overall 3-D shape of tropoelastin utilised small angle x-ray (SAXS) and
small-angle neutron scattering (SANS), revealing tropoelastin to be an elongated,
asymmetric molecule with distinct regions [59]. The regions, and the domains
they comprise, were mapped by the superimposition of truncated versions of the
molecule. Overlapping of the structures revealed that the N-terminal region forms
an extended coil region that encompasses domains 2-18. The coil joins onto a
flexible hinge region comprising of domains 20-24, which is adjacent to the bridge
region of domains 25-26. Subsequent to the bridge are the C-terminal domains,
also termed the “foot” of the molecule due to their spatial arrangement. Further
SAXS and SANS data revealed that tropoelastin’s tertiary structure is perturbed
by mutations of negatively charged residues within disparate regions [36,37]. These
differently shaped molecules present altered coacervation and elastic fibre forma-
tion [36, 37], highlighting the tight interplay tropoelastin’s structure-function re-
lationship.
7
Figure 1.2: Schematic Representation of Tropoelastin. Full-length tropoelastin modelexhibiting the notable and functionally significant structures. Image adapted from Wise etal. [60].
1.2.5 Computational model
The energy landscape of flexible proteins contains many shallow, local minima
that have smaller energy barriers in comparison to more ordered, folded pro-
teins, which have cone shaped energy landscapes dominated by a global minimum
(Figure 1.3). In order to understand the range of conformations tropoelastin oc-
cupies, it was examined using accelerated ab initio molecular dynamics (discussed
in Chapter 2, Methodology). Tropoelastin’s linear polypeptide chain was allowed
to fold over lengthy replica exchange molecular dynamics (REMD) simulations,
resulting in an ensemble of full-atomistic structures [61]. These were subsequently
clustered by the similarity of their Cartesian backbone coordinates, giving rise to
groups of structures that demonstrated the extent of tropoelastin’s conformational
sampling [62].
8
Figure 1.3: Energy landscape of disordered, flexible and ordered proteins Top paneldepicts the energy landscapes of disordered (left), flexible (middle) and ordered (right) proteins.The minima of proteins becomes deeper with greater order, reflecting the preference for a par-ticular set of structures in flexible ensembles, and the emergence of a favoured conformation inordered proteins. The bottom panel illustrates the conformational ensembles that result fromthe three energy landscapes. Disordered protein (left) exhibit a large variety of conformations,with the amount of conformational sampling decreasing as the energy landscape shifts towardsdeeper minima. Image from [63].
Remarkably, the representative structure of the most populated structural cluster
obtained at 300 K overlapped with the SAXS envelope that had been previously
obtained [61], demonstrating the utility of MD in exploring the conformational
landscape of flexible proteins such as tropoelastin. Analysis of the biologically
relevant 310 K structural ensemble revealed that tropoelastin maintains its overall
structure, with the lowest energy structures having a discernable extended N-
terminus extended coil region and an evident C-terminus foot region (Figure
1.4) [62]. These studies have been pivotal in our understanding of tropoelastin as
they confirm that tropoelastin is highly flexible protein rather than intrinsically
disordered molecule.
9
Figure 1.4: Structural ensemble of tropoelastin at 310 K. The dominant structure full-length tropoelastin (blue) derived via computational modelling is overlaid onto other structuresfrom its ensemble. Displacement of domains deviating from the dominant structure are marked.Image adapted from [62].
The application of both coarse-grained and atomistic modelling have yielded fur-
ther discoveries regarding the intrinsic molecular motions that contribute to tropoe-
lastin’s functionality [22,61,62]. The most notable example of this is the distinctive
scissors-twist motion at the C-terminus “foot” region of the WT molecule, which
has been implicated in self-assembly and disease states [22, 61]. Introduction of
the short - and usually absent - domain 22 abolishes the scissors-twist motion
and results in a WT+22 molecule with heightened global molecular stiffness [22].
Hydrogels fabricated from the WT+22 mutant displayed markedly different char-
acteristics to the WT hydrogels, indicating that the presence of scissors-twist and
high flexibility are key elements of self-assembly. Likewise, the G685D cutis laxa
mutation results in a contracted foot region that is incapable of the scissors-twist
10
motion, which is proposed to result in aberrant elastic fibres that contribute to
the rough skin texture phenotype of patients [61].
1.3 Elastogenesis
Elastogenesis is the term that collectively describes the hierarchical process of
elastic fibre formation (Figure 1.5). Elastogenesis is comprised of distinct phases:
tropoelastin synthesis, coacervation, cross-linking, and deposition.
Figure 1.5: Overview of elastogenesis. A) Tropoelastin (blue) is secreted as a monomer tothe cell surface. B-C) Coacervate spherules consisting of tropoelastin n-mers grow in size whilsttethered to cell surface receptors. D) Spherules are released from cell receptors and depositedonto microfibril scaffolds (orange), and E) are enzymatically cross-linked by LOX/LOXL (cones),eventually giving rise to F) fibres consisting predominantly of elastin. Image from [64].
1.3.1 Tropoelastin synthesis
Elastogenic cells, such as fibroblasts, smooth muscle cells, endothelial cells, airway
epithelial cells, keratinocytes, and chondroblasts, synthesise and secrete tropoe-
11
lastin [65–71]. The majority of tropoelastin synthesis occurs during the perinatal
stages of development [72,73], however, synthesis may be triggered in response to
tissue damage [74] or during diseases such as atherosclerosis [75].
As previously discussed, ELN mRNA is spliced, after which it is transcribed into
a 72 kDa polypeptide chain that includes domain 1, the 26 amino acid signal
sequence [76]. Cleavage of the signal sequence results in the transportation of the
60 kDa tropoelastin to the Golgi, where it folds into its tertiary structure before
secretion to the cell surface [77].
1.3.2 Coacervation
Coacervation is an endothermic, entropically favourable process through which
tropoelastin monomers self-assemble into higher order n- mer structures (Figure
1.6). Much of our current knowledge has been gleaned from extensive in vitro
studies that have utilised tropoelastin and a variety of its derivatives to explore
the instrinsic factors of this process.
Figure 1.6: Hierarchy of elements that contribute to the coacervation. The repre-sentative sequences of hydrophobic and cross-linking domains are interspersed throughout thetropoelastin monomer. The properties of the amino acid sequence and the global shape of tropoe-lastin facilitate initial n-mer aggregation and larger coacervate spherule formation. Eventually,the spherules coalesce into elastin fibres, the mature form of the protein. Image from [78].
The initial stage of in vitro coacervation is characterised by the rapid aggregation
of tropoelastin into 1-2 µm spherules, which eventually grow and stabilise into
spherules 2-6 µm in diameter [79–82]. Tropoelastin spherules assemble at the cell
12
surface before deposition onto the microfibrillar scaffold in cell culture systems
at physiological temperature [80]. Coacervation is temperature dependent, with
an optimal temperature of 37◦C [24]. The process of tropoelastin aggregation is
initially reversible, as spherules dissipate if the temperature is lowered [79], how-
ever, maintenance of a physiological temperature results in maturation, which is
indicated by spherule coalescence and the irreversible formation of fibrillar struc-
tures [83–85]. The presence of tropoelastin spherules fusing to fibrils has been
noted in native tissue, demonstrating marked similarities between in vitro and in
vivo coacervation [80,86–88].
Tropoelastin’s hydrophobic domains are primarily responsible for facilitating coac-
ervation [89–91]. Non-polar residues are a major contributor to protein folding, as
their unfavourable interactions with water propel them to bury into the protein
core, however, as tropoelastin is comprised of numerous hydrophobic domains, it
has been demonstrated that many of these domains will be at least partially sol-
vent exposed [78, 92]. Thus, at lower temperatures, the water surrounding these
domains in vitro forms ordered, clathrate-like shells that prevent aggregation until
the appropriate temperature is reached [90, 93–95]. In comparison, higher tem-
peratures allow the breaking of the hydrogen bonds of the ordered water, dis-
sipating the clathrate shells and permitting the association of the hydrophobic
domains [96]. The prevention of early self-aggregation in vivo is thought to be
mediated by chaperone proteins [97,98].
The flexibility of the hydrophobic domains has been implicated in self-assembly.
Large-scale computational modelling of 27 ELP chains revealed that the aggre-
gate maintained a hydrated, disordered, liquid-like state due to the formation of
short-lived interchain bonds [32]. It is thought that the disorder of the ELPs is
partially due the presence of PG repeats, the nature of which has been previ-
ously described in this chapter. This is corroborated by other ELP studies noting
that increasing the spacing between the PG repeats or proline removal results in
heightened β-sheet formation and amyloid-like structures [50, 99]. Further evi-
13
dence for the sequence dependent regulation of tropoelastin assembly arises from
coacervation studies of isolated lengthy central hydrophobic domains (18, 20, 24
and 26) demonstrating that the resultant structures vary between the domains at
the supramolecular level [100,101].
Although tropoelastin’s cross-linking domains are not the main drivers of coacer-
vation, they modulate coacervation in a context-dependent manner. For example,
tropoelastin isoforms containing domain 26A, a cross-linking domain that is nor-
mally spliced out in mature mRNA [102], exhibit diminished coacervation [19].
As the inclusion of cross-linking domains into ELPs decreases aggregation time in
comparison to peptides consisting of only hydrophobic domains, this suggests that
the relatively favourable interactions between the cross-linking domains and aque-
ous solvent are important for regulating coacervation [90]. As previously discussed,
tropoelastin’s cross-linking domains undergo secondary structural changes [58,96],
however, their impact on the conformational sampling of the hydrophobic domains
during coacervation has not been fully assessed. When considering the variety of
cross-linking domains within tropoelastin, it is possible that they also regulate
coacervation in a sequence and spatially dependent manner. Thus, it is imper-
ative to transition away from a reductionist approach and examine coacervation
using tropoelastin’s entirety.
1.3.3 Cross-linking
Similarly to other ECM proteins, such as collagen, tropoelastin covalently cross-
links via its lysines. Cross-linking requires the modification of at least one of
the lysine participants by a member of the copper-containing lysyl oxidase (LOX)
or lysine oxidase-like (LOXL) enzyme families. LOX and LOXL convert the ε-
amino group of lysine to α-aminoadipic acid-δ-semialdehyde (allysine) [103]. The
resultant allysine can then react with the ε-amino group of an unmodified lysine via
a Schiff base reaction to form lysinonorleucine, or undergo an aldol condensation to
give rise to allysine aldol [104,105] (Figure 1.8). Both reactions are spontaneous
14
and do not require further enzymatic action. The bifunctional cross-links are
capable of undergoing further condensation with other lysines and/or allyines to
form tetrafunctional desmosine or isodesmosine [106].
Figure 1.7: Schematic of cross-linking within elastin. Upon modification of lysine toallysine by LOX, the resultant allysine interacts with either an unmodified lysine or an ally-sine, forming lysinonorleucine and allysine aldol respectively. These bifunctional cross-links caneventually condense into more complex species such as desmosine and isodesmosine. Imagefrom [107].
Approximately 90% of tropoelastin’s lysines undergo modification and/or partic-
ipate in cross-links, indicating that elastin is extensively cross-linked [34, 108].
Mapping the locations of these modifications and cross-links is crucial to under-
standing the molecular structure of elastin, however, the precise locations of the
cross-links have been difficult to ascertain, largely due to tropoelastin’s repetitive
sequence. Attempts to elucidate these sites have resulted in a large number of
ambiguously assigned cross-linking sites [109], however, the field has seen some
successes. The first study to unequivocally identify cross-linking participants used
elastin from copper-deficient pigs, whose elastin was partially cross-linked elastin
due to abrogated LOX functionality and, thus, more susceptible to the protein
15
cleavage required for MS analysis [53]. This study demonstrated the presence of
a desmosine involving two lysines from domains 19 and 25 each, and lysinonor-
leucines between domains 10-19 and 10-25. Further studies detailing in vitro
coacervation have been useful in assessing the solvent exposed lysines available for
cross-linking using yeast-derived LOX and chemical cross-linkers [85,110].
A number of recent studies have shed light onto the nature of cross-linking in
elastin. Almost all lysine residues are partially modified, resulting in residues
that can participate in either of the bifunctional bonds [111]. Furthermore, cross-
linking domains bond in a context dependent manner, whereby KA domains, such
as domain 14, have a propensity to form tetrafunctional cross-links whilst KP
domains tend to form bifunctional cross-links [34,103,111]. These most likely arise
due to restrictions imposed by the local secondary structures of these domains, for
example, KP domains are not known to form helices and, therefore, may not align
lysines in such a manner that promotes tetrafunctional cross-link formation. A
further step towards understanding the process of cross-linking is the discovery of
the presence of both intermolecular and intramolecular cross-links within mature
elastin [34]. Although intramolecular cross-links were originally thought of as
structural intermediates required for tetrafunctional cross-links, their existence
in mature elastin suggests that they are of functional importance [34]. Taken
together, these observations suggest that elastin is more heterogeneously cross-
linked that previously assumed [34, 111], however, as not all cross-links can be
unambiguously assigned due to similarities in primary sequence, the suggestion
requires further elucidation [34,103,111].
1.3.4 Head-to-tail model of elastin assembly
The head-to-tail model of tropoelastin assembly combines the global SAXS struc-
ture [59] with the approximate locations of domains 10, 19 and 25, the cross-linked
domains identified in porcine elastin [53]. It was presumed that tropoelastin would
assemble in a head-to-tail manner similar to that of other ECM proteins, including
16
collagen, actin and lamin, which assemble into fibrils that subsequently associate
laterally to form thicker fibres [112–114]. However, considering that tropoelastin
initially forms spherules rather than fibrils during elastogenesis, it was unclear as
to how head-to-tail associations could give rise to the globular structures observed
during early coacervation.
Figure 1.8: Head-to-tail assembly of tropoelastin. A) Approximate location of domains10, 19 and 25 on the outline of tropoelastin’s SAXS envelope. B) Structure of cross-link reportedin [53] with the domain locations indicated. C) Schematic of repeating head-to-tail assembly oftropoelastin monomers based on the aforementioned domains. Obtained from [59].
More recently, it has been proposed that tropoelastin associates via a range of in-
teractions during the initial stages of coacervation. Coarse-grained computational
modelling with 40 tropoelastin monomers demonstrated head-to-head, tail-to-tail,
head-to-tail and lateral interactions over a timeframe of 10 µs [78]. Interestingly,
17
both fibrils and globular clusters of monomers were observed, suggesting a high
level of conformational sampling during this phase of coacervation. Importantly,
the presence of fibrils indicates that the nanostructures formed during initial as-
sembly contribute to the supramolecular structure of elastin arising from later
stages of elastogenesis. However, a drawback of this model is that the represen-
tative structure of tropoelastin was utilised rather than its entire conformational
ensemble and, as such, may not have captured the full scope of interactions.
1.3.5 Deposition
Microfibrils form the scaffolding onto which tropoelastin aggregates are deposited
once they leave the cell surface. Microfibrils contain a variety of proteins, of
which fibrillin-1 is the most common. Tropoelastin interacts with microbibril
components including fibrillin-1, fibulin-4 and -5, and other associated molecules
such as latent transforming growth factor β [115–118]. In addition to interacting
with tropoelastin, fibulin-4 and -5 are capable of also binding LOX and fibrillin-1
and, thus, have key roles in facilitating elastogenesis [119]. The importance of
fibulins in elastin fibre assembly and fibre directionality is crucial, as fibroblasts
with fibulin-4 and -5 knockdowns generate poorly formed elastin fibres [120]. This
has been further demonstrated in vivo, where fibulin-4 -/- mice display aberrant,
non-fibrous elastin, and a marked lack of desmosine cross-links [117].
1.4 Tropoelastin-cell interactions
Tropoelastin promotes the attachment, spreading and proliferation of multiple
cell types, including fibroblasts [121–123], endothelial cells [124, 125], and mes-
enchymal stem cells [122, 123, 126]. These cellular activities are facilitated by the
binding of tropoelastin to specific receptors on the surface of the cells, and trigger
a wide range of processes that include wound healing, elastagenesis and mainte-
nance of stemness [126]. Understanding the mechanisms that contribute to these
signalling pathways is of importance for discerning the physiological consequences
18
of tropoelastin-cell binding. Moreover, these interactions and pathways can be
used to inform future biomaterial and drug design. The cell surface receptors that
tropoelastin has been observed to interact with thus far are elastin-binding protein
(EBP), glycosaminoglycans (GAGs), and integrins.
1.4.1 Elastin binding protein
EBP is a 67 kDa inactive splice variant of β-galactosidase [97,127] that recognises
tropoelastin’s repetitive hydrophobic sequences [128]. EBP is thought to play two
roles in elastogenesis, firstly, operating as part of a protein complex that facilitates
the transportation of tropoelastin from the cell’s interior to exterior [129], whilst
also acting as a chaperone to prevent premature self-aggregation and proteolysis
[97, 98]. EBP also participates in signalling via its interactions with peptides
derived from elastin degradation [130–132]. The recognition of elastin fragments
by EBP facilitates the activation of focal adhesion kinase [131], phosphorylation
events via the Ras-Raf pathway [130,131], and the differential regulation of matrix
metalloproteinases [132], which are associated with the body’s response to tissue
damage. It is of note that, thus far, full-length tropoelastin has not been noted
to trigger these pathways through binding EBP, strongly suggesting that EBP’s
primary signalling pathway is that of wound recognition.
1.4.2 Glycosaminoglycans
GAGs are negatively charged, linear polysaccharides with a length of 10-100 kDa
that are categorised as either sulfated (heparin sulfate and chondroitin sulfate) or
non-sulfated (hyaluronic acid) [133]. GAGs are involved in the coagulation path-
way [134, 135] and inflammatory response [135], making them crucial to wound
healing and tissue regeneration. Due to their charge, GAGs predominantly in-
teract with proteins containing amino acid residues that are positively charged
at physiological pH [136]. Thus, similar to EBP, GAGs also potentially act as
19
chaperones in the context of elastogenesis by preventing premature tropoelastin
aggregation, cross-linking, and the formation of allysines through their interac-
tions with unmodified lysine residues [123, 137, 138]. The tropoelastin domains
identified to interact with GAGs thus far are domains 17-18 [123] and domain
36 [139]. Domains 17 and 36 contain lysines, and, in the case of domain 36, posi-
tively charged arginines. The removal of lysines from a peptide spanning domains
17-18 results in substantially decreased fibroblast adhesion [123], strongly suggest-
ing the involvement of these residues in tropoelastin-GAG interactions. It is of
note that further indication for the involvement of GAGs in elastogensis arises
from their propensity to interact with fibrillin-1 and -2 [140, 141]. When com-
bined with their ability to bind tropoelastin, it is likely that GAGs are capable of
supporting multiple aspects of elastogenesis.
1.4.3 Integrins
Integrins are a ubiquitous, diverse family of heterodimeric cell receptors. In mam-
mals, integrins are comprised of combinations of 18 α and β subunits. Mul-
tiple distinct subunit pairings give rise to dimers with high ligand specificity,
and thus, mediate a number of signalling pathways in response to ligand bind-
ing [142, 143], such as mechanotransduction [144, 145], differentiation [146, 147],
angiogenesis [148], wound repair [149], and tumour cell invasion [150].
The most commonly studied binding sequence is the fibronectin-derived Arg-
Gly-Asp (RGD) sequence [151], a sequence which binds to approximately half
of the known integrins, including the well-characterised integrin αvβ3 [151]. Crys-
tal structures of αvβ3 bound to either RGD or fibronectin fragments show that
ligand binding occurs at the interface between the β-propeller of the α subunit
and the βA domain of the β subunit [152, 153]. The mechanism by which RGD
causes αvβ3 to undergo conformational changes has been explored through crys-
tallography [153] and computational modelling [154], which have revealed that the
20
reorganisation of a number of α-helices within the βA domain promotes structural
changes that shift the conformation of the receptor from a bent (closed) state to
an open (active) state (Figure 1.9) [154,155]. The propagation of the signal from
the extracellular region and across the cell membrane allows for the recruitment
of other proteins at the integrin’s intracellular tail, such as focal adhesion kinase
and PKB/AKT [126].
Figure 1.9: Major integrin conformational changes associated with outside-in sig-nalling. The integrin subunits are bent within the inactive conformation (left). Upon ligand(red) binding at the headpiece interface between the α (blue) and β (green) subunits, the integrinis able to straighten (right). This conformational change is accompanied by the separation ofthe bodies of the two subunits, which allows intracellular effector proteins (purple) to bind theintegrin tail. Image from: https://pdb101.rcsb.org/motm/134.
Thus far, tropoelastin domains 17 and 36 are known to bind αv integrins, specif-
21
ically αvβ3 and αvβ5 [121–123, 126]. This is of significance because tropoelastin
does not contain the canonical RGD integrin binding sequence found in other
ECM proteins, such as fibronectin. Thus, the minimal binding sequence required
to understand the mechanisms of tropoelastin-integrin interactions requires fur-
ther investigation.
1.4.4 Model of tropoelastin-cell interactions
The current model of cell binding to tropoelastin is thought to proceed via a se-
quential combination of GAGs and integrins. GAGs are of varying lengths that
can extend far beyond integrins within the extracellular space, and as such, are
thought to facilitate initial cell adhesion to tropoelastin [123]. Meanwhile, inte-
grins are required for subsequent cell spreading and eventual signalling [122,123],
through recruitment of intracellular proteins [126]. It has been postulated that
tropoelastin contains further integrin binding sites due to the similarity of lysine-
containing motifs between domain 17 and other cross-linking domains [123]. Their
discovery would be essential to understanding the mechanisms of cell binding to
both tropoelastin and elastin.
1.5 Elastin diseases
As previously discussed, alterations to tropoelastin’s sequence directly impact its
structure and, subsequently, its function. The majority of elastin diseases are
caused by various mutations within the ELN gene that impair the morphology
and functionality of the elastin fibre [156]. Two diseases that arise from the
altered structure of tropoelastin are autosomal dominant cutis laxa (ADCL) and
supravalvular aortic stenosis (SVAS).
ADCL is characterised by loose, inelastic skin, and may present other complica-
22
tions such as hernia, or cardiovascular symptoms such as a bicuspid aortic valve.
The most common variants of ADCL arise from frameshift mutations toward the
3’ end of the ELN gene [156, 157]. This mutation results in the aberrant transla-
tion of these exons, potentially including the 3’ untranslated region, and forms an
unusually elongated protein [157]. The resultant tropoelastin monomers display
markedly altered self-assembly and deposition, whereby larger ADCL coacervates
display impaired binding to fibrillin [157]. This manifests as disorganised or frag-
mented elastin fibres [158] that result in the elevated tissue compliance [159, 160]
seen within the ADCL disease phenotype.
SVAS is a rare congenital cardiovascular disease where the patient is born with a
lesion at the sinotubular junction of the aorta [161]. It is further characterised by
a reduction in elastin content, disorganised elastin fibres, as well as an increase
in smooth muscle cells and collagen that result in the constriction of the aorta
[160,162]. Like ADCL, SVAS can arise from multiple alterations to the ELN gene,
which include missense mutations, premature stop codons, and frameshifts due to
base pair insertion and deletion events [163–166]. The result is a truncated form
of tropoelastin that is likely to display altered self-assembly and deposition into
the ECM [167].
These pathophysiologies highlight the potentially detrimental effects of alterations
to the sequence-structure-function hierarchy of elastin formation. However, sub-
stantial hurdles remain in fully characterising the functionalities of the disease-
associated states of tropoelastin, such as obtaining high quality structural data,
exploring the impact of altered self-assembly, and examining the effect of aberrant
protein sequences on downstream signalling pathways. The resolution of these
will lead to a deeper understanding of the mechanisms that contribute ADCL and
SVAS.
23
1.6 Applications of tropoelastin
Tropoelastin’s unique self-assembly properties and potent interactions with cells
poises it for biomaterial fabrication for tissue engineering and wound repair. Syn-
thetic tropoelastin can be produced at scale with an Escherichia coli bacterial
system utilising human cDNA [168, 169]. This has allowed for the feasible incor-
poration of tropoelastin into a wide variety of biomaterials, on its own or in a
material blend, for numerous applications (Figure 1.10).
1.6.1 Tropoelastin-only materials
By itself, tropoelastin forms biomaterials that are elastic, and thus, are appropriate
for use in dermal and cardiovascular tissue. Tropoelastin-only biomaterials have
the advantage of being simple to fabricate due to the self-assembly properties of
tropoelastin. A prime example of this is HeaTro (“heated tropoelastin”), which is
formed by heating freeze-dried tropoelastin to yield a highly porous scaffold that
softens on implantation [170]. Tropoelastin may also be modified, as in the case of
MeTro (“methacrylated tropoelastin”), where methacrylated lysine residues allow
for rapid light-mediated cross-linking, resulting in an elastic material appropriate
for use as a surgical sealant [171,172].
1.6.2 Blended biomaterials
An advantage of hybrid materials is the optimisation of the ratio of proteins and/or
other materials to suit the requirements of specific tissues. Consequently, this
has allowed the fabrication of a number of novel biomaterials with easily tunable
biological, physical and mechanical properties.
The ability to resist high pressure is important with respect cardiovascular and
cartilage tissue engineering. The strength of tropoelastin scaffolds can be increased
24
by coating with poly-caprolactone (PLC), where the increase in strength is pro-
portional to the thickness of the coating [173]. Porous tropoelastin-PLC blends
support chondrocyte adhesion and proliferation, showing promise for cartilage re-
pair [174].
Figure 1.10: Tropoelastin-based biomaterials. A-B) The variety of scaffolds that can befabricated from HeaTro [170]. C) Cross-section showing the interface between MeTro sealant(pink) and porcine lung (purple) [172]. Comparison of fibre sizes of electrospun D) pure tropoe-lastin and E) 80:20 tropoelastin:collagen [175]. E) Optical clarity of tropoelastin-silk cornealreplacement films [176]. F-G) Structure of electrospun tropoelastin-silk meshes for pelvic organprolapse [177].
The incorporation of tropoelastin into Integra Dermal Regeneration Template, a
currently available bioactive scaffold, increases blood vessels after full thickness im-
plantation into a porcine model [178]. The addition of soluble tropoelastin to cell
25
culture containing Integra scaffolds enhances elastin fibre formation, demonstrat-
ing that tropoelastin does not necessarily have to be blended with a biomaterial
to have an effect [179].
Tropoelastin-silk biomaterials have been extensively explored in numerous appli-
cations. These hybrids promote mesenchymal stem cell proliferation [180], in-
fluence progenitor cell lineage [181], and have application in nerve repair and
guidance [182]. The optical clarity, refractive index, glucose permeability, and
interactions with corneal cells have made tropoelastin-silk films favourable can-
didates for corneal replacement [176]. Woven electrospun tropoelastin-silk blends
have demonstrated utility in alleviating pelvic organ prolapse due to their robust
mechanical properties [177].
1.6.3 Surface coatings
In addition to its direct incorporation into biomaterials, tropoelastin can also be
used as a surface coating to enhance the biocompatibility of existing biomedical
devices or materials. Plasma immersion ion implantation (PIII) allows the cova-
lent attachment of molecules to polymers and metals. The presence of tropoelastin
on PLLA-PLGA scaffolds enhances cell adhesion and proliferation, and promotes
angiogenesis [183], and may even aid in resisting thrombosis when applied to
metals [184], displaying its potential for use in implantable cardiovascular devices.
Tropoelastin has also demonstrated utility in orthopaedic implants, where its func-
tionalisation of polyether ether ketone surfaces heightened the expression of bone
markers of human osteoblast-like cells [185].
26
1.7 Thesis aims
This thesis leverages the recent full-atomistic model of tropoelastin [61] to ex-
plore the structure and dynamics of the monomer. This thesis comprises of three
parts that each explores unique facets of tropoelastin modifications and interac-
tions.
1. Consequences of allysine modifications on a local and global scale
I assess how allysine modifications, which are essential to cross-linking, contribute
to the dynamics and structural changes that occur in tropoelastin in the context
of elastin assembly. I use replica exchange molecular dynamics to generate struc-
tural ensembles of allysine containing tropoelastin. I conduct principal component
analysis on these ensembles and find that the molecule departs from the canonical
structural ensemble. Furthermore, I show that, while the canonical scissors-twist
is retained, new movements emerge that deviate from those of the wild type pro-
tein, providing evidence for the involvement of a variety of molecular motions in
elastin assembly. Additionally, I highlight secondary structural changes and link
these perturbations to the longevity of specific salt bridges. I propose a model
where allysines in tropoelastin contribute to hierarchical elastin assembly through
global and local perturbations to molecular structure and dynamics.
2. Factors that influence the initial association of tropoelastin molecules
Using the three tropoelastin molecules that predominantly constitute tropoe-
lastin’s canonical structure ensemble, I model early stage nucleation events in the
context of head-to-tail assembly. I utilise an assortment of dimers that are gener-
ated through docking and driven by experimentally determined sites of interaction.
I dissect the interactions that constitute the dimer interactions based on the style
of overall association to discover the propensity of particular molecules to form
head-to-tail associations based on their global structure and domain placement. I
then conduct elastic net regularised logistic regression to examine the factors that
27
are important for generating head-to-tail associations. I find that the domains
are predominantly responsible for driving the type of interaction, confirming our
previous results.
3. Fuzzy binding mechanisms of tropoelastin and αvβ3
I construct a molecular model of the interactions between tropoelastin and inte-
grin αvβ3 using molecular dynamics. Using two different candidate conformations
of tropoelastin, I create docked protein-protein structures as input for replica ex-
change molecular dynamics simulations, through which I generate two independent
ensembles of tropoelastin-integrin structures. I show that one ensemble contains
more conformational changes within αvβ3 that are associated with outside-in cell
signalling over the other. Importantly, I find that these conformational changes
occur more frequently when tropoelastin binds the integrin’s α1 helix rather than
the upstream canonical binding site, the β1-α1 loop. By dissecting the frequency
of contact between the two proteins, I demonstrate that a broad variety of tropoe-
lastin domains interacted with αvβ3. In particular, I confirm the binding of two
domains, 17 and 36, previously explored in the context of cell attachment with
αvβ3. Furthermore, I demonstrate that a number of domains not previously as-
sociated with cellular interactions also contact αvβ3, including domain 20. Ad-
ditionally, I use principal component analysis to discover the molecular motions
of tropoelastin that contribute to integrin binding, and find that these motions
differed greatly between the two ensembles. I propose a model of fuzzy bind-
ing, whereby multiple tropoelastin conformations are capable of interacting with
numerous αvβ3 sites, including both the canonical ligand binding site and other
non-canonical regions.
28
2.1 Computational multiscale modelling
Multiscale modelling refers to the mathematical and statistical description of sys-
tems, including proteins, across various levels of detail. Broadly speaking, mul-
tiscale modelling can be categorised into bottom-up, where quantum mechanics
and full atomistic modelling are used to infer the properties of large systems, and
top-down, which involves mesoscale and continuum modelling methodologies to
describe smaller scale properties. Quantum mechanics is the most granular level
at which a system can be described as it explicitly details the movement of the
subatomic particles within a system. Due to its extraordinary computational cost,
it cannot be applied to systems on the scale of proteins, in which case full-atomistic
or coarse-grained atomistic simulations become the method of choice. Similarly,
full atomistic simulations are too expensive to be applied to systems where the
bulk properties of a material need to be observed, and thus, the use of mesoscale
and continuum modelling methods are more appropriate. Thus, the choice of
model is largely dependent on the scale of the system and the level of detail re-
quired for analysis, as increases in system size and detail occur concurrently with
increases in computational resources and time.
2.2 Molecular dynamics
As the name suggests, molecular dynamics (MD) is a bottom-up methodology
concerned with describing the motion of molecules. Since its conception in the
1940-50s, MD has emerged as a valuable technique to complement and expand
on experimental data. The advantage of MD over traditional structural biolog-
ical methods is that it can provide detailed insight into molecular motions and
mechanisms that are otherwise not captured by traditional experimental methods
such as X-ray crystallography and cryo electron microscopy, can achieve. A further
benefit of MD is that the system can be set up to suit the needs of the experiment,
30
allowing the exploration of thermodynamics under conditions that are otherwise
not achievable in the laboratory, but that are still worthwhile investigating.
Multiple software packages are available to conduct MD, including NAMD [186],
GROMACS [187], CHARMM [188] and AMBER [189]. These programs determine
the motion of each atom within the molecular system over a series of timesteps.
As the MD program of choice for this thesis is NAMD, particular methods of
integration and algorithms will be described within its context.
2.2.1 Molecular dynamics workflow
Prior to commencing a simulation, parameters such as atomic masses, atomic
coordinates and interaction potentials are first defined. These parameters are
fixed and do not change over time in classical MD simulations and, as such, the
formation and breaking of bonds are currently out of the scope of MD (Figure
2.2).
At the beginning of a simulation, each atom is randomly assigned an initial veloc-
ity within the Maxwell-Boltzmann distribution and the initial coordinates of the
protein are provided by structural files. To begin MD, the intra- and interatomic
forces are calculated based on topology files from empirically derived force fields
(discussed in 2.2.3). The motions of each atom are calculated via Newton’s second
law of movement, and global parameters for the system (including temperature,
pressure, energy, as well as atomic details, such as velocities and positions) are
output, and the configuration is saved. The saved configuration serves as the in-
put for the next iteration of calculations. In this manner, multiple time steps and
their configurations build up a trajectory of molecular motion over time. This is
termed classical MD (cMD).
31
Figure 2.1: Schematic of molecular dynamics simulation workflow. Items in red indi-cate input for molecular dynamics software, and items in blue indicate the steps carried out bythe software itself.
2.2.2 Modelling atomic movement
As mentioned earlier, the most accurate approach to describing molecular mo-
tion is through quantum mechanics (QM). QM methods are based on the time-
indepedent Schrodinger equation, which is is powerful because it accurately de-
scribes the state of both nuclei and electrons [190]. However, the major disad-
vantaged posed by QM is that the sheer volume of computational power required
to solve QM equations for all subatomic particles within a system renders it un-
suitable for models that contain anything more than tens of atoms, such as pro-
teins [191]. Thus, it necessary to utilise other methodologies to derive molecular
protein motion on a meaningful time scale.
As the Schrodinger equation cannot simultaneously model particle velocity and
position, full-atomistic protein modelling is carried out using the key assumption
of the Born-Oppenheimer approximation. The use of the Born-Oppenheimer ap-
proximation considers the contribution of electrons to be almost negligible due to
the difference in the sizes of the nuclei and electrons and solving for the nuclear
and electronic energy components separately [192]. This allows the application of
Newton’s second law of motion, which describes the mass m and position x of a
single atom atom i as
32
mi~xi = − ∂
∂~xiEtotal(~x1, ~x2..., ~xn), i (2.1)
where the potential energy Etotal factors in the positions of all n atoms. NAMD
employs the velocity Verlet method to numerically implement the above formula
[186,193].
The major advantage of utilising classical Newtonian mechanics is the improve-
ment of the time scales on which protein models can be investigated due to the
decrease in computational power required, thus increasing the depth of conforma-
tional sampling. As an example, cMD can achieve simulations of up to microsec-
onds for systems containing over hundreds of thousands of atoms, whereas QM
achieves a picosecond simulation time for tens of atoms [191]. The trade off that
stems from using classical mechanics rather than quantum methods is a loss in ac-
curacy, however, full atomistic classical models have been optimised such that they
have yielded sufficiently accurate results to mirror experimental work [186].
As MD programs iteratively calculate the positions and velocities of atoms within
a system over a series of timesteps, the timesteps need to be of a sufficient size
such that the fastest motions are treated appropriately without destabilising the
simulation [186]. In the case of full-atomistic MD, the fastest modes are hydrocar-
bon bonds. The recommended timestep for full-atomistic MD is up to 1 fs using
unconstrained hydrocarbon bonds, as this ensures that forces and velocities are
calculated at a frequency such that the simulation will not destabilise. In this the-
sis, bonds involving hydrogen are constrained using the SHAKE algorithm [194]
where the timestep is greater than 1 fs.
2.2.3 Force fields
Whilst MD programs carry out the overall simulation of biomolecules, they are not
packaged with a description of the forces that govern the molecules themselves.
33
Force fields are a set of empirical potentials derived from a combination of exper-
imental work and density functional theory, neatly tying quantum mechanics to
full atomistic modelling [195]. Force fields provide MD programs with the physical
and chemical properties of a molecule, including bond lengths, angles, torsions, as
well as non-bonded interactions. Force fields exist for a number of biomolecular
systems, including proteins, carbohydrates and lipids. The most commonly used
force fields using for simulating proteins include CHARMM [196], GROMOS [197],
OPLS [198] and AMBER [199]. This thesis primarily utilises CHARMM for con-
sistency due to its prior application in tropoelastin-based studies [61,62].
The total energy (Etotal) of a molecular system can be captured by the following
expression
Etotal = Ebonded + Enon−bonded (2.2)
where Ebonded describes all bonded interactions and Enon−bonded describes the non-
bonded interactions between atoms. In more detail, eqn. 2.2 can be expanded to
bonded and non-bonded components [196]. The bonded equation
Ebonded =∑bonds
kb(b−b0)2+∑angles
kθ(θ−θ0)2+∑
torsions
kφ[cos(nφ+δ)+1]+∑
impropers
kψ(ψ−ψ0)2
(2.3)
considers the interactions depicted in Figure 2.2. The bond stretch term applies
the bond force constant kb to b − b0, the distance from the bonds’s equilibrium
position [196]. Similarly, the angles are given by the product of the angle force
constant, kθ, and angle’s drift from its equilibrium position, θ − θ0. The dihedral
term is given by the dihedral force constant kφ, the multiplicity of the function n,
the dihedral angle φ, and the phase shift δ. The improper term is the product of
the improper force constant ψ and movement of the out of plane angle ψ−ψ0.
34
The second part of 2.2 describes the dominant non-bonded forces that govern the
behaviour of two non-bonded atoms, i and j, where i 6= j, as
Enon−bonded =∑LJi 6=j
εij[(Rminij
rij)12 − 2(
Rminij
rij)6] +
∑coulomb
qiqjεirij
(2.4)
which consists of the Lennard-Jones (LJ) approximation of van der Waals in-
teractions, and Coulomb’s law, which describes attraction and repulsion relative
to charge and intermolecular distance. Rminijdescribes the point where the LJ
potential is zero [196].
35
Figure 2.2: Schematic of intra- and intermolecular interactions described by theCHARMM forcefield. The intramolecular interactions include bonds, angles, dihedral tor-sions and improper dihedral torsions between the given atoms. Non-bonded interactions includevan der Waals and Coulomb forces.
36
2.2.4 Solvent models
The environments in which the majority of proteins carry out their functions are
solvated, however, this can vary. For example, ion channel proteins are partially
embedded within the hydrophobic lipid membranes of cells and only certain do-
mains are solvent exposed [200]. Meanwhile, other proteins exist primarily within
the cytoplasm or organelles, carrying out their roles in aqueous environments [201].
The choice of solvent in MD is an important consideration, thus, the incorrect sol-
vent may incur unintended structural and functional changes within the molecule
that may not appropriately describe its motions [202].
The two main categories of solvent models used to simulate the physiological en-
vironment of biomolecules are implicit and explicit solvents. Implicit solvents
treat the solvent as a continuum or bulk material rather than modelling individ-
ual solvent molecules. The Generalized Born (GB) implicit solvent model is the
linearised form of the Poisson-Boltzmann equation, which describes the electro-
static potential of the solvent and its effect on the solute molecules [203]. As the
Poisson-Boltzmann is computationally expensive to solve, GB solvent is often used
as a suitable alternative. The GB equation can be expressed as
∆Gelec =−1
2(1− 1
εw)
N∑i,j=1
εijqiqj√
d2ij +RiRjexp(−d2ij4RiRj
)
(2.5)
where the change in electrostatic forces of the system, ∆Gelec, relies on the dielec-
tric of water εw, the partial charges of atoms i and j, the distance between the
atoms, d2ij, the partial charges of the atoms, q, and the Born radii of the atoms,
R [204].
GB models solve for the electrostatic forces based on the properties of the solute
where the solvent has a constant dielectric [205]. A solute atom is modelled as
a sphere consisting of different internal and external dielectrics. These spheres
37
are also termed Born radii, and represent the extent of exposure and interaction
between the solute and solvent. A large Born radius indicates little interaction,
whilst a small radius indicates high exposure to solvent. The significance of the
Born radius is that smaller radii undergo heavier short-range electrostatic screen-
ing, or in other words, more electrostatic dampening [204]. The calculations for
GB solvent are implemented in two steps: 1) calculating all Born radii of the
solute and then 2) calculating the electrostatics between solute and solvent.
It should be noted that results arising from implicit solvent simulations should be
treated with care as the full effects of water, such as viscosity and certain short-
range effects, may not be accurately captured. In some cases it has been noted that
implicit solvent can yield markedly different structural and thermodynamic effects
[206–208], however, these cases also indicate the need to assess the protein-solvent
models on a case-by-case basis. Since the majority of biomolecules are in contact
with aqueous solvent, it is imperative that at least a part of MD simulations are
conducted in explicit solvent to improve structural accuracy.
In contrast to implicit solvent, explicit solvent accounts for each solvent molecule
within a system, leading to less computationally efficient but more accurate simu-
lations. Rigid water models such as TIP3P [209,210] and TIP4P [211] (where the
number within the model name indicates the number of intermolecular interac-
tion points it contains) are commonly used as they are the least computationally
expensive out of the explicit solvent models. Furthermore, their use has resulted
in simulations that reproduce their biological counterparts with higher accuracy
relative to implicit solvent [61]. Thus, this thesis makes use of the initial speed
of GB solvent for simulating large systems to near equilibrium and uses explicit
solvent for structural refinement.
The introduction of a solvent in the place of a continuum requires periodic bound-
ary conditions to model infinite conditions rather than hard-edge effects. Hard-
edge effects describes events where particles bounce off the inner surface of the
38
box containing the protein-water system, and thus, affect the trajectories of other
atoms within the system [212]. By implementing periodic boundary conditions,
the original solvent box is considered to be a unit cell within a system consisting
of repeats of identical cells. Therefore, if a particle, solvent or otherwise, exits
the box on one side it will re-enter on the opposite side, effectively preserving the
mass and velocity of atoms that cross these boundaries (Figure 2.3). A compli-
cation that may arise is that if the periodic box is too small then the molecule of
interest may come into contact with itself in a manner that would not normally
occur. Thus, the boundaries of the box need to be larger than one might initially
suspect, leading to increased computational requirements to cope with the extra
number of water molecules in the system.
Figure 2.3: Representation of boundaries for simulating molecular systems. Thesimulation within the central cell is modelled such that if a particle exits the bound-ary (blue arrow out) it reappears on the other side (blue arrow in). Image from:http://isaacs.sourceforge.net/phys/pbc.html.
2.3 Replica exchange molecular dynamics
A potential pitfall of cMD is a lack of adequate conformational sampling. Molec-
ular dynamics simulations are theoretically ergodic, however, this is an apprecia-
39
bly difficult concept to realise unless a simulation is run for an incredibly long
period of time. When resources and time are limited, a common occurrence is
that molecules become stuck in local minima due to their complex conformational
landscapes [213]. Thus, a simulation might sample many microstates within a par-
ticular conformational basin, but perhaps fail to sample other biologically relevant
relevant conformations. Crossing the energy barriers required to escape confor-
mational basins using cMD requires substantial increases in time and resources,
especially for large proteins, sometimes rendering it infeasible to efficiently explore
the conformational landscape of particular molecules in this manner.
Figure 2.4: Schematic of replica exchange molecular dynamics. Four discrete replicasat different temperatures undergo m steps of MD before an exchange is attempted betweenneighbouring replicas. After the success or rejection of an exchange, the process is repeated.Image by Christopher Rowley, 2016.
Replica exchange molecular dynamics (REMD) elegantly overcomes this problem.
During REMD, a number of replicas of the same protein are exponentially dis-
tributed over a range of temperatures and are run simultaneously (Figure 2.4).
More temperature bins are clustered around the lower end of the temperature
range to ensure adequate sampling at biologically relevant temperatures, however,
temperatures beyond the biological are included to facilitate sampling of rare con-
formations. In order to deeply sample the possible conformations of a protein,
exchanges between the replicas at neighbouring temperatures occur [214]. These
exchanges are evaluated by the Metropolis-Hastings algorithm to generate a sta-
40
tistical ensemble of molecular conformations within a particular temperature or,
in other words, the probability of sampling that conformation at the specified
temperature. These ensemble methodologies are appropriate for considering the
structures of flexible proteins such as tropoelastin due to their ability to sample
a large number of conformations [61, 62]. In the case of REMD, the Metropolis-
Hastings algorithm evaluates the temperatures of neighbouring replicas at specified
time steps [215]. If the overlap between temperatures is sufficient then structures
are exchanged with each other and the simulations continue running at their new
temperatures. The probability P of an exchange transitioning from replica x to
x′ can be described by
P (x→ x′) =
1, if ∆ ≤ 0
exp(−∆), if ∆ > 0′(2.6)
where
∆ = [Ex − Ex′ ](βi − βj) (2.7)
describes the energy of two replicas, x to x′ and β is the reciprocal of the thermo-
dynamic temperature of the system, kBT , and can be written accordingly as
β =1
kBT(2.8)
Generally, an exchange attempt frequency (EAF) on the order of 1 - 5 ps, with an
acceptance rate of 20 - 30 % is deemed optimal for REMD, as enough exchanges
have occurred for a number of structures to have sampled different temperatures
[216]. Therefore, it is imperative that the simulation is set up with enough replicas
for sufficient overlap to facilitate appropriate sampling.
Although powerful, REMD is computationally expensive as a large number of
41
replicas of the same molecule are simulated in parallel. In particular, the use of
explicit solvent greatly increases the number of replicas required for efficient ex-
changes between replicas. Therefore, extended implicit solvent REMD followed
by equilibration in explicit solvent using cMD, has been used to great effect when
modelling tropoelastin [61, 62], and is applied to multiple chapters of this the-
sis.
2.4 Normal mode analysis
A central dogma of biochemistry is that the structure of a protein dictates its func-
tion [217]. Normal mode analysis (NMA) aims to capture the collective motions of
molecules by using the global shape of the molecule, rather than the microstates
(e.g. local energy minima) explored by MD (Figure 2.5).
Figure 2.5: Normal mode analysis of of p38 MAP kinase. The top schematic is ananisotropic network model of p38, showing the Cα network and regions of large (red) and small(red) relative displacement. The bottom panel depicts the directionality of the modes thatcontribute to most to the collective motions of the p38. Adapted from [217].
42
2.4.1 Elastic and anisotropic network models
Elastic network models (ENMs) can be coupled with NMA for computational
efficiency. A subset of ENMs are anisotropic network models (ANMs), which
model proteins as a highly interconnected network of Cα atoms or “nodes” of equal
masses coupled to springs [218]. A key assumption of ANMs is that the protein
oscillates around its equilibrium point - the lowest energy state - as increases in
energy will eventually drive it back to this conformation.
The harmonic potential of the spring between two nodes, i and j, is given by
V =γ
2(rij − r0ij)2 (2.9)
where γ is the spring constant, and rij and r0ij represent the actual and equilibrium
distances respectively. The second order partial derivatives of 2.9 are
Hij =δ2Vijδqiδqj
(2.10)
where q represents the x, y and/or z orientations of nodes i and j. The elements
Hij describe the 3N x 3N submatrices of a Hessian matrix, H
H =
Hii Hij
Hji Hjj
(2.11)
Expansion of the Hessian using the potential from 2.9 and accounting for the
distance between the nodes i and j yields 3N − 6 non-zero eigenvalues and 6 zero
eigenvalues. Eigenvalues of zero that give the rigid body motions of the molecule,
such as rotations and translations, and have no effect on the potential energy of
the molecule, whereas positive and negative eigenvalues indicate the local energy
minima and maxima respectively [218].
43
The kinetic energy of the system is not accounted for in the Hessian. This is given
by
Md2∆q
dt2+ H∆q = 0 (2.12)
where M is the matrix containing the masses of all Cα, H is the Hessian, and q
is the equilibrium configuration. Solving 2.12 yields a 3N -dimensional vector, uk,
containing vector ak, which consists of amplitude and phase factor, and ωk, the
frequency of the mode of motion, is
uk(t) = akexp(−iω2k) (2.13)
which when substituted into equation 2.15 becomes the generalised eigenvector
equation of
Huk = ω2kMuk (2.14)
where H is now weighted by M, such that its eigenvectors can be solved to give
the normal modes of the system. The energy of each mode, k, is proportional to
its eigenvalue (i.e. frequency), λk = ω2k. The implication of this is that the slowest
modes represent larger displacements, and hence, the motions that are the most
likely to dominate global motions of a molecule [218].
As previously mentioned, an assumption of ANMs is that proteins oscillate around
their equilibrium points. Therefore, a requirement of ANM analysis is an equilib-
rium structure is used as input, rather a structure from a local minimum. This
thesis utilises a structure of tropoelastin derived from multiple rounds of REMD,
as described in [61, 62]. Here, this is assumed tropoelastin’s equilibrium confor-
mation as it is the most populated cluster’s lowest energy structure from the
structural ensemble obtained at 310 K.
44
2.5 Molecular docking
The majority of proteins function in concert with other molecules to elicit bi-
ological effects. A substantial portion of modern biochemistry has focused on
elucidating the binding mechanisms between proteins and their partners to better
understand key molecules in biological processes. Traditionally, X-ray crystallog-
raphy, cryo electron microscopy and nuclear magnetic resonance have been used
to assess the conformation and binding partners of proteins [219]. Although these
techniques have been successful in many cases, they are not without their draw-
backs, as they are lengthy and often limited in the size of the proteins they can
analyse. Furthermore, all three techniques have had only mild success when exam-
ining highly dynamic intrinsically disordered proteins. Therefore, using computa-
tional methodologies to complement these technologies has increased in popularity
since the conception of MD.
The determination of a large number of structures through the aforementioned
techniques has allowed for the creation of molecular docking software. The ratio-
nale behind molecular docking is that once a protein structure has been acquired,
it can be fit against its binding partner based on a combination of geometry and
favourable interaction energetics. The simplest manner of docking one protein to
another is by restraining one protein in space and rotating its binding partner to
calculate the best fit in terms of energetics such as electrostatic and short-range
non-bonded interactions (Figure 2.6), however, this method does not generate a
wide variety of unique structures [220,221].
45
Figure 2.6: Schematic of protein-protein docking. Two proteins are docked to eachother via rotation to obtain the best combination of energies, including the van der Waals andelectrostatic energies. Image from BioExcel.
Two major limitations for current docking methodologies exist. As previously
hinted, many docking programs regard molecules as rigid entities when, in reality,
their structure varies on local and global levels. As fitting together two rigid pro-
teins would be highly forced in terms of minimising steric and electrostatic clashes,
rigid-body docking maybe be inconclusive or yield biologically irrelevant results.
The second limitation is of docking programs is that they rarely factor in that the
sites of interaction between molecules may have arisen from ambiguous data, such
as mutagenesis, bioinformatic predictions, and NMR titrations, rather than from
high resolution structural analysis. Under normal circumstances, driving docking
purely with ambiguous data skews the software towards false positive binding and
can result in conformations that are biologically irrelevant similarly to rigid-body
docking.
Taking these limitations into consideration, this thesis utilises HADDOCK 2.2
(Highly Ambiguity Driven protein-protein Docking) [222]. The advantages of
46
HADDOCK are that it allows semi-flexible docking to incorporate small move-
ments in both the protein backbone and side chains provide a better fit between
molecules. Furthermore, it randomises any user-defined ambiguous restraints to
decrease bias towards a particular set of restraints that might be ambiguous. The
docking rounds of HADDOCK are:
1) Thousands of models are generated by rigid-body docking within a relatively
small time frame using rotational and translational moves. A user-defined num-
ber of the best molecules are taken to the next stage. The scoring function to
determine the best conformations is
Score = 0.01Eair + 0.01Evdw + 1.0Eelec + 1.0Edsolv − 0.01BSA (2.15)
where Eair is the energy of the ambiguous interaction restraints (such as those
derived via NMR), Evdw is the van der Waals energy, Eelec is electrostatic energy,
Edsolv is desolvation energy.
2) The next stage involves semi-flexible refinement of torsion angles for refinement
of the interactive regions. This can include the side-chains or the backbone, or
both. Torsion angles are sampled to allow for small (2 A) conformational changes.
All models from this stage are utilised in the next stage.
Score = 0.1Eair + 1.0Evdw + 1.0Eelec + 1.0Edsolv − 0.01BSA (2.16)
where the energy terms are as described above, and also include BSA, which is
buried surface area.
3) Final refinement occurs in explicit solvent to establish electrostatics and de-
termine residue-residue contacts. Explicit solvent can be either the TIP3P water
model or DMSO to mimic a membrane.
47
Score = 0.1Eair + 1.0Evdw + 0.2Eelec + 1.0Edsolv (2.17)
2.6 Machine learning
Machine learning refers to the use of algorithms to predict events or recognise
patterns within data. It has been used in a wide range of scientific (and non-
scientific) applications, such as predicting protein disorder [223], recognition of
nuclear sequence elements [224], and diagnostic medical imaging [225]. Machine
learning can be broadly categorised into supervised and unsupervised learning.
The goal of supervised learning is to either conduct regression or classification
on data where the data is labelled with the outcome of interest. Unsupervised
learning is primarily used to detect patterns where the outcome may be undefined
and is achieved through clustering or dimensionality reduction.
When conducting supervised machine learning on a data set, the data is split into
train (“known cohort”) and test (“unknown cohort”) sets. The model is built up
on the training cohort and is subsequently applied to the test cohort. The train
set must be sufficiently large and variable enough such that the model robustly fits
the data, however, the test set cannot be too small, otherwise it will not accurately
capture the performance of the model. By measuring the fit of the model to the
test cohort, it can ascertained whether the model can be considered sufficiently
predictive before applying it to completely fresh ”real world” data. Care must
be taken to prevent the overfitting of the model to the train set, for example, by
applying regularisation, or else it may not be able to predict unseen data.
2.6.1 Logistic regression
Regression is a commonly used analysis that, in its simplest linear form, describes
the relationship between two variables, x and y, as:
48
y = β0 + β1x+ ε (2.18)
where β0 is the y-intercept, β1 is the slope, and ε describes the random error
component. Linear regression involves solving for the coefficients, β0 and β1, that
provide the best model to fit the data. As this form of regression describes a
continuous linear curve, it is not appropriate for binary categorisation problems
such as those described in this thesis. On the other hand, logistic regression is a
heuristic for binary and multicategorical classification problems. At its simplest,
it takes the form:
P (y − 1|x) =eβ0 + β1x
1 + eβ0 + β1x(2.19)
where P (y = 1|x) describes the probability of x occurring, where β0 and β1 are
the same as in linear regression. The value of the β1 coefficient determines the
probability threshold that is used to classify the data into discrete classes, such
that if β1 = 0.5, that when P > 0.5 for an observation, then that observation will
be classed as class B rather than class A. The sigmoidal function described above
can be transformed into a log-odds output by applying the logit function:
logP
1− P(2.20)
to determine the coefficients. By log transforming the probability of x, the coef-
ficients can be derived via the estimation of the maximum likelihood, thus fitting
the original sigmoidal curve to the data.
49
Figure 2.7: Comparison of linear and logistic regression. Linear regression describesdata that fall out of the 0 - 1 range of a binary prediction, unlike the sigmoidal curve of thelogistic function which asymptotically approaches 0 and 1.
The output of logistic regression is a log-odds ratio that indicates whether a vari-
able has an effect on predicting the outcome or not. The log-odds ratio can be
interpreted as how much the variable of interest increases or decreases the odds of
the outcome occurring. Importantly, the log-odds of each variable are in the same
units as the outcome, rendering logistic regression highly interpretable.
2.6.2 Elastic net regularisation
This thesis utilises elastic net, a type of regularisation that adds a penalty term to
the regression to prevent overfitting. Elastic net is a linear combination of Lasso
(L1) and ridge (L2) regularisation penalties, with the formula:
i=1∑n
(y − y)2 = λ2β21 + λ1 |β1| (2.21)
where the sum of the squared residuals (y and y) that determines the best fit of
the model is penalised by both the Lasso and ridge methods. Lasso regularisa-
tion selectively shrinks the slopes of the variables using the absolute magnitude
of the slope, whereas ridge regression shrinks the variables equally by using the
square of the slope. The hyperparameter λ is solved for by a grid search in ma-
chine learning that provides the best model with respect to predictive power and
50
Chapter 3
Allysine modifications perturbtropoelastin structure andmobility on a local and globalscale
This chapter has been published as:
Ozsvar, J., Tarakanova, A., Wang, R., Buehler, M. J., Weiss, A. S., “Allysinemodifications perturb tropoelastin structure and mobility on a local and globalscale”. Matrix Biology Plus, 3(6), pp.800-809.
52
3.1 Introduction
Elastin is the major elastic extracellular matrix (ECM) protein that is crucial for
the mechanical resilience of elastic vertebrate tissues, including the skin, lungs
and cardiovascular system [96]. The elastin polymer predominantly comprises its
soluble subunit, tropoelastin [156], which is secreted by elastogenic cells and un-
dergoes hierarchical self-assembly to form elastin fibers [226]. Assembly is initiated
after secretion to the cell surface, where tropoelastin molecules rapidly form small
spherules through a process termed coacervation [80, 227]. These spherules are
then deposited onto the microfibrillar scaffold within the ECM [79] where they
assemble into robust, insoluble, and extensively cross-linked fibers [96].
The cross-linking of elastin is facilitated by one or more members of the family of
lysyl oxidase (LOX) enzymes and commences prior to deposition onto the microfib-
rillar scaffold [228]. As an amine oxidase, LOX modifies the ε-amino side chain of
lysine to an α-aminoadipic-δ-semialdehyde, resulting in an allysine residue [5,229].
Allysines are capable of undergoing spontaneous condensation with either the ε-
amino groups of lysines or the semialdehydes of other allysines, forming linear
lysinonorleucine (LNL) or allysine-aldol (ALL) cross-links respectively [230]. LNL
and ALL are able to condense further, forming larger, more complex cross-links
such as desmosine or isodesmosine [231]. These four types of links are the most
abundant cross-linked species within the mature elastin fiber [106,232].
The contribution of allysines to elastin assembly, other than purely their ability to
form cross-links, is currently unknown. Elastin assembly is a finely tuned process
relying on the intrinsic properties of tropoelastin, including the association of its
hydrophobic domains and positioning of its cross-linking domains (Figure 3.1),
both of which are dependent on its molecular arrangement and flexibility [36, 37,
111,233]. The robust balance between molecular arrangement and function can be
perturbed by mutations that result in structural changes within both tropoelastin
molecules and elastin fibers [32,61,96,234]. Thus, it is probable that the presence
53
of allysines can also affect the conformation of tropoelastin, and in turn, influence
coacervation and subsequent higher order assembly processes. To explore this,
knowledge of allysine locations through cross-linking sites is needed to appreciate
the spatial arrangement of molecules during coacervation and the overall steps by
which elastin fibers are formed.
Figure 3.1: Schematic representation of tropoelastin domains and allysines exploredin this study. The hydrophobic and cross-linking domains are represented by the black andwhite boxes respectively. The N- and C-termini are denoted on either side of the schematic.The orange circles mark the domains containing allysine modifications in this study. The cor-responding allysine (in orange) and their flanking amino acid sequences are depicted. Adaptedfrom [122].
The precise cross-linking patterns within native elastin are only partially under-
stood because the highly repetitive sequence of tropoelastin has hampered the
mapping of specific cross-linking sites. More recently, utilization of enzymatic
cleavage and mass spectrometry of native elastin has identified candidate regions
involved in cross-linking [232]. Corroborating evidence for the involvement of these
sites arises from further studies probing the in vitro cross-linking of synthetic re-
combinant human tropoelastin [53,235]. Despite advances in pinpointing cross-link
locations, tropoelastin is a flexible molecule that retains its canonical shape [110],
where this flexibility has rendered it difficult to accurately map these cross-linking
sites to the tertiary structure of the molecule and impeded the use of traditional
high resolution techniques to resolve its entire global structure [85]. As such, the
only experimental shape data for tropoelastin are low resolution small angle x-ray
scattering and small angle neutron scattering structures that comprise ensembles
whose high resolution components were recently identified through molecular dy-
namics [22, 32,61].
54
Recently, the full atomistic structure of tropoelastin was detailed as an ensemble
using extensive replica exchange molecular dynamics (REMD) simulations [111].
The structure correlates remarkably well with the previous low-resolution struc-
tural data, and its secondary structural features are in accord with those indi-
cated by circular dichroism and molecular mutation studies. These highlight the
power of molecular dynamics (MD) in modeling flexible molecules. Examination
of the molecule through normal mode analysis (NMA) also gave insight into the
predominant global motions of tropoelastin that are likely to contribute to self-
assembly [111, 234], providing a molecular basis with which the effects of modifi-
cations within the scope of elastin assembly can be probed.
As the relatively flat energy landscape of flexible molecules, such as tropoelastin,
allows them to transition between energy minima and take on a multitude of con-
formations, it is most appropriate to analyze flexible molecules as a structural
ensemble. Principal component analysis (PCA) is being increasingly used to pin-
point and link structural variation to functionality within protein ensembles [60].
On this basis, PCA has been applied to wild type (WT) tropoelastin and has
highlighted that despite its nature as a flexible molecule, its overall architecture
fluctuates within a molecular ensemble that biased toward its canonical struc-
ture [110].
Here, it was investigated whether allysine modifications are capable of altering the
structure and dynamics of tropoelastin molecules. REMD was conducted to sam-
ple the conformational landscape of tropoelastin to understand the consequences
of single and multiple allysine modifications. Ensemble-based methods such as
PCA were used to describe changes in overall structural variance and flexibility,
local secondary structural changes and the mobility of specific residues. The in-
trinsically accessible molecular motions within the allysine containing molecules
and investigate the contribution of salt bridges to the molecular changes were
also examined. These findings reveal that allysines do more than simply serve as
static precursors to cross-links in elastin assembly, by contributing to changes in
55
molecular structure and dynamics.
3.2 Methods
3.2.1 Allysine parameterisation
The methods for generating the WT model of tropoelastin have been previously
described [61]. For single allysine modifications, residues 353 and 507 were selected
for modification based on previous evidence for their involvement in and multiple
references for cross-linking (Table 3.1). The modified molecules ALK353 and
ALK507 were used to probe for changes tropoelastin may undergo subsequent to
a single allysine modification. To understand the effect of multiple allysine modifi-
cations, residues from the aforementioned residues were simultaneously modified,
as well as sites at 150, 199 and 239 in the protein, where these sites were based on
prior characterization of native and synthetic elastin (Table 3.1) to give 5ALK.
These changes were restricted to sites where multiple publications point to the
sites of allysines. The rationale behind using 5ALK was that elastin is extensively
cross-linked, and this allowed us to explore a representative construct where the
majority of tropoelastin molecules contain more than one modification.
Residue Domain References
150 10 [53,235]199 13 [85,110,111,235]239 15 [111,235]353 19 [53,85,110,111,235]507 25 [53,85,110,111,235]
Table 3.1: Summary of lysines residues converted to allysines in this study, theirrespective domains, and references to supporting studies.
The CHARMM22 force field [196] was selected due to its use in previous tropoe-
lastin simulations [61]. Allysine was applied as a patch using parameters from
the aldehyde functional groups of acetaldehyde and propionaldehyde from the
56
CHARMM General Forcefield (CGenFF) [236] using Visual Molecular Dynamics
(VMD) software [237].
3.2.2 Molecular dynamics input
The modified molecules were first simulated with NAMD [186] using implicit sol-
vent replica exchange molecular dynamics (REMD). The implicit solvent step was
intended to accelerate sampling time, as the water molecules of explicit solvent are
a major limitation in REMD. Each molecule had a total of 48 replicas distributed
exponentially over a temperature range of 280 - 480 K, giving an exchange accep-
tance frequency between 0.2 - 0.3. Exchanges were attempted every 1 ps and were
accepted based on the Metropolis criterion described in previous REMD stud-
ies [61]. Non-bonded forces were applied with cut-off of 16 A, a switch distance of
14 A and a pair distance list of 18 A. Implicit solvent was simulated using a di-
electric constant of 80, ion concentration of 0.15 M, and an α cut off of 15. A total
of 5.2 ns was simulated per tropoelastin molecule, with ∼ 240 ns total simulation
time for the entire ensemble across all temperatures of each molecule.
The root mean square deviation (RMSD) of atomic fluctuation was used as an
indicator of structural convergence. Upon reaching convergence, 1000 structures
were extracted from the last 2 ns of the 310 K replica and clustered by k-means
analysis with the MMTSB toolkit [238] using a RMSD of 5 A. MMTSB was
used to determine the distribution of clusters generated by k-means analysis and
the most representative structures of the most populated clusters. The ProDy
package [217] was utilized to construct anisotropic network models (ANM) and
principal component analysis (PCA) on the ensemble of each molecule. In-house
Tcl/Tk, R and Matlab scripts were used for all other analyses.
The average structure of the most populated cluster for each molecule was fur-
ther equilibrated in explicit aqueous solvent. A cubic box containing ∼ 100,000
water molecules was used to solvate tropoelastin, while a padding distance of 20
57
A from the molecule’s edges was used to ensure the protein would not contact
itself through the periodic boundaries of the water box. The box was neutral-
ized with sodium and chloride ions at the physiological concentration of 0.15 M.
After a brief minimization of the structure with the conjugate gradient method
and equilibration in a constant volume system, the molecules were simulated with
classical molecular dynamics in constant pressure systems using a time step of
2 fs. Equilibration was assessed using the RMSD of each structure, resulting in
equilibration times of 100 ns, 130 ns and 230 ns for ALK507, ALK353 and 5ALK
respectively. A temperature of 310 K was maintained by Langevin dynamics, with
a damping coefficient of 1/ps. A constant pressure of 1 atm was applied using the
Nose-Hoover Langevin barostat with a period of 200 fs and a decay of 100 fs. For
simulating non-bonded parameters, a cutoff of 12 A was used, with a switch dis-
tance of 10 A and a pair distance list of 13.5 A. Electrostatics were regulated by a
Particle Mesh Ewald summation with a grid spacing of 1 A. The resultant struc-
tures were used to conduct NMA using elastic network models with ProDy [217]
and VMD [237] using the NMWiz plugin.
3.3 Results
3.3.1 Structures of single allysine-modified tropoelastin
Ensemble analysis was conducted on 1000 structures that were derived from the
last 2 ns of REMD simulation per molecule. The structural variance within the
ensembles was examined through the contribution of the top 20 principal compo-
nents (Figure 3.2, A-C). It had been previously noted that 42% of the variance
in WT is captured by the top principal components [62], and here, the two prin-
cipal components accounted for 31-41% of the structural variance of the modified
molecules. The sum of the variance of the top three principal components of these
molecules ranged between 40-52% in comparison to 53% of the top three princi-
58
pal components of the WT. Additional principal components would be required
for equating structural variance between allysine modified molecules compared to
WT, so this indicated that the allysine containing molecules exhibited a higher
degree of structural variability relative to WT when examined through these top
principal components.
In addition to PCA, k-means clustering of proteins was conducted based on similar-
ity of the root mean square deviation (RMSD) of atomic coordinates in Cartesian
space. This type of clustering has previously demonstrated that WT favored a
specific configuration over other structures within its ensemble [62]. There was a
similar tendency with ALK353 and ALK507, where this trend changed with mul-
tiple allysine modifications (Figure 3.2, D-E). On this basis, 5ALK displayed a
relatively even distribution of structures throughout its top nine clusters, and was
most evident through clusters 3 to 9, which each comprised between 68 and 81
structures (Figure 3.2, F) and was consistent with a model where these structural
clusters contributed to a comparable extent in 5ALK.
The relevance of the representative structures from the k-clusters was assessed by
overlaying onto 3D PCA plots that describe PC1-PC2-PC3 space (Figure 3.2,
G-I). The most representative structure from the top 4 most populated k-clusters
(depicted in red) within each ensemble were located within dense clusters on the
PCA plots. This was also observed with WT [62]. Representative structures from
the three least sampled ensembles were also overlaid onto the PCA plot, where
they were found to reside in less populated areas. Most of these clusters were
distinct, confirming that the 3 principal components neatly discretized structural
differences between molecular k-clusters.
59
Figure 3.2: Variance of the top 20 modes of principal component analysis of thestructural ensembles for: A) ALK353, B) ALK507, and C) 5ALK. Distribution ofstructures arranged from most to least populated k-clusters using RMSD for: D) ALK353, E)ALK507, and F) 5ALK. The most representative structures from the k-clusters are overlaid ontoPC1-PC2-PC3 space for G) ALK353, H) ALK507 and I) 5ALK. Representative structures areclassed as either from most populated k-clusters (red squares) or least populated k-clusters (bluesquares).
The 2D PCA plots showed less clustering resolution as evidenced by candidate
structures from the least sampled conformations that resided in areas similar to
the most accessible structures within PC1-PC2 space (data not shown). This
was expected because the amount of structural variance of the allysine containing
molecules accounted for by PC1-PC2 differed when compared to WT, as discussed
above. This pointed to the need to proceed with at least the 3 top PCA compo-
nents.
60
3.3.2 Converting lysine to allysine perturbs the global struc-
ture and intrinsic dynamics of tropoelastin
The sum of principal component modes was used to assess the mobility of allysine-
modified tropoelastin ensembles by calculating the square displacement in Carte-
sian space of all Cα carbons in the protein backbone (Figure 3.3, A-C). As only
the top six principal component modes dominated the structure [62], the sums
of the top 2, 3, 6 or 20 principal component modes were compared. A minimum
of three PCA modes describes the allysine containing ensembles were identified.
Although the combination of the top 2 and 3 modes substantially overlapped, the
top 2 modes differed from the overall trend in combinations of higher modes in
some domains (Figure 3.3, A-C). This feature differed from the WT ensemble,
where there is good overlap of the top 2 and 3 modes [62].
Figure 3.3: Normalized square fluctuations of the backbones of: A) ALK353, B)ALK507, and C) 5ALK depicting patterns using 2, 3, 6 or 20 principal componentmodes. Arrows point to regions of marked disparity between fluctuations based on 2 or moremodes. Heat map comparisons of the top 6 principal component modes are shown for: D)ALK353-ALK507, E) 5ALK-ALK353, and F) 5ALK-ALK507.
Based on this, structural similarities between WT and allysine modified tropoe-
lastin were examined by considering the overlap of the top 6 PCA modes between
the different ensembles. Only a mild correlation between WT and singly-modified
61
tropoelastin was noted (Figure 3.4, A-C). Principal components 1, 3 and 5 of
WT respectively correlated with the principal components 1, 3 and 1 of ALK353
(34 – 45%) (Figure 3.4, A), whereas principal components 1 and 2 of ALK507
correlated with principal components 2 and 3 of WT (29 - 40%) respectively (Fig-
ure 3.4, B). In contrast, principal components 3 and 5 from 5ALK weakly (>
22%) correlated with 2 and 6 from WT (Figure 3.4, C). As the PCA modes
relate to the overall architecture of a molecule, this indicated a potential shift
away from the canonical structure. Comparisons between the allysine-modified
molecules also revealed low similarities (< 45%) between principal components of
the ensembles, indicating that the locations of the allysine modifications led to
different structural consequences (Figure 3.3, D-F).
Figure 3.4: Heatmap comparisons of the top 6 principal component modes for WTwith A) ALK353, B) ALK507, and C) 5ALK. Normal mode analysis images that combinethe 6 most accessible modes of the most representative structures for: D) ALK353, E) ALK507,and F) 5ALK. Directionality and magnitude of the modes are depicted in orange. The gradientbar depicts mobility, where red corresponds to the most mobile regions. Black arrows indicatedomains that act as hinges.
This model was supported by the addition of allysines which shifted the structural
62
ensemble from the WT structure (Figure 3.5, A-C). The extent to which average
structures departed from WT depended on the location and extent of these mod-
ifications. For example, ALK353 was globular and displayed a C-terminal foot
region that pointed down from the protein’s center (Figure 3.5, A), whereas
ALK507 was slightly more compact along its vertical axis and displayed a prefer-
ence for a C-terminus that was raised toward the center of the protein (Figure
3.3, B). Relative to WT, 5ALK revealed a compacted C-terminus and an extended
molecular body (Figure 3.5, C).
63
Figure 3.5: Representative structures from the most and least sampled k-clustersfor A) ALK353, B) ALK507, and C) 5ALK.
64
To explore how these changes in global molecular shape affected the motions intrin-
sically accessible to the molecule, NMA was employed using anisotropic network
models (ANMs). ANMs are useful in explaining global molecular motion as they
are reliant solely on the architecture of the molecule, rather than localized sec-
ondary structures, and encompass those motions most accessible to WT including
a twist in the N-terminus with a scissors motion in the C-terminus [61,62]. NMA
was used to describe representative solution structures from the most populated
k-cluster of each ensemble. On combining the lowest 6 normal modes of move-
ment, the scissors-twist motion was observed in ALK507 but not in ALK353 or
5ALK (Figure 3.4, D-F). I propose that the scissors-twist motion relies on the
C-terminus adopting a configuration with two protruding feet as seen in WT and
ALK507.
ALK353 displayed N- and C-terminal flexibility about domain 20, which acted as
a hinge, with additional C-terminal pivot on the plane orthogonal to domain 20
(Figure 3.4, D). The N-terminus ALK507 demonstrated flexibility about a hinge
formed by domain 11 and was the only modified molecule that presented a scissors
twist in its foot region (Figure 3.4, E). The N- and C-termini of 5ALK moved
about domain 19, which acted as a hinge in this molecule (Figure 3.4, F). Taking
into account that allysine-containing structures exist soon after LOX modification,
this suggests a combination of these movements contributes to assembly.
3.3.3 Allysines alter the conformational sampling of do-
mains
The fluctuations of the protein backbone in WT and allysine modified tropoelastin
for the 6 top principal component modes were compared. ALK353 and ALK507 de-
parted from WT with markedly differing regions of high and low mobility (Figure
3.6, A-C) [61] with decreases in the overall magnitude of fluctuation throughout,
accompanied by dampened mobility in domains 19 and 25 which comprised the
65
allysine modifications in ALK353 and ALK507 respectively.
In contrast, the magnitude of fluctuations in the 5ALK backbone had increased
over WT, meaning that the molecule was more flexible (Figure 3.6, C). The
mobility pattern within 5ALK was closer to that seen for WT than to either
ALK353 or ALK507, where high mobility domains within WT were also mobile
in 5ALK, specifically domains 2-5 (residues 1-51), domains 10-19 (residues 133-
357) and domains 21-23 (residues 413-445) (Figure 3.6, A-C). As tropoelastin
requires multiple allysines prior to forming elastin, 5ALK serves as a model of
functionally significant oxidized tropoelastin in elastogenesis, so its similarities in
mobility to WT were considered salient and are sequentially considered here.
Domains 2-5 are located at the head of both WT and 5ALK through domain 6
where they are accompanied by salt bridges [36, 61]. The importance of domain
6 in elastogenesis is demonstrated by the formation of markedly altered fiber
morphology when tropoelastin’s sole aspartate is mutated to alanine [36]. This
suggests that the role of domain 6 in elastin maturation is to hold domains 2-5
in place, potentially for head-to-tail assembly as previously proposed [59]. The
current data are consistent with these findings because domain 6 remained stable
relative to its flanking domains (Figure 3.6, A-C).
66
Figure 3.6: Protein backbone square fluctuations as a combination of the top 6principal component modes for: A) ALK353, B) ALK507, and C) 5ALK. Lysines andallysines are depicted as red dots.
It was noted that domains 10-19 undergo high conformational sampling in both
WT and 5ALK, which is credited to contributions to entropy-based extensibility
by this part of the molecule [239]. In 5ALK these regions are mobile relative to
the N-terminal half of the molecule.
Domains 21-23 display high fluctuations within WT and 5ALK (Figure 3.6, C).
Their flexibility as seen in previous molecular dynamics studies is proposed to
facilitate cross-linking [235]. Experimentally these same domains were identified
as cross-linking hot spots in vitro [85,110] and found as cross-links in native elastin
67
[111,235]. The current data help to explain these findings, by identifying that the
mobility of domains 21-23 enhances LOX-mediated modification and subsequent
cross-linking.
Also relevant is that domain 36 in 5ALK (Figure 3.6, C) undergoes high con-
formational sampling similarly to WT [61,62]. Consistent with observations here,
this domain contains a cell-interactive region [121, 139] and has been established
as particularly flexible in previous elastic network models and by NMA [22, 240].
It has been proposed that the C-terminus plays a role in positioning the molecule
during aggregation and eventually cross-linking [241], features which are in accord
with higher regional mobility as seen here.
The number of lysines and allysines located in regions of high displacement (80 -
85%) is similar to that reported for WT (83%) (Figure 3.6, A-C), as expected
for a model where these residues continue to sample the conformational landscape
in order to facilitate further modifications and ensuing cross-links.
3.3.4 Allysines facilitate changes in salt bridges that con-
tribute to structural variance and lead to local sec-
ondary structural changes
To explore mechanisms behind the heightened flexibility and conformational changes
caused by allysine modification, the presence of salt bridges was investigated. It
is well accepted that tropoelastin’s three negatively charged residues (D72, E345
and E414) are involved in maintaining its overall structure [36, 61, 233]. By con-
verting lysine to allysine, the positive charge is lost, rendering them incapable of
forming salt bridges. On this basis, changes in salt bridge binding were noted
that impacted upon the structural modifications, through PCA and displayed al-
tered conformational sampling. WT was capable of forming multiple salt bridges
through all three negatively charged residues for a substantial proportion of the
68
time sampled (Figure 3.7, A). In contrast in 5ALK, not only did the salt bridge
patterns change, but salt bridge longevity decreased (Figure 3.7, B). Consid-
ering the higher magnitude of protein backbone fluctuation displayed by 5ALK
(Figure 3.6, C), it is likely that these conversions from lysines to allysines re-
leased tropoelastin from a more stable configuration and accordingly conferred
increased mobility. This model is consistent with the previous observation that
there was less overall dominance of a single cluster in the entire structural ensem-
ble (Figure 3.2, F), because less salt bridges would lead to a regional freeing of
the molecule and increase its ability to locally sample other states.
69
Figure 3.7: Salt bridge contact maps for: A) WT and B) 5ALK, where salt bridgepresence and longevity are indicated by black bars. The percent transient α-helicalcontent of WT and 5ALK is shown in C) specific domains and D) the entire molecules. E)Displays the solvent accessible surface areas of hydrophobic domains globally. Distance mapsare shown for lysines and/or allysines in: F) WT and G) 5ALK, where the gradient depictsincreasing time spent in close proximity (4.7 A) to nearby lysines and/or allysines.
The local secondary structural effects of allysines were examined within their re-
spective domains. It has been previously noted that changes in α-helicity within
domains have a tendency to predispose them to stiffness and alter the collec-
tive motions of the molecule [61]. The α-helical content of domains 19 and 25
exhibited substantial changes (Figure 3.7, C), yet the overall α-helicity of the
molecules did not significantly differ (Figure 3.7, D). The lack of change at a
global secondary structural level was consistent with a requirement for flexibility
70
in self-assembly [32, 52]. Additionally, when considering the previously discussed
differences in overall tropoelastin mobility, the maintenance of global structure
highlighted potential differences between disease-associated mutations [36,61,233]
and natural functional modifications.
3.3.5 Hydrophobic solvent accessible surface area decreases
in the presence of allysines
The total solvent accessible surface area (SASA) of the hydrophobic domains was
calculated for all molecules. A decrease from the previously published SASA of
196.24 nm2 for WT [61] to 168.02 - 171.20 nm2 for the modified molecules was
observed (Figure 3.7, E). This compared with changes in the accessibility of
hydrophobic regions at the same scale of those observed for two previously modeled
tropoelastin mutations, D72A and G685D [61]. As the exposure of hydrophobic
regions is known to drive coacervation, this would be explained by decreased salt
bridges and increased mobility of the molecule, which allows hydrophobic regions
to bury further inside the modified molecules than seen in WT.
3.3.6 Distances between residues decrease upon allysine
modification
In addition to forming intermolecular cross-links, tropoelastin is also known to
form multiple intramolecular cross-links [111, 235]. Approaching positive charges
on juxtaposed lysines tend to repel, so it is logical that conversion to the neutral
allysine reduces the distance between Cε and Cδ groups of these residues [235]. The
current study is consistent with these findings, as it established that the presence
of allysine facilitates an increase in the proportion of time spent in proximity (4.7
A) to its neighboring lysines when WT and 5ALK are compared (Figure 3.7,
G-F).
71
3.4 Discussion
Allysine formation is an essential step in making elastin from tropoelastin, yet its
molecular effects have not previously been considered. This study is the first to
demonstrate that structural changes arise from allysine modifications. Converting
lysine to allysine alters structural ensembles, changes the mobility and accessibility
of domains, and varies accessible molecular motions of tropoelastin.
It is well accepted that the structure and functionality of tropoelastin are substan-
tially affected by single point mutations [36, 61, 233]. Although deviations from
WT structure are generally linked to disease states [241], here, it was demonstrated
that naturally occurring modifications are also capable of altering WT structure.
I established that structures within the ensemble depart from the canonical WT
shape with progressing modifications. This departure is of biological relevance,
as the structural consequences of allysines had not been fully explored within the
context of elastogenesis. Furthermore, this study highlighted the decrease in dom-
inance of a single set of structures with progressing allysine modifications. These
findings are in accord with recent mass spectrometry data and help to explain the
heterogeneity of elastin cross-linking [111]; therefore, I posit that decreased struc-
tural dominance contributes to this heterogeneous cross-linking because 5ALK
more evenly samples a range of structures.
Tropoelastin’s mobility is crucial to its functionality and also plays a significant
role in self-association [32, 52]. Here, it was demonstrated that allysine-modified
tropoelastin displayed altered mobility relative to WT in key domains. This effect
was assisted by sparse, short-lived salt bridges that resulted in local and global
secondary structural changes. The high conformational sampling of WT most
likely facilitates rapid aggregation and LOX mediated modification [62]. The
high conformational sampling of WT most likely facilitates rapid aggregation and
LOX mediated modification, however, I propose that the altered mobility patterns
within ALK353 and ALK507 could serve as a checkpoint required prior to further
72
assembly. Considering elastin’s known extensive cross-links and functionality, this
checkpoint limits participation by molecules lacking sufficient allysines and re-
duces the probability of their incorporation into the growing elastin chain where
they would form a weakly cross-linked fiber. This checkpoint model is supported
by the known presence of lysines in relatively mobile regions of tropoelastin that
are recognized as important in cross-linking [53,111,235,242]. Further support for
the checkpoint model arises when considering the time frame of elastin assembly.
Tropoelastin molecules cross-link subsequent to aggregation, which occurs after
LOX has completed modification and dissociated from tropoelastin. This study
benefits from the fact that tropoelastin structures organize on the order of nanosec-
onds, whereas coacervation occurs on the order of seconds [243], which means that
assembly into elastin occurs much later and is at least several magnitudes of order
slower than the time scales examined here. This indicates that allysine contain-
ing tropoelastin transitions away from the canonical tropoelastin shape prior to
aggregation and cross-linking. However, the contribution of allysines to mobility
is likely to change once tropoelastin is cross-linked due to restrictions imposed by
the resultant bond. Further molecular dynamics studies could be undertaken to
explore the effect of cross-linking on the mobility of allysine containing tropoe-
lastin.
The current head-to-tail model of elastin assembly is based on the mapping of a
handful of cross-links [53] onto the low resolution structure of WT [59]. ANMs
based on the global architecture of WT have implicated the C-terminal scissors
twist motion as being crucial to head-to-tail assembly [61]. The presence of the
scissors twist in ALK507 further verified its importance in self-association steps.
However, the ANMs of ALK353 and 5ALK displayed a loss of the C-terminal
twist, unexpectedly indicating that these previously unexplored motions are also
likely to contribute to higher order assembly.
The hydrophobic domains of tropoelastin dominate and drive tropoelastin asso-
ciation [96] and a decrease in SASA is associated with altered coacervation [61].
73
I propose that the lowered SASA of allysine-containing tropoelastin contributes
to the formation of aggregates that LOX can penetrate and further modify. This
type of aggregate would therefore be an experimentally unexplored component in
higher order elastin assembly that is testable in vivo.
A limitation of the current study is that only one allysine out each of the selected
domains was modified due to the required scale of computing resources. Prior
data indicate that domains may contain more than one modification, which raises
the question of the nature of the changes incurred by modifying a different nearby
lysine or more than one lysine within a single domain. It is difficult to predict
the precise consequences of this without further modeling. I hypothesize that the
modification of two lysines in a single domain, if they participate in salt bridge
formation, would impact on tropoelastin structure. To test this hypothesis, vari-
ous combinations of allysines could incorporated into future molecular dynamics
studies.
Taken together, these data reveal that allysines can cause global changes in struc-
ture, domain mobility and overall molecular motions of tropoelastin, and so con-
tribute to irreversible cross-linked aggregates in hierarchical elastin assembly.
74
4.1 Introduction
The process of self-assembly is crucial for the formation of many higher order
biological structures [244]. In many cases, self-assembly is initiated by nucleation
events, whereby the stochastic association of the smallest repeating unit of a higher
order structure results in nuclei. Eventually, more molecules participate in nuclei
as self-assembly transitions to a growth phase. Elastin self-assembly is thought to
proceed along a similar pathway, with tropoelastin’s hydrophobic domains being
responsible for both the nucleation and the growth phases [78,96]. However, due to
the inherent difficulties of structurally analysing tropoelastin, the precise sites that
trigger nucleation and growth phases have not yet been determined. Furthermore,
as hydrophobic domains are highly repetitive and not cross-linked, it is difficult
to determine their interactions, and thus, understand the manner in which they
associate.
The head-to-tail model of tropoelastin assembly was proposed based on a com-
bination the SAXS/SANS structures of recombinant human tropoelastin and the
location of a desmosine cross-link found within native porcine elastin [53, 59].
Mapping the approximate locations of the domains (10, 19 and 25) within the
desmosine on the SAXS envelope indicated that the alignment of the N-terminal
head of tropoelastin with the foot-region of a second molecule may be able to
form the bond [59]. However, the head-to-tail model has several inconsistencies.
Firstly, the SAXS envelope is non-atomistic low resolution structure, and thus, the
domain sites are not in full agreement with the MD-derived full-atomistic model
of tropoelastin [61]. Secondly, the primary porcine tropoelastin sequence differs
from its human counterpart [245]. As the tertiary structure of tropoelastin is
sensitive to perturbations [37, 61, 233], it is feasible that the structures of porcine
and human tropoelastin are sufficiently dissimilar such that the domains do not
precisely map between the two molecules. Furthermore, the head-to-tail model is
based on a single cross-link yet to be noted in human elastin [34,111].
76
To date, the interplay between tropoelastin monomers during coacervation has
not been captured by traditional structural methods. A recent coarse-grained MD
study harnessed the MARTINI force field examined the early stages of the coac-
ervation of forty tropoelastin molecules for up to 10 us of simulation time [78].
Nucleation events were found to occur via head-to-head, head-to-tail, tail-to-tail,
and lateral types of association, demonstrating that at least at this early stage, a
multitude of types of associations are possible. This variety was preserved through
to nascent fibril formation, where multiple types of associations contributed to fib-
rillar structures. However, the underlying reasons for the variety of conformations
throughout early stage coacervation were not examined, nor was the contribution
of head-to-tail structures investigated.
In this chapter, I employ a combination of docking and logistic regression to explore
the factors that are important for the head-to-tail formation of tropoelastin dimers.
I examine domains that have been confirmed to interact within native elastin
as well as those derived from synthetic cross-linking studies to examine a wide
array of dimers. I analyse the energy and surface area terms in the context of
association type and starting conformation, and build a logistic regression model
to identify the features of the data set that are key for predicting the head-to-tail
outcome.
4.2 Methods
4.2.1 Selection of tropoelastin conformations
As described in prior studies, the last 1000 frames of tropoelastin equilibrated
at 310 K via cMD were grouped through k-means clustering according to their
RMSD (Figure 4.1). The most average structures by RMSD (i.e. the structures
closest to the centre of the clusters) from the top 3 most populated clusters were
selected for docking, as these are likely to be the most representative of each cluster
77
(Figure 4.1). These were termed TE1, TE2 and TE3 respectively, according to
the ranking of their clusters.
4.2.2 Protein-protein docking
Tropoelastin dimers were generated through protein-protein docking. To drive
the docking, considered sites from a variety of studies containing ambiguous and
non-ambiguous interactions were considered. The majority of studies examin-
ing elastogenesis focus on lysine-lysine interactions, as these can be pinpointed
as cross-links in native elastin and can be synthetically generated during coac-
ervation of recombinant tropoelastin. Examples of non-ambiguous interactions
include studies utilising BS3-mediated cross-linking of tropoelastin during self-
assembly [110] and mass spectroscopy analysis of native elastin [34, 53, 111] that
unequivocally identified cross-linking sites. Multiple studies note ambiguous in-
teractions, particularly in those examining fragments of native elastin [34, 111],
due to the repetitive nature of tropoelastin’s primary sequence. For example,
the GVKPG sequence of domain 8 is noted to be cross-linked to a KF peptide
sequence, which corresponds to any of domains 17, 19, 27 or 31 [111], thus ren-
dering it incredibly difficult to determine the specificity of cross-linking within
native elastin. Studies were picked were the lysine within at least one domain
was unequivocally identified and included all the lysines as potential candidates
from the other domains. A full list of the studies and residues used to generate
the dimers can be found in Appendix 1. The HADDOCK 2.2 webserver was se-
lected for protein-protein docking [222], as HADDOCK is capable of dealing with
ambiguous sites of interaction, as previously detailed in Chapter 2.
2000 structures were considered for the initial docking phase and the 500 most
energetically favourable structures were allowed to progress to the water refine-
ment stage and included for subsequent analysis. The data that were considered
included structural data in the form of Cartesian coordinates, the energy of the
78
total system as well as its subsets (such as van der Waals and electrostatic energy),
and buried surface area. A complete list of the variables used in this chapter can be
found in Appendix 2. A total of 19,500 dimers were generated for analysis.
4.2.3 Preparation of structural data
As previously mentioned, HADDOCK generates structural data in the form of
PDB files containing the Cartesian coordinates of the atoms of each dimer. The
Cartesian coordinates were used to calculate the centres of mass of the head
(residues 1 – 180), middle (residues 330 – 420) and tail (residues 600 – 698) re-
gions of each individual tropoelastin molecule (Figure 4.1, B). The three centres
of mass of each region per molecule were then used to calculate the head-head,
head-middle, head-tail, middle-middle and tail-tail Euclidean distances across all
dimers (Figure 4.1, B).
4.2.4 Determination of head-to-tail association
Large data sets, such as the one in this study, present with the problem of anno-
tation. As the current choice of docking software, HADDOCK, does not have the
capability to identify the manner of protein-protein association, each dimer must
be annotated outside the docking program. A semi-automated annotation process
was employed here, utilising a combination of principal component analysis (PCA)
and k-means clustering to determine the subpopulation of dimers that could be
classified as associating in a head-to-tail manner. Only the Euclidean distances
between the three centres of mass were factored into PCA at this stage, as it could
be hypothesised that this would be the most fundamental information required to
cluster structures according to their type of interaction. The k-means analysis was
conducted using various values of k to determine the optimal number of clusters
that would most accurately discretise head-to-tail associations along the PC axes.
79
The associations were validated using manual inspection of the dimers within each
cluster (Figure 4.2, A).
4.2.5 Assembly of docking data
The majority of scripting was carried out using R. A full list of the packages
used for analyses can be found in Appendix 2. Global and domain-level solvent
accessible surface area (SASA) and hydrophobic SASA (H-SASA) was calculated
using VMD. Case statements were written to examine the effects of flanking amino
acid residues and cross-linking domain types (i.e. KA or KP).
4.2.6 Correlation
All numeric variables were examined with Spearman’s correlation prior to con-
ducting machine learning. Spearman’s correlation was selected over Pearson’s
correlation, as many variables examined here were ordinal rather than continu-
ous.
4.2.7 Machine learning
The caret package [246] in R was used to carry out the majority of the ma-
chine learning. The caret package is a library of machine learning algorithms,
from which logistic regression, boosted logistic regression, random forest, k-nearest
neighbours, and Naıve Bayes classifiers were selected for model comparison. The
recipes package was used for processing dimer data into a form that was suitable
for caret. Pre-processing steps included imputing and centring the mean of all
numeric values, and scaling the variables. All categorical variables were one-hot
encoded, as this assists with decreasing multicollinearity.
Machine learning was carried out with a 80:20 test/train split across all dimer
80
structures. This ensured that the train set would contain enough data to capture
the majority of the variance of the data, whilst the test set would contain enough
data to measure the validity of the trained model. The split was conducted in
a stratified manner to ensure that the distribution of the outcome (head-to-tail
association) was similar within the test and train sets. Five repeats of 10-fold
cross-validation were carried out to train the model.
4.3 Results
4.3.1 Semi-automated annotation of head-to-tail associa-
tion
The most average tropoelastin conformations from the three most populated struc-
tural clusters of wild type tropoelastin’s REMD ensemble were extracted for dock-
ing. These were labelled TE1, TE2 and TE3 after their respective clusters (Figure
4.1, A). TE1 was most similar to the global shape of tropoelastin that has been
previously observed in SAXS/SANS studies [59]. The N-terminus of TE2 was
displaced relative to TE1 and folded towards the spur region of the molecule.
The N-terminal displacement in TE3 was more mild, which appeared as a length-
ier molecule due to its extended C-terminal region. The three most populated
structural clusters cumulatively accounted for over 20% of the total structures of
the ensemble, and thus were likely to represent the dominant states occupied by
tropoelastin.
81
Figure 4.1: Conformations and regions of tropoelastin used in this study. A) Tropoe-lastin conformations, TE1, TE2 and TE3, from replica exchange molecular dynamics at 310K [61] selected for docking analysis (top panel). The direction of the N and C termini arelabelled. The conformations were derived from the three most populated clusters of the tropoe-lastin ensemble, the distribution of which is depicted below the conformations (bottom panel).B) The regions of tropoelastin selected for distance calculations for dimer studies, with the head(red), middle (pink) and tail (blue) regions highlighted on TE1.
A total of 19,500 dimers arising from the explicit solvent stage of docking were
considered for this study. These required annotation before logistic regression
analysis, as it is an unsupervised type of machine learning requiring data labels.
Annotation was carried out using a combination of k-means clustering and PCA of
the distances between the head, middle and tail regions of the dimers (Figure 4.1,
B), as well as manual inspection for final validation. Only distance measurements
were used as input for k-means clustering, as the energy terms would be later used
for machine learning and their inclusion may have, thus, contributed to some bias
during subsequent analyses. The initial number of k-clusters selected was 4 based
on prior coarse-grained tropoelastin coacervation studies that identified four broad
types of associations: head-to-head, head-to-tail, tail-to-tail and lateral [78].
82
Figure 4.2: Clustering and annotation of the dimers generated through tropoelastin-tropoelastin docking. A) PCA of the dimers derived from distance measurements betweenthe head, middle and tail regions of each molecule. The distribution of structures according tok-means clustering within PC1 and PC2 are shown and the colours represent individual clustersarising from k = 4. The surrounding dimers flanking the PCA arose from the same clusters,and can be described as lateral inverted (top left), head-to-tail (bottom left), tail-to-tail (topright), tail-to-middle (bottom right). B) The amount of variance explained by each principalcomponent. Examination of optimal k-cluster number via C) total within sum of squares, D)the silhouette method, and E) gap statistics.
83
PCA of the distances was conducted to visualise the projection of k-clusters in
principal component (PC) space (Figure 4.2, A). The arch shape of the PC1-
PC2 plot indicated that the dimers were non-linearly related to one another, and
had most likely arisen due to the subset of docking sites that were selected. PCs 1,
2 and 3 respectively accounted for 48%, 25% and 21% of the variance of the data
set, and cumulatively described > 90% of the variance of the measurements, and
thus, were used for subsequent analyses (Figure 4.2, B). Manual inspection of
the clusters arising from k = 4 and their corresponding structures yielded mixed
populations of structures, with a single cluster containing inverted lateral, head-
to-tail, tail-to-tail and tail-to-middle dimers (Figure 4.2, A). This indicated that
a k of 4 was insufficient to correctly discretise the dimers by association.
The optimal value of k was inspected using various statistical methodologies, in-
cluding the total within sum of squares, silhouette and gap statistics methods.
The total within sum of squares analysis generated an elbow plot, where the op-
timal number of clusters was indicated by the elbow (or “hinge”) of the curve. In
the case of the current data, however, the point of the elbow was unclear, appear-
ing to range between 3 and 5 (Figure 4.2, C). The silhouette and gap methods
recommended a k of 6 or 5 respectively (Figure 4.2, D – E), however, inspection
of the resultant clusters still yielded mixed populations of dimers.
Through incremental increases in the value of k and manually checking the resul-
tant clusters, it was noted that a cluster number of 12 yielded discrete populations
of dimers that could be classified into a single type of association (Figure 4.3, A).
The types of dimers that arose could be broadly classified as either head-to-head,
head-to-middle, head-to-tail, inverted lateral, tail-to-middle or tail-to-tail.
84
Figure 4.3: Final clustering and annotation of dimers. A) Clusters from k-means analysisoverlaid onto PCA of distance measurements, where k = 12. B) Assignment of dimer associationand consolidation of clusters by association type. The directionality of the distance measure-ments’ contributions to the PCA are displayed. The data is displayed in PC1-PC2 (left) andPC2-PC3 (right) space.
Multiple clusters could be assigned the same type of interaction, thus, it is likely
that differences in their overall conformation resulted in their separation during
clustering (Figure 4.4). For example, clusters 7 and 8 were both classed as head-
to-tail, however, cluster 7 consisted of structures that had more contact between
the N- and C-termini relative to those of cluster 8. Clusters that fell within the
same broad classification were grouped for subsequent analyses and then projected
onto PC1-PC2 and PC2-PC3 space to verify that they formed discrete clusters
(Figure 3, B). After consolidating the clusters, it was evident that the largest
amount of PC1-PC2 and PC2-PC3 space was taken up by tail-to-middle clusters,
which suggested a large amount of structural variance. In comparison, head-
85
to-tail clusters comprised a small portion of PC space, indicating less variety
within the distance measurements of these structures. The dominant distance
measures that were responsible for PC1 and PC2 were examined to yield insights
as to which measurements were the most important for k-means clustering. PC1
was primarily described by the tail-to-tail and head-to-head distances, whilst the
largest contribution to PC2 arose from the head-to-middle distance. Examination
of the association clusters within PC space showed that head-to-tail dimers were
similar to lateral inverted dimers along PC1-PC2 space, which can be intuitively
explained by the requirement for both these groups that at least one head and one
tail of each tropoelastin molecule must be in contact with each other.
Of note is that the head-to-tail structures here do not strongly resemble those
previously proposed [59]. However, some structures were observed that could
classed as head-to middle appeared to better resemble the head-to-tail model,
as the head of one tropoelastin neatly fit the groove within the middle of the
other (Figure 4.4). Furthermore, the tail-to-middle structures also appeared to
potentially be an altered version of the head-to-tail model, however, with one
tropoelastin molecule almost orthogonal to the other.
86
Figure 4.4: Structures of dimers arising from k-means clustering. Examples of dimersfrom each k-cluster are shown, with their assigned association type labelled and their N and Ctermini labelled.
87
4.3.2 Overview of dimer associations by starting confor-
mation and study type
The majority of structures generated via HADDOCK were categorised as head-to-
middle (29.08 %) or tail-to-middle (26.35 %). The next most common dimer asso-
ciations were head-to-head (17.29 %), head-to-tail (10.51 %), followed by inverted
lateral associations (9.96 %) Tail-to-tail associations were the least common type
of interaction noted, comprising only 6.81 % of the data (Figure 4.5, A).
Figure 4.5: Dimer association by initial tropoelastin conformation. A) The number ofstructures per association type when in the context of the initial tropoelastin conformation usedfor docking. B) PCA of the distance measurements shown by tropoelastin conformation.
The frequency of association type was further dissected by the three starting
conformations of tropoelastin, TE1, TE2 and TE3, to assess their preferences
for particular types of dimer association (Figure 4.5, A). The most frequently
observed association mode for TE1 and TE2 was head-to-middle, whilst TE3 pre-
dominantly formed tail-to-middle associations. Approximately uniform numbers
of head-to-tail and head-to-head structures resulted between each tropoelastin
conformation. However, substantial differences in numbers due to tropoelastin
conformation were noted within the other types of dimers, with greatly differing
numbers of head-to-middle, inverted lateral and tail-to-tail dimers arising from the
three conformations of tropoelastin. When dimers arising from three conforma-
tions of tropoelastin were projected onto PC1-PC2 space resulting from distance
88
measurements, as previously described, the three conformations showed some over-
lap, with higher densities of dimers occurring as expected compared to association.
Taken together, this indicates a clear preference in association type on the basis
of the starting conformation. As TE1 represents the most prevalent structure in
solution at 310 K, it is possible that the majority of initial dimer conformations
during the early stage of elastin assembly associate in a head-to-tail manner. De-
spite this, however, it is also evident that there is a mix of dimers present, which
is similar to that of previous findings [78].
4.3.3 Overview of dimer associations by native or syn-
thetic origin
The cross-links used to drive dimer formation were derived from either native or
synthetic elastin studies. To understand whether these two cross-link sources could
affect the type of dimers formed, the distribution of structures stratified by cross-
link type was next examined. Approximately the same number of tail-to-tail and
inverted lateral dimers arose from native and synthetic studies, however, differ-
ences existed between all other groups (Figure 4.6, A). In particular, the number
of head-to-tail structures resulting from native studies was small (> 100), indicat-
ing that synthetic studies may be predisposed to forming head-to-tail interactions.
A similar trend was observed for tail-to-middle dimers, whilst head-to-middle asso-
ciations occurred predominantly when residues from native studies were supplied.
No dimers that could be classified as head-to-head were observed from synthetic
studies.
Examination of study type by PCA demonstrated areas of overlap that corre-
sponded to the inverted lateral, tail-to-middle, head-to-middle and head-to-tail
regions observed previously (Figure 4.3, B), as also reflected by Figure 4.6, B.
The synthetic studies in the lower right quarter of the PCA showed that they
largely overlapped with the tail-to-middle region previously described, whilst the
89
lower left corner shows that head-to-middle dimers mostly arose from native stud-
ies, showing overall disparity between the types of dimers generated between study
types.
Figure 4.6: Dimer association by native or synthetic elastin study. A) The number ofstructures per study type when in the context of the initial tropoelastin conformation used fordocking. B) PCA of the distance measurements shown by study type.
4.3.4 Structures arising from the canonical cross-link
To more deeply explore the head-to-tail model of elastin assembly, dimers arising
from interactions between domains 10, 19 and 25 were specifically examined, as
these domains form the basis of the model. Unexpectedly, it was found that TE1
and TE3 did not form head-to-tail dimers through these domains, and only one
structure formed by TE3 involving domains 10 and 19 could be classed as such
(Figure 4.7). The dominant type of association was head-to-middle (59.75 %),
followed by tail-to-middle (15.18 %), inverted lateral (12.51 %) and tail-to-tail
(12.47 %) interactions. Inspection of the locations of domains 10, 19 and 25 re-
vealed why this is the case. With respect to the geometry of tropoelastin, it would
be unlikely for domains 10, 19 and 25 to participate in head-to-tail style interac-
tions due to the positioning of their lysines, however, it is conceivable that subse-
quent interactions after nucleation events may allow such interactions. Interest-
ingly, none of the dimers examined here supported the hypothesis that two tropoe-
lastin molecules alone could be responsible for the trifunctional cross-link amongst
90
these regions, however, it is possible that during coacervation some molecules may
shift to accommodate this bond. Furthermore, the exposure of the non-interacting
lysines positions them for further association, thereby allowing propagation of the
growing nascent elastin chain.
Figure 4.7: Dimer associations arising from the interactions between domains 10,19 and 25. Association type is displayed with respect to initial tropoelastin conformation. Thedimers represent interactions between domains 19 and 25 from TE1 (left), 10 and 19 from TE2(middle), and 10 and 25 from TE3 (right). The N and C termini of each tropoelastin moleculeare indicated. The lysines within domains 10 (purple), 19 (cyan) and 25 (yellow) are shown.
4.3.5 Electrostatic interactions of dimers
Electrostatic interactions play a key role in protein-protein interactions and asso-
ciation rates. Due to their long range of interaction, electrostatics are recognised
as a crucial factor in early stage protein-protein interactions, especially prior to
two molecules forming contact, as the contribution of other forces at this stage are
near-zero. As this study models tropoelastin interactions at the dimer stage, it
can be assumed despite the close proximity of tropoelastin molecules, the distri-
bution of positively charged lysines strongly influence the dimerisation process -
91
indeed, tropoelastin contains 35 lysines that are positively charged at physiological
pH.
The electrostatic energy of the head-to-tail structures significantly differed from
head-to-middle, inverted-lateral and tail-to-tail structures, but bore similarities to
head-to-head and tail-to-middle structures (Figure 4.8, A).
Figure 4.8: Electrostatic interactions of tropoelastin dimers. A) Electrostatic energyby the type of dimer interaction. B) Distribution of electrostatic energy by initial tropoelastinconfiguration. C) Surface potential of the tropoelastin structures used in this study, displayingpositively (blue) and negatively (red) charged areas. Wilcoxon’s rank sum test with Bonferronipost-hoc corrections are indicated with respect to head-to-tail conformations, where *** indicatesp ≤ 0.001.
Curiously, dissecting electrostatic energy by tropoelastin conformation revealed
that the dimers formed by TE3 had overall lower energy relative to those formed
92
by TE2 and TE3 (Figure 4.8, B). This could partly be due to the relatively large
proportion of inverted lateral structures formed by TE3 in comparison to TE1 and
TE2 (Figure 5), which had the lowest electrostatic energy of the dimers (Figure
4.8, A). Inspection of the electrostatic potential of the surface of each initial
tropoelastin conformation revealed that TE3 contained less hotspots of positive
surface potential, which may contribute to less electrostatic clashes and, thus,
lower the electrostatic energy of these dimers.
4.3.6 Surface area and solvent accessibility of dimers is
driven by tropoelastin conformation
Buried surface area (BSA) is an important consideration for protein-protein inter-
actions that comes into play after the initial electrostatic interaction has brought
the molecules together. Hydrophobic residues, in particular, are responsible for fa-
cilitating the amount of buried surface area due to their energetically unfavourable
interactions with water molecules, and play a key role during elastin assembly by
facilitating aggregation [96]. Examination of the total BSA of the dimers revealed
that head-to-tail dimers had the lowest BSA out of all the dimer types (1224.41
± 420.87 A), whereas tail-to-tail dimers reported the highest BSA (1497.96 ±
463.08 A) (Figure 4.9, A). When probed in the context of starting tropoelastin
conformation, buried surface area appeared to be uniform throughout all three
conformations (Figure 4.9, B). Of note is that dimers formed with TE2 had a
higher average BSA (1452.19 ± 447.83 A) relative to TE1 (1370.25 ± 431.85 A)
and TE3 (1321.41 ± 421.16 A), which indicates a larger area of interaction that
may contribute to forming more stable dimers.
93
Figure 4.9: Buried surface area and hydrophobic solvent accessible surface area ofthe dimers. Buried surface area by A) association type and B) initial tropoelastin configu-ration, and hydrophobic solvent accessible by C) association type and D) initial tropoelastinconfiguration. Wilcoxon’s rank sum test with Bonferroni post-hoc corrections are indicated withrespect to head-to-tail conformations, where *** indicates p ≤ 0.001.
Tropoelastin aggregation and eventual self-assembly occurs due to the association
of hydrophobic domains, and thus, the dimers were examined in the context of
hydrophobic solvent accessible surface area (SASA). Head-to-tail dimers had the
second highest hydrophobic SASA (45094.95 ± 323.03 A) and were only preceded
by head-to-head interactions (45133.83 ± 341.85 A) (Figure 4.9, C). This is
an intriguing finding as favourable association of hydrophobic domains is crucial
for elastin assembly - as such, the current findings indicate that assembly may
not initially proceed in a head-to-tail manner. However, when considering the
94
requirement of exposed hydrophobic domains for continued assembly, the head-
to-tail dimers may present with a large amount of hydrophobicity that favours
further interactions. Thus, it is difficult to predict with certainty as to whether
head-to-tail dimers are ideal for assembly.
When hydrophobic SASA was examined according to the initial tropoelastin struc-
tures, it was noted that TE2 had a higher mean hydrophobic SASA (45237.74 ±
320.43 A) when compared to TE1 (45013.06 ± 294.56 A) and TE3 (44944.43 ±
301.05 A) (Figure 4.9, D). The low hydrophobic SASA of TE3 dimers indicate
that they are favourable for interactions with water as they have less hydrophobic
residues exposed to solvent. However, the need for the hydrophobic force to drive
elastin self-assembly is so crucial that it is possible that the dimers with lowered
hydrophobic SASA formed by TE3 may not aggregate as rapidly as those formed
by TE1 and TE2. Indeed, TE2’s larger hydrophobic SASA is likely to prime it
for further interactions during coacervation and facilitate more rapid aggregation
in comparison to TE1 and TE3. When taken together with the larger BSA of the
TE2 dimers in comparison to those from TE1 and TE3, it is probable that TE2
forms the stable dimers out of the three conformations examined and are more
amenable to further interactions with other tropoelastin molecules.
4.3.7 Correlation of dimer energies and features
The correlation between numeric variables was examined prior to conducting ma-
chine learning, as collinearity of features lead to model bias via redundancy. Spear-
man’s correlated was selected as the data presented previously, such as the elec-
trostatic energy, were non-parametric (Figure 4.8, B).
95
Figure 4.10: Correlation analysis of the numeric features of the dimer data set. A)Heatmap correlation plot displaying the magnitude and strength of the correlation. Spearman’scorrelation is shown in the cell at each intersection between the features, with values only beingdisplayed where p ≤ 0.05.
The correlation between the energy terms and solvent accessible surface areas was
examined prior to conducting logistic regression, as variables with high collinear-
ity do not improve model accuracy and introduce redundancy into the algorithm.
It was observed that none of the variables were highly correlated, with the ex-
ception of total energy and electrostatic energy which had a correlation of +0.95
(Figure 4.10). This was not unexpected when considering the above analysis of
electrostatic energy and its contribution to the overall dimer systems considered
for this analysis. Before proceeding to logistic regression, total energy was re-
moved, as it is the sum of all energy terms in this analysis, and as such, was likely
to be redundant in the context of machine learning. Another correlation that
could be intuitively interpreted included the negative correlation between BSA
and desolvation energy, as desolvation involves the decoupling of bound proteins,
which would decrease BSA. No other features were removed before proceeding to
machine learning.
96
4.3.8 Machine learning model selection using energy and
surface features
Machine learning was undertaken to understand whether the energy and surface
features of the data set would be able to make a model that could sufficiently
predict the head-to-tail outcome. Logistic regression was first constructed using
elastic net regularisation. This model yielded a sensitivity of 0.28 and a specificity
of 0.97 that corresponded to the true positive and true negative rates respectively,
thus indicating that the model was a poor predictor of the head-to-tail outcome
(Figure 4.11, A). This finding was intriguing, as prior statistical inference hinted
that there were discernible differences between head-to-tail dimers and the other
types of dimers.
The first alternative route that was explored was whether other models would fit
the data better relative to logistic regression. To test this, a number of classifica-
tion algorithms were applied that are readily available through the caret package
of R, including k-nearest neighbours (k-NN), Naıve Bayes, random forest, as well
as boosted logistic regression. The sensitivity yielded by all trained models was
low (> 0.3), with k-NN yielding the highest sensitivity (0.17) other than logis-
tic regression, whilst the lowest sensitivity, other than the previously described
logistic regression model, arose from the Naıve Bayes classifier (0.008) (Figure
4.11, A). Therefore, the predictive capabilities of these models only yielded small
improvement when compared to that of the initial logistic regression.
97
Figure 4.11: Assessment of optimal machine learning model for the energy andsurface area features from the dimer data set. A) Comparison of sensitivity and specificityof trained models using regularised logistic expression, naive Bayes, random forest, boostedlogistic regression and k-nearest neighbours classifiers. B) Projection of head-to-tail dimers ontoPCA of data set.
The second possibility that could explain the poor sensitivity of the previously
described models was that despite the statistical significance between features
such as electrostatic energy and hydrophobic SASA, the overlap in energy and
surface area features previously described was too great for the model to be able
to conduct classification. This was in greater detail by projecting head-to-tail
structures onto PC1-PC2 and PC2-PC3 space generated by the features input
98
into the models (Figure 4.11, B). This demonstrated an approximately even
dispersion of head-to-tail and non-head-to-tail structures throughout, indicating
that constructing a model using energy terms alone was inadequate to conclusively
predict head-to-tail associations.
4.3.9 Logistic regression of whole dimer data set
To improve on the initial model, logistic regression was next conducted on data
containing all features other than total energy and the distance measurements
used in previously in k-means clustering. The best trained model had a sensitivity
of 0.77 (6288 out of 8200 head-to-tail dimers were correctly identified), and a
specificity of 0.98 (68,257 out of 69,805 not head-to-tail dimers were correctly
identified), with hyperparameters of λ = 1 and α = 0 (Figure 4.12, A and B).
This indicated that pure ridge regularisation was the best fit for the model, despite
the option for the model to utilise elastic net regularisation. The performance
of the model was assessed on the test set, which yielded an AUC of 0.971 and
overlapped very well with the train set (Figure 4.12, C). This indicated that the
model was neither overfit nor underfit to the data.
99
Figure 4.12: Logistic regression model performance of extended dimer data set.A) Optimisation of model sensitivity using regularisation (λ) and mixing (α) hyperparametersbetween 0 - 1. B) Confusion matrix of best trained model, showing the amount of true positive(TP), false positive (FP), true negative (TN) and false negative (FN) outcomes. C) Modelperformance shown as area under the curve (AUC) of the fraction of true positive and falsepositive outcomes. Train and test data sets are indicated in comparison to the performance ofa classifier that is equal to that of random chance (AUC = 0.5). D) Variable importance of thefeatures used in the best model.
Variable importance was next examined to understand which features were pre-
dominantly used by logistic regression to predict head-to-tail dimers. The top five
features assigned the highest importance consisted of domain numbers, for exam-
ple, domains 8 and 12 were ranked as the two most important features (Figure
4.12, D). An intuitive interpretation of this would be that these domains are
100
located around the head region of tropoelastin, and thus, are essential for a head-
to-tail interaction. Similarly, domains from the lower regions, particularly the
spur and foot areas such as 23 and 29, were also ranked highly, as these make
up the “tail” component of the interaction. The TE2 conformation was the 7th
most important feature, which was logical as the TE2 conformation appeared to
form more head-to-tail structures relative to TE1 and TE3, as previously dis-
cussed (Figure 4.5, A). However, its importance value of 10 was considerably
smaller compared to those of the domains ranked above it, indicating that it was
not used extensively in the prediction. Desolvation energy and buried surface area
ranked 11th and 13th in terms of importance, most likely for similar reasons as to
those given for the TE2 conformation. Interestingly, whether the domains were
KA or KP type ranked lower in variable importance. KP domains predominantly
exist within tropoelastin’s N-terminal region than at its C-terminus, however, KA
domains are also known to exist at the N-terminus (for example, domain 13).
Therefore, it is possible that logistic regression did not consider the type of do-
main to be important for this classification. The contribution of domains that
were not noted to be involved in head-to-tail interactions, such as 19 and 29, were
considered negligible by logistic regression, as were additional energy terms such
as van Der Waals forces.
4.4 Discussion
The field of research investigating tropoelastin’s interactions during assembly and
its regions of interaction within mature elastin has been greatly hindered due to
tropoelastin’s repetitive sequence. Elastin peptides arising from protein fragmen-
tation during mass spectrometry often contain short sequences such as AK or
AAK which cannot be unequivocally pinpointed to a single site within tropoe-
lastin [34, 111]. For example, Hedkte and colleagues have identified domains 17,
19, 27 and 31 as potential binding partners for domain 8 based on short peptide
101
fragments, however, it is unknown whether domain 8 is cross-linked to all or a
subset of these domains, nor the frequency with which any of these interactions
may occur [34]. To date, only three studies have unambiguously determined cross-
linking sites within native elastin, resulting in a handful of cross-links that provide
the only clues as to tropoelastin-tropoelastin interactivity within native elastin.
Therefore, studies were taken into account if they at least pinpointed an interac-
tion between two domains to avoid the generation of what may be physiologically
irrelevant dimer conformations. Considering the dense nature of elastin and its
high degree of interconnectedness [96], it is likely that many cross-links between
multiple domains await discovery, and as such, the dimers examined here are only
a representation of what is currently known. Future docking studies such as this
could expand the number of structures examined by conducting docking studies
using the ambiguous peptide sequences to drive the study, however, caution must
be taken as to interpreting their biological relevance.
The model of elastin assembly has been an area of debate for a number of decades.
The initial model of assembly proposed was based on tropoelastin’s low resolution
SAXS/SANS envelope [59] and the only native cross-link that had been identi-
fied at the time [53]. As coacervation is a rapid process, it was postulated that
tropoelastin assembles in an ordered fashion, and indeed, since the SAXS/SANS
envelope presented with a single global shape, there was evidence to suggest that
tropoelastin could assemble through multiple mechanisms. However, a paradigm
shift within the field has occurred over the page years through several parallel
studies. Extensive MD simulations revealed that tropoelastin is a highly flexible
protein that exists in a stochastic ensemble of conformations [61, 62]. Concur-
rently, mass spectrometry studies on a variety of organisms revealed that elastin
is heterogeneously cross-linked [34, 111], hinting that tropoelastin’s flexibility is
not only important for elastin’s mechanical resilience, but also plays a role during
coacervation and the positioning of molecules. Most recently, coarse-grained MD
revealed that tropoelastin initiates self-assembly by nucleation events, which occur
102
via a broad spectrum of interactions, including head-to-head, head-to-tail, tail-to-
tail and lateral interactions [78]. These interactions persist through to at least
the nascent fibril stage of elastogenesis. The current findings support the above
data, demonstrating that the previously identified cross-linking sites are capable
of interacting through six distinct types of association during the early stages of
assembly. This study unites the full-atomistic model with available data from a
variety of sources, and utilises three of tropoelastin’s most prevalent structures
from its conformational ensemble. Due to computational restraints this study was
restricted to examining dimers, however, it can be be speculated that the geome-
tries presented in this study would also persist throughout early stage assembly,
as indicated by similarly diverse MD simulations [62].
The precise head-to-tail conformation described by Baldock and colleagues was
not noted here, most likely due to the differences between the domain positions
of full-atomistic structure of tropoelastin [61, 62], and the SAXS/SANS envelope
and accompanying approximate domain locations [59]. However, structures resem-
bling the initial head-to-tail model were seen, which were termed head-to-middle
in this study, due to the N-terminus of the first tropoelastin nestling into the
groove in the second molecule’s middle N-terminal “coil” region. These indicate
a mechanism of tandem association similar to that proposed by the head-to-tail
model, and importantly, made up the majority of the interactions in this study,
suggesting that they are the dominant form of interaction within early stage as-
sembly. Nonetheless, the wide spectrum of conformations generated here indicates
that elastin assembly is more complex than previously appreciated. The variety
of associations seen here are likely to propagate fibril formation through a num-
ber of different commitment paths, thereby leading to highly complex branching
structures made up of multiple types of interactions and result in the downstream
heterogeneity described by cross-linking studies [34,111].
The disparity in tropoelastin association arising from native elastin and synthetic
coacervation studies has not yet been explored. Here, differences were noted
103
between the proportions of dimer types that arose between residues that were
selected from native elastin in comparison to synthetically cross-linked human
tropoelastin studies. Discrepancies in dimer associations between the two broad
categories of studies are likely to have arisen due to bias within the methodolo-
gies of the studies. Native elastin studies rely heavily on protein fragmentation
using proteases, and thus, it is possible that particular areas of elastin are cleaved
more readily, which could explain the high proportion of head-to-middle structures
observed from native studies. Meanwhile, the synthetic study from which inter-
active sites were obtained through BS3 cross-linking, a mid-length linker which
is longer than that of a bifunctional elastin cross-link, could have cross-linked
tropoelastin molecules that do not normally associate in vivo [110]. Moreover, as
synthetic studies are conducted using recombinant human tropoelastin, this un-
modified form of tropoelastin has positive charges on all lysines which may propel
it to associate in a predominantly tail-to-middle manner as indicated by the cur-
rent results. Interestingly, recent coarse-grained molecular dynamics simulations
of early stage tropoelastin aggregation also indicate an enrichment of association
between middle and tail regions of the molecule that was most likely facilitated
by interactions between the large hydrophobic domains in these regions, such as
domains 18 and 20 [78]. As this study was based on the sequence of recombinant
tropoelastin, it is possibly closer to the results of the synthetic studies than those
of the native studies, however, it is difficult to say as the dimer associations were
not quantified.
A further source of variance within the cross-linking studies discussed above is
that the origin of the native elastin in the studies the current protein docking was
based on was not only human in origin; interactions between domains 4-12, 6-14
and 12-27 were derived from bovine elastin [111], whilst the canonical domain 10-
19-25 cross-link is of porcine origin [53]. When taking tropoelastin’s sensitivity to
mutations [22,36,37,233] and modifications [247] into account within the context
of both global structure and coacervation, it is not difficult to assume that the
104
domains within tropoelastin from non-human sources are differently positioned
to those within human elastin, and thus, cross-linked regions from animal elastin
may not accurately reflect those within humans. However, despite differences in
sequences, it must also be noted that tropoelastin molecules across organisms must
share a number of commonalities as to allow them to assemble into elastin fibres of
similar morphologies, regardless of the animal. Thus, the results here are likely to
reflect a number of possibilities available to tropoelastin during coacervation.
Overall, the current data suggests that the geometry of the dimer conformation is
dictated by the conformation of the initial monomer. Thus, it was a fitting discov-
ery that machine learning could not accurately predict head-to-tail associations
based on energy and surface area terms alone. The introduction of domains into
the algorithm resulted in a marked increase in model sensitivity indicating the
importance of domain positioning and overall molecular geometry during coacer-
vation. This validates the approach of this chapter to examine dimer formation
using multiple starting conformation and simplifies further modelling and predic-
tion of association types based on cross-linking data.
105
Chapter 5
Interactions of tropoelastin withintegrins
This chapter has been submitted to the Biophysical Journal as:
Ozsvar, J., Wang, R., Tarakanova, A., Buehler, M. J., Weiss, A. S., “Fuzzy bind-ing model of molecular interactions between tropoelastin and integrin αvβ3”.
106
5.1 Introduction
Elastin is a key component of the mammalian extracellular matrix (ECM) that
imparts tissues with the ability to resist deformation during repeated stretch and
recoil cycles over the duration of an organism’s lifetime. The elastin polymer con-
sists primarily of its monomeric subunit, tropoelastin, which contains alternating
hydrophobic and hydrophilic domains. The hydrophilic domains are rich in ly-
sine and alanine residues (of which lysines are crucial for cross-linking) whilst the
hydrophobic domains mostly consist of repeats of glycine, proline, alanine and
valine [96]. Tropoelastin is involved in numerous cellular activities across differ-
ent cell types, including attachment, proliferation, chemotaxis, and differentiation
of fibroblasts, endothelial cells, smooth muscle cells, and multipotent progenitor
cells [248]. These interactions are largely facilitated directly between tropoelastin
and cell surface receptors including elastin binding protein [249], glycosaminogly-
cans [139], and integrins [121].
Integrins are a major class of cell surface adhesion receptor that are heterodimers
comprising of non-covalently bound variations of α and β subunits, of which there
are 24 known combinations in vertebrates [155]. Structurally, the integrin subunits
consist of extracellular, transmembrane, and intracellular domains. The extracel-
lular headpiece is responsible for ligand binding, where ligand-induced conforma-
tional changes result in the headpiece opening to facilitate an active ‘open’ confor-
mation that allows outside-in signaling via intracellular domains tethered to the
cell cytoskeleton [56]. When adhered to its ligand, integrins regulate crucial cell
functions including cell proliferation, angiogenesis, wound repair, developmental
signaling, immune responses, and tumorigenesis [250]. The diversity of specialized
biological processes regulated by integrins arises from a variety of unique subunit
pairings. An example of this specificity is through the RGD binding motif, often
found in ECM proteins such as fibronectin, vitronectin and laminin, which fa-
cilitates headpiece opening and signal propagation through large conformational
107
changes of αvβ3 [251] and α5β1 integrins [252, 253]. RGD-facilitated integrin ac-
tivation occurs at the interface between the β-propeller of the α subunit and βA
domain of the β3 subunit [152,153].
Integrins are of particular interest as they facilitate tropoelastin-based cell inter-
actions. Thus far, two regions within tropoelastin have been confirmed to bind in-
tegrins; the GRKRK sequence at the C-terminus binding with integrin αvβ3 [121],
and a second interactive site spanning the interface between domains 17 and 18
binding integrins αvβ3 and αvβ5 [123]. As tropoelastin does not contain the clas-
sical RGD sequence, nor any of the currently known ECM motifs recognized by
integrins, the sites on the integrin headpiece that interact with tropoelastin re-
main elusive. Moreover, the mechanisms for tropoelastin-based integrin activation
remain largely unknown. It is crucial to understand the nature of the tropoelastin-
integrin interaction, as the effects of cell attachment on tropoelastin have direct
consequences for elastic fiber and cell organization in three-dimensional tissue ar-
chitectures [254–257].
Here, a computational approach was undertaken to further explore the nature of
the interaction between tropoelastin and integrin αvβ3. Ligand-induced integrin
activation has been previously studied using computational molecular dynamics
(MD) modelling, and has shed light on the mechanisms of αvβ3 spontaneous acti-
vation and ligand-induced strain propagation in signaling [154, 258]. The confor-
mational changes of the extracellular domains, particularly within the headpiece
of αvβ3, have been further characterized through MD simulations with RGD-
containing fibronectin [259] and RGD derivatives [260]. To build upon this and
characterize the interactions between integrins and tropoelastin, this study has
leveraged the full atomistic structure of tropoelastin that was previously modelled
using extensive replica exchange molecular dynamics (REMD) simulations [61].
Although tropoelastin is a highly flexible protein, the computational model cor-
relates well with existing low-resolution structural data and possesses secondary
structural features in agreement with those indicated by circular dichroism and
108
molecular mutation studies [61]. These highlight the utility of REMD to effec-
tively predict the molecular structure of proteins with a high degree of disorder.
In this study, protein-protein docking was conducted between tropoelastin and
αvβ3, accompanied by extended REMD and structural refinement using classical
molecular dynamics (cMD), to dissect the interactions and the mechanism of the
subsequent tropoelastin mediated integrin activation.
5.2 Methods
5.2.1 Preparation of integrin headpiece structure
This study utilized the headpiece from the bent closed conformation of integrin
αvβ3 (Figure 5.1, A-B), which was extracted from the RCSB protein data bank
(PDB ID 1L5G) [152]. The headpiece consists of the β-propeller from the αv
subunit and the βA and hybrid domains from the β3 subunit. Importantly, the
αvβ3 headpiece contains the known receptor site for RGD ligands [261] which is
the origin of the molecular motions that activate the integrin.
Two separate integrin headpieces were modelled to respectively reflect medium
and low affinity binding states for comparative analysis. The affinity of the in-
tegrin for its ligand is regulated by the presence of divalent cations in the MI-
DAS, ADMIDAS and SyMBS pockets that are found close to the ligand binding
site [262]. The medium affinity headpiece was modelled by replacing all cations
with Mg2+ as this configuration is conducive to binding tropoelastin [123,123,262]
and is capable of facilitating spontaneous headpiece opening in silico [154]. The
low affinity headpiece was modelled by replacing all cations with Ca2+ (other than
at MIDAS, which was coordinated to Mg2+) as this configuration impedes com-
putational headpiece opening [154,258]. The medium and low affinity headpieces
were termed αvβ3-Mg and αvβ3-Ca respectively. Disulfide bridges were added be-
tween residues previously validated by crystallography to further prepare the αvβ3
109
headpiece for computational analysis [152].
5.2.2 Preparation of tropoelastin structure
Tropoelastin is a highly flexible molecule that undergoes extensive conformational
sampling [61, 62]. To increase the chance of obtaining structures that bind αvβ3
during both the protein-protein docking and MD stages, two structures from the
final 2 ns of previous simulations [62] were selected based on the accessibility of
the lysines in domains 17 for subsequent docking, as prior studies have indicated
their importance in tropoelastin-integrin interactions [123].
5.2.3 Tropoelastin-integrin configuration preparation
Initial configurations of the tropoelastin-integrin complex were generated from
αvβ3-Mg and tropoelastin using the High Ambiguity Driven protein-protein DOCK-
ing (HADDOCK) 2.2 web server [222]. HADDOCK was selected for this step as it
factors in interaction data, such those from mutagenesis experiments, in instances
where the precise sites of interaction are unknown. Such user defined data were
used to define active residues, which are residues that are thought to be primar-
ily responsible for the interaction. The active residues of tropoelastin that were
selected to drive the docking were K286 and K289 from domain 17. These were
chosen based on prior molecular truncation and peptide studies (including point
mutagenesis and scrambled sequences) that demonstrate their importance in bind-
ing to αv integrins [123]. The residues from the integrin that were selected were
Met118 from αv, and Thr136 and Thr183 from β3. These have been demonstrated
to stay in prolonged contact with a non-RGD coated surface in silico and flank
the RGD-binding region [263]. Thus, by manoeuvring tropoelastin close to the
integrin’s β1-α1 loop, I aimed to reduce the time and resources required for later
REMD. Tropoelastin was not docked directly to the RGD-binding region to al-
low tropoelastin to explore multiple conformations around the binding sites and
110
interact with the site during REMD.
HADDOCK defines passive residues as solvent accessible residues that are in the
vicinity of the active residues [222]. Their proximity to the active residues al-
lows them to contribute to the overall interaction by providing local biophysi-
cal/biochemical context for the interaction [264]. This allows HADDOCK to dock
structures based on a surface rather than a handful of residues, thus, is more
representative of protein-protein interactions. HADDOCK was allowed to auto-
matically define passive residues as those within 6.5 A of the residues mentioned
above. It is possible that this may have resulted in more noise during the dock-
ing stage, however, not selecting passive residues or overly restricting the surface
would have lead to high bias toward particular structures that may not have taken
the local surface context of the interactive sites into account.
2000 steps of initial rigid body docking were conducted, and the 200 most ener-
getically favorable structures were refined in aqueous solvent using brief MD runs.
The resulting tropoelastin-integrin structures were clustered based on their frac-
tion of common contacts [265]. Since no tropoelastin-integrin structure exists for
assessing the biological relevance of the docked structures, we relied on the HAD-
DOCK score to select the most energetically favourable structures for subsequent
REMD. The score is comprised of the weighted contributions of electrostatics,
buried surface area, restraints violation energy, van der Waals forces, desolvation
energy, and the root mean square deviation of atomic coordinate distances away
from the lowest energy structure.
5.2.4 Molecular dynamics modelling
The choice of MD simulation is non-trivial for large protein complexes and flexible
proteins alike. In practice, most MD simulations are not ergodic due to the time
requirements needed to escape minima within the energy landscape. Furthermore,
when accounting for tropoelastin’s high flexibility, it is necessary to sample a range
111
of tropoelastin-integrin structures to understand the underlying mechanisms of the
interaction. REMD [214], an accelerated sampling methodology, was implemented
here using NAMD 2.11 [186].
REMD is appropriate for exploring the conformational sampling of proteins as
it facilitates the crossing of energy barriers that are otherwise difficult to over-
come using cMD. REMD has been previously used to explore the ensemble of
tropoelastin structures to understand its conformational sampling [61, 62] as well
as examine the interactions between integrins and their ligands [263,266]. REMD
takes a single starting structure as its input and simulates it across a range of
temperatures in parallel, thereby giving rise to replicas of the structure at differ-
ent temperatures. Replicas may be exchanged with other replicas in neighboring
temperatures via Monte Carlo moves, permitting the structure to sample a greater
portion of its overall energy landscape [214]. Thus, the initial structure undergoes
a greater extent of conformational sampling with REMD than with cMD.
To implement REMD, a temperature distribution ranging between 280 – 480 K
was used with an exchange acceptance frequency (EAF) of 0.2 and an exchange
step of 5 ps to ensure adequate sampling. 48 replicas were sufficient to achieve
the EAF for the integrin headpiece in isolation, whereas 56 replicas were required
for tropoelastin-integrin simulations due to the greater number of atoms within
the protein complex. Non-bonded interactions were applied using an interaction
cut-off of 16 A, a switch distance of 14 A, and a pair-list distance list of 18 A.
The addition of water as a solvent to REMD systems vastly increases the amount
of required computational resources to run the simulation. Therefore, implicit
solvent was selected for REMD simulations, with a dielectric constant of 80, 0.15
M ion concentration, and an alpha cut-off of 15. Each replica was sampled for
8 ns to ensure a sufficient sampling depth [263]. The integrin headpiece and
the tropoelastin-integrin systems were simulated for a total of 384 ns and 448
ns respectively in REMD. The CHARMM22 force field with the CMAP peptide
backbone correction was selected for consistency with previous tropoelastin and
112
integrin simulations [61,62,154].
Ensembles each containing 400 structures were obtained from the last 2 ns of the
310 K temperature replica for each initial starting structure. These structures
were sorted into clusters using the k-means functionality of the MMTSB Tool Set
using a RMSD of 5 A [238]. The representative structures from the most populated
clusters were selected for subsequent explicit solvent analysis.
As water impacts the secondary structure of tropoelastin, structural refinement
in explicit solvent was conducted using the representative structures obtained
from the most populated cluster resulting from k-means analysis of each ensem-
ble. Systems were solvated using VMD [237] and ionized with 0.15 M NaCl to
mimic physiological conditions. The final solvent box per experiment contained ¿
200,000 water molecules with a padding distance of 20 A to prevent the proteins
from contacting themselves under periodic boundary conditions. All structures
were minimized for 25 ps using the conjugate gradient method before applying
harmonic constraints to the heavy atoms of the protein backbone and sidechains
to prevent the destabilization of the atoms during the initial simulation. The ex-
periments were heated up to 310 K, after which the harmonic constraints of the
sidechains were released, followed by the release of the backbone atoms to relax
the proteins. Next, the proteins were equilibrated for 100 ps in an isobaric simu-
lation, where a constant pressure of 1 atm was maintained using the Nose-Hoover
Langevin barostat with a period of 200 fs and decay of 100 fs. Langevin dynamics
were utilized to maintain a physiologically relevant temperature of 310 K, with a
damping coefficient of 1 ps. Non-bonded parameters were modelled using an in-
teraction cut-off of 12 A, a switch distance of 10 A and a pair-list distance of 13.5
A. Electrostatics were regulated by a Particle Mesh Ewald summation with a grid
spacing of 1 A. The isobaric equilibrations continued for 100 ns until convergence,
assessed through RMSD, was reached.
113
5.2.5 Analysis
Structural analysis consisted of two parts. Examination of ensembles arising from
REMD was conducted on the previously described 400 structures that arose from
the final 2 ns of simulation. Analysis of trajectories modelled using cMD was
carried out on the final 50 ns of simulation. Principal component analysis (PCA)
and local residue fluctuation analysis were carried out with Bio3D [267], ProDy
[217] and VMD [237]. The opening of the integrin headpiece was quantified by
calculating the distance between the centers of mass of residues 250 – 438 of the
αv β-propeller domain, and residues 55 – 108 and 354 – 434 of the β3 hybrid
domain. Other analyses of the structures resulting from both REMD and cMD
were conducted using custom Tcl/Tk, R, and MATLAB scripts.
5.3 Results
5.3.1 Docking of tropoelastin to integrin αvβ3
Protein-protein docking was conducted to generate tropoelastin-integrin complexes
for subsequent MD modelling. The bent closed configuration (Figure 5.1, A)
crystal structure of integrin αvβ3 was selected to examine whether the integrin
could change into the extended open configuration during MD, as this structural
change is important for its activation [259, 268]. The headpiece of αvβ3 (Figure
5.1, B) was separately docked to two conformations of tropoelastin, TE1 and
TE2 (Figure 5.1, C), that were derived from prior REMD simulations at 310
K [61, 62]. The structures that arose from the final refinement stage in water
were clustered and ranked according to the most favorable energies. The most
energetically favorable structures (Figure 5.1, D) from the top ranked clusters
(Supplementary Table 1) were selected for REMD. These starting structures were
termed αvβ3-TE1 and αvβ3-TE2, and had differing alignments of tropoelastin rel-
114
ative to the integrin (Figure 5.1, D).
Figure 5.1: Overview of the structures used in this study. A) The integrin activationpathway depicting the large-scale conformational changes involved in transitioning from theclosed to the open conformation. The headpiece subunits are highlighted with darker shades ofcolors. B) The headpiece of αvβ3 used in this study, depicting the αv (red) and β3 (blue) subunitsand the residues selected for facilitating docking (yellow). Divalent cations are displayed as pinkspheres. C) The conformations of tropoelastin used for docking and the locations of K286 andK289 from domain 17 (teal). The locations of the N- and C-termini of the polypeptide chainare denoted with N and C. D) αvβ3-TE1 and αvβ3-TE2, which were ranked as the top outputstructures from docking, were used as the starting structures for REMD.
5.3.2 Integrin headpiece opening and associated structural
changes with REMD
REMD in implicit solvent was implemented to generate an ensemble of structures
for αvβ3-Mg, αvβ3-Ca, αvβ3-TE1 and αvβ3-TE2. The probability distribution of
the potential energy of each replica overlapped with its neighbor across the tem-
perature range of 280 – 480 K (Figure 5.2). The overlap is particularly crucial as
it allows replica exchange with its neighbor to efficiently facilitate conformational
sampling [214]. It was noted that each ensemble contained a wide variety of struc-
tures across the temperatures. For example, the tropoelastin-integrin ensembles
115
included instances where αvβ3 and tropoelastin were not interacting, as well as
denatured structures that arose at the higher temperatures. This indicated that
the initial structures had undergone extensive structural sampling, which is a key
requirement for ensemble analysis.
Figure 5.2: Potential energy frequency distribution of the 56 replicas across theensemble of αvβ3-TE2. The structures above the distribution represent the various observableconformations during REMD and the exact replica that gave rise to each structure.
The major structural consequence of ligand binding to the integrin headpiece is
the swinging out of the β3 hybrid domain into an open conformation (Figure
5.1, A). To quantify this movement, the distance between the centers of masses
(COM) of the αv and β3 subunits was calculated in the structures within the
ensembles [153,154,263]. As the mechanistic opening of the hybrid domain is con-
sistent across β3 integrins, the open αIIbβ3 structure (PDB ID 2VDR) was used
as a benchmark for identifying an open configuration of αvβ3. It was observed
that the low affinity structure (αvβ3-Ca) was as conducive to headpiece opening
as the medium affinity structure (αvβ3-Mg) using REMD, with 39% and 40% of
structures respectively observed to be fully open when compared to the αIIbβ3
benchmark (Figure 5.3, A). When tropoelastin was introduced into the simula-
tions, αvβ3-TE1 shifted toward unopen and partially open conformations and the
percentage of open structures decreased to 5% (Figure 5.3, B). In comparison,
116
αvβ3-TE2 shifted towards more open conformations and the percentage of open
structures increased to 60%, suggesting that the TE2 configuration of tropoelastin
was more conducive to headpiece opening than TE1.
A structural change associated with headpiece opening is the merging of the α1
and α1’ helices into an elongated α1 helix, which has been demonstrated to main-
tain helicity in both the open αIIbβ3 crystal structure [153] and computational
modelling [154, 258]. Here, it was observed that the α1 helix maintained full he-
licity more frequently in αvβ3-Mg than in αvβ3-Ca (Figure 5.3, C), which was
expected due to the propensity of Mg2+ to promote headpiece opening of αvβ3 over
Ca2+ [262]. Interestingly, the frequency distribution of α1 helicity of αvβ3-Mg was
broader than that of αvβ3-Ca, suggesting that the α1 helix in the presence of Mg2+
was capable of sampling a wider structural range. A comparison of α1 helicity
within the two tropoelastin-integrin ensembles revealed that α1 maintained less
helicity relative to either of the integrin ensembles without tropoelastin, as ¡ 1%
of structures arising from αvβ3-TE1 and αvβ3-TE2 respectively displayed a fully
helical character in either simulation. However, the α1 helix of αvβ3-TE2 main-
tained 75% helicity within 53.8% of the structural ensemble, compared to only
37.5% of αvβ3-TE1. The α1 helix also goes from a bent to a straight alignment,
as seen in αIIbβ3 (Figure 5.3, C). This was most closely resembled by αvβ3-Mg,
followed by αvβ3-Ca.
117
Figure 5.3: Integrin headpiece opening from the last 2 ns of REMD. Frequencydistribution of the opening of the αvβ3 headpiece A) in isolation and B) with tropoelastin.Headpiece opening is quantified by the COM between the two integrin subunits. The dotted lineat 64 A indicates the threshold for the open conformation as derived from αIIbβ3. Complexesfrom the tropoelastin-integrin ensembles, depicting the αv (red) and β3 (blue) subunits andtropoelastin (gray), are shown to illustrate the degree of headpiece opening. C) The frequencydistribution of α1 helicity across the experiments. Representative snapshots of the α1 helix fromthe αIIbβ3 and all four experiments are overlaid onto the starting configuration of the α1 helixin grey. Across all frequency distributions, αvβ3-Mg is red, αvβ3-Ca is blue, αvβ3-TE1 is yellowand αvβ3-TE2 is green.
118
5.3.3 Areas of tropoelastin-integrin interaction
Protein-protein contact maps are frequently used to pinpoint interactive sites
by assessing the proximity of the residues of one protein to the residues of an-
other. Contact frequency maps were constructed here to understand which areas
of tropoelastin were in close proximity to αvβ3. Contact was defined as the Eu-
clidean distance ≤ 12 A between α carbons of tropoelastin and the integrin head-
piece [269]. As structural ensembles were generated via REMD, the frequency of
contact between each α carbon was also considered.
Both TE1 and TE2 contacted the β3 subunit more frequently than the αv subunit,
however, the αv subunit was contacted only by TE1 and not by TE2 (Figure 5.4,
A-B). It was observed that the C-terminal domains of TE1 maintained close prox-
imity to αv, but did not contact D224, which is the primary RGD binding residue
of αv (Figure 5.4, A). I focused further on the interactions between tropoelastin
and β3, as it is primarily ligand binding to β3 that promotes headpiece opening.
I observed that high-frequency (>70%) contact hotspots occurred between TE1
and β3 through domains 13, 17 – 19 and 26 (Figure 5.4, B). Meanwhile, β3
was frequently contacted by domains 7, 8, 10, 12, 16, 17, 20, 26 and 36 of TE2
(Figure 5.4, B). Multiple domains were capable of contacting the integrin in a
single structure, usually in conjunction with domain 17 (Figure 5.4, C). More of
TE2’s domains contacted αvβ3 at any given time than did TE1, due to its position
with respect to the integrin.
119
Figure 5.4: Frequency contact maps of between the integrin subunits and tropoe-lastin. The frequency of contact between the tropoelastin and A) αv subunit and the B) β3subunit. Greater contact frequency throughout the ensemble is indicated by the transition fromwhite to black. The schematic at the top represents the domains of tropoelastin, where the hy-drophobic domains are depicted in black and the cross-linking domains are depicted in white. C)Structures from REMD ensembles of αvβ3 -TE1 (left) and αvβ3 -TE2 (right) highlight multipletropoelastin domains (numbered) that contact the integrin.
To understand whether tropoelastin shared the same binding site within β3 as
120
RGD, the contacts between the cell-interactive domain 17 and the RGD binding
site of β3 were dissected in further detail. The RGD binding site is located at the
β1-α1 loop of β3 and two of its residues, Y122 and S123, are responsible for binding
RGD’s aspartate [153]. Whilst TE1 predominantly contacted the β1-α1 loop close
to the RGD binding site, TE2 almost exclusively contacted the downstream α1
helix (Figure 5.5, A). To pinpoint the nature of the interactions between these
domains, the examination of salt bridge formation demonstrated that D126 from
the end of the β1-α1 loop was capable of interacting with K289 (6.5% of structures)
and K286 (3.3% of structures) from TE1 and TE2 respectively (Figure 5.5, B).
It was also noted that hydrogen bonds occurred in both ensembles, with less
occurring in αvβ3-TE2 than in αvβ3-TE1 (Figure 5.5, C-D). Within αvβ3-TE1,
the tropoelastin residues A285, K286 and 288 were responsible for the majority of
the hydrogen bonds with β3 (Figure 5.5, C). Although the RGD-binding Y122
stayed in close proximity to TE1’s domain 17 (Figure 5.5, A), the small number
of hydrogen bonds indicated that Y122 was not heavily involved in binding domain
17 (Figure 5.4, C). Similarly, S123 did not form hydrogen bonds with domain
17 of TE1 (Figure 5.5, C) despite maintaining close proximity (Figure 5.5, A).
Within αvβ3-TE2, tropoelastin residues V272 and K286 formed the most frequent
hydrogen bonds with β3 (Figure 5.5, C). A greater number of alanines from
TE2 participated in hydrogen bonds relative to those from TE1, and K289 was
not observed to form any bonds with β3.
121
Figure 5.5: Interaction between domain 17 of tropoelastin and the β1-α1 loop/α1helix of integrin αvβ3 within the αvβ3-TE1 and αvβ3-TE2 ensembles. A) Frequencycontact maps between domain 17 and the β1-α1 loop/α1 helix. Greater contact frequencythroughout the ensemble is indicated by the transition from purple to yellow. B) Salt bridgeformation between the lysines from domain 17 with the β1-α1 loop/α1 helix. C) Number ofhydrogen bonds between domain 17 of TE1 and TE2 and the β1-α1 loop/α1 helix. D) Snapshotof the tropoelastin-integrin complexes where domain 17 and the β1-α1 loop/α1 helix interact.Inset: detailed zoom of the molecular interactions between R697 and K696 from tropoelastinand E174 from αvβ3, where R697 forms a hydrogen bond whilst K696 is engaged in a salt bridge.
A second area of tropoelastin that interacts with αv integrins is the GRKRK se-
quence from domain 36 [121]. Here, domain 36 of TE1 primarily interacted with
the αv subunit (Figure 5.6, A) and did not remain in close proximity to β3
(Figure 5.6, C). In contrast, TE2’s domain 36 maintained a contact frequency >
40% with E174 from the α1-α2 loop of β3 (Figure 5.3, A) through salt bridges
involving both lysines and arginines in the GRKRK sequence (Figure 5.6, B).
122
The two dominant salt bridges formed with E174 were via K696 and R697, and
occurred within 16% and 25% of the structures respectively. Multiple hydrogen
bonds also occurred (Figure 5.6, C), with the most frequently occurring hydro-
gen bond within 24% of structures was between E174 with R697 (Figure 5.6,
D).
It was also noted that domain 20 of TE2 contacted the α1 helix (Figure 5.7,
A), which was unexpected as domain 20 was neither docked to this region, nor
does it contain lysines. In particular, β3 residues 130-133 appeared to maintain
a contact frequency > 80%. A number of short-lived hydrogen bonds occurred
between these regions (Figure 5.7, B), predominantly through V363 and Q132,
and G366 and R143 (Figure 5.7, C). As Q132 and R143 also formed hydrogen
bonds with domain 17 (Figure 5.5, C), this suggested that they are of importance
for interaction with tropoelastin.
123
Figure 5.6: Interaction between domain 36 of tropoelastin and the α1-α2 loop ofintegrin αvβ3 within the αvβ3-TE2 ensemble. A) Frequency contact map between domain36 and the α1-α2 loop. Greater contact frequency throughout the ensemble is indicated by thetransition from purple to yellow. B) Salt bridge formation between the lysines from domain 36and the α1-α2 loop. C) Number of hydrogen bonds between domain 36 and the α1-α2 loop.D) Snapshot of the tropoelastin-integrin complex where domain 36 and the α1-α2 loop interact.Inset: detailed zoom of the molecular interactions between R697 and K696 from tropoelastinand E174 from αvβ3, where R697 forms a hydrogen bond whilst K696 is engaged in a salt bridge.
124
Figure 5.7: Interaction between domain 20 of tropoelastin and the β1-α1 loop/α1helix of integrin αvβ3 within the αvβ3-TE2 ensemble. A) Frequency contact map betweentropoelastin and αvβ3. Greater contact frequency throughout the ensemble is indicated by thetransition from purple to yellow. B) Number of hydrogen bonds between domain 20 and theα1 helix across all structures from αvβ3-TE2. C) Snapshot of the tropoelastin-integrin complexwhere domain 20 and the α1 helix interact. Inset: detailed zoom of the molecular interactionsbetween the most frequently occurring hydrogen bonds from V363 and G366 from tropoelastinand Q132 and R143 from αvβ3.
5.3.4 Principal component analysis
Principal component analysis (PCA) was employed to understand the structural
variance of the tropoelastin-integrin ensembles. PCA transforms the Cartesian
coordinates of atoms into new sets of coordinates termed principal components
125
(PCs) that describe a proportion of the variance [217]. PCs are calculated such
that each subsequent PC describes less variance than the previous PC until all
the variance has been accounted for. Thus, the top PCs are the most useful in
describing the ensembles, as they account for the most variation.
Figure 5.8: Principal component analysis (PCA) of the REMD tropoelastin-integrinensembles. A) The variance of all PC modes of αvβ3-TE1 and αvβ3-TE2 and B) the correlationsbetween the respective top six PC modes. C) Projections of the combined top six PCA modesof αvβ3-TE1 and αvβ3-TE2 onto the respective starting structures of tropoelastin TE1 (orange)and TE2 (green). Black arrows indicate the dominant modes of structural variation. D) Thesquare fluctuations of tropoelastin and E) the β3 subunit within αvβ3-TE1 and αvβ3-TE2.
Presently, PC1 accounted for 34% and 52% of the variance of αvβ3-TE1 and αvβ3-
TE2 respectively (Figure 5.8, A). The scree plot of the PCs showed a milder
slope for the first three PCs of αvβ3-TE1 compared to αvβ3-TE2, which displayed
a large drop between PC1 and PC2 (Figure 5.8, A). This indicated a dominant
type of structural variation, or rather, a preferred set of structures within αvβ3-
TE2.
126
Figure 5.9: Cluster analysis of αvβ3-TE1 and αvβ3-TE2. A) Clusters resulting fromk-means analysis of each of the 400 structures per REMD ensemble. Projection of the clus-ters resulting from B) αvβ3-TE1 and C) αvβ3-TE2 onto PC1-PC2-PC3 space. Each cluster isrepresented by a unique color.
The structural variation indicated by PCA was independently verified by k-means
analysis, which clustered the structures based on root mean square deviation
(RMSD) of atomic coordinates using a 5 A cut-off. K-means analysis resulted
in different cluster distributions between the two tropoelastin-integrin ensembles
(Figure 5.9, A). The structures were more evenly distributed throughout the
clusters of αvβ3-TE1, whereas αvβ3-TE2 displayed one highly populated cluster
that contained > 40% of the structures analyzed (Figure 5.9, A). This distribu-
127
tion shape was similar to the PC scree plot, where a larger amount of structural
variance was accounted for by PC1 of αvβ3-TE2 relative to αvβ3-TE1 (Figure
5.9, A), indicating consistency between PCA and k-means analysis.
The distribution of the clusters within the space described by the first three
PCs was next examined to understand how well the PCs described the k-clusters
(Figure 5.9, B-C). Although the majority of clusters were separated along the
PC1-PC2-PC3 axes, some of the sparsely populated clusters were unable to be
discretized. This most likely occurred because the sum of the top three PCs did
not account for enough structural variation for separation. As the majority of
the structural variation was accounted for by the sum of the top six PCs of αvβ3-
TE1 (94%) and αvβ3-TE2 (96%), the top six PCs were used for subsequent PCA
derived analyses.
To determine whether any structural similarities existed between the ensembles,
the correlations between the top six PCs were inspected (Figure 5.9, B). The
highest correlation observed was 42%, which was between PC3 of αvβ3-TE1 and
PC1 of αvβ3-TE2. The PCs within tropoelastin corresponded to structural sam-
pling around the N-terminal region (Figure 5.10). Correlations between the other
PCs were even lower, indicating structural deviation between the ensembles, and
were not investigated in further detail.
128
Figure 5.10: PC modes of tropoelastin from αvβ3-TE1 and αvβ3-TE2. PC modes 3and 1 are overlaid onto TE1 and TE2 respectively.
Tropoelastin’s flexibility has been implicated in a number of processes, includ-
ing receptor binding [62]. To investigate the type of structural variance that
tropoelastin underwent during REMD, the linear combination of the top six PCs
weighted by their contribution to the overall variance were considered. It was
observed that the variation of TE1’s N-terminus predominantly consisted of a
downward shift along the x-y axis towards the C-terminus (Figure 5.8, C). The
C-terminus of TE1 shifted upward in an x-y direction and was accompanied by a
twist along the y-z axis. The structural variance of TE2’s N-terminus consisted
of downward motions along the x-y axis toward the ‘spur’ region, whilst the C-
terminus mostly sampled a scissors-twist motion. As the motions of a protein and
its possible structural variants are highly dependent on its shape, the variations
between TE1 and TE2 are indicative that the shape of tropoelastin impacts its
ability to interact with integrins.
To pinpoint the precise domains of tropoelastin that underwent higher local struc-
tural variance, the square fluctuation profiles of tropoelastin’s amino acids were
constructed from the top six PCs (Figure 5.8, D). This revealed that, whilst
the fluctuation of domains 1-12 was similar, as depicted by the superimposed fluc-
129
tuation profiles, domain 13 and onwards displayed distinct patterns between the
ensembles (Figure 5.8, D) and corresponded to the regions that contacted αvβ3
(Figure 5.3, A-B). Fluctuation peaks occurred at domains 13, 17, 19 and 20
within αvβ3-TE1, where domains 13, 17 and 19 were previously noted to contact
the integrin headpiece (Figure 5.4, B). Domains 13, 17, 19 and 20 underwent
relatively less fluctuation in αvβ3-TE2, and formed troughs of minimal fluctuation
in the cases of domains 13, 19 and 20 (Figure 5.8, D). Considering the high
frequency of contact between these domains and the integrin headpiece (Figure
5.4, B), and the greater extent of headpiece opening seen within the αvβ3-TE2
(Figure 5.3, B), this suggests that the lower fluctuation, or rather, the stability
of these domains is required for headpiece opening.
Similarly, the square fluctuations of the β3 subunit within the tropoelastin-integrin
ensembles were also assessed (Figure 5.4, E). The areas that formed peaks within
β3 were similar between the two ensembles, and included the termini and the α1
and α7 helices. The elevated fluctuation at the termini corresponds to structural
variation at the hybrid domain. As the hybrid domain swings out during headpiece
opening (Figure 5.8, B), the greater fluctuation of αvβ3-TE2 compared to αvβ3-
TE1 is in agreement with the greater number of open headpiece conformations
within αvβ3-TE2 (Figure 5.3, B). Similarly, the fluctuations of both the α1 and
α7 helices were also higher in αvβ3-TE2 (Figure 5.4, D).
5.3.5 Headpiece opening remains stable in explicit sol-
vent
Next, cMD was conducted to examine whether the representative structures from
the ensembles maintained their structures within aqueous solvent. The plateau
in the RMSD of αvβ3 over time was used to denote structural equilibration. The
headpieces equilibrated within 100 ns, with the headpiece of αvβ3-Ca and αvβ3-
TE1 proving more stable in comparison αvβ3-Mg and αvβ3-TE2 (Figure 5.11,
130
A).
Next examined was whether the structures maintained their degree of headpiece
opening during cMD. The center of mass of the opening of the integrin head-
pieces remained relatively stable during equilibration as shown by the standard
error (Figure 5.11, B), as was expected considering the structural equilibration
previously noted (Figure 5.11, A). αvβ3-Mg displayed the greatest amount of
opening, followed closely by αvβ3-TE2, which was almost as open as the αIIbβ3
reference structure (Figure 5.11, B). In comparison, αvβ3-Ca maintained an in-
termediately open structure, whilst the opening of αvβ3-TE1 was almost that of
the initial closed αvβ3 structure.
Figure 5.11: Classical molecular dynamics equilibration of integrin structures. A)RMSD of the integrin headpiece in each simulation. B) Integrin headpiece opening over the last50 ns of simulation. Number of hydrogen bonds formed between domain 17 and the α1 helix/β1-α1 loop of C) αvβ3-TE1 and D) αvβ3-TE2. E) α-helical content of domain 17 from cMD andREMD. F) αvβ3-TE1 and αvβ3-TE2 structures arising from 100 ns of cMD equilibration.
131
The α-helical content of domain 17 was examined to establish whether tropoe-
lastin’s secondary structure played a role in its interaction with αvβ3, as the sec-
ondary structure of cross-linking domains impacts tropoelastin function [58]. The
α-helicity of domain 17 observed in TE1 and TE2 differed depending on the sam-
pling methodology (Figure 5.11, C). Low α-helicity (< 11%) was noted during
sampling with REMD, whereas cMD yielded higher (> 44%) helicity. The in-
creased α-helicity of domain 17 during cMD altered the interactions between αvβ3
relative to those previously derived from REMD. The hydrogen bond that formed
between A285 and K125 during REMD was prolonged during cMD, and a new
bond was formed between A287 and W129 that had not been observed during
REMD (Figure 5.11, D, F). The hydrogen bond duration of K286 and K289
was short (< 2%), which indicated that lysines were less likely to bind the integrin
when domain 17 was α-helical. As αvβ3 remained closed in the presence of TE1
(Figure 5.11, B) despite the formation of the new bonds, it is likely that the
α-helical configuration of domain 17 does not promote headpiece opening. The hy-
drogen bonds between TE2 and αvβ3 became sparser during cMD than in REMD
and were of short duration (< 2%) (Figure 5.11, E). Indeed, visual inspection
indicated that domain 17 of TE2 moved away from the α1 helix during cMD. In-
terestingly, αvβ3 remained open despite the vacating of domain 17 (Figure 5.11,
B, F), thus, I propose that the binding of other domains stabilizes the headpiece
such that it does not immediately close once domain 17 no longer binds the active
site.
5.4 Discussion
The interaction between tissue ECM and cell surface receptors is mediated by the
recognition of various short sequences within ECM proteins. Since the sequences
within tropoelastin bear no resemblance to any known integrin-binding ECM mo-
tifs, a series of peptide based studies that were previously undertaken successfully
132
narrowed down the field of possibilities to two integrin-binding sites at domains
17 and 36 [121–123]. However, the full characterization of these interactions was
hampered because the precise interactive site within the integrin that mediates
the interactions was unknown, and a full atomistic tropoelastin-integrin complex
was unavailable for structural analysis. Thus, the current study leveraged the
recent full atomistic computational molecular model of tropoelastin [61, 62] and
the crystal structure of integrin αvβ3 to probe their interaction in silico.
Prior computational studies utilized short cMD simulations to examine the extent
to which fibronectin fragments opened the integrin headpiece [154, 258]. These
simulations resulted in partially open headpieces, strongly suggesting that cMD
does not sufficiently sample the integrin’s conformational landscape for opening
to be observed within the timeframe of the simulation. This is unsurprising for
large systems the size of the integrin headpiece (let alone when in the presence of
a protein as large and flexible as tropoelastin) as large domain motions, such as
full headpiece opening, are generally only observable on a microsecond timescale
using cMD [270]. To address this, headpiece opening was approached from a
probabilistic viewpoint. I successfully employed REMD to generate a variety of
possible structures and examined whether the distribution of headpiece opening
depended on the type of ions present and the inclusion or absence of tropoelastin
structures. To the best of my knowledge, the extent of headpiece opening here
was greater than that of prior cMD simulations [154,258,271], demonstrating the
utility of REMD for exploring domain-level conformational changes within large
protein-protein complexes within a feasible amount of time and resources. The rep-
resentative structures resulting from REMD were confirmed to stabilise over 100
ns of cMD. When factoring in the nature of cMD, this indicated that a local min-
imum energy structure was reached in each case, highlighting the need for REMD
for broad conformational exploration of large protein-protein complexes.
As tropoelastin is a highly flexible protein that is capable of inhabiting a variety of
conformations [62], I first asked whether its interaction with αvβ3 is conformation
133
dependent. The TE2 conformation of tropoelastin was found to promot a greater
proportion of open integrin headpiece structures compared to TE1 during REMD,
which indicated that TE2 was the preferred binding conformation. However, since
TE1 binding to αvβ3 did not abolish integrin headpiece opening and resulted in a
high population of partially open structures, it could not be ruled out that TE1
did not induce headpiece opening. That two distinct conformations of tropoelastin
promoted varying degrees of headpiece opening via separate integrin sites indicates
unconventional binding mechanisms. As a highly flexible protein with domain-
dependent intrinsic local disorder [44,62], it is possible that tropoelastin binds its
receptors in a fuzzy manner, which is the way in which fully and partly disordered
proteins interact with their binding partners [272,273]. Since the fuzzy behavior of
both the hydrophobic domains [32, 51, 52, 274] and the cross-linking domains [58]
of tropoelastin during self-assembly have been observed, it is conceivable that
this fuzzy behavior is also present during receptor binding. The formation of
multiple transient electrostatic bonds and variable structural fluctuations within
tropoelastin during integrin binding observed here are also consistent with fuzzy
binding [272, 273], however, further studies need to be undertaken to confirm
this.
In addition to the tropoelastin-integrin complexes, αvβ3 was examined in its
medium and low affinity binding structures. Little difference was found between
the proportion of open structures, which is in contrast to prior computational stud-
ies [154,258]. As previous studies utilized cMD rather than REMD, it is likely that
the lack of headpiece opening in prior studies was due to limited conformational
sampling. Indeed, it displayed less movement in the current cMD experiments
relative to the broader sampling of REMD. Further contributions to the similari-
ties in headpiece opening seen here between the integrin-only ensembles may have
also arisen from the lack of integrin thigh domains that normally restrain head-
piece movement. These were not included to maximize computational resources,
however, the lack of these constraints coupled with REMD may have assisted in
134
facilitating broader conformational sampling than that previously observed in sil-
ico [154, 258]. This suggests that the αvβ3 headpiece is capable of opening in the
presence of both Mg2+ and Ca2+, and sheds light onto why cells are still able to
bind tropoelastin at a low level in the presence of calcium in vitro [121–123].
It has been hypothesized that tropoelastin binds integrins unconventionally due to
its lack of known interactive sequences and the inability of RGD peptides to block
tropoelastin-cell binding [121]. As such, the interactions between tropoelastin and
αvβ3 were examined at the residue level. In particular, I focused on domain 17 due
to its integrin binding capabilities [121–123]. It was observed that within the TE2
configuration, domain 17 bound the α1 helix rather than the β1-α1 loop, which
is downstream from the RGD binding site [153]. As TE2 facilitated the largest
amount of headpiece opening, this suggests that the preferred tropoelastin binding
site is the α1 helix. Conventionally, the binding of the β1-α1 loop to RGD causes
slight atomic shifts that result in subsequent conformational changes and head-
piece opening. Thus, if the preferred tropoelastin structure binds the α1 helix, this
raises the question of the nature of the allosteric pathway through which headpiece
opening is elicited. Both crystallographic and MD studies demonstrate that the
joining of the α1 and α1’ helices, and the subsequent movements of the α7 helix are
key components of the headpiece opening mechanism [153,154,253,258,268,275].
The current results are in alignment with this, as the REMD tropoelastin-integrin
ensemble that contained more open configurations, αvβ3-TE2, also maintained
greater α1 helicity relative to αvβ3-TE1. The greater fluctuation of the α7 helix
and its distance from the α1 helix across the ensembles suggest that domain 17
promotes headpiece opening through a similar pathway to that of RGD ligands,
albeit, via another nearby site. This is the first atomistic scale evidence to propose
that tropoelastin’s primary binding site is not the RGD-binding site on αvβ3.
Lysines are thought to be key mediators of the tropoelastin-integrin interaction,
as both of tropoelastin’s currently known interactive sites contain lysines [121,
122] and that cell binding is greatly reduced after lysine point mutations [123].
135
However, the nature of these interactions was not fully explored due to the lack
of a high-resolution structure of tropoelastin at the time. Using the full atomistic
model of tropoelastin, I demonstrated that the lysines of tropoelastin are involved
in a variety of interactions with multiple integrin residues. The lysines of domain
17 formed salt bridges with D126 from the β1-α1 loop of the integrin. The role
of D126 during ligand binding is not completely clear, however, it is capable of
losing and regaining coordination to ADMIDAS throughout the stages of headpiece
opening during RGD binding [153]. As the loss of the D126-ADMIDAS bond was
not observed here despite the interactions between D126 and the lysines of domain
17, K286 and K289, this hints that the role of the salt bridges is to anchor domain
17 to the integrin to form subsequent bonds. In addition to the D126 salt bridge,
the lysines of domain 17 formed hydrogen bonds with various β3 residues of the
β1-α1 loop and α1 helix depending on the initial tropoelastin structure. The
transiency of these bonds and interchangeability between the binding partners
is consistent with the numerous short-lived interactions that occur during fuzzy
binding [276,277].
In addition to confirming the involvement of lysines, the present findings corrob-
orate in vitro evidence that tyrosines are not crucial for tropoelastin’s interaction
with αv integrins [123]. The sole tyrosine within domain 17 did not contact the
interactive site within αvβ3-TE2 and accounted for only a small fraction of the
bonds within αvβ3-TE1. Regarding the involvement of other residues, a receptor
binding role was identified for the arginines of domain 36, as they made up a
substantial proportion of the hydrogen bonds observed between domain 36 and
αvβ3. This was interesting, as arginines have been primarily examined within the
context of tropoelastin’s structural stability [36]. Considering the results from this
study in light of the cell binding ability of the C-terminal GRKRK sequence [121]
and its high conservation across mammals [245], I propose that the arginines of
domain 36 are of functional significance in integrin binding.
Although lysines are important for the tropoelastin-integrin interaction [123], ma-
136
ture elastin is extensively cross-linked via its lysines [34, 111] and, thus, contains
less unmodified lysines available to bind integrins. Although domain 17 of tropoe-
lastin binds integrins, its participation in elastin-integrin interactions is less clear
as it is capable of participating in cross-links [34, 111] . This study is the first to
provide evidence for the involvement of a non-cross-linking region, domain 20, in
tropoelastin-integrin interactions, and is further corroborated by contacts noted
here between other non-cross-linking areas such as domains 16 and 26. As tropoe-
lastin’s largest domain, domain 20 has a greater solvent exposed surface area rela-
tive to other non-cross-linking domains [78], which renders it readily available for
cell receptor contact. Additionally, domain 20 forms part of tropoelastin’s hinge
region [47, 59] which regulates molecular flexibility [22], and as such, is likely to
have a key role in facilitating the conformational sampling necessary for cellular
interactions. This flexibility may, in part, be preserved here by the short-lived
hydrogen bonding between the backbones tropoelastin’s small non-polar residues
and the integrin. Indeed, it has previously been suggested that backbone hydrogen
bonding may be a mechanism for the extension and recoil of elastomeric peptides
of similar composition to domain 20 [278]. Furthermore, the interaction of domain
20 and αvβ3 is significant, as much emphasis has been previously placed on the
involvement of cross-linking domains in tropoelastin-integrin interactions. The
ability of tropoelastin’s hydrophobic domains to bind integrins may explain how
elastin can interact with cells if its lysines are unavailable for binding. Future
studies should test domain 20 in isolation to verify that it is capable of cell bind-
ing, as this may provide avenues for investigating the cell interactive properties of
hydrophobic domains.
The local and global structural sampling of tropoelastin have been of increasing
interest to extrapolate its behavior in the context of self-assembly [22,61,61,247].
Here, similar principles were applied to examine the structural sampling of tropoe-
lastin when bound to αvβ3. PCA demonstrated that TE1 and TE2 underwent
dissimilar modes of local structural sampling during REMD due to decreases in
137
atomic fluctuation at the majority of tropoelastin sites that bound, or were in
close proximity to, the αvβ3. The local fluctuations of TE1 seen here deviated
from tropoelastin in isolation, whilst TE2 appeared to maintain an overall fluctu-
ation pattern that resembled that of isolated tropoelastin [62]. This indicates that
TE2 bears a greater resemblance to the most representative structure of tropoe-
lastin even when bound to the integrin. This likeness was further confirmed by
examining the global motions of the molecules, where the presence of a C-terminal
scissors-twist motion was noted within TE2, but not TE1, that has been previ-
ously described [22, 61, 62]. In the context of receptor interactions, it is possible
that the scissors-twist facilitates the increased frequency of contact of C-terminal
domains 30-36 as observed here, which could be important for stabilizing this in-
teraction if tropoelastin is indeed a fuzzy binder. Its preservation here certainly
indicates that it is of importance for the interaction with αvβ3, as the intrinsic
motions accessible to ligands and their receptors are primary drivers of protein
binding [279].
The tropoelastin monomer contains high random coil content, imbuing it with
flexibility to undergo the appropriate conformational sampling required for self-
assembly [32]. A notable structural change that occurs as a consequence of cross-
linking is the increased α-helicity of the cross-linking domains [52,58]. This change
is a requirement for the formation of desmosine cross-links, which only occur if
the lysines on the face of the α-helices are aligned correctly [53]. The formation of
α-helices during cMD here was not favorable for the interaction with αvβ3, which
is intuitive as the main purpose of α-helices within tropoelastin is to become
rigid and cross-link rather than interact with receptors. The sampling of multiple
bonds between tropoelastin and αvβ3 appears to be key for their interaction, as
evidenced by this study’s comparison between REMD and cMD, and is achievable
when the interactive regions are not rigid. This builds on prior simulations that
noted the high structural sampling of domain 17 [62], and strongly implies that
tropoelastin needs to continue sampling conformational space to elicit integrin
138
headpiece opening via its cross-linking domains.
Overall, both the REMD and cMD data support a model where tropoelastin re-
quires multiple contact sites throughout the integrin headpiece to elicit biological
functionality. Comparable use of multiple binding sites is seen for fibronectin-
integrin interactions; fibronectin comprises non-RGD synergy regions that coop-
erate to promote strong receptor binding [280, 281]. Further evidence for such
synergy sites in tropoelastin comes from tropoelastin peptide studies that found
progressively less cellular interactions with sequential truncations of the tropoe-
lastin sequence [122]. On this basis, I propose that no single domain is wholly
responsible for facilitating tropoelastin-cellular responses, and that instead mul-
tiple defined sites on tropoelastin cooperate to bind the integrin. Future studies
could incorporate a wider range of tropoelastin conformations to examine the full
array of domains capable of interacting with the integrin at its binding site as
well as at other regions to establish the importance of these additional points of
contact.
139
6.1 General discussion
This thesis builds up a series of computational models that explore tropoelastin’s
sequence, structure and mobility with respect to its functionality. This thesis
utilises tropoelastin’s recently derived full-atomistic structure to conduct these
studies, whilst additionally leveraging prior biological experiments to shed light
onto aspects of tropoelastin’s behaviour that have remained unexplored due the
incompatibility of current structural methodologies with tropoelastin’s flexibil-
ity.
6.2 Allysine modifications and their implication
for self-assembly
The structure of WT tropoelastin has been explored in detail using experimen-
tal [36,37,59,233] and MD methodologies [22,61,62], several of which noted that
its structure and intrinstic motions are easily perturbed by a variety of mutations.
Furthermore, such changes have repercussions on the macromolecular scale, ex-
hibiting altered coacervation and biomaterial formation [36,37,233]. As it is con-
ceivable that natural modifications may also alter tropoelastin’s structure and
molecular motions, I explored whether the naturally occurring allysine modifica-
tions that are required for cross-linking are also capable of causing domain dis-
placements and alteration of tropoelastin’s overall molecular motions. I selected
lysine sites for modification on the basis of prior experimental data that were
identified as either cross-links in native elastin [53, 235] or cross-linking hotspots
in synthetic elastin studies [85,110].
I conducted REMD and cMD on three variants of modified tropoelastin, ALK353,
ALK507 and 5ALK. In all cases, I observed that both single and multiple ally-
sine modifications were capable of perturbing the canonical structural ensemble of
141
WT tropoelastin. I utilised PCA and NMA to demonstrate that these ensembles
differed significantly from not only WT [61], but also from one another. This
indicates a surprising diversity in the motions of natural tropoelastin that had
not been considered prior to this study [61, 62]. Importantly, the broad variety
of motions arising from the three tropoelastin molecules surveyed here provides
a mechanism for the heterogeneity seen in recent native elastin mass spectrom-
etry studies [34, 111]. In addition to global structural changes, I also identified
localised shifts in protein backbone fluctuation between tropoelastin containing
single allysines and 5ALK, tropoelastin containing five allysines. Interestingly, I
saw that ALK353 and ALK507 exhibited an approximate 3-fold decrease in the
overall local fluctuation relative to 5ALK, which suggests that tropoelastin’s flex-
ibility is enhanced due to multiple allysine modifications. Curiously, ALK353 and
ALK507 also presented with lowered backbone fluctuation relative to WT, simi-
lar to those displayed by tropoelastin mutants that exhibited altered functional-
ity [22, 36, 37, 61, 233]. As the elevated conformational sampling of tropoelastin
is thought to be crucial for its self-assembly into higher order structures [22, 61],
I hypothesise that the dampened flexibility of ALK353 and ALK507 provides a
checkpoint to prevent their aggregation into the growing elastin chain. In doing
so, tropoelastin with multiple allysines is preferentially incorporated into the chain
due to its high conformational sampling, permitting the formation of a stronger
chain due to more cross-linking, whilst tropoelastin with low mobility is less likely
to undergo assembly.
6.3 Updating the head-to-tail model of assem-
bly
The head-to-tail model of elastin assembly was proposed based on SAXS/SANS
analysis that identified the global structural envelope of tropoelastin [59] and
142
the only elastin cross-link discovered at the time, which originated from porcine
elastin [53]. However, the model has a number of inconsistences in light of recent
data. Technical improvements in mass spectrometry have allowed the discovery
of numerous cross-linking sites within native human elastin [34], none of which
have corresponded to those found in porcine elastin [53]; this is likely due to the
differences between porcine and human tropoelastin [245]. Moreover, the positions
of the cross-linking sites within the recently modelled full-atomistic structure of
tropoelastin are not in agreement with those postulated based on the SAXS/SANS
envelope [59]. Further to this, numerous models of early stage coacervation [32,
78] noted heterogeneity in the interactions between tropoelastin domains and, in
particular, a coarse-grained study of 40 tropoelastin monomers noted a variety of
interactions that formed between the monomers [78].
I explored whether the head-to-tail model would hold true at the earliest stage of
elastin assembly - nucleation. I examined the interactions between a variety of do-
mains from synthetic studies [110], recent native elastin publications [34,111], and
the canonical porcine cross-link that the head-to-tail model was partially based
on [53]. I generated almost 20,000 dimers via protein-protein docking using the
three most representative conformations from WT tropoelastin’s structural en-
semble, giving rise to the largest number of tropoelastin formations examined in
a single study to date. I did not observe the precise style of head-to-tail interac-
tion that has been previously postulated [59], however, I observed that a class of
dimer that could be termed “head-to-middle” that was most similar to the model
proposed by Baldock and colleagues, that also formed the majority of the dimers
noted in the current study. To better understand the factors involved in form-
ing head-to-tail dimers, I employed logistic regression through machine learning,
which indicated that geometric data, such as domains and initial conformation,
were the most crucial in determining the resultant dimer. Although I only exam-
ined early stage coacervation here, when considering that extended coacervation
studies indicate that a variety of associations persist until at least 10 fs [78], it
143
is likely that the nucleation configurations hold their structure for some time,
validating my current approach. Overall, this study presents an important step
in unifying the decade-old head-to-tail model with current data from experimen-
tal and computational sources, simultaneously demonstrating the application of
machine learning in untangling protein-protein interactions.
6.4 Fuzzy binding mechanisms of tropoelastin
and integrins
Tropoelastin is known to bind a number of integrins [121–123, 126], facilitating
events that are crucial to wound repair and tissue regeneration. Nevertheless,
there is a distinct lack of fundamental knowledge pertaining to the nature of this
interaction. Firstly, the RGD sequences of other ECM proteins are responsible
for binding integrins, however, tropoelastin is devoid of this motif. Indeed, the
regions involved in these interactions have been pinpointed to be domains 17-
18 and 36, which are similar only in that they contain lysines. As the lack of
a tropoelastin-integrin crystal structure is unavailable due to tropoelastin’s in-
herent flexibility, this also leaves the question of the integrin site that interacts
with tropoelastin a mystery. Based on prior studies detailing the conformational
changes that are crucial for assessing integrin activation [154, 258, 268] and inte-
grin residues that participate in non-RGD interactions with other surfaces [263],
I sought to construct a model of tropoelastin-integrin binding that could explain
the observations detailed above.
By comparing the REMD ensembles of two tropoelastin structures that had been
docked to integrin αvβ3, I inspected the binding mechanism from a stochastic per-
spective. Quantification of the range of tropoelastin structures and integrin head-
pieces in various states of opening demonstrated that tropoelastin has a preferred
binding conformation that strongly promotes integrin activation. More interest-
144
ingly, that the second structure did not abolish headpiece opening strongly sug-
gested that tropoelastin interacts with integrins in a fuzzy manner [276], forming
many transient bonds that still allow tropoelastin to exhibit conformational sam-
pling. Indeed, I observed that multiple tropoelastin sites were capable of contact-
ing the integrin, involving domains 17 and 36, as previously observed [121–123], as
well as domain 20. Curiously, domain 20 is a hydrophobic domain that does not
contain the positively charged lysines that appeared to be responsible for domains
17 and 36 interacting with αvβ3 in the current study. Excitingly, this provides a
mechanism with which highly cross-linked elastin - which may present with limited
lysines on its surface - can continue to interact with integrins in mature tissues. By
narrowing down tropoelastin’s preferred binding site, the α1 helix, I hypothesise
its utility for future wound healing and tissue repair therapeutics if verified using
crystallography.
6.5 Future directions
This thesis enhances the current understanding of the delicate structure-function
relationship of tropoelastin with respect to the process of self-assembly. My find-
ing that allysines modify the structure and dynamics of tropoelastin indicates that
it is not sufficient to examine elastogenesis in the context of unmodified tropoe-
lastin. The impact of allysine modifications that I hypothesise here, including
the checkpoint theory presented in Chapter 3, can be further investigated using
large scale coarse-grained studies, similar to those discussed previously in this the-
sis. In that regard, a coarse-grained study utilising allysine modified tropoelastin
would also provide further insights into the nature of nucleation events and fibre
directionality, building upon the present dimer study.
An important avenue that this thesis has opened up is the exploration of tropoelastin-
receptor binding. I have demonstrated that it is feasible to model large scale
protein-protein interactions using ensemble methodologies. This study paves the
145
way for a deeper understanding of the mechanisms of interactions between tropoe-
lastin and other ECM receptors such as GAGs and EBP, provided that they have
available structures that have been derived from crystallography. The elucidation
of these interactive sites is crucial for furthering the field of tropoelastin-based
therapeutics, as detailed knowledge of the mechanisms through which tropoe-
lastin exerts its biological effects is required for optimising features such as the
presentation of its cell interactive sites.
Although a commonly criticised aspect of computational methodologies is that
they are unverified. Here, I have taken the approach to use experimental data to
inform my modelling rather than use simulations purely as discovery tools. Thus,
the models developed in this thesis unite prior experimental data with the full-
atomistic structure of tropoelastin to examine its functionality under an atomic
scale lens. This has enabled us to better explain prior findings whilst also al-
lowing the observation of new events. By understanding that the self-interactive
and cell receptor binding behaviours of tropoelastin can only partially be ex-
plained on a macroscopic level, I highlight the need for future work to incorporate
molecular data, whilst acknowledging the importance of experimentally robust
studies.
146
References
[1] Robert P Mecham and Elaine C Davis. Elastic fiber structure and assembly.Peter D Yurchenko, David E Birk, Robert P Mecham (eds), New York,Academic Press, 281-314, 1994.
[2] Russell Ross and Paul Bornstein. The elastic fiber: I. the separation andpartial characterization of its macromolecular components. The Journal ofCell Biology, 40(2):366–381, 1969.
[3] Hiromi Yanagisawa and Jessica Wagenseil. Elastic fibers and biomechanicsof the aorta: Insights from mouse studies. Matrix Biology, 85:160–172, 2019.
[4] Jaime Moore and Susan Thibeault. Insights into the role of elastin in vocalfold health and disease. Journal of Voice, 26(3):269–275, 2012.
[5] Robert P Mecham. Elastin in lung development and disease pathogenesis.Matrix Biology, 73:6–20, 2018.
[6] SD Shapiro, SK Endicott, MA Province, JA Pierce, EJ Campbell, et al.Marked longevity of human lung parenchymal elastic fibers deduced fromprevalence of d-aspartate and nuclear weapons-related radiocarbon. TheJournal of Clinical Investigation, 87(5):1828–1834, 1991.
[7] Janet T Powell, Nicholas Vine, and Margot Crossman. On the accumulationof d-aspartate in elastin and other proteins of the ageing aorta. Atheroscle-rosis, 97(2-3):201–208, 1992.
[8] Muhammad M Bashir, Zena Indik, Helena Yeh, Norma Ornstein-Goldstein,Joan C Rosenbloom, William Abrams, M Fazio, J Uitto, and J Rosenbloom.Characterization of the complete human elastin gene. delineation of un-usual features in the 5’-flanking region. Journal of Biological Chemistry,264(15):8887–8891, 1989.
[9] Zena Indik, Helena Yeh, Norma Ornstein-Goldstein, Paul Sheppard, NoelAnderson, Joan C Rosenbloom, Leena Peltonen, and Joel Rosenbloom. Al-ternative splicing of human elastin mrna indicated by sequence analysisof cloned genomic and complementary dna. Proceedings of the NationalAcademy of Sciences, 84(16):5680–5684, 1987.
[10] Zena Indik, Helena Yeh, Norma Ornstein-Goldstein, Umberto Kucich,William Abrams, Joan C Rosenbloom, and Joel Rosenbloom. Structureof the elastin gene and alternative splicing of elastin mrna: implications forhuman disease. American Journal of Medical Genetics, 34(1):81–90, 1989.
147
[11] MJ Fazio, DR Olsen, H Kuivaniemi, ML Chu, JM Davidson, J Rosenbloom,and J Uitto. Isolation and characterization of human elastin cdnas, and age-associated variation in elastin gene expression in cultured skin fibroblasts.Laboratory Investigation; a Journal of Technical Methods and Pathology,58(3):270–277, 1988.
[12] William C Parks, Jill D Roby, Leeju C Wu, and Leonard E Gross. Cellularexpression of tropoelastin mrna splice variants. Matrix, 12(2):156–162, 1992.
[13] Sean E Reichheld, Lisa D Muiznieks, Robert Lu, Simon Sharpe, and Fred WKeeley. Sequence variants of human tropoelastin affecting assembly, struc-tural characteristics and functional properties of polymeric elastin in healthand disease. Matrix Biology, 84:68–80, 2019.
[14] Laurent Debelle and AM Tamburro. Elastin: molecular description andfunction. The International Journal of Biochemistry & Cell Biology,31(2):261–272, 1999.
[15] Zhou Chen, Mi Hee Shin, Young Ji Moon, Se Rah Lee, Yeon Kyung Kim,Jo-Eun Seo, Ji Eun Kim, Kyu Han Kim, and Jin Ho Chung. Modulationof elastin exon 26a mrna and protein expression in human skin in vivo.Experimental Dermatology, 18(4):378–386, 2009.
[16] Helen Piontkivska, Yi Zhang, Eric D Green, Laura Elnitski, NISC Compar-ative Sequencing Program, et al. Multi-species sequence comparison revealsdynamic evolution of the elastin gene that has involved purifying selectionand lineage-specific insertions/deletions. BMC Genomics, 5(1):31, 2004.
[17] Eiichi Hirano, Russell H Knutsen, Hideki Sugitani, Christopher H Ciliberto,and Robert P Mecham. Functional rescue of elastin insufficiency in mice bythe human elastin gene: implications for mouse models of human disease.Circulation Research, 101(5):523–531, 2007.
[18] William C Parks and Susan B Deak. Tropoelastin heterogeneity: implica-tions for protein function and disease. Elastic, 1(399–406):2, 1990.
[19] Sacha A Jensen, Bernadette Vrhovski, and Anthony S Weiss. Domain 26 oftropoelastin plays a dominant role in association by coacervation. Journalof Biological Chemistry, 275(37):28449–28454, 2000.
[20] Beth A Kozel, Hiroshi Wachi, Elaine C Davis, and Robert P Mecham. Do-mains in tropoelastin that mediate elastin depositionin vitro and in vivo.Journal of Biological Chemistry, 278(20):18491–18498, 2003.
[21] Lisa D Muiznieks, Ming Miao, Eva E Sitarz, and Fred W Keeley. Contri-bution of domain 30 of tropoelastin to elastic fiber formation and materialelasticity. Biopolymers, 105(5):267–275, 2016.
[22] Giselle C Yeo, Anna Tarakanova, Clair Baldock, Steven G Wise, Markus JBuehler, and Anthony S Weiss. Subtle balance of tropoelastin molecularshape and flexibility regulates dynamics and hierarchical assembly. ScienceAdvances, 2(2):e1501145, 2016.
148
[23] Ming Miao, Sean E Reichheld, Lisa D Muiznieks, Eva E Sitarz, SimonSharpe, and Fred W Keeley. Single nucleotide polymorphisms and do-main/splice variants modulate assembly and elastomeric properties of hu-man elastin. implications for tissue specificity and durability of elastic tissue.Biopolymers, 107(5):e23007, 2017.
[24] Bernadette Vrhovski, Sacha Jensen, and Anthony S Weiss. Coacervationcharacteristics of recombinant human tropoelastin. European Journal ofBiochemistry, 250(1):92–98, 1997.
[25] Judith Ann Foster, Eveline Bruenger, William R Gray, and Lawrence BSandberg. Isolation and amino acid sequences of tropoelastin peptides. Jour-nal of Biological Chemistry, 248(8):2876–2879, 1973.
[26] William R Gray, Lawrence B Sandberg, and Judit A Foster. Molecularmodel for elastin structure and function. Nature, 246(5434):461–466, 1973.
[27] Lawrence B Sandberg, Terril B Wolt, and John G Leslie. Quantitation ofelastin through measurement of its pentapeptide content. Biochemical andBiophysical Research Communications, 136(2):672–678, 1986.
[28] Rao S Rapaka, K Okamoto, and DW Urry. Coacervation properties in se-quential polypeptide models of elastin: Synthesis of h-(ala-pro-gly-gly) n-val-ome and h-(ala-pro-gly-val-gly) n-val-ome. International Journal of Peptideand Protein Research, 12(2):81–92, 1978.
[29] Bernadette Vrhovski and Anthony S Weiss. Biochemistry of tropoelastin.European Journal of Biochemistry, 258(1):1–18, 1998.
[30] Prachumporn Toonkool, Sacha A Jensen, Adam L Maxwell, and Anthony SWeiss. Hydrophobic domains of human tropoelastin interact in a context-dependent manner. Journal of Biological Chemistry, 276(48):44575–44580,2001.
[31] Lisa D Muiznieks, Anthony S Weiss, and Fred W Keeley. Structural disorderand dynamics of elastin. Biochemistry and Cell Biology, 88(2):239–250, 2010.
[32] Sarah Rauscher and Regis Pomes. The liquid structure of elastin. Elife,6:e26526, 2017.
[33] Sarah Rauscher and Regis Pomes. Structural disorder and protein elasticity.In Fuzziness, pages 159–183. Springer, 2012.
[34] Tobias Hedtke, Christoph U Schrader, Andrea Heinz, Wolfgang Hoehen-warter, Jurgen Brinckmann, Thomas Groth, and Christian EH Schmelzer.A comprehensive map of human elastin cross-linking during elastogenesis.The FEBS Journal, 286(18):3594–3610, 2019.
[35] Patricia L Brown, Lisa Mecham, Clarina Tisdale, and Robert P Mecham.The cysteine residues in the carboxy terminal domain of tropoelastin forman intrachain disulfide bond that stabilizes a loop structure and positivelycharged pocket. Biochemical and Biophysical Research Communications,186(1):549–555, 1992.
149
[36] Giselle C Yeo, Clair Baldock, Steven G Wise, and Anthony S Weiss. Anegatively charged residue stabilizes the tropoelastin n-terminal region forelastic fiber assembly. Journal of Biological Chemistry, 289(50):34815–34826,2014.
[37] Giselle C Yeo, Clair Baldock, Steven G Wise, and Anthony S Weiss. Tar-geted modulation of tropoelastin structure and assembly. ACS BiomaterialsScience & Engineering, 3(11):2832–2844, 2017.
[38] Lisa D Muiznieks and Anthony S Weiss. Flexibility in the solution structureof human tropoelastin. Biochemistry, 46(27):8196–8205, 2007.
[39] David He, Ming Miao, Eva E Sitarz, Lisa D Muiznieks, Sean Reichheld,Richard J Stahl, Fred W Keeley, and John Parkinson. Polymorphisms inthe human tropoelastin gene modify in vitro self-assembly and mechanicalproperties of elastin-like polypeptides. PloS One, 7(9):e46130, 2012.
[40] Laurent Debelle, Alain JP Alix, Marie-Paule Jacob, Jean-Pierre Huvenne,Maurice Berjot, Bernard Sombret, and Pierre Legrand. Bovine elastin and κ-elastin secondary structure determination by optical spectroscopies. Journalof Biological Chemistry, 270(44):26099–26103, 1995.
[41] Ellen Green, Richard Ellis, and Peter Winlove. The molecular structure andphysical properties of elastin fibers as revealed by raman microspectroscopy.Biopolymers: Original Research on Biomolecules, 89(11):931–940, 2008.
[42] Vesna Serrano, Wenge Liu, and Stefan Franzen. An infrared spectroscopicstudy of the conformational transition of elastin-like polypeptides. Biophys-ical Journal, 93(7):2429–2435, 2007.
[43] JR Lyerla Jr and DA Torchia. Molecular mobility and structure of elastin de-duced from the solvent and temperature dependence of carbon-13 magneticresonance relaxation data. Biochemistry, 14(23):5175–5183, 1975.
[44] Joel P Mackay, Lisa D Muiznieks, Prachumporn Toonkool, and Anthony SWeiss. The hydrophobic domain 26 of human tropoelastin is unstructuredin solution. Journal of Structural Biology, 150(2):154–162, 2005.
[45] Alex Kentsis and Tobin R Sosnick. Trifluoroethanol promotes helix for-mation by destabilizing backbone exposure: desolvation rather than nativehydrogen bonding defines the kinetic pathway of dimeric coiled coil folding.Biochemistry, 37(41):14613–14622, 1998.
[46] Herald Reiersen and Anthony R Rees. Trifluoroethanol may form a solventmatrix for assisted hydrophobic interactions between peptide side chains.Protein Engineering, 13(11):739–743, 2000.
[47] Ming Miao, Judith T Cirulis, Shaun Lee, and Fred W Keeley. Structuraldeterminants of cross-linking and hydrophobic domains for self-assembly ofelastin-like polypeptides. Biochemistry, 44(43):14367–14375, 2005.
[48] Laurent Debelle, Alain JP Alix, Shao M Wei, Marie-Paule Jacob, Jean-Pierre Huvenne, Maurice Berjot, and Pierre Legrand. The secondary struc-
150
ture and architecture of human elastin. European Journal of Biochemistry,258(2):533–539, 1998.
[49] Brigida Bochicchio, Antonietta Pepe, and Antonio M Tamburro. Investi-gating by cd the molecular mechanism of elasticity of elastomeric proteins.Chirality: The Pharmacological, Biological, and Chemical Consequences ofMolecular Asymmetry, 20(9):985–994, 2008.
[50] Sarah Rauscher, Stephanie Baud, Ming Miao, Fred W Keeley, and RegisPomes. Proline and glycine control protein self-organization into elastomericor amyloid fibrils. Structure, 14(11):1667–1676, 2006.
[51] Stefan Roberts, Michael Dzuricky, and Ashutosh Chilkoti. Elastin-likepolypeptides as models of intrinsically disordered proteins. FEBS Letters,589(19):2477–2486, 2015.
[52] Sean E Reichheld, Lisa D Muiznieks, Fred W Keeley, and Simon Sharpe.Direct observation of structure and dynamics during phase separation ofan elastomeric protein. Proceedings of the National Academy of Sciences,114(22):E4408–E4415, 2017.
[53] Patricia Brown-Augsburger, Clarina Tisdale, Thomas Broekelmann, Car-olyn Sloan, and Robert P Mecham. Identification of an elastin cross-linkingdomain that joins three peptide chains possible role in nucleated assembly.Journal of Biological Chemistry, 270(30):17778–17783, 1995.
[54] An-Suei Yang and Barry Honig. Free energy determinants of secondarystructure formation: I. α-helices. Journal of Molecular Biology, 252(3):351–365, 1995.
[55] Franc Avbelj. Amino acid conformational preferences and solvation of polarbackbone atoms in peptides and proteins. Journal of Molecular Biology,300(5):1335–1359, 2000.
[56] Peizhi Luo and Robert L Baldwin. Mechanism of helix induction by triflu-oroethanol: a framework for extrapolating the helix-forming properties ofpeptides from trifluoroethanol/water mixtures back to water. Biochemistry,36(27):8413–8421, 1997.
[57] Antonio Mario Tamburro, Antonietta Pepe, and Brigida Bochicchio. Lo-calizing α-helices in human tropoelastin: assembly of the elastin “puzzle”.Biochemistry, 45(31):9518–9530, 2006.
[58] Sean E Reichheld, Lisa D Muiznieks, Richard Stahl, Karen Simonetti, SimonSharpe, and Fred W Keeley. Conformational transitions of the cross-linkingdomains of elastin during self-assembly. Journal of Biological Chemistry,289(14):10057–10068, 2014.
[59] Clair Baldock, Andres F Oberhauser, Liang Ma, Donna Lammie, VeroniqueSiegler, Suzanne M Mithieux, Yidong Tu, John Yuen Ho Chow, FarhanaSuleman, Marc Malfois, et al. Shape of tropoelastin, the highly extensibleprotein that controls human tissue elasticity. Proceedings of the NationalAcademy of Sciences, 108(11):4322–4327, 2011.
151
[60] Steven G Wise, Giselle C Yeo, Matti A Hiob, Jelena Rnjak-Kovacina,David L Kaplan, Martin KC Ng, and Anthony S Weiss. Tropoelastin: Aversatile, bioactive assembly module. Acta Biomaterialia, 10(4):1532–1541,2014.
[61] Anna Tarakanova, Giselle C Yeo, Clair Baldock, Anthony S Weiss, andMarkus J Buehler. Molecular model of human tropoelastin and implicationsof associated mutations. Proceedings of the National Academy of Sciences,115(28):7338–7343, 2018.
[62] Anna Tarakanova, Giselle C Yeo, Clair Baldock, Anthony S Weiss, andMarkus J Buehler. Tropoelastin is a flexible molecule that retains its canon-ical shape. Macromolecular Bioscience, 19(3):1800250, 2019.
[63] Tilman Flock, Robert J Weatheritt, Natasha S Latysheva, and M MadanBabu. Controlling entropy to tune the functions of intrinsically disorderedregions. Current Opinion in Structural Biology, 26:62–72, 2014.
[64] Howard Vindin, Suzanne M Mithieux, and Anthony S Weiss. Elastin archi-tecture. Matrix Biology, 84:4–16, 2019.
[65] Patricia Brown-Augsburger, Thomas Broekelmann, Joel Rosenbloom, andRobert P Mecham. Functional domains on elastin and microfibril-associatedglycoprotein involved in elastic fibre assembly. Biochemical Journal,318(1):149–155, 1996.
[66] RP Mecham, BD Levy, SL Morris, JG Madaras, and DS Wrenn. Increasedcyclic gmp levels lead to a stimulation of elastin production in ligamentfibroblasts that is reversed by cyclic amp. Journal of Biological Chemistry,260(6):3255–3258, 1985.
[67] A Sampath Narayanan, Larry B Sandberg, Russell Ross, and Don L Layman.The smooth muscle cell. iii. elastin synthesis in arterial smooth muscle cellculture. The Journal of Cell Biology, 68(3):411–419, 1976.
[68] Hiroyoshi Kajiya, Nobuhiko Tanaka, Toyoko Inazumi, Yoshiyuki Seyama,Shingo Tajima, and Akira Ishibashi. Cultured human keratinocytes expresstropoelastin. Journal of Investigative Dermatology, 109(5):641–644, 1997.
[69] Robert P Mecham, Judy Madaras, John A McDonald, and Una Ryan.Elastin production by cultured calf pulmonary artery endothelial cells. Jour-nal of Cellular Physiology, 116(3):282–288, 1983.
[70] Thomas J Mariani, Sarah E Dunsmore, Qinglang Li, Xueming Ye, andRichard A Pierce. Regulation of lung fibroblast tropoelastin expression byalveolar epithelial cells. American Journal of Physiology-Lung Cellular andMolecular Physiology, 274(1):47–57, 1998.
[71] Peter Heeger and Joel Rosenbloom. Biosynthesis of tropoelastin by elasticcartilage. Connective Tissue Research, 8(1):21–25, 1980.
[72] Barbara Myers, Michael Dubick, Jerold A Last, and Robert B Rucker.Elastin synthesis during perinatal lung development in the rat. Biochim-ica et Biophysica Acta (BBA)-General Subjects, 761(1):17–22, 1983.
152
[73] Akihiko Noguchi, Kathryn Firsching, Jonathan D Kursar, and RajkumarReddy. Developmental changes of tropoelastin synthesis by rat pulmonaryfibroblasts and effects of dexamethasone. Pediatric Research, 28(4):379–382,1990.
[74] William C Parks. Posttranscriptional regulation of lung elastin produc-tion. American Journal of Respiratory Cell and Molecular Biology, 17(1):1–2, 1997.
[75] Alkystis Phinikaridou, Sara Lacerda, Begona Lavin, Marcelo E Andia, Al-berto Smith, Prakash Saha, and Rene M Botnar. Tropoelastin: a novelmarker for plaque progression and instability. Circulation: CardiovascularImaging, 11(8):e007303, 2018.
[76] Leonard E Grosso and Robert P Mecham. In vitro processing of tropoelastin:investigation of a possible transport function associated with the carboxy-terminal domain. Biochemical and Biophysical Research Communications,153(2):545–551, 1988.
[77] Aleksander Hinek, Fred W Keeley, and John Callahans. Recycling of the 67-kda elastin binding protein in arterial myocytes is imperative for secretionof tropoelastin. Experimental Cell rResearch, 220(2):312–324, 1995.
[78] Anna Tarakanova, Jazmin Ozsvar, Anthony S Weiss, and Markus J Buehler.Coarse-grained model of tropoelastin self-assembly into nascent fibrils. Ma-terials Today Biology, 3:100016, 2019.
[79] Adam W Clarke, Eva C Arnspang, Suzanne M Mithieux, Emine Korkmaz,Filip Braet, and Anthony S Weiss. Tropoelastin massively associates duringcoacervation to form quantized protein spheres. Biochemistry, 45(33):9989–9996, 2006.
[80] Beth A Kozel, Brenda J Rongish, Andras Czirok, Julia Zach, Charles DLittle, Elaine C Davis, Russell H Knutsen, Jessica E Wagenseil, Marilyn ALevy, and Robert P Mecham. Elastic fiber formation: a dynamic view ofextracellular matrix assembly using timer reporters. Journal of CellularPhysiology, 207(1):87–96, 2006.
[81] Yidong Tu and Anthony S Weiss. Transient tropoelastin nanoparticles areearly-stage intermediates in the coacervation of human tropoelastin whoseaggregation is facilitated by heparan sulfate and heparin decasaccharides.Matrix Biology, 29(2):152–159, 2010.
[82] Yidong Tu, Steven G Wise, and Anthony S Weiss. Stages in tropoelastincoalescence during synthetic elastin hydrogel formation. Micron, 41(3):268–272, 2010.
[83] Betty A Cox, Barry C Starcher, and Dan W Urry. Coacervation of tropoe-lastin results in fiber formation. Journal of Biological Chemistry, 249(3):997–998, 1974.
153
[84] GM Bressan, I Castellani, MG Giro, D Volpin, C Fornieri, and I PasqualiRonchetti. Banded fibers in tropoelastin coacervates at physiological tem-peratures. Journal of Ultrastructure Research, 82(3):335–340, 1983.
[85] Suzanne M Mithieux, Steven G Wise, Mark J Raftery, Barry Starcher, andAnthony S Weiss. A model two-component system for studying the architec-ture of elastin assembly in vitro. Journal of Structural Biology, 149(3):282–289, 2005.
[86] M Daria Haust, Robert H More, SA Bencosme, and John U Balis. Elasto-genesis in human aorta: an electron microscopic study. Experimental andMolecular Pathology, 4(5):508–524, 1965.
[87] WH Fahrenbach, LB Sandberg, and EG Cleary. Ultrastructural studies onearly elastogenesis. The Anatomical Record, 155(4):563–575, 1966.
[88] Ernest N Albert. Developing elastic tissue: an electron microscopic study.The American Journal of Pathology, 69(1):89, 1972.
[89] AM Tamburro, V Guantieri, and D Daga Gordini. Synthesis and structuralstudies of a pentapeptide sequence of elastin. poly (val-gly-gly-leu-gly). Jour-nal of Biomolecular Structure and Dynamics, 10(3):441–454, 1992.
[90] Ming Miao, Catherine M Bellingham, Richard J Stahl, Eva E Sitarz, Christo-pher J Lane, and Fred W Keeley. Sequence and structure determinantsfor the self-aggregation of recombinant polypeptides modeled after humanelastin. Journal of Biological Chemistry, 278(49):48553–48562, 2003.
[91] Lisa D Muiznieks, Sacha A Jensen, and Anthony S Weiss. Structural changesand facilitated association of tropoelastin. Archives of Biochemistry andBiophysics, 410(2):317–323, 2003.
[92] Leanne B Dyksterhuis, Clair Baldock, Donna Lammie, Tim J Wess, andAnthony S Weiss. Domains 17–27 of tropoelastin contain key regions ofcontact for coacervation and contain an unusual turn-containing crosslinkingdomain. Matrix Biology, 26(2):125–135, 2007.
[93] Wendy J Wu and Anthony S Weiss. Deficient coacervation of two forms ofhuman tropoelastin associated with supravalvular aortic stenosis. EuropeanJournal of Biochemistry, 266(1):308–314, 1999.
[94] Jany Dandurand, Valerie Samouillan, Colette Lacabanne, Antonietta Pepe,and Brigida Bochicchio. Water structure and elastin-like peptide aggrega-tion. Journal of Thermal Analysis and Calorimetry, 120(1):419–426, 2015.
[95] Daiki Tatsubo, Keitaro Suyama, Masaya Miyazaki, Iori Maeda, andTakeru Nose. Stepwise mechanism of temperature-dependent coacervationof the elastin-like peptide analogue dimer, (c(wpgvg)3)2. Biochemistry,57(10):1582–1590, 2018.
[96] Giselle C Yeo, Fred W Keeley, and Anthony S Weiss. Coacervation of tropoe-lastin. Advances in Colloid and Interface Science, 167(1):94–103, 2011.
154
[97] Aleksander Hinek and Marlene Rabinovitch. 67-kd elastin-binding protein isa protective” companion” of extracellular insoluble elastin and intracellulartropoelastin. The Journal of Cell Biology, 126(2):563–574, 1994.
[98] Ming Miao, Sean E Reichheld, Lisa D Muiznieks, Yayi Huang, and Fred WKeeley. Elastin binding protein and fkbp65 modulate in vitro self-assemblyof human tropoelastin. Biochemistry, 52(44):7731–7741, 2013.
[99] Lisa D Muiznieks and Fred W Keeley. Proline periodicity modulates theself-assembly properties of elastin-like polypeptides. Journal of BiologicalChemistry, 285(51):39779–39789, 2010.
[100] Antonietta Pepe, Deanna Guerra, Brigida Bochicchio, Daniela Quaglino,Dealba Gheduzzi, Ivonne Pasquali Ronchetti, and Antonio M Tamburro.Dissection of human tropoelastin: supramolecular organization of polypep-tide sequences coded by particular exons. Matrix Biology, 24(2):96–109,2005.
[101] Antonietta Pepe, Roberta Flamia, Deanna Guerra, Daniela Quaglino,Brigida Bochicchio, Ivonne Pasquali Ronchetti, and Antonio M Tamburro.Exon 26-coded polypeptide: an isolated hydrophobic domain of humantropoelastin able to self-assemble in vitro. Matrix Biology, 27(5):441–450,2008.
[102] Joel Rosenbloom, William R Abrams, Zena Indik, Helena Yeh, NormaOrnstein-Goldstein, and Muhammad M Bashir. Structure of the elastin gene.In Ciba Foundation Symposium 192-The Molecular Biology and Pathologyof Elastic Tissues: The Molecular Biology and Pathology of Elastic Tissues:Ciba Foundation Symposium 192, pages 59–80. Wiley Online Library, 2007.
[103] Christian EH Schmelzer, Andrea Heinz, Helen Troilo, Michael P Lockhart-Cairns, Thomas A Jowitt, Marion F Marchand, Laurent Bidault, MarineBignon, Tobias Hedtke, Alain Barret, et al. Lysyl oxidase–like 2 (loxl2)–mediated cross-linking of tropoelastin. The FASEB Journal, 33(4):5468–5481, 2019.
[104] C Franzblau, B Faris, and R Papaioannou. Lysinonorleucine. a new aminoacid from hydrolyzates of elastin. Biochemistry, 8(7):2833–2837, 1969.
[105] Richard W Lent, Barbara Smith, Lily L Salcedo, Barbara Faris, andCarl Franzblau. Reduction of elastin. ii. evidence for the presence of α-aminoadipic acid. delta.-semialdehyde and its aldol condensation product.Biochemistry, 8(7):2837–2845, 1969.
[106] SM Partridge, DF Elsden, and J Thomas. Constitution of the cross-linkagesin elastin. Nature, 197(4874):1297–1298, 1963.
[107] Andrea Heinz, Christoph KH Ruttkies, Gunther Jahreis, Christoph USchrader, Kanin Wichapong, Wolfgang Sippl, Fred W Keeley, Reinhard HHNeubert, and Christian EH Schmelzer. In vitro cross-linking of elastin pep-tides and molecular characterization of the resultant biomaterials. Biochim-ica et Biophysica Acta (BBA)-General Subjects, 1830(4):2994–3004, 2013.
155
[108] Beth A Kozel, Hiroshi Wachi, Elaine C Davis, and Robert P Mecham. Do-mains in tropoelastin that mediate elastin depositionin vitro and in vivo.Journal of Biological Chemistry, 278(20):18491–18498, 2003.
[109] Christian EH Schmelzer, Tobias Hedtke, and Andrea Heinz. Unique molec-ular networks: Formation and role of elastin cross-links. IUBMB life,72(5):842–854, 2020.
[110] Steven G Wise, Suzanne M Mithieux, Mark J Raftery, and Anthony S Weiss.Specificity in the coacervation of tropoelastin: solvent exposed lysines. Jour-nal of Structural Biology, 149(3):273–281, 2005.
[111] Christoph U Schrader, Andrea Heinz, Petra Majovsky, Berin Karaman May-ack, Jurgen Brinckmann, Wolfgang Sippl, and Christian EH Schmelzer.Elastin is heterogeneously cross-linked. Journal of Biological Chemistry,293(39):15107–15119, 2018.
[112] Karl E Kadler. Fell muir lecture: Collagen fibril formation in vitro and invivo. International Journal of Experimental Pathology, 98(1):4–16, 2017.
[113] GM Cooper. Structure and organization of actin filaments. The cell: amolecular approach, 2, 2000.
[114] E Heitlinger, M Peter, A Lustig, W Villiger, EA Nigg, and U Aebi. Therole of the head and tail domain in lamin structure and assembly: analysisof bacterially expressed chicken lamin a and truncated b2 lamins. Journalof Structural Biology, 108(1):74–91, 1992.
[115] Zsolt Urban, Vishwanathan Hucthagowder, Nura Schurmann, Vesna Todor-ovic, Lior Zilberberg, Jiwon Choi, Carla Sens, Chester W Brown, Robin DClark, Kristen E Holland, et al. Mutations in ltbp4 cause a syndrome ofimpaired pulmonary, gastrointestinal, genitourinary, musculoskeletal, anddermal development. The American Journal of Human Genetics, 85(5):593–605, 2009.
[116] Insa Bultmann-Mellin, Anne Conradi, Alexandra C Maul, Katharina Dinger,Frank Wempe, Alexander P Wohl, Thomas Imhof, F Thomas Wunderlich,Alexander C Bunck, Tomoyuki Nakamura, et al. Modeling autosomal reces-sive cutis laxa type 1c in mice reveals distinct functions for ltbp-4 isoforms.Disease models & mechanisms, 8(4):403–415, 2015.
[117] Precious J McLaughlin, Qiuyun Chen, Masahito Horiguchi, Barry CStarcher, J Brett Stanton, Thomas J Broekelmann, Alan D Marmorstein,Brian McKay, Robert Mecham, Tomoyuki Nakamura, et al. Targeted dis-ruption of fibulin-4 abolishes elastogenesis and causes perinatal lethality inmice. Molecular and Cellular Biology, 26(5):1700–1709, 2006.
[118] Kazuo Noda, Branka Dabovic, Kyoko Takagi, Tadashi Inoue, MasahitoHoriguchi, Maretoshi Hirai, Yusuke Fujikawa, Tomoya O Akama, KenjiKusumoto, Lior Zilberberg, et al. Latent tgf-β binding protein 4 promoteselastic fiber assembly by interacting with fibulin-5. Proceedings of the Na-tional Academy of Sciences, 110(8):2852–2857, 2013.
156
[119] Svenja Hinderer, Nian Shen, Lea-Jeanne Ringuette, Jan Hansmann, Dieter PReinhardt, Sara Y Brucker, Elaine C Davis, and Katja Schenke-Layland.In vitro elastogenesis: instructing human vascular smooth muscle cells togenerate an elastic fiber-containing extracellular matrix scaffold. BiomedicalMaterials, 10(3):034102, 2015.
[120] Yoshinori Yamauchi, Eichi Tsuruga, Kazuki Nakashima, Yoshihiko Sawa,and Hiroyuki Ishikawa. Fibulin-4 and-5, but not fibulin-2, are associatedwith tropoelastin deposition in elastin-producing cell culture. Acta Histo-chemica et Cytochemica, 43(6):131–138, 2010.
[121] Daniel V Bax, Ursula R Rodgers, Marcela MM Bilek, and Anthony S Weiss.Cell adhesion to tropoelastin is mediated via the c-terminal grkrk motifand integrin alphavbeta3. Journal of Biological Chemistry, pages jbc–M109,2009.
[122] Pearl Lee, Daniel V Bax, Marcela MM Bilek, and Anthony S Weiss. A novelcell adhesion region in tropoelastin that mediates attachment to integrinalphavbeta5. Journal of Biological Chemistry, pages jbc–M113, 2013.
[123] Pearl Lee, Giselle C Yeo, and Anthony S Weiss. A cell adhesive peptide fromtropoelastin promotes sequential cell attachment and spreading via distinctreceptors. The FEBS Journal, 284(14):2216–2230, 2017.
[124] Matti A Hiob, Steven G Wise, Alexey Kondyurin, Anna Waterhouse,Marcela M Bilek, Martin KC Ng, and Anthony S Weiss. The use of plasma-activated covalent attachment of early domains of tropoelastin to enhancevascular compatibility of surfaces. Biomaterials, 34(31):7584–7591, 2013.
[125] Young Yu, Steven G Wise, Praveesuda L Michael, Daniel V Bax, Gloria SCYuen, Matti A Hiob, Giselle C Yeo, Elysse C Filipe, Louise L Dunn, Kim HChan, et al. Characterization of endothelial progenitor cell interactions withhuman tropoelastin. PloS One, 10(6):e0131101, 2015.
[126] Giselle C Yeo and Anthony S Weiss. Soluble matrix protein is a potent mod-ulator of mesenchymal stem cell performance. Proceedings of the NationalAcademy of Sciences, 116(6):2042–2051, 2019.
[127] Aleksander Hinek, David S Wrenn, Robert P Mecham, and Samuel HBarondes. The elastin receptor: a galactoside-binding protein. Science,239(4847):1539–1541, 1988.
[128] Shingo Tajima, Hiroshi Wachi, Yuko Uemura, and Kouji Okamoto. Modu-lation by elastin peptide vgvapg of cell proliferation and elastin expressionin human skin fibroblasts. Archives of Dermatological Research, 289(8):489–492, 1997.
[129] Aleksander Hinek, Kathy R Braun, Kela Liu, Yanting Wang, and Thomas NWight. Retrovirally mediated overexpression of versican v3 reverses im-paired elastogenesis and heightened proliferation exhibited by fibroblastsfrom costello syndrome and hurler disease patients. The American Journalof Pathology, 164(1):119–131, 2004.
157
[130] Laurent Duca, Laurent Debelle, Romain Debret, Frank Antonicelli, WilliamHornebeck, and Bernard Haye. The elastin peptides-mediated inductionof pro-collagenase-1 production by human fibroblasts involves activationof mek/erk pathway via pka-and pi3k-dependent signaling. FEBS letters,524(1-3):193–198, 2002.
[131] Satsuki Mochizuki, Bertrand Brassart, and Aleksander Hinek. Signal-ing pathways transduced through the elastin receptor facilitate prolifer-ation of arterial smooth muscle cells. Journal of Biological Chemistry,277(47):44854–44863, 2002.
[132] Bertrand Brassart, Patrick Fuchs, Eric Huet, Alain JP Alix, Jean Wallach,Antonio M Tamburro, Frederic Delacoux, Bernard Haye, Herve Emonard,William Hornebeck, et al. Conformational dependence of collagenase (ma-trix metalloproteinase-1) up-regulation by elastin peptides in cultured fi-broblasts. Journal of Biological Chemistry, 276(7):5222–5227, 2001.
[133] Kristian Prydz. Determinants of glycosaminoglycan (gag) structure.Biomolecules, 5(3):2003–2022, 2015.
[134] Marie-Claude Bourin and Ulf Lindahl. Glycosaminoglycans and the regula-tion of blood coagulation. Biochemical Journal, 289(Pt 2):313, 1993.
[135] Deirdre R Coombe. Biological implications of glycosaminoglycan interac-tions with haemopoietic cytokines. Immunology and Cell Biology, 86(7):598–607, 2008.
[136] Rahul Raman, V Sasisekharan, and Ram Sasisekharan. Structural insightsinto biological roles of protein-glycosaminoglycan interactions. Chemistry &Biology, 12(3):267–277, 2005.
[137] C Fornieri, M Baccarani-Contri, D Quaglino, and I Pasquali-Ronchetti. Ly-syl oxidase activity and elastin/glycosaminoglycan interactions in growingchick and rat aortas. The Journal of Cell Biology, 105(3):1463–1469, 1987.
[138] Wendy J Wu, Bernadette Vrhovski, and Anthony S Weiss. Glycosamino-glycans mediate the coacervation of human tropoelastin through dominantcharge interactions involving lysine side chains. Journal of Biological Chem-istry, 274(31):21719–21724, 1999.
[139] Thomas J Broekelmann, Beth A Kozel, Hideaki Ishibashi, Claudio C Wer-neck, Fred W Keeley, Lijuan Zhang, and Robert P Mecham. Tropoelastininteracts with cell-surface glycosaminoglycans via its cooh-terminal domain.Journal of Biological Chemistry, 280(49):40939–40947, 2005.
[140] Kerstin Tiedemann, Boris Batge, Peter K Muller, and Dieter P Reinhardt.Interactions of fibrillin-1 with heparin/heparan sulfate, implications for mi-crofibrillar assembly. Journal of Biological Chemistry, 276(38):36035–36042,2001.
[141] Timothy M Ritty, Thomas J Broekelmann, Claudio C Werneck, andRobert P Mecham. Fibrillin-1 and- 2 contain heparin-binding sites impor-
158
tant for matrix deposition and that support cell attachment. BiochemicalJournal, 375(2):425–432, 2003.
[142] Shailaja Seetharaman and Sandrine Etienne-Manneville. Integrin diversitybrings specificity in mechanotransduction. Biology of the Cell, 110(3):49–64,2018.
[143] Cedric Zeltz and Donald Gullberg. The integrin–collagen connection–a gluefor tissue repair? Journal of Cell Science, 129(4):653–664, 2016.
[144] Antonios Chronopoulos, Stephen D Thorpe, Ernesto Cortes, Dariusz La-chowski, Alistair J Rice, Vasyl V Mykuliak, Tomasz Rog, David A Lee,Vesa P Hytonen, and E Armando. Syndecan-4 tunes cell mechanics by ac-tivating the kindlin-integrin-rhoa pathway. Nature Materials, pages 1–10,2020.
[145] Aban Shuaib, Daniyal Motan, Pinaki Bhattacharya, Alex McNabb, Timo-thy M Skerry, and Damien Lacroix. Heterogeneity in the mechanical prop-erties of integrins determines mechanotransduction dynamics in bone os-teoblasts. Scientific Reports, 9(1):1–14, 2019.
[146] Laura Tomasello, Antonina Coppola, Maria Pitrone, Valentina Failla, Sal-vatore Cillino, Giuseppe Pizzolanti, and Carla Giordano. Pfn1 and integrin-β1/mtor axis involvement in cornea differentiation of fibroblast limbal stemcells. Journal of Cellular and Molecular Medicine, 23(11):7210–7221, 2019.
[147] Gabriel Neiman, Marıa Agustina Scarafıa, Alejandro La Greca, NataliaL Santın Velazque, Ximena Garate, Ariel Waisman, Alan M Mobbs,Tais Hanae Kasai-Brunswick, Fernanda Mesquita, Daiana Martire-Greco,et al. Integrin alpha-5 subunit is critical for the early stages of humanpluripotent stem cell cardiac differentiation. Scientific Reports, 9(1):1–10,2019.
[148] Aroa Duro-Castano, Elena Gallon, Caitlin Decker, and Marıa J Vicent. Mod-ulating angiogenesis with integrin-targeted nanomedicines. Advanced DrugDelivery Reviews, 119:101–119, 2017.
[149] Kevin K Kim, Dean Sheppard, and Harold A Chapman. Tgf-β1 sig-naling and tissue fibrosis. Cold Spring Harbor Perspectives in Biology,10(4):a022293, 2018.
[150] Yun Deng, Quan Wan, and Wangxiang Yan. Integrin α5/itga5 promotesthe proliferation, migration, invasion and progression of oral squamous car-cinoma by epithelial–mesenchymal transition. Cancer Management and Re-search, 11:9609, 2019.
[151] Erkki Ruoslahti. Rgd and other recognition sequences for integrins. AnnualReview of Cell and Developmental Biology, 12(1):697–715, 1996.
[152] Jian-Ping Xiong, Thilo Stehle, Rongguang Zhang, Andrzej Joachimiak,Matthias Frech, Simon L Goodman, and M Amin Arnaout. Crystal structureof the extracellular segment of integrin αvβ3 in complex with an arg-gly-aspligand. Science, 296(5565):151–155, 2002.
159
[153] Jieqing Zhu, Jianghai Zhu, and Timothy A Springer. Complete integrinheadpiece opening in eight steps. Journal of Cell Biology, 201(7):1053–1068,2013.
[154] Eileen Puklin-Faucher, Mu Gao, Klaus Schulten, and Viola Vogel. How theheadpiece hinge angle is opened: new insights into the dynamics of integrinactivation. Journal of Cell Biology, 175(2):349–360, 2006.
[155] Richard O Hynes. Integrins: bidirectional, allosteric signaling machines.Cell, 110(6):673–687, 2002.
[156] Maria Laura Duque Lasio and Beth A Kozel. Elastin-driven genetic diseases.Matrix Biology, 71:144–160, 2018.
[157] Bert Callewaert, Marjolijn Renard, Vishwanathan Hucthagowder, Beate Al-brecht, Ingrid Hausser, Edward Blair, Cristina Dias, Alice Albino, HiroshiWachi, Fumiaki Sato, et al. New insights into the pathogenesis of autosomal-dominant cutis laxa with report of five eln mutations. Human Mutation,32(4):445–455, 2011.
[158] Mayada Tassabehji, Kay Metcalfe, Jane Hurst, Gillian S Ashcroft, CayKielty, Carrie Wilmot, Dian Donnai, Andrew P Read, and Carolyn JPJones. An elastin gene mutation producing abnormal tropoelastin and ab-normal elastic fibres in a patient with autosomal dominant cutis laxa. Humanmolecular genetics, 7(6):1021–1028, 1998.
[159] Hideki Sugitani, Eiichi Hirano, Russell H Knutsen, Adrian Shifren, Jessica EWagenseil, Christopher Ciliberto, Beth A Kozel, Zsolt Urban, Elaine CDavis, Thomas J Broekelmann, et al. Alternative splicing and tissue-specific elastin misassembly act as biological modifiers of human elastin geneframeshift mutations associated with dominant cutis laxa. Journal of Bio-logical Chemistry, 287(26):22055–22067, 2012.
[160] Beth A Kozel, Chi-Ting Su, Joshua R Danback, Ryan L Minster, SuneetaMadan-Khetarpal, Juliann McConnell, Meghan K Mac Neal, Kara L Levine,Robert C Wilson, Frank C Sciurba, et al. Biomechanical properties of theskin in cutis laxa. The Journal of Investigative Dermatology, 134(11):2836,2014.
[161] Austin J Cocciolone, Jie Z Hawes, Marius C Staiculescu, Elizabeth O John-son, Monzur Murshed, and Jessica E Wagenseil. Elastin, arterial mechanics,and cardiovascular disease. American Journal of Physiology-Heart and Cir-culatory Physiology, 315(2):H189–H205, 2018.
[162] Andrew K Baldwin, Andreja Simpson, Ruth Steer, Stuart A Cain, andCay M Kielty. Elastic fibres in health and disease. Expert Reviews in Molec-ular Medicine, 15, 2013.
[163] Amanda K Ewart, Weishan Jin, Donald Atkinson, Colleen A Morris, andMark T Keating. Supravalvular aortic stenosis associated with a deletion dis-rupting the elastin gene. The Journal of Clinical Investigation, 93(3):1071–1077, 1994.
160
[164] Timothy M Olson, Virginia V Michels, Zsolt Urban, Katalin Cslszar, An-gela M Christiano, David J Driscoll, Robert H Feldt, Charles D Boyd, andStephen N Thibodeau. A 30 kb deletion within the elastin gene results in fa-milial supravalvular aortic stenosis. Human Molecular Genetics, 4(9):1677–1679, 1995.
[165] Dean Y Li, Amanda E Toland, Beth B Boak, Donald L Atkinson, Gregory JEnsing, Colleen A Morris, and Mark T Keating. Elastin point mutationscause an obstructive vascular disease, supravalvular aortic stenosis. HumanMolecular Genetics, 6(7):1021–1028, 1997.
[166] Seonmin Park, Eul-Ju Seo, Han-Wook Yoo, and Youngho Kim. Novel mu-tations in the human elastin gene (eln) causing isolated supravalvular aorticstenosis. International Journal of Molecular Medicine, 18(2):329–332, 2006.
[167] Hiroshi Wachi, Fumiaki Sato, Junji Nakazawa, Risa Nonaka, Zoltan Szabo,Zsolt Urban, Takuo Yasunaga, Iori Maeda, Koji Okamoto, Barry C Starcher,et al. Domains 16 and 17 of tropoelastin in elastic fibre formation. Biochem-ical Journal, 402(1):63–70, 2007.
[168] Zena Indik, William R Abrams, Umberto Kucich, Carolyn W Gibson,Robert P Mecham, and Joel Rosenbloom. Production of recombinant hu-man tropoelastin: characterization and demonstration of immunologic andchemotactic activity. Archives of biochemistry and biophysics, 280(1):80–86,1990.
[169] Stephen L Martin, Bernadette Vrhovski, and Anthony S Weiss. Total synthe-sis and expression in escherichia coli of a gene encoding human tropoelastin.Gene, 154(2):159–166, 1995.
[170] Suzanne M Mithieux, Behnaz Aghaei-Ghareh-Bolagh, Leping Yan, Kekini VKuppan, Yiwei Wang, Francia Garces-Suarez, Zhe Li, Peter K Maitz, Eliz-abeth A Carter, Christina Limantoro, et al. Tropoelastin implants that ac-celerate wound repair. Advanced Healthcare Materials, 7(10):1701206, 2018.
[171] Nasim Annabi, Suzanne M Mithieux, Pinar Zorlutuna, Gulden Camci-Unal,Anthony S Weiss, and Ali Khademhosseini. Engineered cell-laden humanprotein-based elastomer. Biomaterials, 34(22):5496–5505, 2013.
[172] Nasim Annabi, Devyesh Rana, Ehsan Shirzaei Sani, Roberto Portillo-Lara,Jessie L Gifford, Mohammad M Fares, Suzanne M Mithieux, and Anthony SWeiss. Engineering a sprayable and elastic hydrogel adhesive with antimi-crobial properties for wound healing. Biomaterials, 139:229–243, 2017.
[173] Richard Wang, Jazmin Ozsvar, Behnaz Aghaei-Ghareh-Bolagh, Matti AHiob, Suzanne M Mithieux, and Anthony S Weiss. Freestanding hierar-chical vascular structures engineered from ice. Biomaterials, 192:334–345,2019.
[174] Nasim Annabi, Ali Fathi, Suzanne M Mithieux, Penny Martens, Anthony SWeiss, and Fariba Dehghani. The effect of elastin on chondrocyte adhe-sion and proliferation on poly-caprolactone/elastin composites. Biomateri-als, 32(6):1517–1525, 2011.
161
[175] Jelena Rnjak-Kovacina, Steven G Wise, Zhe Li, Peter KM Maitz, Cara JYoung, Yiwei Wang, and Anthony S Weiss. Electrospun synthetic humanelastin: collagen composite scaffolds for dermal tissue engineering. ActaBiomaterialia, 8(10):3714–3722, 2012.
[176] Behnaz Aghaei-Ghareh-Bolagh, Juan Guan, Yiwei Wang, Adam D Martin,Rebecca Dawson, Suzanne M Mithieux, and Anthony S Weiss. Opticallyrobust, highly permeable and elastic protein films that support dual corneacell types. Biomaterials, 188:50–62, 2019.
[177] Behnaz Aghaei-Ghareh-Bolagh, Suzanne M Mithieux, Matti A Hiob, YiweiWang, Avelyn Chong, and Anthony S Weiss. Fabricated tropoelastin-silkyarns and woven textiles for diverse tissue engineering applications. ActaBiomaterialia, 91:112–122, 2019.
[178] Yiwei Wang, Suzanne M Mithieux, Yvonne Kong, Xue-Qing Wang, Cas-sandra Chong, Ali Fathi, Fariba Dehghani, Eleni Panas, John Kemnitzer,Robert Daniels, et al. Tropoelastin incorporation into a dermal regenera-tion template promotes wound angiogenesis. Advanced Healthcare Materials,4(4):577–584, 2015.
[179] Suzanne M Mithieux and Anthony S Weiss. Design of an elastin-layereddermal regeneration template. Acta Biomaterialia, 52:33–40, 2017.
[180] Xiao Hu, Xiuli Wang, Jelena Rnjak, Anthony S Weiss, and David L Kaplan.Biomaterials derived from silk–tropoelastin protein systems. Biomaterials,31(32):8121–8131, 2010.
[181] Xiao Hu, Sang-Hyug Park, Eun Seok Gil, Xiao-Xia Xia, Anthony S Weiss,and David L Kaplan. The influence of elasticity and surface roughness onmyogenic and osteogenic-differentiation of cells on silk-elastin biomaterials.Biomaterials, 32(34):8979–8989, 2011.
[182] James D White, Siran Wang, Anthony S Weiss, and David L Kaplan. Silk–tropoelastin protein films for nerve guidance. Acta Biomaterialia, 14:1–10,2015.
[183] Shira Landau, Ariel A Szklanny, Giselle C Yeo, Yulia Shandalov, ElenaKosobrodova, Anthony S Weiss, and Shulamit Levenberg. Tropoelastincoated plla-plga scaffolds promote vascular network formation. Biomate-rials, 122:72–82, 2017.
[184] Matti A Hiob, Steven G Wise, Alexei Kondyurin, Anna Waterhouse,Marcela M Bilek, Martin K Ng, and Anthony S Weiss. The use of plasma-activated covalent attachment of early domains of tropoelastin to enhancevascular compatibility of surfaces. Biomaterials, 34(31):7584–7591, 2013.
[185] Edgar A Wakelin, Giselle C Yeo, David R McKenzie, Marcela MM Bilek, andAnthony S Weiss. Plasma ion implantation enabled bio-functionalization ofpeek improves osteoblastic activity. APL Bioengineering, 2(2):026109, 2018.
[186] James C Phillips, Rosemary Braun, Wei Wang, James Gumbart,Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel,
162
Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics withnamd. Journal of Computational Chemistry, 26(16):1781–1802, 2005.
[187] David Van Der Spoel, Erik Lindahl, Berk Hess, Gerrit Groenhof, Alan EMark, and Herman JC Berendsen. Gromacs: fast, flexible, and free. Journalof Computational Chemistry, 26(16):1701–1718, 2005.
[188] Bernard R Brooks, Robert E Bruccoleri, Barry D Olafson, David J States,S a Swaminathan, and Martin Karplus. Charmm: a program for macro-molecular energy, minimization, and dynamics calculations. Journal of Com-putational Chemistry, 4(2):187–217, 1983.
[189] David A Case, Thomas E Cheatham III, Tom Darden, Holger Gohlke, RayLuo, Kenneth M Merz Jr, Alexey Onufriev, Carlos Simmerling, Bing Wang,and Robert J Woods. The amber biomolecular simulation programs. Journalof Computational Chemistry, 26(16):1668–1688, 2005.
[190] Thomas S Hofer. From macromolecules to electrons—grand challenges intheoretical and computational chemistry. Frontiers in Chemistry, 1:6, 2013.
[191] Ron O Dror, Robert M Dirks, JP Grossman, Huafeng Xu, and David EShaw. Biomolecular simulation: a computational microscope for molecularbiology. Annual Review of Biophysics, 41:429–452, 2012.
[192] Jean-Michel Combes, Pierre Duclos, and Ruedi Seiler. The born-oppenheimer approximation. In Rigorous atomic and molecular physics,pages 185–213. Springer, 1981.
[193] Michael P Allen and Dominic J Tildesley. Computer Simulation of Liquids.Oxford University Press, Michael P Allen and Dominic J Tildesley, 100-110,2017.
[194] Hans C Andersen. Rattle: A “velocity” version of the shake algorithmfor molecular dynamics calculations. Journal of Computational Physics,52(1):24–34, 1983.
[195] Evelyn Mayaan, Adam Moser, Alexander D MacKerell Jr, and Darrin MYork. Charmm force field parameters for simulation of reactive intermedi-ates in native and thio-substituted ribozymes. Journal of ComputationalChemistry, 28(2):495–507, 2007.
[196] Alex D MacKerell Jr, Donald Bashford, MLDR Bellott, Roland Leslie Dun-brack Jr, Jeffrey D Evanseck, Martin J Field, Stefan Fischer, Jiali Gao,H Guo, Sookhee Ha, et al. All-atom empirical potential for molecular mod-eling and dynamics studies of proteins. The Journal of Physical ChemistryB, 102(18):3586–3616, 1998.
[197] Wilfred F van Gunsteren, SR Billeter, AA Eising, Philippe H Hunenberger,PKHC Kruger, Alan E Mark, WRP Scott, and Ilario G Tironi. Biomolecularsimulation: the gromos96 manual and user guide. 1996.
[198] William L Jorgensen, David S Maxwell, and Julian Tirado-Rives. Develop-ment and testing of the opls all-atom force field on conformational energetics
163
and properties of organic liquids. Journal of the American Chemical Society,118(45):11225–11236, 1996.
[199] Wendy D Cornell, Piotr Cieplak, Christopher I Bayly, Ian R Gould, Ken-neth M Merz, David M Ferguson, David C Spellmeyer, Thomas Fox,James W Caldwell, and Peter A Kollman. A second generation force fieldfor the simulation of proteins, nucleic acids, and organic molecules. Journalof the American Chemical Society, 117(19):5179–5197, 1995.
[200] Qiang Shi, Sergei Izvekov, and Gregory A Voth. Mixed atomistic and coarse-grained molecular dynamics: simulation of a membrane-bound ion channel.The Journal of Physical Chemistry B, 110(31):15045–15048, 2006.
[201] Charles L Brooks III and Martin Karplus. Solvent effects on protein motionand protein effects on solvent motion: dynamics of the active site region oflysozyme. Journal of Molecular Biology, 208(1):159–181, 1989.
[202] Anna Rita Bizzarri and Salvatore Cannistraro. Molecular dynamics of waterat the protein- solvent interface, 2002.
[203] Thomas Simonson. Macromolecular electrostatics: continuum models andtheir growing pains. Current Opinion in Structural Biology, 11(2):243–252,2001.
[204] Alexey V Onufriev and David A Case. Generalized born implicit solventmodels for biomolecules. Annual Review of Biophysics, 48:275–296, 2019.
[205] Donald Bashford and David A Case. Generalized born models of macro-molecular solvation effects. Annual review of physical chemistry, 51(1):129–152, 2000.
[206] Xiaohui Wang, Boming Deng, and Zhaoxi Sun. Thermodynamics of helixformation in small peptides of varying length in vacuo, in implicit solvent,and in explicit solvent. Journal of Molecular Modeling, 25(1):3, 2019.
[207] Ramu Anandakrishnan, Aleksander Drozdetski, Ross C Walker, andAlexey V Onufriev. Speed of conformational change: comparing explicitand implicit solvent molecular dynamics simulations. Biophysical Journal,108(5):1153–1164, 2015.
[208] Thomas H Rod, Patrik Rydberg, and Ulf Ryde. Implicit versus explicitsolvent in free energy calculations of enzyme catalysis: Methyl transfer cat-alyzed by catechol o-methyltransferase. The Journal of Chemical Physics,124(17):174503, 2006.
[209] William L Jorgensen, Jayaraman Chandrasekhar, Jeffry D Madura, Roger WImpey, and Michael L Klein. Comparison of simple potential functions forsimulating liquid water. The Journal of Chemical Physics, 79(2):926–935,1983.
[210] Eyal Neria, Stefan Fischer, and Martin Karplus. Simulation of activa-tion free energies in molecular systems. The Journal of Chemical Physics,105(5):1902–1921, 1996.
164
[211] Hans W Horn, William C Swope, Jed W Pitera, Jeffry D Madura, Thomas JDick, Greg L Hura, and Teresa Head-Gordon. Development of an improvedfour-site water model for biomolecular simulations: Tip4p-ew. The Journalof Chemical Physics, 120(20):9665–9678, 2004.
[212] Kota Kasahara, Shun Sakuraba, and Ikuo Fukuda. Enhanced samplingof molecular dynamics simulations of a polyalanine octapeptide: Effects ofthe periodic boundary conditions on peptide conformation. The Journal ofPhysical Chemistry B, 122(9):2495–2503, 2018. PMID: 29439570.
[213] Alessandro Laio and Michele Parrinello. Escaping free-energy minima. Pro-ceedings of the National Academy of Sciences, 99(20):12562–12566, 2002.
[214] Yuji Sugita and Yuko Okamoto. Replica-exchange molecular dynamicsmethod for protein folding. Chemical Physics Letters, 314(1-2):141–151,1999.
[215] Yuji Sugita, Motoshi Kamiya, Hiraku Oshima, and Suyong Re. Replica-exchange methods for biomolecular simulations. In Biomolecular Simula-tions, pages 155–177. Springer, 2019.
[216] Daniel Sindhikara, Yilin Meng, and Adrian E Roitberg. Exchange frequencyin replica exchange molecular dynamics. The Journal of Chemical Physics,128(2):01B609, 2008.
[217] Ahmet Bakan and Ivet Bahar. Computational generation inhibitor-boundconformers of p38 map kinase and comparison with experiments. In Bio-computing 2011, pages 181–192. World Scientific, 2011.
[218] Ivet Bahar, Timothy R Lezon, Ahmet Bakan, and Indira H Shrivastava.Normal mode analysis of biomolecular structures: functional mechanisms ofmembrane proteins. Chemical Reviews, 110(3):1463–1497, 2010.
[219] Dana Reichmann, Ofer Rahat, Mati Cohen, Hani Neuvirth, and GideonSchreiber. The molecular architecture of protein–protein binding sites. Cur-rent Opinion in Structural Biology, 17(1):67–76, 2007.
[220] Carlos J Camacho and Sandor Vajda. Protein–protein association kineticsand protein docking. Current Opinion in Structural Biology, 12(1):36–40,2002.
[221] Graham R Smith and Michael JE Sternberg. Prediction of protein–proteininteractions by docking methods. Current Opinion in Structural Biology,12(1):28–35, 2002.
[222] Cyril Dominguez, Rolf Boelens, and Alexandre MJJ Bonvin. Haddock: aprotein- protein docking approach based on biochemical or biophysical in-formation. Journal of the American Chemical Society, 125(7):1731–1737,2003.
[223] Claire C Hsu, Markus J Buehler, and Anna Tarakanova. the order-disordercontinuum: Linking predictions of protein structure and disorder throughmolecular simulation. Scientific Reports, 10(1):1–14, 2020.
165
[224] Maxwell W Libbrecht and William Stafford Noble. Machine learning appli-cations in genetics and genomics. Nature Reviews Genetics, 16(6):321–332,2015.
[225] Bradley J Erickson, Panagiotis Korfiatis, Zeynettin Akkus, and Timothy LKline. Machine learning for medical imaging. Radiographics, 37(2):505–515,2017.
[226] Nicholas A Saunders and Michael E Grant. Elastin biosynthesis in chick-embryo arteries. studies on the intracellular site of synthesis of tropoelastin.Biochemical Journal, 221(2):393–400, 1984.
[227] I Pasquali-Ronchetti, M Baccarani-Contri, C Fornieri, G Mori, andD Quaglino Jr. Structure and composition of the elastin fibre in normaland pathological conditions. Micron, 24(1):75–89, 1993.
[228] Herbert M Kagan and Kathleen A Sullivan. [35] lysyl oxidase: Preparationand role in elastin biosynthesis. In Methods in Enzymology, volume 82, pages637–650. Elsevier, 1982.
[229] Fumiaki Sato, Hiroshi Wachi, Marie Ishida, Risa Nonaka, Satoshi Onoue,Zsolt Urban, Barry C Starcher, and Yoshiyuki Seyama. Distinct steps ofcross-linking, self-association, and maturation of tropoelastin are necessaryfor elastic fiber formation. Journal of Molecular Biology, 369(3):841–851,2007.
[230] Robert C Siegel, Sheldon R Pinnell, and George R Martin. Cross-linking ofcollagen and elastin. properties of lysyl oxidase. Biochemistry, 9(23):4486–4492, 1970.
[231] Sheldon R Pinnell and George R Martin. The cross-linking of collagenand elastin: enzymatic conversion of lysine in peptide linkage to alpha-aminoadipic-delta-semialdehyde (allysine) by an extract from bone. Proceed-ings of the National Academy of Sciences of the United States of America,61(2):708–716, 1968.
[232] SM Partridge, DF Elsden, J Thomas, A Dorfman, A Telser, and Pei-LeeHo. Biosynthesis of the desmosine and isodesmosine cross-bridges in elastin.Biochemical Journal, 93(3):30–33, 1964.
[233] Giselle C Yeo, Clair Baldock, Anne Tuukkanen, Manfred Roessle, Leanne BDyksterhuis, Steven G Wise, Jacqueline Matthews, Suzanne M Mithieux,and Anthony S Weiss. Tropoelastin bridge region positions the cell-interactive c terminus and contributes to elastic fiber assembly. Proceedingsof the National Academy of Sciences, 109(8):2878–2883, 2012.
[234] Suzanne M Mithieux, Yidong Tu, Emine Korkmaz, Filip Braet, and An-thony S Weiss. In situ polymerization of tropoelastin in the absence ofchemical cross-linking. Biomaterials, 30(4):431–435, 2009.
[235] Andrea Heinz, Christoph U Schrader, Stephanie Baud, Fred W Keeley,Suzanne M Mithieux, Anthony S Weiss, Reinhard HH Neubert, and Chris-
166
tian EH Schmelzer. Molecular-level characterization of elastin-like constructsand human aortic elastin. Matrix Biology, 38:12–21, 2014.
[236] Kenno Vanommeslaeghe, Elizabeth Hatcher, Chayan Acharya, SibsankarKundu, Shijun Zhong, Jihyun Shim, Eva Darian, Olgun Guvench, P Lopes,Igor Vorobyov, et al. Charmm general force field: A force field for drug-likemolecules compatible with the charmm all-atom additive biological forcefields. Journal of Computational Chemistry, 31(4):671–690, 2010.
[237] William Humphrey, Andrew Dalke, Klaus Schulten, et al. Vmd: visualmolecular dynamics. Journal of Molecular Graphics, 14(1):33–38, 1996.
[238] Michael Feig, John Karanicolas, and Charles L Brooks III. Mmtsb toolset: enhanced sampling and multiscale modeling methods for applications instructural biology. Journal of Molecular Graphics and Modelling, 22(5):377–395, 2004.
[239] Suzanne M Mithieux, Steven G Wise, and Anthony S Weiss. Tropoelastin—amultifaceted naturally smart material. Advanced Drug Delivery Reviews,65(4):421–428, 2013.
[240] Anna Tarakanova, Wenwen Huang, Anthony S Weiss, David L Kaplan, andMarkus J Buehler. Computational smart polymer design based on elastinprotein mutability. Biomaterials, 127:49–60, 2017.
[241] Zsolt Urbßn, Jun Zhang, Elaine C Davis, Gregg K Maeda, Anil Kumar,Heather Stalker, John W Belmont, Charles D Boyd, and Margaret R Wal-lace. Supravalvular aortic stenosis: genetic and molecular dissection of acomplex mutation in the elastin gene. Human Genetics, 109(5):512–520,2001.
[242] Leanne B Dyksterhuis and Anthony S Weiss. Homology models for domains21–23 of human tropoelastin shed light on lysine crosslinking. Biochemicaland Biophysical Research Communications, 396(4):870–873, 2010.
[243] Bernadette Vrhovski, Sacha Jensen, and Anthony S Weiss. Coacervationcharacteristics of recombinant human tropoelastin. European Journal ofBiochemistry, 250(1):92–98, 1997.
[244] Yushi Bai, Quan Luo, and Junqiu Liu. Protein self-assembly via supramolec-ular strategies. Chemical Society Reviews, 45(10):2756–2767, 2016.
[245] Helen Piontkivska, Yi Zhang, Eric D Green, Laura Elnitski, et al. Multi-species sequence comparison reveals dynamic evolution of the elastin genethat has involved purifying selection and lineage-specific insertions/dele-tions. BMC Genomics, 5(1):31, 2004.
[246] Max Kuhn. Building predictive models in r using the caret package. Journalof Statistical Software, 28(5):1–26, 2008.
[247] Jazmin Ozsvar, Anna Tarakanova, Richard Wang, Markus J Buehler, andAnthony S Weiss. Allysine modifications perturb tropoelastin structure andmobility on a local and global scale. Matrix Biology Plus, 2:100002, 2019.
167
[248] Jessica F Almine, Daniel V Bax, Suzanne M Mithieux, Lisa Nivison-Smith,Jelena Rnjak, Anna Waterhouse, Steven G Wise, and Anthony S Weiss.Elastin-based materials. Chemical Society Reviews, 39(9):3371–3379, 2010.
[249] Ursula R Rodgers and Anthony S Weiss. Cellular interactions with elastin.Pathologie Biologie, 53(7):390–398, 2005.
[250] Stephan Huveneers, Hoa Truong, and Erik HJ Danen. Integrins: signaling,disease, and therapy. International Journal of Radiation Biology, 83(11-12):743–751, 2007.
[251] Junichi Takagi, Benjamin M Petre, Thomas Walz, and Timothy A Springer.Global conformational rearrangements in integrin extracellular domains inoutside-in and inside-out signaling. Cell, 110(5):599–611, 2002.
[252] A Paul Mould, Emlyn JH Symonds, Patrick A Buckley, J Gunter Gross-mann, Paul A McEwan, Stephanie J Barton, Janet A Askari, Susan E Craig,Jordi Bella, and Martin J Humphries. Structure of an integrin-ligand com-plex deduced from solution x-ray scattering and site-directed mutagenesis.Journal of Biological Chemistry, 278(41):39993–39999, 2003.
[253] Junichi Takagi, Konstantin Strokovich, Timothy A Springer, and ThomasWalz. Structure of integrin α5β1 in complex with fibronectin. The EMBOJournal, 22(18):4607–4615, 2003.
[254] Gregory C Sephel and Jeffrey M Davidson. Elastin production in humanskin fibroblast cultures and its decline with age. Journal of InvestigativeDermatology, 86(3):279–285, 1986.
[255] Robert P Mecham. Elastin synthesis and fiber assembly a. Annals of theNew York Academy of Sciences, 624(1):137–146, 1991.
[256] Caroline H Damsky and Zena Werb. Signal transduction by integrin recep-tors for extracellular matrix: cooperative processing of extracellular infor-mation. Current Opinion in Cell Biology, 4(5):772–781, 1992.
[257] Steven M Frisch and Erkki Ruoslahti. Integrins and anoikis. Current Opinionin Cell Biology, 9(5):701–706, 1997.
[258] Eileen Puklin-Faucher and Viola Vogel. Integrin activation dynamics be-tween the rgd-binding site and the headpiece hinge. Journal of BiologicalChemistry, 284(52):36557–36568, 2009.
[259] Lingyun Wang, Di Pan, Qi Yan, and Yuhua Song. Activation mechanismsof αvβ3 integrin by binding to fibronectin: a computational study. ProteinScience, 26(6):1124–1137, 2017.
[260] Antonella Paladino, Monica Civera, Flavio Curnis, Mayra Paolillo, Ce-sare Gennari, Umberto Piarulli, Angelo Corti, Laura Belvisi, and GiorgioColombo. The importance of detail: How differences in ligand structuresdetermine distinct functional responses in integrin αvβ3. Chemistry–A Eu-ropean Journal, 25(23):5959–5970, 2019.
168
[261] Dror Yahalom, Angela Wittelsberger, Dale F Mierke, Michael Rosenblatt,Joseph M Alexander, and Michael Chorev. Identification of the principalbinding site for rgd-containing ligands in the αvβ3 integrin: a photoaffinitycross-linking study. Biochemistry, 41(26):8321–8331, 2002.
[262] A Paul Mould, Steven K Akiyama, and Martin J Humphries. Regulationof integrin α5β1-fibronectin interactions by divalent cations evidence fordistinct classes of binding sites for mn2+, mg2+, and ca2+. Journal ofBiological Chemistry, 270(44):26270–26277, 1995.
[263] Zaira Martın-Moldes, Davoud Ebrahimi, Robyn Plowright, Nina Dinjaski,Carole C Perry, Markus J Buehler, and David L Kaplan. Intracellularpathways involved in bone regeneration triggered by recombinant silk–silicachimeras. Advanced Functional Materials, 28(27):1702570, 2018.
[264] Srinivasan Jayashree, Pavalam Murugavel, Ramanathan Sowdhamini, andNarayanaswamy Srinivasan. Interface residues of transient protein-proteincomplexes have extensive intra-protein interactions apart from inter-proteininteractions. Biol. Direct, 14(1):1, 2019.
[265] Joao PGLM Rodrigues, Mikael Trellet, Christophe Schmitz, Panagiotis Kas-tritis, Ezgi Karaca, Adrien SJ Melquiond, and Alexandre MJJ Bonvin.Clustering biomolecular complexes by residue contacts similarity. Proteins:Structure, Function, and Bioinformatics, 80(7):1810–1817, 2012.
[266] Hironao Yamada, Sakiko Mori, Takeshi Miyakawa, Ryota Morikawa, Fumi-hiko Katagiri, Kentaro Hozumi, Yamato Kikkawa, Motoyoshi Nomizu, andMasako Takasu. Structural study of cell attachment peptide derived fromlaminin by molecular dynamics simulation. PloS One, 11(2):e0149474, 2016.
[267] Barry J Grant, Ana PC Rodrigues, Karim M ElSawy, J Andrew McCammon,and Leo SD Caves. Bio3d: an r package for the comparative analysis ofprotein structures. Bioinformatics, 22(21):2695–2696, 2006.
[268] Tsan Xiao, Junichi Takagi, Barry S Coller, Jia-Huai Wang, and Timothy ASpringer. Structural basis for allostery in integrins and binding to fibrinogen-mimetic therapeutics. Nature, 432(7013):59–67, 2004.
[269] Marco Vassura, Pietro Di Lena, Luciano Margara, Maria Mirto, GiovanniAloisio, Piero Fariselli, and Rita Casadio. Blurring contact maps of thou-sands of proteins: what we can learn by reconstructing 3d structure. BioDataMining, 4(1):1, 2011.
[270] Vishal C Nashine, Sharon Hammes-Schiffer, and Stephen J Benkovic. Cou-pled motions in enzyme catalysis. Current Opinion in Chemical Biology,14(5):644–651, 2010.
[271] Ranjani K Paradise, Douglas A Lauffenburger, and Krystyn J Van Vliet.Acidic extracellular ph promotes activation of integrin αvβ3. PloS One,6(1), 2011.
169
[272] Rashmi Sharma, Zsolt Raduly, Marton Miskei, and Monika Fuxreiter. Fuzzycomplexes: Specific binding without complete folding. FEBS Letters,589(19):2533–2542, 2015.
[273] Peter Tompa and Monika Fuxreiter. Fuzzy complexes: polymorphism andstructural disorder in protein–protein interactions. Trends in BiochemicalSciences, 33(1):2–8, 2008.
[274] Davoud Mozhdehi, Kelli M Luginbuhl, Joseph R Simon, Michael Dzuricky,Rudiger Berger, H Samet Varol, Fred C Huang, Kristen L Buehne,Nicholas R Mayne, Isaac Weitzhandler, et al. Genetically encoded lipid–polypeptide hybrid biomaterials that exhibit temperature-triggered hierar-chical self-assembly. Nature Chemistry, 10(5):496–505, 2018.
[275] Bing-Hao Luo, Christopher V Carman, and Timothy A Springer. Structuralbasis of integrin regulation and signaling. Annual Reviews in Immunology,25:619–647, 2007.
[276] Monika Fuxreiter, Istvan Simon, and Sarah Bondos. Dynamic protein–dnarecognition: beyond what can be seen. Trends in Biochemical Sciences,36(8):415–423, 2011.
[277] Bankala Krishnarjuna, Toshihiko Sugiki, Rodrigo AV Morales, Jeffrey Seow,Toshimichi Fujiwara, Karyn L Wilde, Raymond S Norton, and Christo-pher A MacRaild. Transient antibody-antigen interactions mediate thestrain-specific recognition of a conserved malaria epitope. Communicationsbiology, 1(1):1–10, 2018.
[278] Isabella L Karle and Dan W Urry. Crystal structure of cyclic (apgvgv)-2, ananalog of elastin, and a suggested mechanism for elongation/contraction ofthe molecule. Biopolymers: Original Research on Biomolecules, 77(4):198–204, 2005.
[279] Dror Tobi and Ivet Bahar. Structural changes involved in protein bindingcorrelate with intrinsic motions of proteins in the unbound state. Proceedingsof the National Academy of Sciences, 102(52):18908–18913, 2005.
[280] Shin-ichi Aota, Motoyoshi Nomizu, and Kenneth M Yamada. The shortamino acid sequence pro-his-ser-arg-asn in human fibronectin enhances cell-adhesive function. Journal of Biological Chemistry, 269(40):24756–24761,1994.
[281] Diwakar Chada, Timothy Mather, and Matthias U Nollert. The synergysite of fibronectin is required for strong interaction with the platelet integrinαiibβ3. Annals of Biomedical Engineering, 34(10):1542–1552, 2006.
170
Appendix A
Code and scripts
Appendix A includes a collection of scripts used throughout this dissertation. Inorder to improve readability, code snippets were used where possible to commu-nicate the core function of the program.
171
A.1 Code for implementing machine learning
The following R packages were used for carrying out machine learning in Chapter3.
1 library(caret)
2 library(glmnet)
3 library(mltools)
4 library(standardize)
5 library(pROC)
6 library(caTools)
7 library(Hmisc)
8 library(CaretMisc)
The following R code snippet was used to set up the train control for machinelearning in Chapter 3.
1 # Set up 5 repeats of 10-fold CV
23 train_control_repeatedcv <- trainControl(
4 method = "repeatedcv",
5 number = 10, # k = 10
6 repeats = 5, # repeat 5 times ,
7 classProbs = TRUE ,
8 summaryFunction = twoClassSummary , # binary outcome
9 savePredictions = "final" # save predictions for the best
hyperparameter set
10 )
The following R code snippet was used to execute the machine learning in Chap-ter 3.
1 # Set up search grid for hyperparameters
2 hyperparameter_search <- expand.grid(
3 alpha = c(0, 0.4, 0.6, 0.8, 1),
4 lambda = c(0.1, 1, 10, 100)
5 )
67 # Set seed to make results reproducible
8 set.seed (123)
910 # Execute glmnet using caret
11 data_elastic_net <- train(
12 association_HT ~ ., # head -to-tail outcome
13 data = data_train_engineered , # training data set
14 metric = "Sens", # select sensitivity as metric
15 method = "glmnet",
16 trControl = train_control_repeatedcv ,
17 tuneGrid = hyperparameter_search
18 )
1920 # Examine the effect of the hyperparameters
21 plot(data_elastic_net)
The following R code snippet was used to evaluate the trained glmnet model inChapter 3.
172
1 # Examine the effect of the hyperparameters
2 plot(data_elastic_net)
34 # Create model list
5 allModels <- list("elastic_net" = data_elastic_net)
67 # Read in test data set
8 test_data <- readRDS("data_test_engineered.rds")
910 # Examine test/train performance
11 results <- lapply(allModels ,
12 eval_classifier ,
13 test_data = test_data) %>%
14 bind_rows(.id = "modeltype")
173