Computational modelling of tropoelastin modifications and ...

Computational modelling of tropoelastin

modifications and interactions

Jazmin Ozsvar

School of Life and Environmental Sciences

Faculty of Science

University of Sydney

2020

A thesis in fulfilment for the completion of

Doctor of Philosophy

“Cell and tissue, shell and bone, leaf and flower, are so manyportions of matter, and it is in obedience to the laws of physics that

their particles have been moved, moulded and conformed.”

D’Arcy Wentworth Thompson

i

Declaration

This is to certify that to the best of my knowledge, the content of this thesis

is my own work. This thesis has not been submitted for any degree or other

purposes.

I certify that the intellectual content of this thesis is the product of my own work

and that all the assistance received in preparing this thesis and sources have been

acknowledged.

Jazmin Ozsvar Date

ii

Acknowledgements

My first thank you goes to my supervisor, Professor Tony Weiss, for his constantenthusiasm for research, his relentless stream of fascinating ideas, and remarkabledepth of knowledge. I deeply appreciate his patience with me whilst I transitionedfrom biology to biophysics, and for encouraging me to pursue studies in a fieldthat had been unexplored within our laboratory. It has been my pleasure andprivilege to have him as my supervisor throughout my doctoral studies.

I also wish to thank Dr Suzanne Mithieux and Dr Giselle Yeo for their encour-agement, insights and feedback throughout my candidature. I greatly appreciatetheir help and advice, both within and outside the realm of science.

I could not have transitioned into molecular dynamics without the help of numer-ous people. My biggest thanks goes Assistant Professor Anna Tarakanova, whonot only answered my many questions on protein modelling and molecular dy-namics, but also provided me with the tropoelastin model and providing feedbackon this thesis. I am incredibly grateful to have the opportunity to learn fromand collaborate with her. I would also like to acknowledge the advice I receivedfrom Professor Markus Buehler and his group, as well as advice from AssociateProfessor Serdar Kuyucak and Dr Jeffry Setiadi on several technical aspects of myproject.

I would like to acknowledge the Cell Therapy Manufacturing Cooperative ResearchCentre for their meticulously planned ePhD program, their funding support, andfor providing me with opportunities to present my work throughout my stud-ies.

I am deeply indebted the members of the Weiss laboratory, past and present, espe-cially those who dedicated their work to uncovering the mysteries of tropoelastinstructure, coacervation and cell interactions. In terms of laboratory members frommy time, thank you to Behnaz, Matti, Kekini, Pearl, Ed, Avelyn, Lea, Karen, Ziyu,Aleen, Howard, Johnny and Sally for sharing your time and conversations withme. In particular, I would like to thank Richard, my partner in crime, for bothemotional and engineering support during my doctoral studies.

A big thank you also goes to Cordwell and Reeves labs for years of games nights,food outings, and lunch time “what grinds my gears” moments. Moreoever, Iwould like to further thank my friends, both near and far, outside the scientificcommunity for putting up with me during my doctoral studies and for giving meencouragement and sound advice when required.

Last, but certainly not the least, I would like to thank my family. To my parentsand my sister, words cannot describe how much it means to me that you havebelieved in me and supported my scientific endeavours throughout all these years.Sometimes a shoulder to lean on is better than all the scientific advice in theworld.

iii

Abstract

Despite their biological importance and prevalence, the elucidation of the struc-

tures and dynamics of highly flexible proteins have presented a profound challenge

to the structural biological community. Improvements over the last decades in

computational hardware and the accuracy of computational chemistry software

have permitted the in-depth exploration of flexible proteins.

Here, I delve into the molecular dynamics and mechanisms of tropoelastin - the

building block of the elastin proteins - that are crucial to its functionality, and the

interplay between primary sequence, local structure and global structure. I lever-

age the recently derived full-atomistic structure of tropoelastin through a series of

computational molecular dynamics models to dissect three facets of tropoelastin’s

functionality in this thesis. Firstly, I examine the effect of natural modifications

on the global and local structure of tropoelastin, and their implications for the

self-assembly process through which elastin is formed. I find that the global struc-

tures deviate from the canonical wild type structure, indicating the formation of

heterogeneous aggregates and cross-linking. The implied heterogeneity of these

aggregates is further explored using dimers as representative nucleation events,

where I examine the influence of physical forces and initial tropoelastin structures

on early stage self-assembly. Dimers of tropoelastin result in surprisingly diverse

associations, indicating that elastin assembly is not as homogeneous as previously

thought. Finally, I probe the interaction between tropoelastin monomers and in-

tegrins, a class of cell receptors crucial for signalling and tissue integrity. I identify

tropoelastin as a fuzzy binding protein which is capable of binding to integrins in

a variety of conformations. Furthermore, I determine that tropoelastin exhibits

preferential binding, which is dependent on the initial starting conformation.

iv

Author contribution statement

Chapters 1, 3 and 5 contain material that has been published in or submitted to

scientific journals. Where applicable, the relevant publication is cited at the start

of the chapter.

For Chapter 1, I wrote 50 % of the manuscript for the cited review.

For Chapter 3, I designed the study, conducted the experiments, performed the

majority of the data analysis, and wrote the manuscript for the cited publication. I

received assistance with data analysis and manuscript editing from the co-authors

of the publication.

For Chapter 5, I designed the study, conducted the experiments, performed the

majority of the data analysis, and wrote the manuscript for the submitted publi-

cation. I received assistance with data analysis and manuscript editing from the

co-authors of the publication.

In addition, in cases where I am not the corresponding author of a published item,

permission to include the published material has been granted by the correspond-

ing author.

Jazmin Ozsvar Date

As supervisor for the candidature upon which this thesis is based, I can confirm

that the authorship attribution statements above are correct.

Anthony S. Weiss Date

v

Disseminations Arising from this Work

Publications

Ozsvar, J., Wang, R., Tarakanova, A., Buehler, M. J. and Weiss, A.S., 2020.

Fuzzy binding model of molecular interactions between tropoelastin and integrin

αvβ3. Submitted to the Biophysical Journal.

Wang, R., Ozsvar, J., Yeo, G. C., Weiss, A. S., 2019. “Hierarchical assembly

of elastin materials”. Current Opinion in Chemical Engineering, Vol. 24, pp.

54-60.

Ozsvar, J., Tarakanova, A., Wang, R., Buehler, M. J. and Weiss, A.S., 2019.

Allysine modifications perturb tropoelastin structure and mobility on a local and

global scale. Matrix Biology Plus, 3(6), pp.800-809.

Book Chapters

Wang, R., Mithieux, S.M., Ozsvar, J. and Weiss, A.S., 2016. Synthetic-Elastin

Systems. Elastic Fiber Matrices: Biomimetic Approaches to Regeneration and

Repair, p.97-132. CRC Press.

Conference Presentations and Posters

Ozsvar, J., Wang, R., Weiss, A.S., 2019. Unravelling the interactions between

tropoelastin and integrins. Oral presentation at the Annual Matrix Biology Society

of Australia and New Zealand 2019, the Woolcock Institute, New South Wales,

Australia.

Ozsvar, J., Weiss, A.S., 2018. The Role of Allysines in Tropoelastin Dynamics

and Assembly. Oral presentation at the 3rd Matrix Biology Europe 2018, Univer-

sity of Manchester, Manchester, United Kingdom.

vi

Ozsvar, J., Weiss, A.S., 2018. The Role of Allysines in Tropoelastin Dynamics

and Assembly. Oral presentation at the 10th European Elastin Meeting 2018,

Radboud University Medical Center, Radboud, Netherlands.

Ozsvar, J., Weiss, A.S., 2017. Dynamic Landscape of Tropoelastin Assembly.

Oral presentation at the Annual Matrix Biology Society of Australia and New

Zealand 2017, Royal Childrens Hospital of Melbourne, Victoria, Australia.

Ozsvar, J., Weiss, A.S., 2017. Dynamic Landscape of Tropoelastin Assembly.

Oral presentation at the 3rd Annual Charles Perkins Center Early to Mid Ca-

reer Researchers Symposium 2017, University of Sydney, New South Wales, Aus-

tralia.

Ozsvar, J., Weiss, A.S., 2017. Untangling the interactions between cells and

tropoelastin. Cell Therapy Manufacturing Cooperative Research Centre ImpaCT

Day, Adelaide.

vii

Abbreviations

ALL allysine-aldol

ANM anisotropic network model

BSA buried surface area

BS3 bissulfosuccinimidyl suberate

cMD classical molecular dynamics

COM centre of mass

EAF exchange acceptance frequency

EBP elastin binding protein

ECM extracellular matrix

ELP elastin-like polypeptides

ENM elastic network model

GAG glycosaminoglycans

GB Generalised-Born

HADDOCK High Ambiguity Driven protein-protein Docking

LNL lysinonorleucine

LOX lysyl oxidase

MD molecular dynamics

NMA normal mode analysis

PC principal component

PCA principal component analysis

QM quantum mechanics

REMD replica exchange molecular dynamics

RGD arginine - glycine - asparate

RMSD root mean square deivation

SANS small angle neutron scattering

SASA solvent accessible surface area

SAXS small angle x-ray scattering

WT wild type

viii

Contents

1 Introduction 11.1 Elastin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Elastic fibres . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Tropoelastin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 The ELN gene . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Primary sequence . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Secondary structure . . . . . . . . . . . . . . . . . . . . . . . 51.2.4 Overall tertiary structure . . . . . . . . . . . . . . . . . . . . 71.2.5 Computational model . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Elastogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1 Tropoelastin synthesis . . . . . . . . . . . . . . . . . . . . . 111.3.2 Coacervation . . . . . . . . . . . . . . . . . . . . . . . . . . 121.3.3 Cross-linking . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3.4 Head-to-tail model of elastin assembly . . . . . . . . . . . . 161.3.5 Deposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Tropoelastin-cell interactions . . . . . . . . . . . . . . . . . . . . . . 181.4.1 Elastin binding protein . . . . . . . . . . . . . . . . . . . . . 191.4.2 Glycosaminoglycans . . . . . . . . . . . . . . . . . . . . . . 191.4.3 Integrins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.4 Model of tropoelastin-cell interactions . . . . . . . . . . . . . 22

1.5 Elastin diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.6 Applications of tropoelastin . . . . . . . . . . . . . . . . . . . . . . 24

1.6.1 Tropoelastin-only materials . . . . . . . . . . . . . . . . . . 241.6.2 Blended biomaterials . . . . . . . . . . . . . . . . . . . . . . 241.6.3 Surface coatings . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.7 Thesis aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Materials and Methodology 292.1 Computational multiscale modelling . . . . . . . . . . . . . . . . . . 302.2 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.1 Molecular dynamics workflow . . . . . . . . . . . . . . . . . 312.2.2 Modelling atomic movement . . . . . . . . . . . . . . . . . . 322.2.3 Force fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.4 Solvent models . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.3 Replica exchange molecular dynamics . . . . . . . . . . . . . . . . . 392.4 Normal mode analysis . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.4.1 Elastic and anisotropic network models . . . . . . . . . . . . 432.5 Molecular docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

ix

2.6 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.6.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . 482.6.2 Elastic net regularisation . . . . . . . . . . . . . . . . . . . . 50

3 Allysine modifications perturb tropoelastin structure and mobil-ity on a local and global scale 523.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.1 Allysine parameterisation . . . . . . . . . . . . . . . . . . . 563.2.2 Molecular dynamics input . . . . . . . . . . . . . . . . . . . 57

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.3.1 Structures of single allysine-modified tropoelastin . . . . . . 583.3.2 Converting lysine to allysine perturbs the global structure

and intrinsic dynamics of tropoelastin . . . . . . . . . . . . . 613.3.3 Allysines alter the conformational sampling of domains . . . 653.3.4 Allysines facilitate changes in salt bridges that contribute

to structural variance and lead to local secondary structuralchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3.5 Hydrophobic solvent accessible surface area decreases in thepresence of allysines . . . . . . . . . . . . . . . . . . . . . . 71

3.3.6 Distances between residues decrease upon allysine modifica-tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 Modelling of tropoelastin nucleation events 754.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2.1 Selection of tropoelastin conformations . . . . . . . . . . . . 774.2.2 Protein-protein docking . . . . . . . . . . . . . . . . . . . . 784.2.3 Preparation of structural data . . . . . . . . . . . . . . . . . 794.2.4 Determination of head-to-tail association . . . . . . . . . . . 794.2.5 Assembly of docking data . . . . . . . . . . . . . . . . . . . 804.2.6 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.2.7 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . 80

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.3.1 Semi-automated annotation of head-to-tail association . . . 814.3.2 Overview of dimer associations by starting conformation and

study type . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.3.3 Overview of dimer associations by native or synthetic origin 894.3.4 Structures arising from the canonical cross-link . . . . . . . 904.3.5 Electrostatic interactions of dimers . . . . . . . . . . . . . . 914.3.6 Surface area and solvent accessibility of dimers is driven by

tropoelastin conformation . . . . . . . . . . . . . . . . . . . 934.3.7 Correlation of dimer energies and features . . . . . . . . . . 954.3.8 Machine learning model selection using energy and surface

features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.9 Logistic regression of whole dimer data set . . . . . . . . . . 99

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

x

5 Interactions of tropoelastin with integrins 1065.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.2.1 Preparation of integrin headpiece structure . . . . . . . . . . 1095.2.2 Preparation of tropoelastin structure . . . . . . . . . . . . . 1105.2.3 Tropoelastin-integrin configuration preparation . . . . . . . 1105.2.4 Molecular dynamics modelling . . . . . . . . . . . . . . . . . 1115.2.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1145.3.1 Docking of tropoelastin to integrin αvβ3 . . . . . . . . . . . 1145.3.2 Integrin headpiece opening and associated structural changes

with REMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155.3.3 Areas of tropoelastin-integrin interaction . . . . . . . . . . . 1195.3.4 Principal component analysis . . . . . . . . . . . . . . . . . 1255.3.5 Headpiece opening remains stable in explicit solvent . . . . . 130

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6 Discussion 1406.1 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.2 Allysine modifications and their implication for self-assembly . . . . 1416.3 Updating the head-to-tail model of assembly . . . . . . . . . . . . . 1426.4 Fuzzy binding mechanisms of tropoelastin and integrins . . . . . . . 1446.5 Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

A Code and scripts 171A.1 Code for implementing machine learning . . . . . . . . . . . . . . . 172

xi

Chapter 1

Introduction

Parts of this chapter have been published as:

Wang, R., Ozsvar, J., Yeo, G. C., Weiss, A. S., 2019. “Hierarchical assembly ofelastin materials”. Current Opinion in Chemical Engineering, Vol. 24, pp. 54-60,2019.

1

1.1 Elastin

1.1.1 Elastic fibres

Elastic fibres are a vital component of the extracellular matrix (ECM) of vertebrate

elastic tissues, which includes the skin, lungs and cardiovascular system. Elastic

fibres are composed of approximately 90% elastin, an insoluble, multimeric protein

[1]. The remaining components of elastin fibres are primarily fibrillins, particularly

fibrillin-1, a class of insoluble glycoproteins [2]. As the major constituent of elastic

fibres, elastin is predominantly responsible for the key characteristic property of

these fibres, which is that of mechanical resilience. Elastin’s ability to return

to its resting state undeformed allows tissues to undergo repeated stretch and

recoil cycles that are required for the execution of appropriate functionality [3–5].

Elastin’s mechanical properties are made even more remarkable by its incredible

durability. Carbon dating methodologies have estimated elastin’s half-life to be

70-80 years [6] and, as such, it endures throughout an organism’s lifetime [7].

Elastin is composed solely of its soluble monomer subunit, tropoelastin.

1.2 Tropoelastin

1.2.1 The ELN gene

Tropoelastin is encoded by the 45 kb ELN gene, which is located on the long arm

of chromosome 7q11.2 [8]. The 34 exons of ELN are interspersed between lengthy

introns [9,10], the splicing of which incurs a variety of mRNA variants. Alternative

splicing has been observed with exons 22, 23, 24, 26A, 32 and 33 [11, 12], and

combinations thereof give rise to 13 known human tropoelastin isoforms [13]. Out

of these exons, 26A has only been found in humans [14–16]. Exon 22 is spliced

out of human transcripts but has been noted in other mammalian isoforms, such

2

as murine mRNA [11, 16, 17]. A highly conserved 3’ untranslated region exists

directly downstream of domain 36 that is thought to play a role in the regulation

of tropoelastin expression [10,18].

Variations in the relative abundance of alternatively spliced ELN mRNA tran-

scripts has been observed between tissues (Figure 1.1). This diversity is thought

to be necessary for the fine tuning of the mechanical characteristics of tissues

to suit their diverse functional requirements [13]. Indeed, studies examining the

consequences of domain insertions and deletions note changes in the intrinsic func-

tionality of tropoelastin, corroborating the hypothesis that domain insertions and

deletions result in altered tissue mechanics [19–23]. Great benefit would be gleaned

from understanding of the mechanical contributions of tropoelastin arising from

various splice variants to tissue function.

Figure 1.1: Human ELN mRNA splice variants obtained from elastic tissues. Relativeabundance of the most highly expressed human mRNA isoforms isolated from aorta, coronaryartery, lung, skin, uterus, and bladder. The loss and/or gain of exons is displayed. Imageadapted from [13].

The isoform investigated in this thesis is tropoelastin containing domain 26A and

lacking domain 22, and is commonly found in elastic tissues [9, 24].

1.2.2 Primary sequence

Tropoelastin domain are encoded by single ELN exons, and can be categorised

as either “hydrophobic” or “cross-linking” based on amino acid content and func-

3

tionality.

As seen in Table 1.1, tropoelastin’s amino acid content is dominated by non-

polar residues such as glycine, alanine, valine and proline [14]. The hydrophobic

domains comprise of variations of VPGVG repeating motifs [25–27], giving rise to

a low complexity primary sequence. The length of the hydrophobic domains are

variable, with the shorter (9 - 15 residues) hydrophobic domains occurring closer

to the N-terminus, whereas the longer (up to 55 residues) hydrophobic domains

are located within the central and C-terminal regions of the molecule [10]. The

hydrophobic domains are primarily responsible for facilitating tropoelastin self-

assembly [28–30], maintaining tropoelastin’s flexibility [31, 32], and the stretch-

recoil properties of elastin [31,33].

The cross-linking domains are distinguished by the presence of lysines, which form

cross-links within mature elastin. Cross-linking domains are termed either KP or

KA type domains, describing the amino acids (proline or alanine respectively),

which flank the lysines. KP domains exist closer toward the N-terminus, whereas

KA domains are found closer to the C-terminus. Cross-linking domains contain

anywhere between one to three lysines present within their sequences [10].

There are two exceptions to these classifications. Domain 1 (not shown) forms a

signal peptide that is cleaved off to give rise to the mature form of the protein. The

second exception is domain 36, which contains lysines but does not participate in

cross-linking [34]. Moreover, the amino acid sequence of domain 36 is unique as

its lysines are interspersed between positively charged arginines, forming a RKRK

sequence that caps off tropoelastin’s C-terminus [29].

Other residues that differ from tropoelastin’s low complexity sequence are the

cysteines in domain 36, which form the single disulfide bond within the molecule

[35]. Additionally, tropoelastin contains three negatively charged residues, which

are crucial for maintaining tropoelastin’s tertiary structure [36,37].

4

Domain Sequence Residue numbers

2 GGVPGAIPGGVPGGVFYP 1 – 183 GAGLGALGG 19 – 274 GALGPGGKPLKP 28 – 395 VPGGLAGAGLGA 40 – 51

6GLGAFPAVTFPGALVPGGVADAAAAYKAAKA

52 – 82

7 GAGLGGVPGVGGLGVS 83 – 988 AGAVVPQPGAGVKPGKVP 99 – 1169 GVGLPGVYPGGVLPGA 117 – 13210 RFPGVGVLPGVPTGAGVKPKAP 133 – 15411 GVGGAFAGIP 155 – 16512 GVGPFGGPQPGVPLGYPIKAPKLP 166 – 18813 GGYGLPYTTGKLPY 189 – 20214 GYGPGGVAGAAGKAGYPTGT 203 – 22215 GVGPQAAAAAAAKAAAKF 223 – 24016 GAGAAGVLPGVGGAGVPGVPGAIPGIGGIA 241 – 27017 GVGTPAAAAAAAAAAKAAKY 271 – 290

18GAAAGLVPGGPGFGPGVVGVPGAGVPGVGVPGAGIPVVPGAGIPGAAVP

291 – 339

19 GVVSPEAAAKAAAKAAKY 340 – 357

20GARPGVGVGGIPTYGVGAGGFPGFGVGVGGIPGVAGVPSVGGVPGVGGVPGVGIS

358 – 412

21 PEAQAAAAAKAAKY 413 – 42623 GVGTPAAAAAKAAAKAAQF 427 – 445

24GLVPGVGVAPGVGVAPGVGVAPGVGLAPGVGVAPGVGVAPGVGVAPGI

446 – 493

25 GPGGVAAAAKSAAKVAAKAQL 494 – 514

26RAAAGLGAGIPGLGVGVGVPGLGVGAGVPGLGVGAGVPGFGA

515 – 556

27 VPGALAAAKAAKY 557 – 56928 GAAVPGVLGGLGALGGVGIPGGVV 570 – 59329 GAGPAAAAAAAKAAAKAAQF 594 – 61330 GLVGAAGLGGLGVGGLGVPGVGGLG 614 – 63831 GIPPAAAAKAAKY 639 – 65132 GAAGLGGVLGGAGQFPLG 652 – 66933 GVAARPGFGLSPIFP 670 – 68436 GGACLGKACGRKRK 685 – 698

Table 1.1: Summary of human tropoelastin’s domains and their respective se-quences. The residue numbers of the sequences are indicated.

1.2.3 Secondary structure

Our understanding of tropoelastin’s structure has been hindered by the insolubil-

ity of elastin fibres, the repetitiveness of its primary sequence, and tropoelastin’s

5

inherent flexibility. Due to the lack of a full-atomistic x-ray crystal structure, a

number of elastin derivatives, including α-elastin, κ-elastin, isolated tropoelastin

domains, and synthetic elastin-like polypeptides (ELPs), were studied using circu-

lar dichroism (CD) [38, 39], Raman spectroscopy [40, 41], Fourier transform infra

red spectroscopy (FTIR) [40,42], and nuclear magnetic resonance (NMR) [43,44].

Collectively, these studies have yielded much insight into tropoelastin’s secondary

structure.

Despite initial discrepancies between studies due experimental context, such as

the solvent of choice [45–47], it is now generally agreed that the majority of

tropoelastin forms random coils and transient ordered secondary structures, which

include α-helices and β-structures [29, 48]. The majority of the random coil con-

tent is found in the hydrophobic domains, rendering them highly flexible in so-

lution [44, 49]. This flexibility is partly attributed to the numerous PG motifs

found in the primary sequence of the hydrophobic domains [50]. This is a unique

combination due to the peculiar pairing of the most and least flexible amino acids.

Glycine confers flexibility to local protein structure due to its small sidechain,

which consists of a single hydrogen molecule. Proline, on the other hand, con-

tains a bulky ring that impedes local conformational sampling, thereby disrupting

secondary structure formation. Thus, this pairing results in flexible hydrophobic

domains that exhibit transient secondary structures [51], which are thought to con-

tribute to efficient conformational sampling during self-assembly and subsequent

to cross-linking [32,52].

Tropoelastin’s cross-linking domains, particularly the KA domains, were tradi-

tionally presumed to form α-helices and poly-proline II helices (PPII) due to the

presence of desmosine cross-links [53]. Desmosine requires the specific alignment

of four lysines between two tropoelastin domains, which can be achieved via helical

configuration [53]. Further studies demonstrating that alanines are predisposed

to form α-helices within other proteins appear to support this argument [54, 55].

However, KA domains present high helical content when in trifluoroethanol, a

6

solvent that stabilises secondary structures [56, 57]. Studies examining the heli-

cal content of ELPs demonstrate that whilst α-helices are indeed present during

later stages of self-assembly, KA domains are primarily composed of random coil

content whilst in monomeric form, similar to the hydrophobic domains [58]. This

corroborates with other studies indicating that less than 10% of tropoelastin’s

structure is helical [29,48], despite almost half of its sequence consisting of cross-

linking domains. The KA domains of ELPs undergo a transition from random coil

to β-strands during early self-assembly, before forming α-helices that are stable

enough to detect via NMR [58].

1.2.4 Overall tertiary structure

Tropoelastin’s overall flexibility greatly impeded studies of its tertiary structure:

thus, no crystal structure exists to date. The first experiments to successfully de-

termine the overall 3-D shape of tropoelastin utilised small angle x-ray (SAXS) and

small-angle neutron scattering (SANS), revealing tropoelastin to be an elongated,

asymmetric molecule with distinct regions [59]. The regions, and the domains

they comprise, were mapped by the superimposition of truncated versions of the

molecule. Overlapping of the structures revealed that the N-terminal region forms

an extended coil region that encompasses domains 2-18. The coil joins onto a

flexible hinge region comprising of domains 20-24, which is adjacent to the bridge

region of domains 25-26. Subsequent to the bridge are the C-terminal domains,

also termed the “foot” of the molecule due to their spatial arrangement. Further

SAXS and SANS data revealed that tropoelastin’s tertiary structure is perturbed

by mutations of negatively charged residues within disparate regions [36,37]. These

differently shaped molecules present altered coacervation and elastic fibre forma-

tion [36, 37], highlighting the tight interplay tropoelastin’s structure-function re-

lationship.

7

Figure 1.2: Schematic Representation of Tropoelastin. Full-length tropoelastin modelexhibiting the notable and functionally significant structures. Image adapted from Wise etal. [60].

1.2.5 Computational model

The energy landscape of flexible proteins contains many shallow, local minima

that have smaller energy barriers in comparison to more ordered, folded pro-

teins, which have cone shaped energy landscapes dominated by a global minimum

(Figure 1.3). In order to understand the range of conformations tropoelastin oc-

cupies, it was examined using accelerated ab initio molecular dynamics (discussed

in Chapter 2, Methodology). Tropoelastin’s linear polypeptide chain was allowed

to fold over lengthy replica exchange molecular dynamics (REMD) simulations,

resulting in an ensemble of full-atomistic structures [61]. These were subsequently

clustered by the similarity of their Cartesian backbone coordinates, giving rise to

groups of structures that demonstrated the extent of tropoelastin’s conformational

sampling [62].

8

Figure 1.3: Energy landscape of disordered, flexible and ordered proteins Top paneldepicts the energy landscapes of disordered (left), flexible (middle) and ordered (right) proteins.The minima of proteins becomes deeper with greater order, reflecting the preference for a par-ticular set of structures in flexible ensembles, and the emergence of a favoured conformation inordered proteins. The bottom panel illustrates the conformational ensembles that result fromthe three energy landscapes. Disordered protein (left) exhibit a large variety of conformations,with the amount of conformational sampling decreasing as the energy landscape shifts towardsdeeper minima. Image from [63].

Remarkably, the representative structure of the most populated structural cluster

obtained at 300 K overlapped with the SAXS envelope that had been previously

obtained [61], demonstrating the utility of MD in exploring the conformational

landscape of flexible proteins such as tropoelastin. Analysis of the biologically

relevant 310 K structural ensemble revealed that tropoelastin maintains its overall

structure, with the lowest energy structures having a discernable extended N-

terminus extended coil region and an evident C-terminus foot region (Figure

1.4) [62]. These studies have been pivotal in our understanding of tropoelastin as

they confirm that tropoelastin is highly flexible protein rather than intrinsically

disordered molecule.

9

Figure 1.4: Structural ensemble of tropoelastin at 310 K. The dominant structure full-length tropoelastin (blue) derived via computational modelling is overlaid onto other structuresfrom its ensemble. Displacement of domains deviating from the dominant structure are marked.Image adapted from [62].

The application of both coarse-grained and atomistic modelling have yielded fur-

ther discoveries regarding the intrinsic molecular motions that contribute to tropoe-

lastin’s functionality [22,61,62]. The most notable example of this is the distinctive

scissors-twist motion at the C-terminus “foot” region of the WT molecule, which

has been implicated in self-assembly and disease states [22, 61]. Introduction of

the short - and usually absent - domain 22 abolishes the scissors-twist motion

and results in a WT+22 molecule with heightened global molecular stiffness [22].

Hydrogels fabricated from the WT+22 mutant displayed markedly different char-

acteristics to the WT hydrogels, indicating that the presence of scissors-twist and

high flexibility are key elements of self-assembly. Likewise, the G685D cutis laxa

mutation results in a contracted foot region that is incapable of the scissors-twist

10

motion, which is proposed to result in aberrant elastic fibres that contribute to

the rough skin texture phenotype of patients [61].

1.3 Elastogenesis

Elastogenesis is the term that collectively describes the hierarchical process of

elastic fibre formation (Figure 1.5). Elastogenesis is comprised of distinct phases:

tropoelastin synthesis, coacervation, cross-linking, and deposition.

Figure 1.5: Overview of elastogenesis. A) Tropoelastin (blue) is secreted as a monomer tothe cell surface. B-C) Coacervate spherules consisting of tropoelastin n-mers grow in size whilsttethered to cell surface receptors. D) Spherules are released from cell receptors and depositedonto microfibril scaffolds (orange), and E) are enzymatically cross-linked by LOX/LOXL (cones),eventually giving rise to F) fibres consisting predominantly of elastin. Image from [64].

1.3.1 Tropoelastin synthesis

Elastogenic cells, such as fibroblasts, smooth muscle cells, endothelial cells, airway

epithelial cells, keratinocytes, and chondroblasts, synthesise and secrete tropoe-

11

lastin [65–71]. The majority of tropoelastin synthesis occurs during the perinatal

stages of development [72,73], however, synthesis may be triggered in response to

tissue damage [74] or during diseases such as atherosclerosis [75].

As previously discussed, ELN mRNA is spliced, after which it is transcribed into

a 72 kDa polypeptide chain that includes domain 1, the 26 amino acid signal

sequence [76]. Cleavage of the signal sequence results in the transportation of the

60 kDa tropoelastin to the Golgi, where it folds into its tertiary structure before

secretion to the cell surface [77].

1.3.2 Coacervation

Coacervation is an endothermic, entropically favourable process through which

tropoelastin monomers self-assemble into higher order n- mer structures (Figure

1.6). Much of our current knowledge has been gleaned from extensive in vitro

studies that have utilised tropoelastin and a variety of its derivatives to explore

the instrinsic factors of this process.

Figure 1.6: Hierarchy of elements that contribute to the coacervation. The repre-sentative sequences of hydrophobic and cross-linking domains are interspersed throughout thetropoelastin monomer. The properties of the amino acid sequence and the global shape of tropoe-lastin facilitate initial n-mer aggregation and larger coacervate spherule formation. Eventually,the spherules coalesce into elastin fibres, the mature form of the protein. Image from [78].

The initial stage of in vitro coacervation is characterised by the rapid aggregation

of tropoelastin into 1-2 µm spherules, which eventually grow and stabilise into

spherules 2-6 µm in diameter [79–82]. Tropoelastin spherules assemble at the cell

12

surface before deposition onto the microfibrillar scaffold in cell culture systems

at physiological temperature [80]. Coacervation is temperature dependent, with

an optimal temperature of 37◦C [24]. The process of tropoelastin aggregation is

initially reversible, as spherules dissipate if the temperature is lowered [79], how-

ever, maintenance of a physiological temperature results in maturation, which is

indicated by spherule coalescence and the irreversible formation of fibrillar struc-

tures [83–85]. The presence of tropoelastin spherules fusing to fibrils has been

noted in native tissue, demonstrating marked similarities between in vitro and in

vivo coacervation [80,86–88].

Tropoelastin’s hydrophobic domains are primarily responsible for facilitating coac-

ervation [89–91]. Non-polar residues are a major contributor to protein folding, as

their unfavourable interactions with water propel them to bury into the protein

core, however, as tropoelastin is comprised of numerous hydrophobic domains, it

has been demonstrated that many of these domains will be at least partially sol-

vent exposed [78, 92]. Thus, at lower temperatures, the water surrounding these

domains in vitro forms ordered, clathrate-like shells that prevent aggregation until

the appropriate temperature is reached [90, 93–95]. In comparison, higher tem-

peratures allow the breaking of the hydrogen bonds of the ordered water, dis-

sipating the clathrate shells and permitting the association of the hydrophobic

domains [96]. The prevention of early self-aggregation in vivo is thought to be

mediated by chaperone proteins [97,98].

The flexibility of the hydrophobic domains has been implicated in self-assembly.

Large-scale computational modelling of 27 ELP chains revealed that the aggre-

gate maintained a hydrated, disordered, liquid-like state due to the formation of

short-lived interchain bonds [32]. It is thought that the disorder of the ELPs is

partially due the presence of PG repeats, the nature of which has been previ-

ously described in this chapter. This is corroborated by other ELP studies noting

that increasing the spacing between the PG repeats or proline removal results in

heightened β-sheet formation and amyloid-like structures [50, 99]. Further evi-

13

dence for the sequence dependent regulation of tropoelastin assembly arises from

coacervation studies of isolated lengthy central hydrophobic domains (18, 20, 24

and 26) demonstrating that the resultant structures vary between the domains at

the supramolecular level [100,101].

Although tropoelastin’s cross-linking domains are not the main drivers of coacer-

vation, they modulate coacervation in a context-dependent manner. For example,

tropoelastin isoforms containing domain 26A, a cross-linking domain that is nor-

mally spliced out in mature mRNA [102], exhibit diminished coacervation [19].

As the inclusion of cross-linking domains into ELPs decreases aggregation time in

comparison to peptides consisting of only hydrophobic domains, this suggests that

the relatively favourable interactions between the cross-linking domains and aque-

ous solvent are important for regulating coacervation [90]. As previously discussed,

tropoelastin’s cross-linking domains undergo secondary structural changes [58,96],

however, their impact on the conformational sampling of the hydrophobic domains

during coacervation has not been fully assessed. When considering the variety of

cross-linking domains within tropoelastin, it is possible that they also regulate

coacervation in a sequence and spatially dependent manner. Thus, it is imper-

ative to transition away from a reductionist approach and examine coacervation

using tropoelastin’s entirety.

1.3.3 Cross-linking

Similarly to other ECM proteins, such as collagen, tropoelastin covalently cross-

links via its lysines. Cross-linking requires the modification of at least one of

the lysine participants by a member of the copper-containing lysyl oxidase (LOX)

or lysine oxidase-like (LOXL) enzyme families. LOX and LOXL convert the ε-

amino group of lysine to α-aminoadipic acid-δ-semialdehyde (allysine) [103]. The

resultant allysine can then react with the ε-amino group of an unmodified lysine via

a Schiff base reaction to form lysinonorleucine, or undergo an aldol condensation to

give rise to allysine aldol [104,105] (Figure 1.8). Both reactions are spontaneous

14

and do not require further enzymatic action. The bifunctional cross-links are

capable of undergoing further condensation with other lysines and/or allyines to

form tetrafunctional desmosine or isodesmosine [106].

Figure 1.7: Schematic of cross-linking within elastin. Upon modification of lysine toallysine by LOX, the resultant allysine interacts with either an unmodified lysine or an ally-sine, forming lysinonorleucine and allysine aldol respectively. These bifunctional cross-links caneventually condense into more complex species such as desmosine and isodesmosine. Imagefrom [107].

Approximately 90% of tropoelastin’s lysines undergo modification and/or partic-

ipate in cross-links, indicating that elastin is extensively cross-linked [34, 108].

Mapping the locations of these modifications and cross-links is crucial to under-

standing the molecular structure of elastin, however, the precise locations of the

cross-links have been difficult to ascertain, largely due to tropoelastin’s repetitive

sequence. Attempts to elucidate these sites have resulted in a large number of

ambiguously assigned cross-linking sites [109], however, the field has seen some

successes. The first study to unequivocally identify cross-linking participants used

elastin from copper-deficient pigs, whose elastin was partially cross-linked elastin

due to abrogated LOX functionality and, thus, more susceptible to the protein

15

cleavage required for MS analysis [53]. This study demonstrated the presence of

a desmosine involving two lysines from domains 19 and 25 each, and lysinonor-

leucines between domains 10-19 and 10-25. Further studies detailing in vitro

coacervation have been useful in assessing the solvent exposed lysines available for

cross-linking using yeast-derived LOX and chemical cross-linkers [85,110].

A number of recent studies have shed light onto the nature of cross-linking in

elastin. Almost all lysine residues are partially modified, resulting in residues

that can participate in either of the bifunctional bonds [111]. Furthermore, cross-

linking domains bond in a context dependent manner, whereby KA domains, such

as domain 14, have a propensity to form tetrafunctional cross-links whilst KP

domains tend to form bifunctional cross-links [34,103,111]. These most likely arise

due to restrictions imposed by the local secondary structures of these domains, for

example, KP domains are not known to form helices and, therefore, may not align

lysines in such a manner that promotes tetrafunctional cross-link formation. A

further step towards understanding the process of cross-linking is the discovery of

the presence of both intermolecular and intramolecular cross-links within mature

elastin [34]. Although intramolecular cross-links were originally thought of as

structural intermediates required for tetrafunctional cross-links, their existence

in mature elastin suggests that they are of functional importance [34]. Taken

together, these observations suggest that elastin is more heterogeneously cross-

linked that previously assumed [34, 111], however, as not all cross-links can be

unambiguously assigned due to similarities in primary sequence, the suggestion

requires further elucidation [34,103,111].

1.3.4 Head-to-tail model of elastin assembly

The head-to-tail model of tropoelastin assembly combines the global SAXS struc-

ture [59] with the approximate locations of domains 10, 19 and 25, the cross-linked

domains identified in porcine elastin [53]. It was presumed that tropoelastin would

assemble in a head-to-tail manner similar to that of other ECM proteins, including

16

collagen, actin and lamin, which assemble into fibrils that subsequently associate

laterally to form thicker fibres [112–114]. However, considering that tropoelastin

initially forms spherules rather than fibrils during elastogenesis, it was unclear as

to how head-to-tail associations could give rise to the globular structures observed

during early coacervation.

Figure 1.8: Head-to-tail assembly of tropoelastin. A) Approximate location of domains10, 19 and 25 on the outline of tropoelastin’s SAXS envelope. B) Structure of cross-link reportedin [53] with the domain locations indicated. C) Schematic of repeating head-to-tail assembly oftropoelastin monomers based on the aforementioned domains. Obtained from [59].

More recently, it has been proposed that tropoelastin associates via a range of in-

teractions during the initial stages of coacervation. Coarse-grained computational

modelling with 40 tropoelastin monomers demonstrated head-to-head, tail-to-tail,

head-to-tail and lateral interactions over a timeframe of 10 µs [78]. Interestingly,

17

both fibrils and globular clusters of monomers were observed, suggesting a high

level of conformational sampling during this phase of coacervation. Importantly,

the presence of fibrils indicates that the nanostructures formed during initial as-

sembly contribute to the supramolecular structure of elastin arising from later

stages of elastogenesis. However, a drawback of this model is that the represen-

tative structure of tropoelastin was utilised rather than its entire conformational

ensemble and, as such, may not have captured the full scope of interactions.

1.3.5 Deposition

Microfibrils form the scaffolding onto which tropoelastin aggregates are deposited

once they leave the cell surface. Microfibrils contain a variety of proteins, of

which fibrillin-1 is the most common. Tropoelastin interacts with microbibril

components including fibrillin-1, fibulin-4 and -5, and other associated molecules

such as latent transforming growth factor β [115–118]. In addition to interacting

with tropoelastin, fibulin-4 and -5 are capable of also binding LOX and fibrillin-1

and, thus, have key roles in facilitating elastogenesis [119]. The importance of

fibulins in elastin fibre assembly and fibre directionality is crucial, as fibroblasts

with fibulin-4 and -5 knockdowns generate poorly formed elastin fibres [120]. This

has been further demonstrated in vivo, where fibulin-4 -/- mice display aberrant,

non-fibrous elastin, and a marked lack of desmosine cross-links [117].

1.4 Tropoelastin-cell interactions

Tropoelastin promotes the attachment, spreading and proliferation of multiple

cell types, including fibroblasts [121–123], endothelial cells [124, 125], and mes-

enchymal stem cells [122, 123, 126]. These cellular activities are facilitated by the

binding of tropoelastin to specific receptors on the surface of the cells, and trigger

a wide range of processes that include wound healing, elastagenesis and mainte-

nance of stemness [126]. Understanding the mechanisms that contribute to these

signalling pathways is of importance for discerning the physiological consequences

18

of tropoelastin-cell binding. Moreover, these interactions and pathways can be

used to inform future biomaterial and drug design. The cell surface receptors that

tropoelastin has been observed to interact with thus far are elastin-binding protein

(EBP), glycosaminoglycans (GAGs), and integrins.

1.4.1 Elastin binding protein

EBP is a 67 kDa inactive splice variant of β-galactosidase [97,127] that recognises

tropoelastin’s repetitive hydrophobic sequences [128]. EBP is thought to play two

roles in elastogenesis, firstly, operating as part of a protein complex that facilitates

the transportation of tropoelastin from the cell’s interior to exterior [129], whilst

also acting as a chaperone to prevent premature self-aggregation and proteolysis

[97, 98]. EBP also participates in signalling via its interactions with peptides

derived from elastin degradation [130–132]. The recognition of elastin fragments

by EBP facilitates the activation of focal adhesion kinase [131], phosphorylation

events via the Ras-Raf pathway [130,131], and the differential regulation of matrix

metalloproteinases [132], which are associated with the body’s response to tissue

damage. It is of note that, thus far, full-length tropoelastin has not been noted

to trigger these pathways through binding EBP, strongly suggesting that EBP’s

primary signalling pathway is that of wound recognition.

1.4.2 Glycosaminoglycans

GAGs are negatively charged, linear polysaccharides with a length of 10-100 kDa

that are categorised as either sulfated (heparin sulfate and chondroitin sulfate) or

non-sulfated (hyaluronic acid) [133]. GAGs are involved in the coagulation path-

way [134, 135] and inflammatory response [135], making them crucial to wound

healing and tissue regeneration. Due to their charge, GAGs predominantly in-

teract with proteins containing amino acid residues that are positively charged

at physiological pH [136]. Thus, similar to EBP, GAGs also potentially act as

19

chaperones in the context of elastogenesis by preventing premature tropoelastin

aggregation, cross-linking, and the formation of allysines through their interac-

tions with unmodified lysine residues [123, 137, 138]. The tropoelastin domains

identified to interact with GAGs thus far are domains 17-18 [123] and domain

36 [139]. Domains 17 and 36 contain lysines, and, in the case of domain 36, posi-

tively charged arginines. The removal of lysines from a peptide spanning domains

17-18 results in substantially decreased fibroblast adhesion [123], strongly suggest-

ing the involvement of these residues in tropoelastin-GAG interactions. It is of

note that further indication for the involvement of GAGs in elastogensis arises

from their propensity to interact with fibrillin-1 and -2 [140, 141]. When com-

bined with their ability to bind tropoelastin, it is likely that GAGs are capable of

supporting multiple aspects of elastogenesis.

1.4.3 Integrins

Integrins are a ubiquitous, diverse family of heterodimeric cell receptors. In mam-

mals, integrins are comprised of combinations of 18 α and β subunits. Mul-

tiple distinct subunit pairings give rise to dimers with high ligand specificity,

and thus, mediate a number of signalling pathways in response to ligand bind-

ing [142, 143], such as mechanotransduction [144, 145], differentiation [146, 147],

angiogenesis [148], wound repair [149], and tumour cell invasion [150].

The most commonly studied binding sequence is the fibronectin-derived Arg-

Gly-Asp (RGD) sequence [151], a sequence which binds to approximately half

of the known integrins, including the well-characterised integrin αvβ3 [151]. Crys-

tal structures of αvβ3 bound to either RGD or fibronectin fragments show that

ligand binding occurs at the interface between the β-propeller of the α subunit

and the βA domain of the β subunit [152, 153]. The mechanism by which RGD

causes αvβ3 to undergo conformational changes has been explored through crys-

tallography [153] and computational modelling [154], which have revealed that the

20

reorganisation of a number of α-helices within the βA domain promotes structural

changes that shift the conformation of the receptor from a bent (closed) state to

an open (active) state (Figure 1.9) [154,155]. The propagation of the signal from

the extracellular region and across the cell membrane allows for the recruitment

of other proteins at the integrin’s intracellular tail, such as focal adhesion kinase

and PKB/AKT [126].

Figure 1.9: Major integrin conformational changes associated with outside-in sig-nalling. The integrin subunits are bent within the inactive conformation (left). Upon ligand(red) binding at the headpiece interface between the α (blue) and β (green) subunits, the integrinis able to straighten (right). This conformational change is accompanied by the separation ofthe bodies of the two subunits, which allows intracellular effector proteins (purple) to bind theintegrin tail. Image from: https://pdb101.rcsb.org/motm/134.

Thus far, tropoelastin domains 17 and 36 are known to bind αv integrins, specif-

21

ically αvβ3 and αvβ5 [121–123, 126]. This is of significance because tropoelastin

does not contain the canonical RGD integrin binding sequence found in other

ECM proteins, such as fibronectin. Thus, the minimal binding sequence required

to understand the mechanisms of tropoelastin-integrin interactions requires fur-

ther investigation.

1.4.4 Model of tropoelastin-cell interactions

The current model of cell binding to tropoelastin is thought to proceed via a se-

quential combination of GAGs and integrins. GAGs are of varying lengths that

can extend far beyond integrins within the extracellular space, and as such, are

thought to facilitate initial cell adhesion to tropoelastin [123]. Meanwhile, inte-

grins are required for subsequent cell spreading and eventual signalling [122,123],

through recruitment of intracellular proteins [126]. It has been postulated that

tropoelastin contains further integrin binding sites due to the similarity of lysine-

containing motifs between domain 17 and other cross-linking domains [123]. Their

discovery would be essential to understanding the mechanisms of cell binding to

both tropoelastin and elastin.

1.5 Elastin diseases

As previously discussed, alterations to tropoelastin’s sequence directly impact its

structure and, subsequently, its function. The majority of elastin diseases are

caused by various mutations within the ELN gene that impair the morphology

and functionality of the elastin fibre [156]. Two diseases that arise from the

altered structure of tropoelastin are autosomal dominant cutis laxa (ADCL) and

supravalvular aortic stenosis (SVAS).

ADCL is characterised by loose, inelastic skin, and may present other complica-

22

tions such as hernia, or cardiovascular symptoms such as a bicuspid aortic valve.

The most common variants of ADCL arise from frameshift mutations toward the

3’ end of the ELN gene [156, 157]. This mutation results in the aberrant transla-

tion of these exons, potentially including the 3’ untranslated region, and forms an

unusually elongated protein [157]. The resultant tropoelastin monomers display

markedly altered self-assembly and deposition, whereby larger ADCL coacervates

display impaired binding to fibrillin [157]. This manifests as disorganised or frag-

mented elastin fibres [158] that result in the elevated tissue compliance [159, 160]

seen within the ADCL disease phenotype.

SVAS is a rare congenital cardiovascular disease where the patient is born with a

lesion at the sinotubular junction of the aorta [161]. It is further characterised by

a reduction in elastin content, disorganised elastin fibres, as well as an increase

in smooth muscle cells and collagen that result in the constriction of the aorta

[160,162]. Like ADCL, SVAS can arise from multiple alterations to the ELN gene,

which include missense mutations, premature stop codons, and frameshifts due to

base pair insertion and deletion events [163–166]. The result is a truncated form

of tropoelastin that is likely to display altered self-assembly and deposition into

the ECM [167].

These pathophysiologies highlight the potentially detrimental effects of alterations

to the sequence-structure-function hierarchy of elastin formation. However, sub-

stantial hurdles remain in fully characterising the functionalities of the disease-

associated states of tropoelastin, such as obtaining high quality structural data,

exploring the impact of altered self-assembly, and examining the effect of aberrant

protein sequences on downstream signalling pathways. The resolution of these

will lead to a deeper understanding of the mechanisms that contribute ADCL and

SVAS.

23

1.6 Applications of tropoelastin

Tropoelastin’s unique self-assembly properties and potent interactions with cells

poises it for biomaterial fabrication for tissue engineering and wound repair. Syn-

thetic tropoelastin can be produced at scale with an Escherichia coli bacterial

system utilising human cDNA [168, 169]. This has allowed for the feasible incor-

poration of tropoelastin into a wide variety of biomaterials, on its own or in a

material blend, for numerous applications (Figure 1.10).

1.6.1 Tropoelastin-only materials

By itself, tropoelastin forms biomaterials that are elastic, and thus, are appropriate

for use in dermal and cardiovascular tissue. Tropoelastin-only biomaterials have

the advantage of being simple to fabricate due to the self-assembly properties of

tropoelastin. A prime example of this is HeaTro (“heated tropoelastin”), which is

formed by heating freeze-dried tropoelastin to yield a highly porous scaffold that

softens on implantation [170]. Tropoelastin may also be modified, as in the case of

MeTro (“methacrylated tropoelastin”), where methacrylated lysine residues allow

for rapid light-mediated cross-linking, resulting in an elastic material appropriate

for use as a surgical sealant [171,172].

1.6.2 Blended biomaterials

An advantage of hybrid materials is the optimisation of the ratio of proteins and/or

other materials to suit the requirements of specific tissues. Consequently, this

has allowed the fabrication of a number of novel biomaterials with easily tunable

biological, physical and mechanical properties.

The ability to resist high pressure is important with respect cardiovascular and

cartilage tissue engineering. The strength of tropoelastin scaffolds can be increased

24

by coating with poly-caprolactone (PLC), where the increase in strength is pro-

portional to the thickness of the coating [173]. Porous tropoelastin-PLC blends

support chondrocyte adhesion and proliferation, showing promise for cartilage re-

pair [174].

Figure 1.10: Tropoelastin-based biomaterials. A-B) The variety of scaffolds that can befabricated from HeaTro [170]. C) Cross-section showing the interface between MeTro sealant(pink) and porcine lung (purple) [172]. Comparison of fibre sizes of electrospun D) pure tropoe-lastin and E) 80:20 tropoelastin:collagen [175]. E) Optical clarity of tropoelastin-silk cornealreplacement films [176]. F-G) Structure of electrospun tropoelastin-silk meshes for pelvic organprolapse [177].

The incorporation of tropoelastin into Integra Dermal Regeneration Template, a

currently available bioactive scaffold, increases blood vessels after full thickness im-

plantation into a porcine model [178]. The addition of soluble tropoelastin to cell

25

culture containing Integra scaffolds enhances elastin fibre formation, demonstrat-

ing that tropoelastin does not necessarily have to be blended with a biomaterial

to have an effect [179].

Tropoelastin-silk biomaterials have been extensively explored in numerous appli-

cations. These hybrids promote mesenchymal stem cell proliferation [180], in-

fluence progenitor cell lineage [181], and have application in nerve repair and

guidance [182]. The optical clarity, refractive index, glucose permeability, and

interactions with corneal cells have made tropoelastin-silk films favourable can-

didates for corneal replacement [176]. Woven electrospun tropoelastin-silk blends

have demonstrated utility in alleviating pelvic organ prolapse due to their robust

mechanical properties [177].

1.6.3 Surface coatings

In addition to its direct incorporation into biomaterials, tropoelastin can also be

used as a surface coating to enhance the biocompatibility of existing biomedical

devices or materials. Plasma immersion ion implantation (PIII) allows the cova-

lent attachment of molecules to polymers and metals. The presence of tropoelastin

on PLLA-PLGA scaffolds enhances cell adhesion and proliferation, and promotes

angiogenesis [183], and may even aid in resisting thrombosis when applied to

metals [184], displaying its potential for use in implantable cardiovascular devices.

Tropoelastin has also demonstrated utility in orthopaedic implants, where its func-

tionalisation of polyether ether ketone surfaces heightened the expression of bone

markers of human osteoblast-like cells [185].

26

1.7 Thesis aims

This thesis leverages the recent full-atomistic model of tropoelastin [61] to ex-

plore the structure and dynamics of the monomer. This thesis comprises of three

parts that each explores unique facets of tropoelastin modifications and interac-

tions.

1. Consequences of allysine modifications on a local and global scale

I assess how allysine modifications, which are essential to cross-linking, contribute

to the dynamics and structural changes that occur in tropoelastin in the context

of elastin assembly. I use replica exchange molecular dynamics to generate struc-

tural ensembles of allysine containing tropoelastin. I conduct principal component

analysis on these ensembles and find that the molecule departs from the canonical

structural ensemble. Furthermore, I show that, while the canonical scissors-twist

is retained, new movements emerge that deviate from those of the wild type pro-

tein, providing evidence for the involvement of a variety of molecular motions in

elastin assembly. Additionally, I highlight secondary structural changes and link

these perturbations to the longevity of specific salt bridges. I propose a model

where allysines in tropoelastin contribute to hierarchical elastin assembly through

global and local perturbations to molecular structure and dynamics.

2. Factors that influence the initial association of tropoelastin molecules

Using the three tropoelastin molecules that predominantly constitute tropoe-

lastin’s canonical structure ensemble, I model early stage nucleation events in the

context of head-to-tail assembly. I utilise an assortment of dimers that are gener-

ated through docking and driven by experimentally determined sites of interaction.

I dissect the interactions that constitute the dimer interactions based on the style

of overall association to discover the propensity of particular molecules to form

head-to-tail associations based on their global structure and domain placement. I

then conduct elastic net regularised logistic regression to examine the factors that

27

are important for generating head-to-tail associations. I find that the domains

are predominantly responsible for driving the type of interaction, confirming our

previous results.

3. Fuzzy binding mechanisms of tropoelastin and αvβ3

I construct a molecular model of the interactions between tropoelastin and inte-

grin αvβ3 using molecular dynamics. Using two different candidate conformations

of tropoelastin, I create docked protein-protein structures as input for replica ex-

change molecular dynamics simulations, through which I generate two independent

ensembles of tropoelastin-integrin structures. I show that one ensemble contains

more conformational changes within αvβ3 that are associated with outside-in cell

signalling over the other. Importantly, I find that these conformational changes

occur more frequently when tropoelastin binds the integrin’s α1 helix rather than

the upstream canonical binding site, the β1-α1 loop. By dissecting the frequency

of contact between the two proteins, I demonstrate that a broad variety of tropoe-

lastin domains interacted with αvβ3. In particular, I confirm the binding of two

domains, 17 and 36, previously explored in the context of cell attachment with

αvβ3. Furthermore, I demonstrate that a number of domains not previously as-

sociated with cellular interactions also contact αvβ3, including domain 20. Ad-

ditionally, I use principal component analysis to discover the molecular motions

of tropoelastin that contribute to integrin binding, and find that these motions

differed greatly between the two ensembles. I propose a model of fuzzy bind-

ing, whereby multiple tropoelastin conformations are capable of interacting with

numerous αvβ3 sites, including both the canonical ligand binding site and other

non-canonical regions.

28

Chapter 2

Materials and Methodology

29

2.1 Computational multiscale modelling

Multiscale modelling refers to the mathematical and statistical description of sys-

tems, including proteins, across various levels of detail. Broadly speaking, mul-

tiscale modelling can be categorised into bottom-up, where quantum mechanics

and full atomistic modelling are used to infer the properties of large systems, and

top-down, which involves mesoscale and continuum modelling methodologies to

describe smaller scale properties. Quantum mechanics is the most granular level

at which a system can be described as it explicitly details the movement of the

subatomic particles within a system. Due to its extraordinary computational cost,

it cannot be applied to systems on the scale of proteins, in which case full-atomistic

or coarse-grained atomistic simulations become the method of choice. Similarly,

full atomistic simulations are too expensive to be applied to systems where the

bulk properties of a material need to be observed, and thus, the use of mesoscale

and continuum modelling methods are more appropriate. Thus, the choice of

model is largely dependent on the scale of the system and the level of detail re-

quired for analysis, as increases in system size and detail occur concurrently with

increases in computational resources and time.

2.2 Molecular dynamics

As the name suggests, molecular dynamics (MD) is a bottom-up methodology

concerned with describing the motion of molecules. Since its conception in the

1940-50s, MD has emerged as a valuable technique to complement and expand

on experimental data. The advantage of MD over traditional structural biolog-

ical methods is that it can provide detailed insight into molecular motions and

mechanisms that are otherwise not captured by traditional experimental methods

such as X-ray crystallography and cryo electron microscopy, can achieve. A further

benefit of MD is that the system can be set up to suit the needs of the experiment,

30

allowing the exploration of thermodynamics under conditions that are otherwise

not achievable in the laboratory, but that are still worthwhile investigating.

Multiple software packages are available to conduct MD, including NAMD [186],

GROMACS [187], CHARMM [188] and AMBER [189]. These programs determine

the motion of each atom within the molecular system over a series of timesteps.

As the MD program of choice for this thesis is NAMD, particular methods of

integration and algorithms will be described within its context.

2.2.1 Molecular dynamics workflow

Prior to commencing a simulation, parameters such as atomic masses, atomic

coordinates and interaction potentials are first defined. These parameters are

fixed and do not change over time in classical MD simulations and, as such, the

formation and breaking of bonds are currently out of the scope of MD (Figure

2.2).

At the beginning of a simulation, each atom is randomly assigned an initial veloc-

ity within the Maxwell-Boltzmann distribution and the initial coordinates of the

protein are provided by structural files. To begin MD, the intra- and interatomic

forces are calculated based on topology files from empirically derived force fields

(discussed in 2.2.3). The motions of each atom are calculated via Newton’s second

law of movement, and global parameters for the system (including temperature,

pressure, energy, as well as atomic details, such as velocities and positions) are

output, and the configuration is saved. The saved configuration serves as the in-

put for the next iteration of calculations. In this manner, multiple time steps and

their configurations build up a trajectory of molecular motion over time. This is

termed classical MD (cMD).

31

Figure 2.1: Schematic of molecular dynamics simulation workflow. Items in red indi-cate input for molecular dynamics software, and items in blue indicate the steps carried out bythe software itself.

2.2.2 Modelling atomic movement

As mentioned earlier, the most accurate approach to describing molecular mo-

tion is through quantum mechanics (QM). QM methods are based on the time-

indepedent Schrodinger equation, which is is powerful because it accurately de-

scribes the state of both nuclei and electrons [190]. However, the major disad-

vantaged posed by QM is that the sheer volume of computational power required

to solve QM equations for all subatomic particles within a system renders it un-

suitable for models that contain anything more than tens of atoms, such as pro-

teins [191]. Thus, it necessary to utilise other methodologies to derive molecular

protein motion on a meaningful time scale.

As the Schrodinger equation cannot simultaneously model particle velocity and

position, full-atomistic protein modelling is carried out using the key assumption

of the Born-Oppenheimer approximation. The use of the Born-Oppenheimer ap-

proximation considers the contribution of electrons to be almost negligible due to

the difference in the sizes of the nuclei and electrons and solving for the nuclear

and electronic energy components separately [192]. This allows the application of

Newton’s second law of motion, which describes the mass m and position x of a

single atom atom i as

32

mi~xi = − ∂

∂~xiEtotal(~x1, ~x2..., ~xn), i (2.1)

where the potential energy Etotal factors in the positions of all n atoms. NAMD

employs the velocity Verlet method to numerically implement the above formula

[186,193].

The major advantage of utilising classical Newtonian mechanics is the improve-

ment of the time scales on which protein models can be investigated due to the

decrease in computational power required, thus increasing the depth of conforma-

tional sampling. As an example, cMD can achieve simulations of up to microsec-

onds for systems containing over hundreds of thousands of atoms, whereas QM

achieves a picosecond simulation time for tens of atoms [191]. The trade off that

stems from using classical mechanics rather than quantum methods is a loss in ac-

curacy, however, full atomistic classical models have been optimised such that they

have yielded sufficiently accurate results to mirror experimental work [186].

As MD programs iteratively calculate the positions and velocities of atoms within

a system over a series of timesteps, the timesteps need to be of a sufficient size

such that the fastest motions are treated appropriately without destabilising the

simulation [186]. In the case of full-atomistic MD, the fastest modes are hydrocar-

bon bonds. The recommended timestep for full-atomistic MD is up to 1 fs using

unconstrained hydrocarbon bonds, as this ensures that forces and velocities are

calculated at a frequency such that the simulation will not destabilise. In this the-

sis, bonds involving hydrogen are constrained using the SHAKE algorithm [194]

where the timestep is greater than 1 fs.

2.2.3 Force fields

Whilst MD programs carry out the overall simulation of biomolecules, they are not

packaged with a description of the forces that govern the molecules themselves.

33

Force fields are a set of empirical potentials derived from a combination of exper-

imental work and density functional theory, neatly tying quantum mechanics to

full atomistic modelling [195]. Force fields provide MD programs with the physical

and chemical properties of a molecule, including bond lengths, angles, torsions, as

well as non-bonded interactions. Force fields exist for a number of biomolecular

systems, including proteins, carbohydrates and lipids. The most commonly used

force fields using for simulating proteins include CHARMM [196], GROMOS [197],

OPLS [198] and AMBER [199]. This thesis primarily utilises CHARMM for con-

sistency due to its prior application in tropoelastin-based studies [61,62].

The total energy (Etotal) of a molecular system can be captured by the following

expression

Etotal = Ebonded + Enon−bonded (2.2)

where Ebonded describes all bonded interactions and Enon−bonded describes the non-

bonded interactions between atoms. In more detail, eqn. 2.2 can be expanded to

bonded and non-bonded components [196]. The bonded equation

Ebonded =∑bonds

kb(b−b0)2+∑angles

kθ(θ−θ0)2+∑

torsions

kφ[cos(nφ+δ)+1]+∑

impropers

kψ(ψ−ψ0)2

(2.3)

considers the interactions depicted in Figure 2.2. The bond stretch term applies

the bond force constant kb to b − b0, the distance from the bonds’s equilibrium

position [196]. Similarly, the angles are given by the product of the angle force

constant, kθ, and angle’s drift from its equilibrium position, θ − θ0. The dihedral

term is given by the dihedral force constant kφ, the multiplicity of the function n,

the dihedral angle φ, and the phase shift δ. The improper term is the product of

the improper force constant ψ and movement of the out of plane angle ψ−ψ0.

34

The second part of 2.2 describes the dominant non-bonded forces that govern the

behaviour of two non-bonded atoms, i and j, where i 6= j, as

Enon−bonded =∑LJi 6=j

εij[(Rminij

rij)12 − 2(

Rminij

rij)6] +

∑coulomb

qiqjεirij

(2.4)

which consists of the Lennard-Jones (LJ) approximation of van der Waals in-

teractions, and Coulomb’s law, which describes attraction and repulsion relative

to charge and intermolecular distance. Rminijdescribes the point where the LJ

potential is zero [196].

35

Figure 2.2: Schematic of intra- and intermolecular interactions described by theCHARMM forcefield. The intramolecular interactions include bonds, angles, dihedral tor-sions and improper dihedral torsions between the given atoms. Non-bonded interactions includevan der Waals and Coulomb forces.

36

2.2.4 Solvent models

The environments in which the majority of proteins carry out their functions are

solvated, however, this can vary. For example, ion channel proteins are partially

embedded within the hydrophobic lipid membranes of cells and only certain do-

mains are solvent exposed [200]. Meanwhile, other proteins exist primarily within

the cytoplasm or organelles, carrying out their roles in aqueous environments [201].

The choice of solvent in MD is an important consideration, thus, the incorrect sol-

vent may incur unintended structural and functional changes within the molecule

that may not appropriately describe its motions [202].

The two main categories of solvent models used to simulate the physiological en-

vironment of biomolecules are implicit and explicit solvents. Implicit solvents

treat the solvent as a continuum or bulk material rather than modelling individ-

ual solvent molecules. The Generalized Born (GB) implicit solvent model is the

linearised form of the Poisson-Boltzmann equation, which describes the electro-

static potential of the solvent and its effect on the solute molecules [203]. As the

Poisson-Boltzmann is computationally expensive to solve, GB solvent is often used

as a suitable alternative. The GB equation can be expressed as

∆Gelec =−1

2(1− 1

εw)

N∑i,j=1

εijqiqj√

d2ij +RiRjexp(−d2ij4RiRj

)

(2.5)

where the change in electrostatic forces of the system, ∆Gelec, relies on the dielec-

tric of water εw, the partial charges of atoms i and j, the distance between the

atoms, d2ij, the partial charges of the atoms, q, and the Born radii of the atoms,

R [204].

GB models solve for the electrostatic forces based on the properties of the solute

where the solvent has a constant dielectric [205]. A solute atom is modelled as

a sphere consisting of different internal and external dielectrics. These spheres

37

are also termed Born radii, and represent the extent of exposure and interaction

between the solute and solvent. A large Born radius indicates little interaction,

whilst a small radius indicates high exposure to solvent. The significance of the

Born radius is that smaller radii undergo heavier short-range electrostatic screen-

ing, or in other words, more electrostatic dampening [204]. The calculations for

GB solvent are implemented in two steps: 1) calculating all Born radii of the

solute and then 2) calculating the electrostatics between solute and solvent.

It should be noted that results arising from implicit solvent simulations should be

treated with care as the full effects of water, such as viscosity and certain short-

range effects, may not be accurately captured. In some cases it has been noted that

implicit solvent can yield markedly different structural and thermodynamic effects

[206–208], however, these cases also indicate the need to assess the protein-solvent

models on a case-by-case basis. Since the majority of biomolecules are in contact

with aqueous solvent, it is imperative that at least a part of MD simulations are

conducted in explicit solvent to improve structural accuracy.

In contrast to implicit solvent, explicit solvent accounts for each solvent molecule

within a system, leading to less computationally efficient but more accurate simu-

lations. Rigid water models such as TIP3P [209,210] and TIP4P [211] (where the

number within the model name indicates the number of intermolecular interac-

tion points it contains) are commonly used as they are the least computationally

expensive out of the explicit solvent models. Furthermore, their use has resulted

in simulations that reproduce their biological counterparts with higher accuracy

relative to implicit solvent [61]. Thus, this thesis makes use of the initial speed

of GB solvent for simulating large systems to near equilibrium and uses explicit

solvent for structural refinement.

The introduction of a solvent in the place of a continuum requires periodic bound-

ary conditions to model infinite conditions rather than hard-edge effects. Hard-

edge effects describes events where particles bounce off the inner surface of the

38

box containing the protein-water system, and thus, affect the trajectories of other

atoms within the system [212]. By implementing periodic boundary conditions,

the original solvent box is considered to be a unit cell within a system consisting

of repeats of identical cells. Therefore, if a particle, solvent or otherwise, exits

the box on one side it will re-enter on the opposite side, effectively preserving the

mass and velocity of atoms that cross these boundaries (Figure 2.3). A compli-

cation that may arise is that if the periodic box is too small then the molecule of

interest may come into contact with itself in a manner that would not normally

occur. Thus, the boundaries of the box need to be larger than one might initially

suspect, leading to increased computational requirements to cope with the extra

number of water molecules in the system.

Figure 2.3: Representation of boundaries for simulating molecular systems. Thesimulation within the central cell is modelled such that if a particle exits the bound-ary (blue arrow out) it reappears on the other side (blue arrow in). Image from:http://isaacs.sourceforge.net/phys/pbc.html.

2.3 Replica exchange molecular dynamics

A potential pitfall of cMD is a lack of adequate conformational sampling. Molec-

ular dynamics simulations are theoretically ergodic, however, this is an apprecia-

39

bly difficult concept to realise unless a simulation is run for an incredibly long

period of time. When resources and time are limited, a common occurrence is

that molecules become stuck in local minima due to their complex conformational

landscapes [213]. Thus, a simulation might sample many microstates within a par-

ticular conformational basin, but perhaps fail to sample other biologically relevant

relevant conformations. Crossing the energy barriers required to escape confor-

mational basins using cMD requires substantial increases in time and resources,

especially for large proteins, sometimes rendering it infeasible to efficiently explore

the conformational landscape of particular molecules in this manner.

Figure 2.4: Schematic of replica exchange molecular dynamics. Four discrete replicasat different temperatures undergo m steps of MD before an exchange is attempted betweenneighbouring replicas. After the success or rejection of an exchange, the process is repeated.Image by Christopher Rowley, 2016.

Replica exchange molecular dynamics (REMD) elegantly overcomes this problem.

During REMD, a number of replicas of the same protein are exponentially dis-

tributed over a range of temperatures and are run simultaneously (Figure 2.4).

More temperature bins are clustered around the lower end of the temperature

range to ensure adequate sampling at biologically relevant temperatures, however,

temperatures beyond the biological are included to facilitate sampling of rare con-

formations. In order to deeply sample the possible conformations of a protein,

exchanges between the replicas at neighbouring temperatures occur [214]. These

exchanges are evaluated by the Metropolis-Hastings algorithm to generate a sta-

40

tistical ensemble of molecular conformations within a particular temperature or,

in other words, the probability of sampling that conformation at the specified

temperature. These ensemble methodologies are appropriate for considering the

structures of flexible proteins such as tropoelastin due to their ability to sample

a large number of conformations [61, 62]. In the case of REMD, the Metropolis-

Hastings algorithm evaluates the temperatures of neighbouring replicas at specified

time steps [215]. If the overlap between temperatures is sufficient then structures

are exchanged with each other and the simulations continue running at their new

temperatures. The probability P of an exchange transitioning from replica x to

x′ can be described by

P (x→ x′) =

1, if ∆ ≤ 0

exp(−∆), if ∆ > 0′(2.6)

where

∆ = [Ex − Ex′ ](βi − βj) (2.7)

describes the energy of two replicas, x to x′ and β is the reciprocal of the thermo-

dynamic temperature of the system, kBT , and can be written accordingly as

β =1

kBT(2.8)

Generally, an exchange attempt frequency (EAF) on the order of 1 - 5 ps, with an

acceptance rate of 20 - 30 % is deemed optimal for REMD, as enough exchanges

have occurred for a number of structures to have sampled different temperatures

[216]. Therefore, it is imperative that the simulation is set up with enough replicas

for sufficient overlap to facilitate appropriate sampling.

Although powerful, REMD is computationally expensive as a large number of

41

replicas of the same molecule are simulated in parallel. In particular, the use of

explicit solvent greatly increases the number of replicas required for efficient ex-

changes between replicas. Therefore, extended implicit solvent REMD followed

by equilibration in explicit solvent using cMD, has been used to great effect when

modelling tropoelastin [61, 62], and is applied to multiple chapters of this the-

sis.

2.4 Normal mode analysis

A central dogma of biochemistry is that the structure of a protein dictates its func-

tion [217]. Normal mode analysis (NMA) aims to capture the collective motions of

molecules by using the global shape of the molecule, rather than the microstates

(e.g. local energy minima) explored by MD (Figure 2.5).

Figure 2.5: Normal mode analysis of of p38 MAP kinase. The top schematic is ananisotropic network model of p38, showing the Cα network and regions of large (red) and small(red) relative displacement. The bottom panel depicts the directionality of the modes thatcontribute to most to the collective motions of the p38. Adapted from [217].

42

2.4.1 Elastic and anisotropic network models

Elastic network models (ENMs) can be coupled with NMA for computational

efficiency. A subset of ENMs are anisotropic network models (ANMs), which

model proteins as a highly interconnected network of Cα atoms or “nodes” of equal

masses coupled to springs [218]. A key assumption of ANMs is that the protein

oscillates around its equilibrium point - the lowest energy state - as increases in

energy will eventually drive it back to this conformation.

The harmonic potential of the spring between two nodes, i and j, is given by

V =γ

2(rij − r0ij)2 (2.9)

where γ is the spring constant, and rij and r0ij represent the actual and equilibrium

distances respectively. The second order partial derivatives of 2.9 are

Hij =δ2Vijδqiδqj

(2.10)

where q represents the x, y and/or z orientations of nodes i and j. The elements

Hij describe the 3N x 3N submatrices of a Hessian matrix, H

H =

Hii Hij

Hji Hjj

(2.11)

Expansion of the Hessian using the potential from 2.9 and accounting for the

distance between the nodes i and j yields 3N − 6 non-zero eigenvalues and 6 zero

eigenvalues. Eigenvalues of zero that give the rigid body motions of the molecule,

such as rotations and translations, and have no effect on the potential energy of

the molecule, whereas positive and negative eigenvalues indicate the local energy

minima and maxima respectively [218].

43

The kinetic energy of the system is not accounted for in the Hessian. This is given

by

Md2∆q

dt2+ H∆q = 0 (2.12)

where M is the matrix containing the masses of all Cα, H is the Hessian, and q

is the equilibrium configuration. Solving 2.12 yields a 3N -dimensional vector, uk,

containing vector ak, which consists of amplitude and phase factor, and ωk, the

frequency of the mode of motion, is

uk(t) = akexp(−iω2k) (2.13)

which when substituted into equation 2.15 becomes the generalised eigenvector

equation of

Huk = ω2kMuk (2.14)

where H is now weighted by M, such that its eigenvectors can be solved to give

the normal modes of the system. The energy of each mode, k, is proportional to

its eigenvalue (i.e. frequency), λk = ω2k. The implication of this is that the slowest

modes represent larger displacements, and hence, the motions that are the most

likely to dominate global motions of a molecule [218].

As previously mentioned, an assumption of ANMs is that proteins oscillate around

their equilibrium points. Therefore, a requirement of ANM analysis is an equilib-

rium structure is used as input, rather a structure from a local minimum. This

thesis utilises a structure of tropoelastin derived from multiple rounds of REMD,

as described in [61, 62]. Here, this is assumed tropoelastin’s equilibrium confor-

mation as it is the most populated cluster’s lowest energy structure from the

structural ensemble obtained at 310 K.

44

2.5 Molecular docking

The majority of proteins function in concert with other molecules to elicit bi-

ological effects. A substantial portion of modern biochemistry has focused on

elucidating the binding mechanisms between proteins and their partners to better

understand key molecules in biological processes. Traditionally, X-ray crystallog-

raphy, cryo electron microscopy and nuclear magnetic resonance have been used

to assess the conformation and binding partners of proteins [219]. Although these

techniques have been successful in many cases, they are not without their draw-

backs, as they are lengthy and often limited in the size of the proteins they can

analyse. Furthermore, all three techniques have had only mild success when exam-

ining highly dynamic intrinsically disordered proteins. Therefore, using computa-

tional methodologies to complement these technologies has increased in popularity

since the conception of MD.

The determination of a large number of structures through the aforementioned

techniques has allowed for the creation of molecular docking software. The ratio-

nale behind molecular docking is that once a protein structure has been acquired,

it can be fit against its binding partner based on a combination of geometry and

favourable interaction energetics. The simplest manner of docking one protein to

another is by restraining one protein in space and rotating its binding partner to

calculate the best fit in terms of energetics such as electrostatic and short-range

non-bonded interactions (Figure 2.6), however, this method does not generate a

wide variety of unique structures [220,221].

45

Figure 2.6: Schematic of protein-protein docking. Two proteins are docked to eachother via rotation to obtain the best combination of energies, including the van der Waals andelectrostatic energies. Image from BioExcel.

Two major limitations for current docking methodologies exist. As previously

hinted, many docking programs regard molecules as rigid entities when, in reality,

their structure varies on local and global levels. As fitting together two rigid pro-

teins would be highly forced in terms of minimising steric and electrostatic clashes,

rigid-body docking maybe be inconclusive or yield biologically irrelevant results.

The second limitation is of docking programs is that they rarely factor in that the

sites of interaction between molecules may have arisen from ambiguous data, such

as mutagenesis, bioinformatic predictions, and NMR titrations, rather than from

high resolution structural analysis. Under normal circumstances, driving docking

purely with ambiguous data skews the software towards false positive binding and

can result in conformations that are biologically irrelevant similarly to rigid-body

docking.

Taking these limitations into consideration, this thesis utilises HADDOCK 2.2

(Highly Ambiguity Driven protein-protein Docking) [222]. The advantages of

46

HADDOCK are that it allows semi-flexible docking to incorporate small move-

ments in both the protein backbone and side chains provide a better fit between

molecules. Furthermore, it randomises any user-defined ambiguous restraints to

decrease bias towards a particular set of restraints that might be ambiguous. The

docking rounds of HADDOCK are:

1) Thousands of models are generated by rigid-body docking within a relatively

small time frame using rotational and translational moves. A user-defined num-

ber of the best molecules are taken to the next stage. The scoring function to

determine the best conformations is

Score = 0.01Eair + 0.01Evdw + 1.0Eelec + 1.0Edsolv − 0.01BSA (2.15)

where Eair is the energy of the ambiguous interaction restraints (such as those

derived via NMR), Evdw is the van der Waals energy, Eelec is electrostatic energy,

Edsolv is desolvation energy.

2) The next stage involves semi-flexible refinement of torsion angles for refinement

of the interactive regions. This can include the side-chains or the backbone, or

both. Torsion angles are sampled to allow for small (2 A) conformational changes.

All models from this stage are utilised in the next stage.

Score = 0.1Eair + 1.0Evdw + 1.0Eelec + 1.0Edsolv − 0.01BSA (2.16)

where the energy terms are as described above, and also include BSA, which is

buried surface area.

3) Final refinement occurs in explicit solvent to establish electrostatics and de-

termine residue-residue contacts. Explicit solvent can be either the TIP3P water

model or DMSO to mimic a membrane.

47

Score = 0.1Eair + 1.0Evdw + 0.2Eelec + 1.0Edsolv (2.17)

2.6 Machine learning

Machine learning refers to the use of algorithms to predict events or recognise

patterns within data. It has been used in a wide range of scientific (and non-

scientific) applications, such as predicting protein disorder [223], recognition of

nuclear sequence elements [224], and diagnostic medical imaging [225]. Machine

learning can be broadly categorised into supervised and unsupervised learning.

The goal of supervised learning is to either conduct regression or classification

on data where the data is labelled with the outcome of interest. Unsupervised

learning is primarily used to detect patterns where the outcome may be undefined

and is achieved through clustering or dimensionality reduction.

When conducting supervised machine learning on a data set, the data is split into

train (“known cohort”) and test (“unknown cohort”) sets. The model is built up

on the training cohort and is subsequently applied to the test cohort. The train

set must be sufficiently large and variable enough such that the model robustly fits

the data, however, the test set cannot be too small, otherwise it will not accurately

capture the performance of the model. By measuring the fit of the model to the

test cohort, it can ascertained whether the model can be considered sufficiently

predictive before applying it to completely fresh ”real world” data. Care must

be taken to prevent the overfitting of the model to the train set, for example, by

applying regularisation, or else it may not be able to predict unseen data.

2.6.1 Logistic regression

Regression is a commonly used analysis that, in its simplest linear form, describes

the relationship between two variables, x and y, as:

48

y = β0 + β1x+ ε (2.18)

where β0 is the y-intercept, β1 is the slope, and ε describes the random error

component. Linear regression involves solving for the coefficients, β0 and β1, that

provide the best model to fit the data. As this form of regression describes a

continuous linear curve, it is not appropriate for binary categorisation problems

such as those described in this thesis. On the other hand, logistic regression is a

heuristic for binary and multicategorical classification problems. At its simplest,

it takes the form:

P (y − 1|x) =eβ0 + β1x

1 + eβ0 + β1x(2.19)

where P (y = 1|x) describes the probability of x occurring, where β0 and β1 are

the same as in linear regression. The value of the β1 coefficient determines the

probability threshold that is used to classify the data into discrete classes, such

that if β1 = 0.5, that when P > 0.5 for an observation, then that observation will

be classed as class B rather than class A. The sigmoidal function described above

can be transformed into a log-odds output by applying the logit function:

logP

1− P(2.20)

to determine the coefficients. By log transforming the probability of x, the coef-

ficients can be derived via the estimation of the maximum likelihood, thus fitting

the original sigmoidal curve to the data.

49

Figure 2.7: Comparison of linear and logistic regression. Linear regression describesdata that fall out of the 0 - 1 range of a binary prediction, unlike the sigmoidal curve of thelogistic function which asymptotically approaches 0 and 1.

The output of logistic regression is a log-odds ratio that indicates whether a vari-

able has an effect on predicting the outcome or not. The log-odds ratio can be

interpreted as how much the variable of interest increases or decreases the odds of

the outcome occurring. Importantly, the log-odds of each variable are in the same

units as the outcome, rendering logistic regression highly interpretable.

2.6.2 Elastic net regularisation

This thesis utilises elastic net, a type of regularisation that adds a penalty term to

the regression to prevent overfitting. Elastic net is a linear combination of Lasso

(L1) and ridge (L2) regularisation penalties, with the formula:

i=1∑n

(y − y)2 = λ2β21 + λ1 |β1| (2.21)

where the sum of the squared residuals (y and y) that determines the best fit of

the model is penalised by both the Lasso and ridge methods. Lasso regularisa-

tion selectively shrinks the slopes of the variables using the absolute magnitude

of the slope, whereas ridge regression shrinks the variables equally by using the

square of the slope. The hyperparameter λ is solved for by a grid search in ma-

chine learning that provides the best model with respect to predictive power and

50

overfitting.

51

Chapter 3

Allysine modifications perturbtropoelastin structure andmobility on a local and globalscale

This chapter has been published as:

Ozsvar, J., Tarakanova, A., Wang, R., Buehler, M. J., Weiss, A. S., “Allysinemodifications perturb tropoelastin structure and mobility on a local and globalscale”. Matrix Biology Plus, 3(6), pp.800-809.

52

3.1 Introduction

Elastin is the major elastic extracellular matrix (ECM) protein that is crucial for

the mechanical resilience of elastic vertebrate tissues, including the skin, lungs

and cardiovascular system [96]. The elastin polymer predominantly comprises its

soluble subunit, tropoelastin [156], which is secreted by elastogenic cells and un-

dergoes hierarchical self-assembly to form elastin fibers [226]. Assembly is initiated

after secretion to the cell surface, where tropoelastin molecules rapidly form small

spherules through a process termed coacervation [80, 227]. These spherules are

then deposited onto the microfibrillar scaffold within the ECM [79] where they

assemble into robust, insoluble, and extensively cross-linked fibers [96].

The cross-linking of elastin is facilitated by one or more members of the family of

lysyl oxidase (LOX) enzymes and commences prior to deposition onto the microfib-

rillar scaffold [228]. As an amine oxidase, LOX modifies the ε-amino side chain of

lysine to an α-aminoadipic-δ-semialdehyde, resulting in an allysine residue [5,229].

Allysines are capable of undergoing spontaneous condensation with either the ε-

amino groups of lysines or the semialdehydes of other allysines, forming linear

lysinonorleucine (LNL) or allysine-aldol (ALL) cross-links respectively [230]. LNL

and ALL are able to condense further, forming larger, more complex cross-links

such as desmosine or isodesmosine [231]. These four types of links are the most

abundant cross-linked species within the mature elastin fiber [106,232].

The contribution of allysines to elastin assembly, other than purely their ability to

form cross-links, is currently unknown. Elastin assembly is a finely tuned process

relying on the intrinsic properties of tropoelastin, including the association of its

hydrophobic domains and positioning of its cross-linking domains (Figure 3.1),

both of which are dependent on its molecular arrangement and flexibility [36, 37,

111,233]. The robust balance between molecular arrangement and function can be

perturbed by mutations that result in structural changes within both tropoelastin

molecules and elastin fibers [32,61,96,234]. Thus, it is probable that the presence

53

of allysines can also affect the conformation of tropoelastin, and in turn, influence

coacervation and subsequent higher order assembly processes. To explore this,

knowledge of allysine locations through cross-linking sites is needed to appreciate

the spatial arrangement of molecules during coacervation and the overall steps by

which elastin fibers are formed.

Figure 3.1: Schematic representation of tropoelastin domains and allysines exploredin this study. The hydrophobic and cross-linking domains are represented by the black andwhite boxes respectively. The N- and C-termini are denoted on either side of the schematic.The orange circles mark the domains containing allysine modifications in this study. The cor-responding allysine (in orange) and their flanking amino acid sequences are depicted. Adaptedfrom [122].

The precise cross-linking patterns within native elastin are only partially under-

stood because the highly repetitive sequence of tropoelastin has hampered the

mapping of specific cross-linking sites. More recently, utilization of enzymatic

cleavage and mass spectrometry of native elastin has identified candidate regions

involved in cross-linking [232]. Corroborating evidence for the involvement of these

sites arises from further studies probing the in vitro cross-linking of synthetic re-

combinant human tropoelastin [53,235]. Despite advances in pinpointing cross-link

locations, tropoelastin is a flexible molecule that retains its canonical shape [110],

where this flexibility has rendered it difficult to accurately map these cross-linking

sites to the tertiary structure of the molecule and impeded the use of traditional

high resolution techniques to resolve its entire global structure [85]. As such, the

only experimental shape data for tropoelastin are low resolution small angle x-ray

scattering and small angle neutron scattering structures that comprise ensembles

whose high resolution components were recently identified through molecular dy-

namics [22, 32,61].

54

Recently, the full atomistic structure of tropoelastin was detailed as an ensemble

using extensive replica exchange molecular dynamics (REMD) simulations [111].

The structure correlates remarkably well with the previous low-resolution struc-

tural data, and its secondary structural features are in accord with those indi-

cated by circular dichroism and molecular mutation studies. These highlight the

power of molecular dynamics (MD) in modeling flexible molecules. Examination

of the molecule through normal mode analysis (NMA) also gave insight into the

predominant global motions of tropoelastin that are likely to contribute to self-

assembly [111, 234], providing a molecular basis with which the effects of modifi-

cations within the scope of elastin assembly can be probed.

As the relatively flat energy landscape of flexible molecules, such as tropoelastin,

allows them to transition between energy minima and take on a multitude of con-

formations, it is most appropriate to analyze flexible molecules as a structural

ensemble. Principal component analysis (PCA) is being increasingly used to pin-

point and link structural variation to functionality within protein ensembles [60].

On this basis, PCA has been applied to wild type (WT) tropoelastin and has

highlighted that despite its nature as a flexible molecule, its overall architecture

fluctuates within a molecular ensemble that biased toward its canonical struc-

ture [110].

Here, it was investigated whether allysine modifications are capable of altering the

structure and dynamics of tropoelastin molecules. REMD was conducted to sam-

ple the conformational landscape of tropoelastin to understand the consequences

of single and multiple allysine modifications. Ensemble-based methods such as

PCA were used to describe changes in overall structural variance and flexibility,

local secondary structural changes and the mobility of specific residues. The in-

trinsically accessible molecular motions within the allysine containing molecules

and investigate the contribution of salt bridges to the molecular changes were

also examined. These findings reveal that allysines do more than simply serve as

static precursors to cross-links in elastin assembly, by contributing to changes in

55

molecular structure and dynamics.

3.2 Methods

3.2.1 Allysine parameterisation

The methods for generating the WT model of tropoelastin have been previously

described [61]. For single allysine modifications, residues 353 and 507 were selected

for modification based on previous evidence for their involvement in and multiple

references for cross-linking (Table 3.1). The modified molecules ALK353 and

ALK507 were used to probe for changes tropoelastin may undergo subsequent to

a single allysine modification. To understand the effect of multiple allysine modifi-

cations, residues from the aforementioned residues were simultaneously modified,

as well as sites at 150, 199 and 239 in the protein, where these sites were based on

prior characterization of native and synthetic elastin (Table 3.1) to give 5ALK.

These changes were restricted to sites where multiple publications point to the

sites of allysines. The rationale behind using 5ALK was that elastin is extensively

cross-linked, and this allowed us to explore a representative construct where the

majority of tropoelastin molecules contain more than one modification.

Residue Domain References

150 10 [53,235]199 13 [85,110,111,235]239 15 [111,235]353 19 [53,85,110,111,235]507 25 [53,85,110,111,235]

Table 3.1: Summary of lysines residues converted to allysines in this study, theirrespective domains, and references to supporting studies.

The CHARMM22 force field [196] was selected due to its use in previous tropoe-

lastin simulations [61]. Allysine was applied as a patch using parameters from

the aldehyde functional groups of acetaldehyde and propionaldehyde from the

56

CHARMM General Forcefield (CGenFF) [236] using Visual Molecular Dynamics

(VMD) software [237].

3.2.2 Molecular dynamics input

The modified molecules were first simulated with NAMD [186] using implicit sol-

vent replica exchange molecular dynamics (REMD). The implicit solvent step was

intended to accelerate sampling time, as the water molecules of explicit solvent are

a major limitation in REMD. Each molecule had a total of 48 replicas distributed

exponentially over a temperature range of 280 - 480 K, giving an exchange accep-

tance frequency between 0.2 - 0.3. Exchanges were attempted every 1 ps and were

accepted based on the Metropolis criterion described in previous REMD stud-

ies [61]. Non-bonded forces were applied with cut-off of 16 A, a switch distance of

14 A and a pair distance list of 18 A. Implicit solvent was simulated using a di-

electric constant of 80, ion concentration of 0.15 M, and an α cut off of 15. A total

of 5.2 ns was simulated per tropoelastin molecule, with ∼ 240 ns total simulation

time for the entire ensemble across all temperatures of each molecule.

The root mean square deviation (RMSD) of atomic fluctuation was used as an

indicator of structural convergence. Upon reaching convergence, 1000 structures

were extracted from the last 2 ns of the 310 K replica and clustered by k-means

analysis with the MMTSB toolkit [238] using a RMSD of 5 A. MMTSB was

used to determine the distribution of clusters generated by k-means analysis and

the most representative structures of the most populated clusters. The ProDy

package [217] was utilized to construct anisotropic network models (ANM) and

principal component analysis (PCA) on the ensemble of each molecule. In-house

Tcl/Tk, R and Matlab scripts were used for all other analyses.

The average structure of the most populated cluster for each molecule was fur-

ther equilibrated in explicit aqueous solvent. A cubic box containing ∼ 100,000

water molecules was used to solvate tropoelastin, while a padding distance of 20

57

A from the molecule’s edges was used to ensure the protein would not contact

itself through the periodic boundaries of the water box. The box was neutral-

ized with sodium and chloride ions at the physiological concentration of 0.15 M.

After a brief minimization of the structure with the conjugate gradient method

and equilibration in a constant volume system, the molecules were simulated with

classical molecular dynamics in constant pressure systems using a time step of

2 fs. Equilibration was assessed using the RMSD of each structure, resulting in

equilibration times of 100 ns, 130 ns and 230 ns for ALK507, ALK353 and 5ALK

respectively. A temperature of 310 K was maintained by Langevin dynamics, with

a damping coefficient of 1/ps. A constant pressure of 1 atm was applied using the

Nose-Hoover Langevin barostat with a period of 200 fs and a decay of 100 fs. For

simulating non-bonded parameters, a cutoff of 12 A was used, with a switch dis-

tance of 10 A and a pair distance list of 13.5 A. Electrostatics were regulated by a

Particle Mesh Ewald summation with a grid spacing of 1 A. The resultant struc-

tures were used to conduct NMA using elastic network models with ProDy [217]

and VMD [237] using the NMWiz plugin.

3.3 Results

3.3.1 Structures of single allysine-modified tropoelastin

Ensemble analysis was conducted on 1000 structures that were derived from the

last 2 ns of REMD simulation per molecule. The structural variance within the

ensembles was examined through the contribution of the top 20 principal compo-

nents (Figure 3.2, A-C). It had been previously noted that 42% of the variance

in WT is captured by the top principal components [62], and here, the two prin-

cipal components accounted for 31-41% of the structural variance of the modified

molecules. The sum of the variance of the top three principal components of these

molecules ranged between 40-52% in comparison to 53% of the top three princi-

58

pal components of the WT. Additional principal components would be required

for equating structural variance between allysine modified molecules compared to

WT, so this indicated that the allysine containing molecules exhibited a higher

degree of structural variability relative to WT when examined through these top

principal components.

In addition to PCA, k-means clustering of proteins was conducted based on similar-

ity of the root mean square deviation (RMSD) of atomic coordinates in Cartesian

space. This type of clustering has previously demonstrated that WT favored a

specific configuration over other structures within its ensemble [62]. There was a

similar tendency with ALK353 and ALK507, where this trend changed with mul-

tiple allysine modifications (Figure 3.2, D-E). On this basis, 5ALK displayed a

relatively even distribution of structures throughout its top nine clusters, and was

most evident through clusters 3 to 9, which each comprised between 68 and 81

structures (Figure 3.2, F) and was consistent with a model where these structural

clusters contributed to a comparable extent in 5ALK.

The relevance of the representative structures from the k-clusters was assessed by

overlaying onto 3D PCA plots that describe PC1-PC2-PC3 space (Figure 3.2,

G-I). The most representative structure from the top 4 most populated k-clusters

(depicted in red) within each ensemble were located within dense clusters on the

PCA plots. This was also observed with WT [62]. Representative structures from

the three least sampled ensembles were also overlaid onto the PCA plot, where

they were found to reside in less populated areas. Most of these clusters were

distinct, confirming that the 3 principal components neatly discretized structural

differences between molecular k-clusters.

59

Figure 3.2: Variance of the top 20 modes of principal component analysis of thestructural ensembles for: A) ALK353, B) ALK507, and C) 5ALK. Distribution ofstructures arranged from most to least populated k-clusters using RMSD for: D) ALK353, E)ALK507, and F) 5ALK. The most representative structures from the k-clusters are overlaid ontoPC1-PC2-PC3 space for G) ALK353, H) ALK507 and I) 5ALK. Representative structures areclassed as either from most populated k-clusters (red squares) or least populated k-clusters (bluesquares).

The 2D PCA plots showed less clustering resolution as evidenced by candidate

structures from the least sampled conformations that resided in areas similar to

the most accessible structures within PC1-PC2 space (data not shown). This

was expected because the amount of structural variance of the allysine containing

molecules accounted for by PC1-PC2 differed when compared to WT, as discussed

above. This pointed to the need to proceed with at least the 3 top PCA compo-

nents.

60

3.3.2 Converting lysine to allysine perturbs the global struc-

ture and intrinsic dynamics of tropoelastin

The sum of principal component modes was used to assess the mobility of allysine-

modified tropoelastin ensembles by calculating the square displacement in Carte-

sian space of all Cα carbons in the protein backbone (Figure 3.3, A-C). As only

the top six principal component modes dominated the structure [62], the sums

of the top 2, 3, 6 or 20 principal component modes were compared. A minimum

of three PCA modes describes the allysine containing ensembles were identified.

Although the combination of the top 2 and 3 modes substantially overlapped, the

top 2 modes differed from the overall trend in combinations of higher modes in

some domains (Figure 3.3, A-C). This feature differed from the WT ensemble,

where there is good overlap of the top 2 and 3 modes [62].

Figure 3.3: Normalized square fluctuations of the backbones of: A) ALK353, B)ALK507, and C) 5ALK depicting patterns using 2, 3, 6 or 20 principal componentmodes. Arrows point to regions of marked disparity between fluctuations based on 2 or moremodes. Heat map comparisons of the top 6 principal component modes are shown for: D)ALK353-ALK507, E) 5ALK-ALK353, and F) 5ALK-ALK507.

Based on this, structural similarities between WT and allysine modified tropoe-

lastin were examined by considering the overlap of the top 6 PCA modes between

the different ensembles. Only a mild correlation between WT and singly-modified

61

tropoelastin was noted (Figure 3.4, A-C). Principal components 1, 3 and 5 of

WT respectively correlated with the principal components 1, 3 and 1 of ALK353

(34 – 45%) (Figure 3.4, A), whereas principal components 1 and 2 of ALK507

correlated with principal components 2 and 3 of WT (29 - 40%) respectively (Fig-

ure 3.4, B). In contrast, principal components 3 and 5 from 5ALK weakly (>

22%) correlated with 2 and 6 from WT (Figure 3.4, C). As the PCA modes

relate to the overall architecture of a molecule, this indicated a potential shift

away from the canonical structure. Comparisons between the allysine-modified

molecules also revealed low similarities (< 45%) between principal components of

the ensembles, indicating that the locations of the allysine modifications led to

different structural consequences (Figure 3.3, D-F).

Figure 3.4: Heatmap comparisons of the top 6 principal component modes for WTwith A) ALK353, B) ALK507, and C) 5ALK. Normal mode analysis images that combinethe 6 most accessible modes of the most representative structures for: D) ALK353, E) ALK507,and F) 5ALK. Directionality and magnitude of the modes are depicted in orange. The gradientbar depicts mobility, where red corresponds to the most mobile regions. Black arrows indicatedomains that act as hinges.

This model was supported by the addition of allysines which shifted the structural

62

ensemble from the WT structure (Figure 3.5, A-C). The extent to which average

structures departed from WT depended on the location and extent of these mod-

ifications. For example, ALK353 was globular and displayed a C-terminal foot

region that pointed down from the protein’s center (Figure 3.5, A), whereas

ALK507 was slightly more compact along its vertical axis and displayed a prefer-

ence for a C-terminus that was raised toward the center of the protein (Figure

3.3, B). Relative to WT, 5ALK revealed a compacted C-terminus and an extended

molecular body (Figure 3.5, C).

63

Figure 3.5: Representative structures from the most and least sampled k-clustersfor A) ALK353, B) ALK507, and C) 5ALK.

64

To explore how these changes in global molecular shape affected the motions intrin-

sically accessible to the molecule, NMA was employed using anisotropic network

models (ANMs). ANMs are useful in explaining global molecular motion as they

are reliant solely on the architecture of the molecule, rather than localized sec-

ondary structures, and encompass those motions most accessible to WT including

a twist in the N-terminus with a scissors motion in the C-terminus [61,62]. NMA

was used to describe representative solution structures from the most populated

k-cluster of each ensemble. On combining the lowest 6 normal modes of move-

ment, the scissors-twist motion was observed in ALK507 but not in ALK353 or

5ALK (Figure 3.4, D-F). I propose that the scissors-twist motion relies on the

C-terminus adopting a configuration with two protruding feet as seen in WT and

ALK507.

ALK353 displayed N- and C-terminal flexibility about domain 20, which acted as

a hinge, with additional C-terminal pivot on the plane orthogonal to domain 20

(Figure 3.4, D). The N-terminus ALK507 demonstrated flexibility about a hinge

formed by domain 11 and was the only modified molecule that presented a scissors

twist in its foot region (Figure 3.4, E). The N- and C-termini of 5ALK moved

about domain 19, which acted as a hinge in this molecule (Figure 3.4, F). Taking

into account that allysine-containing structures exist soon after LOX modification,

this suggests a combination of these movements contributes to assembly.

3.3.3 Allysines alter the conformational sampling of do-

mains

The fluctuations of the protein backbone in WT and allysine modified tropoelastin

for the 6 top principal component modes were compared. ALK353 and ALK507 de-

parted from WT with markedly differing regions of high and low mobility (Figure

3.6, A-C) [61] with decreases in the overall magnitude of fluctuation throughout,

accompanied by dampened mobility in domains 19 and 25 which comprised the

65

allysine modifications in ALK353 and ALK507 respectively.

In contrast, the magnitude of fluctuations in the 5ALK backbone had increased

over WT, meaning that the molecule was more flexible (Figure 3.6, C). The

mobility pattern within 5ALK was closer to that seen for WT than to either

ALK353 or ALK507, where high mobility domains within WT were also mobile

in 5ALK, specifically domains 2-5 (residues 1-51), domains 10-19 (residues 133-

357) and domains 21-23 (residues 413-445) (Figure 3.6, A-C). As tropoelastin

requires multiple allysines prior to forming elastin, 5ALK serves as a model of

functionally significant oxidized tropoelastin in elastogenesis, so its similarities in

mobility to WT were considered salient and are sequentially considered here.

Domains 2-5 are located at the head of both WT and 5ALK through domain 6

where they are accompanied by salt bridges [36, 61]. The importance of domain

6 in elastogenesis is demonstrated by the formation of markedly altered fiber

morphology when tropoelastin’s sole aspartate is mutated to alanine [36]. This

suggests that the role of domain 6 in elastin maturation is to hold domains 2-5

in place, potentially for head-to-tail assembly as previously proposed [59]. The

current data are consistent with these findings because domain 6 remained stable

relative to its flanking domains (Figure 3.6, A-C).

66

Figure 3.6: Protein backbone square fluctuations as a combination of the top 6principal component modes for: A) ALK353, B) ALK507, and C) 5ALK. Lysines andallysines are depicted as red dots.

It was noted that domains 10-19 undergo high conformational sampling in both

WT and 5ALK, which is credited to contributions to entropy-based extensibility

by this part of the molecule [239]. In 5ALK these regions are mobile relative to

the N-terminal half of the molecule.

Domains 21-23 display high fluctuations within WT and 5ALK (Figure 3.6, C).

Their flexibility as seen in previous molecular dynamics studies is proposed to

facilitate cross-linking [235]. Experimentally these same domains were identified

as cross-linking hot spots in vitro [85,110] and found as cross-links in native elastin

67

[111,235]. The current data help to explain these findings, by identifying that the

mobility of domains 21-23 enhances LOX-mediated modification and subsequent

cross-linking.

Also relevant is that domain 36 in 5ALK (Figure 3.6, C) undergoes high con-

formational sampling similarly to WT [61,62]. Consistent with observations here,

this domain contains a cell-interactive region [121, 139] and has been established

as particularly flexible in previous elastic network models and by NMA [22, 240].

It has been proposed that the C-terminus plays a role in positioning the molecule

during aggregation and eventually cross-linking [241], features which are in accord

with higher regional mobility as seen here.

The number of lysines and allysines located in regions of high displacement (80 -

85%) is similar to that reported for WT (83%) (Figure 3.6, A-C), as expected

for a model where these residues continue to sample the conformational landscape

in order to facilitate further modifications and ensuing cross-links.

3.3.4 Allysines facilitate changes in salt bridges that con-

tribute to structural variance and lead to local sec-

ondary structural changes

To explore mechanisms behind the heightened flexibility and conformational changes

caused by allysine modification, the presence of salt bridges was investigated. It

is well accepted that tropoelastin’s three negatively charged residues (D72, E345

and E414) are involved in maintaining its overall structure [36, 61, 233]. By con-

verting lysine to allysine, the positive charge is lost, rendering them incapable of

forming salt bridges. On this basis, changes in salt bridge binding were noted

that impacted upon the structural modifications, through PCA and displayed al-

tered conformational sampling. WT was capable of forming multiple salt bridges

through all three negatively charged residues for a substantial proportion of the

68

time sampled (Figure 3.7, A). In contrast in 5ALK, not only did the salt bridge

patterns change, but salt bridge longevity decreased (Figure 3.7, B). Consid-

ering the higher magnitude of protein backbone fluctuation displayed by 5ALK

(Figure 3.6, C), it is likely that these conversions from lysines to allysines re-

leased tropoelastin from a more stable configuration and accordingly conferred

increased mobility. This model is consistent with the previous observation that

there was less overall dominance of a single cluster in the entire structural ensem-

ble (Figure 3.2, F), because less salt bridges would lead to a regional freeing of

the molecule and increase its ability to locally sample other states.

69

Figure 3.7: Salt bridge contact maps for: A) WT and B) 5ALK, where salt bridgepresence and longevity are indicated by black bars. The percent transient α-helicalcontent of WT and 5ALK is shown in C) specific domains and D) the entire molecules. E)Displays the solvent accessible surface areas of hydrophobic domains globally. Distance mapsare shown for lysines and/or allysines in: F) WT and G) 5ALK, where the gradient depictsincreasing time spent in close proximity (4.7 A) to nearby lysines and/or allysines.

The local secondary structural effects of allysines were examined within their re-

spective domains. It has been previously noted that changes in α-helicity within

domains have a tendency to predispose them to stiffness and alter the collec-

tive motions of the molecule [61]. The α-helical content of domains 19 and 25

exhibited substantial changes (Figure 3.7, C), yet the overall α-helicity of the

molecules did not significantly differ (Figure 3.7, D). The lack of change at a

global secondary structural level was consistent with a requirement for flexibility

70

in self-assembly [32, 52]. Additionally, when considering the previously discussed

differences in overall tropoelastin mobility, the maintenance of global structure

highlighted potential differences between disease-associated mutations [36,61,233]

and natural functional modifications.

3.3.5 Hydrophobic solvent accessible surface area decreases

in the presence of allysines

The total solvent accessible surface area (SASA) of the hydrophobic domains was

calculated for all molecules. A decrease from the previously published SASA of

196.24 nm2 for WT [61] to 168.02 - 171.20 nm2 for the modified molecules was

observed (Figure 3.7, E). This compared with changes in the accessibility of

hydrophobic regions at the same scale of those observed for two previously modeled

tropoelastin mutations, D72A and G685D [61]. As the exposure of hydrophobic

regions is known to drive coacervation, this would be explained by decreased salt

bridges and increased mobility of the molecule, which allows hydrophobic regions

to bury further inside the modified molecules than seen in WT.

3.3.6 Distances between residues decrease upon allysine

modification

In addition to forming intermolecular cross-links, tropoelastin is also known to

form multiple intramolecular cross-links [111, 235]. Approaching positive charges

on juxtaposed lysines tend to repel, so it is logical that conversion to the neutral

allysine reduces the distance between Cε and Cδ groups of these residues [235]. The

current study is consistent with these findings, as it established that the presence

of allysine facilitates an increase in the proportion of time spent in proximity (4.7

A) to its neighboring lysines when WT and 5ALK are compared (Figure 3.7,

G-F).

71

3.4 Discussion

Allysine formation is an essential step in making elastin from tropoelastin, yet its

molecular effects have not previously been considered. This study is the first to

demonstrate that structural changes arise from allysine modifications. Converting

lysine to allysine alters structural ensembles, changes the mobility and accessibility

of domains, and varies accessible molecular motions of tropoelastin.

It is well accepted that the structure and functionality of tropoelastin are substan-

tially affected by single point mutations [36, 61, 233]. Although deviations from

WT structure are generally linked to disease states [241], here, it was demonstrated

that naturally occurring modifications are also capable of altering WT structure.

I established that structures within the ensemble depart from the canonical WT

shape with progressing modifications. This departure is of biological relevance,

as the structural consequences of allysines had not been fully explored within the

context of elastogenesis. Furthermore, this study highlighted the decrease in dom-

inance of a single set of structures with progressing allysine modifications. These

findings are in accord with recent mass spectrometry data and help to explain the

heterogeneity of elastin cross-linking [111]; therefore, I posit that decreased struc-

tural dominance contributes to this heterogeneous cross-linking because 5ALK

more evenly samples a range of structures.

Tropoelastin’s mobility is crucial to its functionality and also plays a significant

role in self-association [32, 52]. Here, it was demonstrated that allysine-modified

tropoelastin displayed altered mobility relative to WT in key domains. This effect

was assisted by sparse, short-lived salt bridges that resulted in local and global

secondary structural changes. The high conformational sampling of WT most

likely facilitates rapid aggregation and LOX mediated modification [62]. The

high conformational sampling of WT most likely facilitates rapid aggregation and

LOX mediated modification, however, I propose that the altered mobility patterns

within ALK353 and ALK507 could serve as a checkpoint required prior to further

72

assembly. Considering elastin’s known extensive cross-links and functionality, this

checkpoint limits participation by molecules lacking sufficient allysines and re-

duces the probability of their incorporation into the growing elastin chain where

they would form a weakly cross-linked fiber. This checkpoint model is supported

by the known presence of lysines in relatively mobile regions of tropoelastin that

are recognized as important in cross-linking [53,111,235,242]. Further support for

the checkpoint model arises when considering the time frame of elastin assembly.

Tropoelastin molecules cross-link subsequent to aggregation, which occurs after

LOX has completed modification and dissociated from tropoelastin. This study

benefits from the fact that tropoelastin structures organize on the order of nanosec-

onds, whereas coacervation occurs on the order of seconds [243], which means that

assembly into elastin occurs much later and is at least several magnitudes of order

slower than the time scales examined here. This indicates that allysine contain-

ing tropoelastin transitions away from the canonical tropoelastin shape prior to

aggregation and cross-linking. However, the contribution of allysines to mobility

is likely to change once tropoelastin is cross-linked due to restrictions imposed by

the resultant bond. Further molecular dynamics studies could be undertaken to

explore the effect of cross-linking on the mobility of allysine containing tropoe-

lastin.

The current head-to-tail model of elastin assembly is based on the mapping of a

handful of cross-links [53] onto the low resolution structure of WT [59]. ANMs

based on the global architecture of WT have implicated the C-terminal scissors

twist motion as being crucial to head-to-tail assembly [61]. The presence of the

scissors twist in ALK507 further verified its importance in self-association steps.

However, the ANMs of ALK353 and 5ALK displayed a loss of the C-terminal

twist, unexpectedly indicating that these previously unexplored motions are also

likely to contribute to higher order assembly.

The hydrophobic domains of tropoelastin dominate and drive tropoelastin asso-

ciation [96] and a decrease in SASA is associated with altered coacervation [61].

73

I propose that the lowered SASA of allysine-containing tropoelastin contributes

to the formation of aggregates that LOX can penetrate and further modify. This

type of aggregate would therefore be an experimentally unexplored component in

higher order elastin assembly that is testable in vivo.

A limitation of the current study is that only one allysine out each of the selected

domains was modified due to the required scale of computing resources. Prior

data indicate that domains may contain more than one modification, which raises

the question of the nature of the changes incurred by modifying a different nearby

lysine or more than one lysine within a single domain. It is difficult to predict

the precise consequences of this without further modeling. I hypothesize that the

modification of two lysines in a single domain, if they participate in salt bridge

formation, would impact on tropoelastin structure. To test this hypothesis, vari-

ous combinations of allysines could incorporated into future molecular dynamics

studies.

Taken together, these data reveal that allysines can cause global changes in struc-

ture, domain mobility and overall molecular motions of tropoelastin, and so con-

tribute to irreversible cross-linked aggregates in hierarchical elastin assembly.

74

Chapter 4

Modelling of tropoelastinnucleation events

75

4.1 Introduction

The process of self-assembly is crucial for the formation of many higher order

biological structures [244]. In many cases, self-assembly is initiated by nucleation

events, whereby the stochastic association of the smallest repeating unit of a higher

order structure results in nuclei. Eventually, more molecules participate in nuclei

as self-assembly transitions to a growth phase. Elastin self-assembly is thought to

proceed along a similar pathway, with tropoelastin’s hydrophobic domains being

responsible for both the nucleation and the growth phases [78,96]. However, due to

the inherent difficulties of structurally analysing tropoelastin, the precise sites that

trigger nucleation and growth phases have not yet been determined. Furthermore,

as hydrophobic domains are highly repetitive and not cross-linked, it is difficult

to determine their interactions, and thus, understand the manner in which they

associate.

The head-to-tail model of tropoelastin assembly was proposed based on a com-

bination the SAXS/SANS structures of recombinant human tropoelastin and the

location of a desmosine cross-link found within native porcine elastin [53, 59].

Mapping the approximate locations of the domains (10, 19 and 25) within the

desmosine on the SAXS envelope indicated that the alignment of the N-terminal

head of tropoelastin with the foot-region of a second molecule may be able to

form the bond [59]. However, the head-to-tail model has several inconsistencies.

Firstly, the SAXS envelope is non-atomistic low resolution structure, and thus, the

domain sites are not in full agreement with the MD-derived full-atomistic model

of tropoelastin [61]. Secondly, the primary porcine tropoelastin sequence differs

from its human counterpart [245]. As the tertiary structure of tropoelastin is

sensitive to perturbations [37, 61, 233], it is feasible that the structures of porcine

and human tropoelastin are sufficiently dissimilar such that the domains do not

precisely map between the two molecules. Furthermore, the head-to-tail model is

based on a single cross-link yet to be noted in human elastin [34,111].

76

To date, the interplay between tropoelastin monomers during coacervation has

not been captured by traditional structural methods. A recent coarse-grained MD

study harnessed the MARTINI force field examined the early stages of the coac-

ervation of forty tropoelastin molecules for up to 10 us of simulation time [78].

Nucleation events were found to occur via head-to-head, head-to-tail, tail-to-tail,

and lateral types of association, demonstrating that at least at this early stage, a

multitude of types of associations are possible. This variety was preserved through

to nascent fibril formation, where multiple types of associations contributed to fib-

rillar structures. However, the underlying reasons for the variety of conformations

throughout early stage coacervation were not examined, nor was the contribution

of head-to-tail structures investigated.

In this chapter, I employ a combination of docking and logistic regression to explore

the factors that are important for the head-to-tail formation of tropoelastin dimers.

I examine domains that have been confirmed to interact within native elastin

as well as those derived from synthetic cross-linking studies to examine a wide

array of dimers. I analyse the energy and surface area terms in the context of

association type and starting conformation, and build a logistic regression model

to identify the features of the data set that are key for predicting the head-to-tail

outcome.

4.2 Methods

4.2.1 Selection of tropoelastin conformations

As described in prior studies, the last 1000 frames of tropoelastin equilibrated

at 310 K via cMD were grouped through k-means clustering according to their

RMSD (Figure 4.1). The most average structures by RMSD (i.e. the structures

closest to the centre of the clusters) from the top 3 most populated clusters were

selected for docking, as these are likely to be the most representative of each cluster

77

(Figure 4.1). These were termed TE1, TE2 and TE3 respectively, according to

the ranking of their clusters.

4.2.2 Protein-protein docking

Tropoelastin dimers were generated through protein-protein docking. To drive

the docking, considered sites from a variety of studies containing ambiguous and

non-ambiguous interactions were considered. The majority of studies examin-

ing elastogenesis focus on lysine-lysine interactions, as these can be pinpointed

as cross-links in native elastin and can be synthetically generated during coac-

ervation of recombinant tropoelastin. Examples of non-ambiguous interactions

include studies utilising BS3-mediated cross-linking of tropoelastin during self-

assembly [110] and mass spectroscopy analysis of native elastin [34, 53, 111] that

unequivocally identified cross-linking sites. Multiple studies note ambiguous in-

teractions, particularly in those examining fragments of native elastin [34, 111],

due to the repetitive nature of tropoelastin’s primary sequence. For example,

the GVKPG sequence of domain 8 is noted to be cross-linked to a KF peptide

sequence, which corresponds to any of domains 17, 19, 27 or 31 [111], thus ren-

dering it incredibly difficult to determine the specificity of cross-linking within

native elastin. Studies were picked were the lysine within at least one domain

was unequivocally identified and included all the lysines as potential candidates

from the other domains. A full list of the studies and residues used to generate

the dimers can be found in Appendix 1. The HADDOCK 2.2 webserver was se-

lected for protein-protein docking [222], as HADDOCK is capable of dealing with

ambiguous sites of interaction, as previously detailed in Chapter 2.

2000 structures were considered for the initial docking phase and the 500 most

energetically favourable structures were allowed to progress to the water refine-

ment stage and included for subsequent analysis. The data that were considered

included structural data in the form of Cartesian coordinates, the energy of the

78

total system as well as its subsets (such as van der Waals and electrostatic energy),

and buried surface area. A complete list of the variables used in this chapter can be

found in Appendix 2. A total of 19,500 dimers were generated for analysis.

4.2.3 Preparation of structural data

As previously mentioned, HADDOCK generates structural data in the form of

PDB files containing the Cartesian coordinates of the atoms of each dimer. The

Cartesian coordinates were used to calculate the centres of mass of the head

(residues 1 – 180), middle (residues 330 – 420) and tail (residues 600 – 698) re-

gions of each individual tropoelastin molecule (Figure 4.1, B). The three centres

of mass of each region per molecule were then used to calculate the head-head,

head-middle, head-tail, middle-middle and tail-tail Euclidean distances across all

dimers (Figure 4.1, B).

4.2.4 Determination of head-to-tail association

Large data sets, such as the one in this study, present with the problem of anno-

tation. As the current choice of docking software, HADDOCK, does not have the

capability to identify the manner of protein-protein association, each dimer must

be annotated outside the docking program. A semi-automated annotation process

was employed here, utilising a combination of principal component analysis (PCA)

and k-means clustering to determine the subpopulation of dimers that could be

classified as associating in a head-to-tail manner. Only the Euclidean distances

between the three centres of mass were factored into PCA at this stage, as it could

be hypothesised that this would be the most fundamental information required to

cluster structures according to their type of interaction. The k-means analysis was

conducted using various values of k to determine the optimal number of clusters

that would most accurately discretise head-to-tail associations along the PC axes.

79

The associations were validated using manual inspection of the dimers within each

cluster (Figure 4.2, A).

4.2.5 Assembly of docking data

The majority of scripting was carried out using R. A full list of the packages

used for analyses can be found in Appendix 2. Global and domain-level solvent

accessible surface area (SASA) and hydrophobic SASA (H-SASA) was calculated

using VMD. Case statements were written to examine the effects of flanking amino

acid residues and cross-linking domain types (i.e. KA or KP).

4.2.6 Correlation

All numeric variables were examined with Spearman’s correlation prior to con-

ducting machine learning. Spearman’s correlation was selected over Pearson’s

correlation, as many variables examined here were ordinal rather than continu-

ous.

4.2.7 Machine learning

The caret package [246] in R was used to carry out the majority of the ma-

chine learning. The caret package is a library of machine learning algorithms,

from which logistic regression, boosted logistic regression, random forest, k-nearest

neighbours, and Naıve Bayes classifiers were selected for model comparison. The

recipes package was used for processing dimer data into a form that was suitable

for caret. Pre-processing steps included imputing and centring the mean of all

numeric values, and scaling the variables. All categorical variables were one-hot

encoded, as this assists with decreasing multicollinearity.

Machine learning was carried out with a 80:20 test/train split across all dimer

80

structures. This ensured that the train set would contain enough data to capture

the majority of the variance of the data, whilst the test set would contain enough

data to measure the validity of the trained model. The split was conducted in

a stratified manner to ensure that the distribution of the outcome (head-to-tail

association) was similar within the test and train sets. Five repeats of 10-fold

cross-validation were carried out to train the model.

4.3 Results

4.3.1 Semi-automated annotation of head-to-tail associa-

tion

The most average tropoelastin conformations from the three most populated struc-

tural clusters of wild type tropoelastin’s REMD ensemble were extracted for dock-

ing. These were labelled TE1, TE2 and TE3 after their respective clusters (Figure

4.1, A). TE1 was most similar to the global shape of tropoelastin that has been

previously observed in SAXS/SANS studies [59]. The N-terminus of TE2 was

displaced relative to TE1 and folded towards the spur region of the molecule.

The N-terminal displacement in TE3 was more mild, which appeared as a length-

ier molecule due to its extended C-terminal region. The three most populated

structural clusters cumulatively accounted for over 20% of the total structures of

the ensemble, and thus were likely to represent the dominant states occupied by

tropoelastin.

81

Figure 4.1: Conformations and regions of tropoelastin used in this study. A) Tropoe-lastin conformations, TE1, TE2 and TE3, from replica exchange molecular dynamics at 310K [61] selected for docking analysis (top panel). The direction of the N and C termini arelabelled. The conformations were derived from the three most populated clusters of the tropoe-lastin ensemble, the distribution of which is depicted below the conformations (bottom panel).B) The regions of tropoelastin selected for distance calculations for dimer studies, with the head(red), middle (pink) and tail (blue) regions highlighted on TE1.

A total of 19,500 dimers arising from the explicit solvent stage of docking were

considered for this study. These required annotation before logistic regression

analysis, as it is an unsupervised type of machine learning requiring data labels.

Annotation was carried out using a combination of k-means clustering and PCA of

the distances between the head, middle and tail regions of the dimers (Figure 4.1,

B), as well as manual inspection for final validation. Only distance measurements

were used as input for k-means clustering, as the energy terms would be later used

for machine learning and their inclusion may have, thus, contributed to some bias

during subsequent analyses. The initial number of k-clusters selected was 4 based

on prior coarse-grained tropoelastin coacervation studies that identified four broad

types of associations: head-to-head, head-to-tail, tail-to-tail and lateral [78].

82

Figure 4.2: Clustering and annotation of the dimers generated through tropoelastin-tropoelastin docking. A) PCA of the dimers derived from distance measurements betweenthe head, middle and tail regions of each molecule. The distribution of structures according tok-means clustering within PC1 and PC2 are shown and the colours represent individual clustersarising from k = 4. The surrounding dimers flanking the PCA arose from the same clusters,and can be described as lateral inverted (top left), head-to-tail (bottom left), tail-to-tail (topright), tail-to-middle (bottom right). B) The amount of variance explained by each principalcomponent. Examination of optimal k-cluster number via C) total within sum of squares, D)the silhouette method, and E) gap statistics.

83

PCA of the distances was conducted to visualise the projection of k-clusters in

principal component (PC) space (Figure 4.2, A). The arch shape of the PC1-

PC2 plot indicated that the dimers were non-linearly related to one another, and

had most likely arisen due to the subset of docking sites that were selected. PCs 1,

2 and 3 respectively accounted for 48%, 25% and 21% of the variance of the data

set, and cumulatively described > 90% of the variance of the measurements, and

thus, were used for subsequent analyses (Figure 4.2, B). Manual inspection of

the clusters arising from k = 4 and their corresponding structures yielded mixed

populations of structures, with a single cluster containing inverted lateral, head-

to-tail, tail-to-tail and tail-to-middle dimers (Figure 4.2, A). This indicated that

a k of 4 was insufficient to correctly discretise the dimers by association.

The optimal value of k was inspected using various statistical methodologies, in-

cluding the total within sum of squares, silhouette and gap statistics methods.

The total within sum of squares analysis generated an elbow plot, where the op-

timal number of clusters was indicated by the elbow (or “hinge”) of the curve. In

the case of the current data, however, the point of the elbow was unclear, appear-

ing to range between 3 and 5 (Figure 4.2, C). The silhouette and gap methods

recommended a k of 6 or 5 respectively (Figure 4.2, D – E), however, inspection

of the resultant clusters still yielded mixed populations of dimers.

Through incremental increases in the value of k and manually checking the resul-

tant clusters, it was noted that a cluster number of 12 yielded discrete populations

of dimers that could be classified into a single type of association (Figure 4.3, A).

The types of dimers that arose could be broadly classified as either head-to-head,

head-to-middle, head-to-tail, inverted lateral, tail-to-middle or tail-to-tail.

84

Figure 4.3: Final clustering and annotation of dimers. A) Clusters from k-means analysisoverlaid onto PCA of distance measurements, where k = 12. B) Assignment of dimer associationand consolidation of clusters by association type. The directionality of the distance measure-ments’ contributions to the PCA are displayed. The data is displayed in PC1-PC2 (left) andPC2-PC3 (right) space.

Multiple clusters could be assigned the same type of interaction, thus, it is likely

that differences in their overall conformation resulted in their separation during

clustering (Figure 4.4). For example, clusters 7 and 8 were both classed as head-

to-tail, however, cluster 7 consisted of structures that had more contact between

the N- and C-termini relative to those of cluster 8. Clusters that fell within the

same broad classification were grouped for subsequent analyses and then projected

onto PC1-PC2 and PC2-PC3 space to verify that they formed discrete clusters

(Figure 3, B). After consolidating the clusters, it was evident that the largest

amount of PC1-PC2 and PC2-PC3 space was taken up by tail-to-middle clusters,

which suggested a large amount of structural variance. In comparison, head-

85

to-tail clusters comprised a small portion of PC space, indicating less variety

within the distance measurements of these structures. The dominant distance

measures that were responsible for PC1 and PC2 were examined to yield insights

as to which measurements were the most important for k-means clustering. PC1

was primarily described by the tail-to-tail and head-to-head distances, whilst the

largest contribution to PC2 arose from the head-to-middle distance. Examination

of the association clusters within PC space showed that head-to-tail dimers were

similar to lateral inverted dimers along PC1-PC2 space, which can be intuitively

explained by the requirement for both these groups that at least one head and one

tail of each tropoelastin molecule must be in contact with each other.

Of note is that the head-to-tail structures here do not strongly resemble those

previously proposed [59]. However, some structures were observed that could

classed as head-to middle appeared to better resemble the head-to-tail model,

as the head of one tropoelastin neatly fit the groove within the middle of the

other (Figure 4.4). Furthermore, the tail-to-middle structures also appeared to

potentially be an altered version of the head-to-tail model, however, with one

tropoelastin molecule almost orthogonal to the other.

86

Figure 4.4: Structures of dimers arising from k-means clustering. Examples of dimersfrom each k-cluster are shown, with their assigned association type labelled and their N and Ctermini labelled.

87

4.3.2 Overview of dimer associations by starting confor-

mation and study type

The majority of structures generated via HADDOCK were categorised as head-to-

middle (29.08 %) or tail-to-middle (26.35 %). The next most common dimer asso-

ciations were head-to-head (17.29 %), head-to-tail (10.51 %), followed by inverted

lateral associations (9.96 %) Tail-to-tail associations were the least common type

of interaction noted, comprising only 6.81 % of the data (Figure 4.5, A).

Figure 4.5: Dimer association by initial tropoelastin conformation. A) The number ofstructures per association type when in the context of the initial tropoelastin conformation usedfor docking. B) PCA of the distance measurements shown by tropoelastin conformation.

The frequency of association type was further dissected by the three starting

conformations of tropoelastin, TE1, TE2 and TE3, to assess their preferences

for particular types of dimer association (Figure 4.5, A). The most frequently

observed association mode for TE1 and TE2 was head-to-middle, whilst TE3 pre-

dominantly formed tail-to-middle associations. Approximately uniform numbers

of head-to-tail and head-to-head structures resulted between each tropoelastin

conformation. However, substantial differences in numbers due to tropoelastin

conformation were noted within the other types of dimers, with greatly differing

numbers of head-to-middle, inverted lateral and tail-to-tail dimers arising from the

three conformations of tropoelastin. When dimers arising from three conforma-

tions of tropoelastin were projected onto PC1-PC2 space resulting from distance

88

measurements, as previously described, the three conformations showed some over-

lap, with higher densities of dimers occurring as expected compared to association.

Taken together, this indicates a clear preference in association type on the basis

of the starting conformation. As TE1 represents the most prevalent structure in

solution at 310 K, it is possible that the majority of initial dimer conformations

during the early stage of elastin assembly associate in a head-to-tail manner. De-

spite this, however, it is also evident that there is a mix of dimers present, which

is similar to that of previous findings [78].

4.3.3 Overview of dimer associations by native or syn-

thetic origin

The cross-links used to drive dimer formation were derived from either native or

synthetic elastin studies. To understand whether these two cross-link sources could

affect the type of dimers formed, the distribution of structures stratified by cross-

link type was next examined. Approximately the same number of tail-to-tail and

inverted lateral dimers arose from native and synthetic studies, however, differ-

ences existed between all other groups (Figure 4.6, A). In particular, the number

of head-to-tail structures resulting from native studies was small (> 100), indicat-

ing that synthetic studies may be predisposed to forming head-to-tail interactions.

A similar trend was observed for tail-to-middle dimers, whilst head-to-middle asso-

ciations occurred predominantly when residues from native studies were supplied.

No dimers that could be classified as head-to-head were observed from synthetic

studies.

Examination of study type by PCA demonstrated areas of overlap that corre-

sponded to the inverted lateral, tail-to-middle, head-to-middle and head-to-tail

regions observed previously (Figure 4.3, B), as also reflected by Figure 4.6, B.

The synthetic studies in the lower right quarter of the PCA showed that they

largely overlapped with the tail-to-middle region previously described, whilst the

89

lower left corner shows that head-to-middle dimers mostly arose from native stud-

ies, showing overall disparity between the types of dimers generated between study

types.

Figure 4.6: Dimer association by native or synthetic elastin study. A) The number ofstructures per study type when in the context of the initial tropoelastin conformation used fordocking. B) PCA of the distance measurements shown by study type.

4.3.4 Structures arising from the canonical cross-link

To more deeply explore the head-to-tail model of elastin assembly, dimers arising

from interactions between domains 10, 19 and 25 were specifically examined, as

these domains form the basis of the model. Unexpectedly, it was found that TE1

and TE3 did not form head-to-tail dimers through these domains, and only one

structure formed by TE3 involving domains 10 and 19 could be classed as such

(Figure 4.7). The dominant type of association was head-to-middle (59.75 %),

followed by tail-to-middle (15.18 %), inverted lateral (12.51 %) and tail-to-tail

(12.47 %) interactions. Inspection of the locations of domains 10, 19 and 25 re-

vealed why this is the case. With respect to the geometry of tropoelastin, it would

be unlikely for domains 10, 19 and 25 to participate in head-to-tail style interac-

tions due to the positioning of their lysines, however, it is conceivable that subse-

quent interactions after nucleation events may allow such interactions. Interest-

ingly, none of the dimers examined here supported the hypothesis that two tropoe-

lastin molecules alone could be responsible for the trifunctional cross-link amongst

90

these regions, however, it is possible that during coacervation some molecules may

shift to accommodate this bond. Furthermore, the exposure of the non-interacting

lysines positions them for further association, thereby allowing propagation of the

growing nascent elastin chain.

Figure 4.7: Dimer associations arising from the interactions between domains 10,19 and 25. Association type is displayed with respect to initial tropoelastin conformation. Thedimers represent interactions between domains 19 and 25 from TE1 (left), 10 and 19 from TE2(middle), and 10 and 25 from TE3 (right). The N and C termini of each tropoelastin moleculeare indicated. The lysines within domains 10 (purple), 19 (cyan) and 25 (yellow) are shown.

4.3.5 Electrostatic interactions of dimers

Electrostatic interactions play a key role in protein-protein interactions and asso-

ciation rates. Due to their long range of interaction, electrostatics are recognised

as a crucial factor in early stage protein-protein interactions, especially prior to

two molecules forming contact, as the contribution of other forces at this stage are

near-zero. As this study models tropoelastin interactions at the dimer stage, it

can be assumed despite the close proximity of tropoelastin molecules, the distri-

bution of positively charged lysines strongly influence the dimerisation process -

91

indeed, tropoelastin contains 35 lysines that are positively charged at physiological

pH.

The electrostatic energy of the head-to-tail structures significantly differed from

head-to-middle, inverted-lateral and tail-to-tail structures, but bore similarities to

head-to-head and tail-to-middle structures (Figure 4.8, A).

Figure 4.8: Electrostatic interactions of tropoelastin dimers. A) Electrostatic energyby the type of dimer interaction. B) Distribution of electrostatic energy by initial tropoelastinconfiguration. C) Surface potential of the tropoelastin structures used in this study, displayingpositively (blue) and negatively (red) charged areas. Wilcoxon’s rank sum test with Bonferronipost-hoc corrections are indicated with respect to head-to-tail conformations, where *** indicatesp ≤ 0.001.

Curiously, dissecting electrostatic energy by tropoelastin conformation revealed

that the dimers formed by TE3 had overall lower energy relative to those formed

92

by TE2 and TE3 (Figure 4.8, B). This could partly be due to the relatively large

proportion of inverted lateral structures formed by TE3 in comparison to TE1 and

TE2 (Figure 5), which had the lowest electrostatic energy of the dimers (Figure

4.8, A). Inspection of the electrostatic potential of the surface of each initial

tropoelastin conformation revealed that TE3 contained less hotspots of positive

surface potential, which may contribute to less electrostatic clashes and, thus,

lower the electrostatic energy of these dimers.

4.3.6 Surface area and solvent accessibility of dimers is

driven by tropoelastin conformation

Buried surface area (BSA) is an important consideration for protein-protein inter-

actions that comes into play after the initial electrostatic interaction has brought

the molecules together. Hydrophobic residues, in particular, are responsible for fa-

cilitating the amount of buried surface area due to their energetically unfavourable

interactions with water molecules, and play a key role during elastin assembly by

facilitating aggregation [96]. Examination of the total BSA of the dimers revealed

that head-to-tail dimers had the lowest BSA out of all the dimer types (1224.41

± 420.87 A), whereas tail-to-tail dimers reported the highest BSA (1497.96 ±

463.08 A) (Figure 4.9, A). When probed in the context of starting tropoelastin

conformation, buried surface area appeared to be uniform throughout all three

conformations (Figure 4.9, B). Of note is that dimers formed with TE2 had a

higher average BSA (1452.19 ± 447.83 A) relative to TE1 (1370.25 ± 431.85 A)

and TE3 (1321.41 ± 421.16 A), which indicates a larger area of interaction that

may contribute to forming more stable dimers.

93

Figure 4.9: Buried surface area and hydrophobic solvent accessible surface area ofthe dimers. Buried surface area by A) association type and B) initial tropoelastin configu-ration, and hydrophobic solvent accessible by C) association type and D) initial tropoelastinconfiguration. Wilcoxon’s rank sum test with Bonferroni post-hoc corrections are indicated withrespect to head-to-tail conformations, where *** indicates p ≤ 0.001.

Tropoelastin aggregation and eventual self-assembly occurs due to the association

of hydrophobic domains, and thus, the dimers were examined in the context of

hydrophobic solvent accessible surface area (SASA). Head-to-tail dimers had the

second highest hydrophobic SASA (45094.95 ± 323.03 A) and were only preceded

by head-to-head interactions (45133.83 ± 341.85 A) (Figure 4.9, C). This is

an intriguing finding as favourable association of hydrophobic domains is crucial

for elastin assembly - as such, the current findings indicate that assembly may

not initially proceed in a head-to-tail manner. However, when considering the

94

requirement of exposed hydrophobic domains for continued assembly, the head-

to-tail dimers may present with a large amount of hydrophobicity that favours

further interactions. Thus, it is difficult to predict with certainty as to whether

head-to-tail dimers are ideal for assembly.

When hydrophobic SASA was examined according to the initial tropoelastin struc-

tures, it was noted that TE2 had a higher mean hydrophobic SASA (45237.74 ±

320.43 A) when compared to TE1 (45013.06 ± 294.56 A) and TE3 (44944.43 ±

301.05 A) (Figure 4.9, D). The low hydrophobic SASA of TE3 dimers indicate

that they are favourable for interactions with water as they have less hydrophobic

residues exposed to solvent. However, the need for the hydrophobic force to drive

elastin self-assembly is so crucial that it is possible that the dimers with lowered

hydrophobic SASA formed by TE3 may not aggregate as rapidly as those formed

by TE1 and TE2. Indeed, TE2’s larger hydrophobic SASA is likely to prime it

for further interactions during coacervation and facilitate more rapid aggregation

in comparison to TE1 and TE3. When taken together with the larger BSA of the

TE2 dimers in comparison to those from TE1 and TE3, it is probable that TE2

forms the stable dimers out of the three conformations examined and are more

amenable to further interactions with other tropoelastin molecules.

4.3.7 Correlation of dimer energies and features

The correlation between numeric variables was examined prior to conducting ma-

chine learning, as collinearity of features lead to model bias via redundancy. Spear-

man’s correlated was selected as the data presented previously, such as the elec-

trostatic energy, were non-parametric (Figure 4.8, B).

95

Figure 4.10: Correlation analysis of the numeric features of the dimer data set. A)Heatmap correlation plot displaying the magnitude and strength of the correlation. Spearman’scorrelation is shown in the cell at each intersection between the features, with values only beingdisplayed where p ≤ 0.05.

The correlation between the energy terms and solvent accessible surface areas was

examined prior to conducting logistic regression, as variables with high collinear-

ity do not improve model accuracy and introduce redundancy into the algorithm.

It was observed that none of the variables were highly correlated, with the ex-

ception of total energy and electrostatic energy which had a correlation of +0.95

(Figure 4.10). This was not unexpected when considering the above analysis of

electrostatic energy and its contribution to the overall dimer systems considered

for this analysis. Before proceeding to logistic regression, total energy was re-

moved, as it is the sum of all energy terms in this analysis, and as such, was likely

to be redundant in the context of machine learning. Another correlation that

could be intuitively interpreted included the negative correlation between BSA

and desolvation energy, as desolvation involves the decoupling of bound proteins,

which would decrease BSA. No other features were removed before proceeding to

machine learning.

96

4.3.8 Machine learning model selection using energy and

surface features

Machine learning was undertaken to understand whether the energy and surface

features of the data set would be able to make a model that could sufficiently

predict the head-to-tail outcome. Logistic regression was first constructed using

elastic net regularisation. This model yielded a sensitivity of 0.28 and a specificity

of 0.97 that corresponded to the true positive and true negative rates respectively,

thus indicating that the model was a poor predictor of the head-to-tail outcome

(Figure 4.11, A). This finding was intriguing, as prior statistical inference hinted

that there were discernible differences between head-to-tail dimers and the other

types of dimers.

The first alternative route that was explored was whether other models would fit

the data better relative to logistic regression. To test this, a number of classifica-

tion algorithms were applied that are readily available through the caret package

of R, including k-nearest neighbours (k-NN), Naıve Bayes, random forest, as well

as boosted logistic regression. The sensitivity yielded by all trained models was

low (> 0.3), with k-NN yielding the highest sensitivity (0.17) other than logis-

tic regression, whilst the lowest sensitivity, other than the previously described

logistic regression model, arose from the Naıve Bayes classifier (0.008) (Figure

4.11, A). Therefore, the predictive capabilities of these models only yielded small

improvement when compared to that of the initial logistic regression.

97

Figure 4.11: Assessment of optimal machine learning model for the energy andsurface area features from the dimer data set. A) Comparison of sensitivity and specificityof trained models using regularised logistic expression, naive Bayes, random forest, boostedlogistic regression and k-nearest neighbours classifiers. B) Projection of head-to-tail dimers ontoPCA of data set.

The second possibility that could explain the poor sensitivity of the previously

described models was that despite the statistical significance between features

such as electrostatic energy and hydrophobic SASA, the overlap in energy and

surface area features previously described was too great for the model to be able

to conduct classification. This was in greater detail by projecting head-to-tail

structures onto PC1-PC2 and PC2-PC3 space generated by the features input

98

into the models (Figure 4.11, B). This demonstrated an approximately even

dispersion of head-to-tail and non-head-to-tail structures throughout, indicating

that constructing a model using energy terms alone was inadequate to conclusively

predict head-to-tail associations.

4.3.9 Logistic regression of whole dimer data set

To improve on the initial model, logistic regression was next conducted on data

containing all features other than total energy and the distance measurements

used in previously in k-means clustering. The best trained model had a sensitivity

of 0.77 (6288 out of 8200 head-to-tail dimers were correctly identified), and a

specificity of 0.98 (68,257 out of 69,805 not head-to-tail dimers were correctly

identified), with hyperparameters of λ = 1 and α = 0 (Figure 4.12, A and B).

This indicated that pure ridge regularisation was the best fit for the model, despite

the option for the model to utilise elastic net regularisation. The performance

of the model was assessed on the test set, which yielded an AUC of 0.971 and

overlapped very well with the train set (Figure 4.12, C). This indicated that the

model was neither overfit nor underfit to the data.

99

Figure 4.12: Logistic regression model performance of extended dimer data set.A) Optimisation of model sensitivity using regularisation (λ) and mixing (α) hyperparametersbetween 0 - 1. B) Confusion matrix of best trained model, showing the amount of true positive(TP), false positive (FP), true negative (TN) and false negative (FN) outcomes. C) Modelperformance shown as area under the curve (AUC) of the fraction of true positive and falsepositive outcomes. Train and test data sets are indicated in comparison to the performance ofa classifier that is equal to that of random chance (AUC = 0.5). D) Variable importance of thefeatures used in the best model.

Variable importance was next examined to understand which features were pre-

dominantly used by logistic regression to predict head-to-tail dimers. The top five

features assigned the highest importance consisted of domain numbers, for exam-

ple, domains 8 and 12 were ranked as the two most important features (Figure

4.12, D). An intuitive interpretation of this would be that these domains are

100

located around the head region of tropoelastin, and thus, are essential for a head-

to-tail interaction. Similarly, domains from the lower regions, particularly the

spur and foot areas such as 23 and 29, were also ranked highly, as these make

up the “tail” component of the interaction. The TE2 conformation was the 7th

most important feature, which was logical as the TE2 conformation appeared to

form more head-to-tail structures relative to TE1 and TE3, as previously dis-

cussed (Figure 4.5, A). However, its importance value of 10 was considerably

smaller compared to those of the domains ranked above it, indicating that it was

not used extensively in the prediction. Desolvation energy and buried surface area

ranked 11th and 13th in terms of importance, most likely for similar reasons as to

those given for the TE2 conformation. Interestingly, whether the domains were

KA or KP type ranked lower in variable importance. KP domains predominantly

exist within tropoelastin’s N-terminal region than at its C-terminus, however, KA

domains are also known to exist at the N-terminus (for example, domain 13).

Therefore, it is possible that logistic regression did not consider the type of do-

main to be important for this classification. The contribution of domains that

were not noted to be involved in head-to-tail interactions, such as 19 and 29, were

considered negligible by logistic regression, as were additional energy terms such

as van Der Waals forces.

4.4 Discussion

The field of research investigating tropoelastin’s interactions during assembly and

its regions of interaction within mature elastin has been greatly hindered due to

tropoelastin’s repetitive sequence. Elastin peptides arising from protein fragmen-

tation during mass spectrometry often contain short sequences such as AK or

AAK which cannot be unequivocally pinpointed to a single site within tropoe-

lastin [34, 111]. For example, Hedkte and colleagues have identified domains 17,

19, 27 and 31 as potential binding partners for domain 8 based on short peptide

101

fragments, however, it is unknown whether domain 8 is cross-linked to all or a

subset of these domains, nor the frequency with which any of these interactions

may occur [34]. To date, only three studies have unambiguously determined cross-

linking sites within native elastin, resulting in a handful of cross-links that provide

the only clues as to tropoelastin-tropoelastin interactivity within native elastin.

Therefore, studies were taken into account if they at least pinpointed an interac-

tion between two domains to avoid the generation of what may be physiologically

irrelevant dimer conformations. Considering the dense nature of elastin and its

high degree of interconnectedness [96], it is likely that many cross-links between

multiple domains await discovery, and as such, the dimers examined here are only

a representation of what is currently known. Future docking studies such as this

could expand the number of structures examined by conducting docking studies

using the ambiguous peptide sequences to drive the study, however, caution must

be taken as to interpreting their biological relevance.

The model of elastin assembly has been an area of debate for a number of decades.

The initial model of assembly proposed was based on tropoelastin’s low resolution

SAXS/SANS envelope [59] and the only native cross-link that had been identi-

fied at the time [53]. As coacervation is a rapid process, it was postulated that

tropoelastin assembles in an ordered fashion, and indeed, since the SAXS/SANS

envelope presented with a single global shape, there was evidence to suggest that

tropoelastin could assemble through multiple mechanisms. However, a paradigm

shift within the field has occurred over the page years through several parallel

studies. Extensive MD simulations revealed that tropoelastin is a highly flexible

protein that exists in a stochastic ensemble of conformations [61, 62]. Concur-

rently, mass spectrometry studies on a variety of organisms revealed that elastin

is heterogeneously cross-linked [34, 111], hinting that tropoelastin’s flexibility is

not only important for elastin’s mechanical resilience, but also plays a role during

coacervation and the positioning of molecules. Most recently, coarse-grained MD

revealed that tropoelastin initiates self-assembly by nucleation events, which occur

102

via a broad spectrum of interactions, including head-to-head, head-to-tail, tail-to-

tail and lateral interactions [78]. These interactions persist through to at least

the nascent fibril stage of elastogenesis. The current findings support the above

data, demonstrating that the previously identified cross-linking sites are capable

of interacting through six distinct types of association during the early stages of

assembly. This study unites the full-atomistic model with available data from a

variety of sources, and utilises three of tropoelastin’s most prevalent structures

from its conformational ensemble. Due to computational restraints this study was

restricted to examining dimers, however, it can be be speculated that the geome-

tries presented in this study would also persist throughout early stage assembly,

as indicated by similarly diverse MD simulations [62].

The precise head-to-tail conformation described by Baldock and colleagues was

not noted here, most likely due to the differences between the domain positions

of full-atomistic structure of tropoelastin [61, 62], and the SAXS/SANS envelope

and accompanying approximate domain locations [59]. However, structures resem-

bling the initial head-to-tail model were seen, which were termed head-to-middle

in this study, due to the N-terminus of the first tropoelastin nestling into the

groove in the second molecule’s middle N-terminal “coil” region. These indicate

a mechanism of tandem association similar to that proposed by the head-to-tail

model, and importantly, made up the majority of the interactions in this study,

suggesting that they are the dominant form of interaction within early stage as-

sembly. Nonetheless, the wide spectrum of conformations generated here indicates

that elastin assembly is more complex than previously appreciated. The variety

of associations seen here are likely to propagate fibril formation through a num-

ber of different commitment paths, thereby leading to highly complex branching

structures made up of multiple types of interactions and result in the downstream

heterogeneity described by cross-linking studies [34,111].

The disparity in tropoelastin association arising from native elastin and synthetic

coacervation studies has not yet been explored. Here, differences were noted

103

between the proportions of dimer types that arose between residues that were

selected from native elastin in comparison to synthetically cross-linked human

tropoelastin studies. Discrepancies in dimer associations between the two broad

categories of studies are likely to have arisen due to bias within the methodolo-

gies of the studies. Native elastin studies rely heavily on protein fragmentation

using proteases, and thus, it is possible that particular areas of elastin are cleaved

more readily, which could explain the high proportion of head-to-middle structures

observed from native studies. Meanwhile, the synthetic study from which inter-

active sites were obtained through BS3 cross-linking, a mid-length linker which

is longer than that of a bifunctional elastin cross-link, could have cross-linked

tropoelastin molecules that do not normally associate in vivo [110]. Moreover, as

synthetic studies are conducted using recombinant human tropoelastin, this un-

modified form of tropoelastin has positive charges on all lysines which may propel

it to associate in a predominantly tail-to-middle manner as indicated by the cur-

rent results. Interestingly, recent coarse-grained molecular dynamics simulations

of early stage tropoelastin aggregation also indicate an enrichment of association

between middle and tail regions of the molecule that was most likely facilitated

by interactions between the large hydrophobic domains in these regions, such as

domains 18 and 20 [78]. As this study was based on the sequence of recombinant

tropoelastin, it is possibly closer to the results of the synthetic studies than those

of the native studies, however, it is difficult to say as the dimer associations were

not quantified.

A further source of variance within the cross-linking studies discussed above is

that the origin of the native elastin in the studies the current protein docking was

based on was not only human in origin; interactions between domains 4-12, 6-14

and 12-27 were derived from bovine elastin [111], whilst the canonical domain 10-

19-25 cross-link is of porcine origin [53]. When taking tropoelastin’s sensitivity to

mutations [22,36,37,233] and modifications [247] into account within the context

of both global structure and coacervation, it is not difficult to assume that the

104

domains within tropoelastin from non-human sources are differently positioned

to those within human elastin, and thus, cross-linked regions from animal elastin

may not accurately reflect those within humans. However, despite differences in

sequences, it must also be noted that tropoelastin molecules across organisms must

share a number of commonalities as to allow them to assemble into elastin fibres of

similar morphologies, regardless of the animal. Thus, the results here are likely to

reflect a number of possibilities available to tropoelastin during coacervation.

Overall, the current data suggests that the geometry of the dimer conformation is

dictated by the conformation of the initial monomer. Thus, it was a fitting discov-

ery that machine learning could not accurately predict head-to-tail associations

based on energy and surface area terms alone. The introduction of domains into

the algorithm resulted in a marked increase in model sensitivity indicating the

importance of domain positioning and overall molecular geometry during coacer-

vation. This validates the approach of this chapter to examine dimer formation

using multiple starting conformation and simplifies further modelling and predic-

tion of association types based on cross-linking data.

105

Chapter 5

Interactions of tropoelastin withintegrins

This chapter has been submitted to the Biophysical Journal as:

Ozsvar, J., Wang, R., Tarakanova, A., Buehler, M. J., Weiss, A. S., “Fuzzy bind-ing model of molecular interactions between tropoelastin and integrin αvβ3”.

106

5.1 Introduction

Elastin is a key component of the mammalian extracellular matrix (ECM) that

imparts tissues with the ability to resist deformation during repeated stretch and

recoil cycles over the duration of an organism’s lifetime. The elastin polymer con-

sists primarily of its monomeric subunit, tropoelastin, which contains alternating

hydrophobic and hydrophilic domains. The hydrophilic domains are rich in ly-

sine and alanine residues (of which lysines are crucial for cross-linking) whilst the

hydrophobic domains mostly consist of repeats of glycine, proline, alanine and

valine [96]. Tropoelastin is involved in numerous cellular activities across differ-

ent cell types, including attachment, proliferation, chemotaxis, and differentiation

of fibroblasts, endothelial cells, smooth muscle cells, and multipotent progenitor

cells [248]. These interactions are largely facilitated directly between tropoelastin

and cell surface receptors including elastin binding protein [249], glycosaminogly-

cans [139], and integrins [121].

Integrins are a major class of cell surface adhesion receptor that are heterodimers

comprising of non-covalently bound variations of α and β subunits, of which there

are 24 known combinations in vertebrates [155]. Structurally, the integrin subunits

consist of extracellular, transmembrane, and intracellular domains. The extracel-

lular headpiece is responsible for ligand binding, where ligand-induced conforma-

tional changes result in the headpiece opening to facilitate an active ‘open’ confor-

mation that allows outside-in signaling via intracellular domains tethered to the

cell cytoskeleton [56]. When adhered to its ligand, integrins regulate crucial cell

functions including cell proliferation, angiogenesis, wound repair, developmental

signaling, immune responses, and tumorigenesis [250]. The diversity of specialized

biological processes regulated by integrins arises from a variety of unique subunit

pairings. An example of this specificity is through the RGD binding motif, often

found in ECM proteins such as fibronectin, vitronectin and laminin, which fa-

cilitates headpiece opening and signal propagation through large conformational

107

changes of αvβ3 [251] and α5β1 integrins [252, 253]. RGD-facilitated integrin ac-

tivation occurs at the interface between the β-propeller of the α subunit and βA

domain of the β3 subunit [152,153].

Integrins are of particular interest as they facilitate tropoelastin-based cell inter-

actions. Thus far, two regions within tropoelastin have been confirmed to bind in-

tegrins; the GRKRK sequence at the C-terminus binding with integrin αvβ3 [121],

and a second interactive site spanning the interface between domains 17 and 18

binding integrins αvβ3 and αvβ5 [123]. As tropoelastin does not contain the clas-

sical RGD sequence, nor any of the currently known ECM motifs recognized by

integrins, the sites on the integrin headpiece that interact with tropoelastin re-

main elusive. Moreover, the mechanisms for tropoelastin-based integrin activation

remain largely unknown. It is crucial to understand the nature of the tropoelastin-

integrin interaction, as the effects of cell attachment on tropoelastin have direct

consequences for elastic fiber and cell organization in three-dimensional tissue ar-

chitectures [254–257].

Here, a computational approach was undertaken to further explore the nature of

the interaction between tropoelastin and integrin αvβ3. Ligand-induced integrin

activation has been previously studied using computational molecular dynamics

(MD) modelling, and has shed light on the mechanisms of αvβ3 spontaneous acti-

vation and ligand-induced strain propagation in signaling [154, 258]. The confor-

mational changes of the extracellular domains, particularly within the headpiece

of αvβ3, have been further characterized through MD simulations with RGD-

containing fibronectin [259] and RGD derivatives [260]. To build upon this and

characterize the interactions between integrins and tropoelastin, this study has

leveraged the full atomistic structure of tropoelastin that was previously modelled

using extensive replica exchange molecular dynamics (REMD) simulations [61].

Although tropoelastin is a highly flexible protein, the computational model cor-

relates well with existing low-resolution structural data and possesses secondary

structural features in agreement with those indicated by circular dichroism and

108

molecular mutation studies [61]. These highlight the utility of REMD to effec-

tively predict the molecular structure of proteins with a high degree of disorder.

In this study, protein-protein docking was conducted between tropoelastin and

αvβ3, accompanied by extended REMD and structural refinement using classical

molecular dynamics (cMD), to dissect the interactions and the mechanism of the

subsequent tropoelastin mediated integrin activation.

5.2 Methods

5.2.1 Preparation of integrin headpiece structure

This study utilized the headpiece from the bent closed conformation of integrin

αvβ3 (Figure 5.1, A-B), which was extracted from the RCSB protein data bank

(PDB ID 1L5G) [152]. The headpiece consists of the β-propeller from the αv

subunit and the βA and hybrid domains from the β3 subunit. Importantly, the

αvβ3 headpiece contains the known receptor site for RGD ligands [261] which is

the origin of the molecular motions that activate the integrin.

Two separate integrin headpieces were modelled to respectively reflect medium

and low affinity binding states for comparative analysis. The affinity of the in-

tegrin for its ligand is regulated by the presence of divalent cations in the MI-

DAS, ADMIDAS and SyMBS pockets that are found close to the ligand binding

site [262]. The medium affinity headpiece was modelled by replacing all cations

with Mg2+ as this configuration is conducive to binding tropoelastin [123,123,262]

and is capable of facilitating spontaneous headpiece opening in silico [154]. The

low affinity headpiece was modelled by replacing all cations with Ca2+ (other than

at MIDAS, which was coordinated to Mg2+) as this configuration impedes com-

putational headpiece opening [154,258]. The medium and low affinity headpieces

were termed αvβ3-Mg and αvβ3-Ca respectively. Disulfide bridges were added be-

tween residues previously validated by crystallography to further prepare the αvβ3

109

headpiece for computational analysis [152].

5.2.2 Preparation of tropoelastin structure

Tropoelastin is a highly flexible molecule that undergoes extensive conformational

sampling [61, 62]. To increase the chance of obtaining structures that bind αvβ3

during both the protein-protein docking and MD stages, two structures from the

final 2 ns of previous simulations [62] were selected based on the accessibility of

the lysines in domains 17 for subsequent docking, as prior studies have indicated

their importance in tropoelastin-integrin interactions [123].

5.2.3 Tropoelastin-integrin configuration preparation

Initial configurations of the tropoelastin-integrin complex were generated from

αvβ3-Mg and tropoelastin using the High Ambiguity Driven protein-protein DOCK-

ing (HADDOCK) 2.2 web server [222]. HADDOCK was selected for this step as it

factors in interaction data, such those from mutagenesis experiments, in instances

where the precise sites of interaction are unknown. Such user defined data were

used to define active residues, which are residues that are thought to be primar-

ily responsible for the interaction. The active residues of tropoelastin that were

selected to drive the docking were K286 and K289 from domain 17. These were

chosen based on prior molecular truncation and peptide studies (including point

mutagenesis and scrambled sequences) that demonstrate their importance in bind-

ing to αv integrins [123]. The residues from the integrin that were selected were

Met118 from αv, and Thr136 and Thr183 from β3. These have been demonstrated

to stay in prolonged contact with a non-RGD coated surface in silico and flank

the RGD-binding region [263]. Thus, by manoeuvring tropoelastin close to the

integrin’s β1-α1 loop, I aimed to reduce the time and resources required for later

REMD. Tropoelastin was not docked directly to the RGD-binding region to al-

low tropoelastin to explore multiple conformations around the binding sites and

110

interact with the site during REMD.

HADDOCK defines passive residues as solvent accessible residues that are in the

vicinity of the active residues [222]. Their proximity to the active residues al-

lows them to contribute to the overall interaction by providing local biophysi-

cal/biochemical context for the interaction [264]. This allows HADDOCK to dock

structures based on a surface rather than a handful of residues, thus, is more

representative of protein-protein interactions. HADDOCK was allowed to auto-

matically define passive residues as those within 6.5 A of the residues mentioned

above. It is possible that this may have resulted in more noise during the dock-

ing stage, however, not selecting passive residues or overly restricting the surface

would have lead to high bias toward particular structures that may not have taken

the local surface context of the interactive sites into account.

2000 steps of initial rigid body docking were conducted, and the 200 most ener-

getically favorable structures were refined in aqueous solvent using brief MD runs.

The resulting tropoelastin-integrin structures were clustered based on their frac-

tion of common contacts [265]. Since no tropoelastin-integrin structure exists for

assessing the biological relevance of the docked structures, we relied on the HAD-

DOCK score to select the most energetically favourable structures for subsequent

REMD. The score is comprised of the weighted contributions of electrostatics,

buried surface area, restraints violation energy, van der Waals forces, desolvation

energy, and the root mean square deviation of atomic coordinate distances away

from the lowest energy structure.

5.2.4 Molecular dynamics modelling

The choice of MD simulation is non-trivial for large protein complexes and flexible

proteins alike. In practice, most MD simulations are not ergodic due to the time

requirements needed to escape minima within the energy landscape. Furthermore,

when accounting for tropoelastin’s high flexibility, it is necessary to sample a range

111

of tropoelastin-integrin structures to understand the underlying mechanisms of the

interaction. REMD [214], an accelerated sampling methodology, was implemented

here using NAMD 2.11 [186].

REMD is appropriate for exploring the conformational sampling of proteins as

it facilitates the crossing of energy barriers that are otherwise difficult to over-

come using cMD. REMD has been previously used to explore the ensemble of

tropoelastin structures to understand its conformational sampling [61, 62] as well

as examine the interactions between integrins and their ligands [263,266]. REMD

takes a single starting structure as its input and simulates it across a range of

temperatures in parallel, thereby giving rise to replicas of the structure at differ-

ent temperatures. Replicas may be exchanged with other replicas in neighboring

temperatures via Monte Carlo moves, permitting the structure to sample a greater

portion of its overall energy landscape [214]. Thus, the initial structure undergoes

a greater extent of conformational sampling with REMD than with cMD.

To implement REMD, a temperature distribution ranging between 280 – 480 K

was used with an exchange acceptance frequency (EAF) of 0.2 and an exchange

step of 5 ps to ensure adequate sampling. 48 replicas were sufficient to achieve

the EAF for the integrin headpiece in isolation, whereas 56 replicas were required

for tropoelastin-integrin simulations due to the greater number of atoms within

the protein complex. Non-bonded interactions were applied using an interaction

cut-off of 16 A, a switch distance of 14 A, and a pair-list distance list of 18 A.

The addition of water as a solvent to REMD systems vastly increases the amount

of required computational resources to run the simulation. Therefore, implicit

solvent was selected for REMD simulations, with a dielectric constant of 80, 0.15

M ion concentration, and an alpha cut-off of 15. Each replica was sampled for

8 ns to ensure a sufficient sampling depth [263]. The integrin headpiece and

the tropoelastin-integrin systems were simulated for a total of 384 ns and 448

ns respectively in REMD. The CHARMM22 force field with the CMAP peptide

backbone correction was selected for consistency with previous tropoelastin and

112

integrin simulations [61,62,154].

Ensembles each containing 400 structures were obtained from the last 2 ns of the

310 K temperature replica for each initial starting structure. These structures

were sorted into clusters using the k-means functionality of the MMTSB Tool Set

using a RMSD of 5 A [238]. The representative structures from the most populated

clusters were selected for subsequent explicit solvent analysis.

As water impacts the secondary structure of tropoelastin, structural refinement

in explicit solvent was conducted using the representative structures obtained

from the most populated cluster resulting from k-means analysis of each ensem-

ble. Systems were solvated using VMD [237] and ionized with 0.15 M NaCl to

mimic physiological conditions. The final solvent box per experiment contained ¿

200,000 water molecules with a padding distance of 20 A to prevent the proteins

from contacting themselves under periodic boundary conditions. All structures

were minimized for 25 ps using the conjugate gradient method before applying

harmonic constraints to the heavy atoms of the protein backbone and sidechains

to prevent the destabilization of the atoms during the initial simulation. The ex-

periments were heated up to 310 K, after which the harmonic constraints of the

sidechains were released, followed by the release of the backbone atoms to relax

the proteins. Next, the proteins were equilibrated for 100 ps in an isobaric simu-

lation, where a constant pressure of 1 atm was maintained using the Nose-Hoover

Langevin barostat with a period of 200 fs and decay of 100 fs. Langevin dynamics

were utilized to maintain a physiologically relevant temperature of 310 K, with a

damping coefficient of 1 ps. Non-bonded parameters were modelled using an in-

teraction cut-off of 12 A, a switch distance of 10 A and a pair-list distance of 13.5

A. Electrostatics were regulated by a Particle Mesh Ewald summation with a grid

spacing of 1 A. The isobaric equilibrations continued for 100 ns until convergence,

assessed through RMSD, was reached.

113

5.2.5 Analysis

Structural analysis consisted of two parts. Examination of ensembles arising from

REMD was conducted on the previously described 400 structures that arose from

the final 2 ns of simulation. Analysis of trajectories modelled using cMD was

carried out on the final 50 ns of simulation. Principal component analysis (PCA)

and local residue fluctuation analysis were carried out with Bio3D [267], ProDy

[217] and VMD [237]. The opening of the integrin headpiece was quantified by

calculating the distance between the centers of mass of residues 250 – 438 of the

αv β-propeller domain, and residues 55 – 108 and 354 – 434 of the β3 hybrid

domain. Other analyses of the structures resulting from both REMD and cMD

were conducted using custom Tcl/Tk, R, and MATLAB scripts.

5.3 Results

5.3.1 Docking of tropoelastin to integrin αvβ3

Protein-protein docking was conducted to generate tropoelastin-integrin complexes

for subsequent MD modelling. The bent closed configuration (Figure 5.1, A)

crystal structure of integrin αvβ3 was selected to examine whether the integrin

could change into the extended open configuration during MD, as this structural

change is important for its activation [259, 268]. The headpiece of αvβ3 (Figure

5.1, B) was separately docked to two conformations of tropoelastin, TE1 and

TE2 (Figure 5.1, C), that were derived from prior REMD simulations at 310

K [61, 62]. The structures that arose from the final refinement stage in water

were clustered and ranked according to the most favorable energies. The most

energetically favorable structures (Figure 5.1, D) from the top ranked clusters

(Supplementary Table 1) were selected for REMD. These starting structures were

termed αvβ3-TE1 and αvβ3-TE2, and had differing alignments of tropoelastin rel-

114

ative to the integrin (Figure 5.1, D).

Figure 5.1: Overview of the structures used in this study. A) The integrin activationpathway depicting the large-scale conformational changes involved in transitioning from theclosed to the open conformation. The headpiece subunits are highlighted with darker shades ofcolors. B) The headpiece of αvβ3 used in this study, depicting the αv (red) and β3 (blue) subunitsand the residues selected for facilitating docking (yellow). Divalent cations are displayed as pinkspheres. C) The conformations of tropoelastin used for docking and the locations of K286 andK289 from domain 17 (teal). The locations of the N- and C-termini of the polypeptide chainare denoted with N and C. D) αvβ3-TE1 and αvβ3-TE2, which were ranked as the top outputstructures from docking, were used as the starting structures for REMD.

5.3.2 Integrin headpiece opening and associated structural

changes with REMD

REMD in implicit solvent was implemented to generate an ensemble of structures

for αvβ3-Mg, αvβ3-Ca, αvβ3-TE1 and αvβ3-TE2. The probability distribution of

the potential energy of each replica overlapped with its neighbor across the tem-

perature range of 280 – 480 K (Figure 5.2). The overlap is particularly crucial as

it allows replica exchange with its neighbor to efficiently facilitate conformational

sampling [214]. It was noted that each ensemble contained a wide variety of struc-

tures across the temperatures. For example, the tropoelastin-integrin ensembles

115

included instances where αvβ3 and tropoelastin were not interacting, as well as

denatured structures that arose at the higher temperatures. This indicated that

the initial structures had undergone extensive structural sampling, which is a key

requirement for ensemble analysis.

Figure 5.2: Potential energy frequency distribution of the 56 replicas across theensemble of αvβ3-TE2. The structures above the distribution represent the various observableconformations during REMD and the exact replica that gave rise to each structure.

The major structural consequence of ligand binding to the integrin headpiece is

the swinging out of the β3 hybrid domain into an open conformation (Figure

5.1, A). To quantify this movement, the distance between the centers of masses

(COM) of the αv and β3 subunits was calculated in the structures within the

ensembles [153,154,263]. As the mechanistic opening of the hybrid domain is con-

sistent across β3 integrins, the open αIIbβ3 structure (PDB ID 2VDR) was used

as a benchmark for identifying an open configuration of αvβ3. It was observed

that the low affinity structure (αvβ3-Ca) was as conducive to headpiece opening

as the medium affinity structure (αvβ3-Mg) using REMD, with 39% and 40% of

structures respectively observed to be fully open when compared to the αIIbβ3

benchmark (Figure 5.3, A). When tropoelastin was introduced into the simula-

tions, αvβ3-TE1 shifted toward unopen and partially open conformations and the

percentage of open structures decreased to 5% (Figure 5.3, B). In comparison,

116

αvβ3-TE2 shifted towards more open conformations and the percentage of open

structures increased to 60%, suggesting that the TE2 configuration of tropoelastin

was more conducive to headpiece opening than TE1.

A structural change associated with headpiece opening is the merging of the α1

and α1’ helices into an elongated α1 helix, which has been demonstrated to main-

tain helicity in both the open αIIbβ3 crystal structure [153] and computational

modelling [154, 258]. Here, it was observed that the α1 helix maintained full he-

licity more frequently in αvβ3-Mg than in αvβ3-Ca (Figure 5.3, C), which was

expected due to the propensity of Mg2+ to promote headpiece opening of αvβ3 over

Ca2+ [262]. Interestingly, the frequency distribution of α1 helicity of αvβ3-Mg was

broader than that of αvβ3-Ca, suggesting that the α1 helix in the presence of Mg2+

was capable of sampling a wider structural range. A comparison of α1 helicity

within the two tropoelastin-integrin ensembles revealed that α1 maintained less

helicity relative to either of the integrin ensembles without tropoelastin, as ¡ 1%

of structures arising from αvβ3-TE1 and αvβ3-TE2 respectively displayed a fully

helical character in either simulation. However, the α1 helix of αvβ3-TE2 main-

tained 75% helicity within 53.8% of the structural ensemble, compared to only

37.5% of αvβ3-TE1. The α1 helix also goes from a bent to a straight alignment,

as seen in αIIbβ3 (Figure 5.3, C). This was most closely resembled by αvβ3-Mg,

followed by αvβ3-Ca.

117

Figure 5.3: Integrin headpiece opening from the last 2 ns of REMD. Frequencydistribution of the opening of the αvβ3 headpiece A) in isolation and B) with tropoelastin.Headpiece opening is quantified by the COM between the two integrin subunits. The dotted lineat 64 A indicates the threshold for the open conformation as derived from αIIbβ3. Complexesfrom the tropoelastin-integrin ensembles, depicting the αv (red) and β3 (blue) subunits andtropoelastin (gray), are shown to illustrate the degree of headpiece opening. C) The frequencydistribution of α1 helicity across the experiments. Representative snapshots of the α1 helix fromthe αIIbβ3 and all four experiments are overlaid onto the starting configuration of the α1 helixin grey. Across all frequency distributions, αvβ3-Mg is red, αvβ3-Ca is blue, αvβ3-TE1 is yellowand αvβ3-TE2 is green.

118

5.3.3 Areas of tropoelastin-integrin interaction

Protein-protein contact maps are frequently used to pinpoint interactive sites

by assessing the proximity of the residues of one protein to the residues of an-

other. Contact frequency maps were constructed here to understand which areas

of tropoelastin were in close proximity to αvβ3. Contact was defined as the Eu-

clidean distance ≤ 12 A between α carbons of tropoelastin and the integrin head-

piece [269]. As structural ensembles were generated via REMD, the frequency of

contact between each α carbon was also considered.

Both TE1 and TE2 contacted the β3 subunit more frequently than the αv subunit,

however, the αv subunit was contacted only by TE1 and not by TE2 (Figure 5.4,

A-B). It was observed that the C-terminal domains of TE1 maintained close prox-

imity to αv, but did not contact D224, which is the primary RGD binding residue

of αv (Figure 5.4, A). I focused further on the interactions between tropoelastin

and β3, as it is primarily ligand binding to β3 that promotes headpiece opening.

I observed that high-frequency (>70%) contact hotspots occurred between TE1

and β3 through domains 13, 17 – 19 and 26 (Figure 5.4, B). Meanwhile, β3

was frequently contacted by domains 7, 8, 10, 12, 16, 17, 20, 26 and 36 of TE2

(Figure 5.4, B). Multiple domains were capable of contacting the integrin in a

single structure, usually in conjunction with domain 17 (Figure 5.4, C). More of

TE2’s domains contacted αvβ3 at any given time than did TE1, due to its position

with respect to the integrin.

119

Figure 5.4: Frequency contact maps of between the integrin subunits and tropoe-lastin. The frequency of contact between the tropoelastin and A) αv subunit and the B) β3subunit. Greater contact frequency throughout the ensemble is indicated by the transition fromwhite to black. The schematic at the top represents the domains of tropoelastin, where the hy-drophobic domains are depicted in black and the cross-linking domains are depicted in white. C)Structures from REMD ensembles of αvβ3 -TE1 (left) and αvβ3 -TE2 (right) highlight multipletropoelastin domains (numbered) that contact the integrin.

To understand whether tropoelastin shared the same binding site within β3 as

120

RGD, the contacts between the cell-interactive domain 17 and the RGD binding

site of β3 were dissected in further detail. The RGD binding site is located at the

β1-α1 loop of β3 and two of its residues, Y122 and S123, are responsible for binding

RGD’s aspartate [153]. Whilst TE1 predominantly contacted the β1-α1 loop close

to the RGD binding site, TE2 almost exclusively contacted the downstream α1

helix (Figure 5.5, A). To pinpoint the nature of the interactions between these

domains, the examination of salt bridge formation demonstrated that D126 from

the end of the β1-α1 loop was capable of interacting with K289 (6.5% of structures)

and K286 (3.3% of structures) from TE1 and TE2 respectively (Figure 5.5, B).

It was also noted that hydrogen bonds occurred in both ensembles, with less

occurring in αvβ3-TE2 than in αvβ3-TE1 (Figure 5.5, C-D). Within αvβ3-TE1,

the tropoelastin residues A285, K286 and 288 were responsible for the majority of

the hydrogen bonds with β3 (Figure 5.5, C). Although the RGD-binding Y122

stayed in close proximity to TE1’s domain 17 (Figure 5.5, A), the small number

of hydrogen bonds indicated that Y122 was not heavily involved in binding domain

17 (Figure 5.4, C). Similarly, S123 did not form hydrogen bonds with domain

17 of TE1 (Figure 5.5, C) despite maintaining close proximity (Figure 5.5, A).

Within αvβ3-TE2, tropoelastin residues V272 and K286 formed the most frequent

hydrogen bonds with β3 (Figure 5.5, C). A greater number of alanines from

TE2 participated in hydrogen bonds relative to those from TE1, and K289 was

not observed to form any bonds with β3.

121

Figure 5.5: Interaction between domain 17 of tropoelastin and the β1-α1 loop/α1helix of integrin αvβ3 within the αvβ3-TE1 and αvβ3-TE2 ensembles. A) Frequencycontact maps between domain 17 and the β1-α1 loop/α1 helix. Greater contact frequencythroughout the ensemble is indicated by the transition from purple to yellow. B) Salt bridgeformation between the lysines from domain 17 with the β1-α1 loop/α1 helix. C) Number ofhydrogen bonds between domain 17 of TE1 and TE2 and the β1-α1 loop/α1 helix. D) Snapshotof the tropoelastin-integrin complexes where domain 17 and the β1-α1 loop/α1 helix interact.Inset: detailed zoom of the molecular interactions between R697 and K696 from tropoelastinand E174 from αvβ3, where R697 forms a hydrogen bond whilst K696 is engaged in a salt bridge.

A second area of tropoelastin that interacts with αv integrins is the GRKRK se-

quence from domain 36 [121]. Here, domain 36 of TE1 primarily interacted with

the αv subunit (Figure 5.6, A) and did not remain in close proximity to β3

(Figure 5.6, C). In contrast, TE2’s domain 36 maintained a contact frequency >

40% with E174 from the α1-α2 loop of β3 (Figure 5.3, A) through salt bridges

involving both lysines and arginines in the GRKRK sequence (Figure 5.6, B).

122

The two dominant salt bridges formed with E174 were via K696 and R697, and

occurred within 16% and 25% of the structures respectively. Multiple hydrogen

bonds also occurred (Figure 5.6, C), with the most frequently occurring hydro-

gen bond within 24% of structures was between E174 with R697 (Figure 5.6,

D).

It was also noted that domain 20 of TE2 contacted the α1 helix (Figure 5.7,

A), which was unexpected as domain 20 was neither docked to this region, nor

does it contain lysines. In particular, β3 residues 130-133 appeared to maintain

a contact frequency > 80%. A number of short-lived hydrogen bonds occurred

between these regions (Figure 5.7, B), predominantly through V363 and Q132,

and G366 and R143 (Figure 5.7, C). As Q132 and R143 also formed hydrogen

bonds with domain 17 (Figure 5.5, C), this suggested that they are of importance

for interaction with tropoelastin.

123

Figure 5.6: Interaction between domain 36 of tropoelastin and the α1-α2 loop ofintegrin αvβ3 within the αvβ3-TE2 ensemble. A) Frequency contact map between domain36 and the α1-α2 loop. Greater contact frequency throughout the ensemble is indicated by thetransition from purple to yellow. B) Salt bridge formation between the lysines from domain 36and the α1-α2 loop. C) Number of hydrogen bonds between domain 36 and the α1-α2 loop.D) Snapshot of the tropoelastin-integrin complex where domain 36 and the α1-α2 loop interact.Inset: detailed zoom of the molecular interactions between R697 and K696 from tropoelastinand E174 from αvβ3, where R697 forms a hydrogen bond whilst K696 is engaged in a salt bridge.

124

Figure 5.7: Interaction between domain 20 of tropoelastin and the β1-α1 loop/α1helix of integrin αvβ3 within the αvβ3-TE2 ensemble. A) Frequency contact map betweentropoelastin and αvβ3. Greater contact frequency throughout the ensemble is indicated by thetransition from purple to yellow. B) Number of hydrogen bonds between domain 20 and theα1 helix across all structures from αvβ3-TE2. C) Snapshot of the tropoelastin-integrin complexwhere domain 20 and the α1 helix interact. Inset: detailed zoom of the molecular interactionsbetween the most frequently occurring hydrogen bonds from V363 and G366 from tropoelastinand Q132 and R143 from αvβ3.

5.3.4 Principal component analysis

Principal component analysis (PCA) was employed to understand the structural

variance of the tropoelastin-integrin ensembles. PCA transforms the Cartesian

coordinates of atoms into new sets of coordinates termed principal components

125

(PCs) that describe a proportion of the variance [217]. PCs are calculated such

that each subsequent PC describes less variance than the previous PC until all

the variance has been accounted for. Thus, the top PCs are the most useful in

describing the ensembles, as they account for the most variation.

Figure 5.8: Principal component analysis (PCA) of the REMD tropoelastin-integrinensembles. A) The variance of all PC modes of αvβ3-TE1 and αvβ3-TE2 and B) the correlationsbetween the respective top six PC modes. C) Projections of the combined top six PCA modesof αvβ3-TE1 and αvβ3-TE2 onto the respective starting structures of tropoelastin TE1 (orange)and TE2 (green). Black arrows indicate the dominant modes of structural variation. D) Thesquare fluctuations of tropoelastin and E) the β3 subunit within αvβ3-TE1 and αvβ3-TE2.

Presently, PC1 accounted for 34% and 52% of the variance of αvβ3-TE1 and αvβ3-

TE2 respectively (Figure 5.8, A). The scree plot of the PCs showed a milder

slope for the first three PCs of αvβ3-TE1 compared to αvβ3-TE2, which displayed

a large drop between PC1 and PC2 (Figure 5.8, A). This indicated a dominant

type of structural variation, or rather, a preferred set of structures within αvβ3-

TE2.

126

Figure 5.9: Cluster analysis of αvβ3-TE1 and αvβ3-TE2. A) Clusters resulting fromk-means analysis of each of the 400 structures per REMD ensemble. Projection of the clus-ters resulting from B) αvβ3-TE1 and C) αvβ3-TE2 onto PC1-PC2-PC3 space. Each cluster isrepresented by a unique color.

The structural variation indicated by PCA was independently verified by k-means

analysis, which clustered the structures based on root mean square deviation

(RMSD) of atomic coordinates using a 5 A cut-off. K-means analysis resulted

in different cluster distributions between the two tropoelastin-integrin ensembles

(Figure 5.9, A). The structures were more evenly distributed throughout the

clusters of αvβ3-TE1, whereas αvβ3-TE2 displayed one highly populated cluster

that contained > 40% of the structures analyzed (Figure 5.9, A). This distribu-

127

tion shape was similar to the PC scree plot, where a larger amount of structural

variance was accounted for by PC1 of αvβ3-TE2 relative to αvβ3-TE1 (Figure

5.9, A), indicating consistency between PCA and k-means analysis.

The distribution of the clusters within the space described by the first three

PCs was next examined to understand how well the PCs described the k-clusters

(Figure 5.9, B-C). Although the majority of clusters were separated along the

PC1-PC2-PC3 axes, some of the sparsely populated clusters were unable to be

discretized. This most likely occurred because the sum of the top three PCs did

not account for enough structural variation for separation. As the majority of

the structural variation was accounted for by the sum of the top six PCs of αvβ3-

TE1 (94%) and αvβ3-TE2 (96%), the top six PCs were used for subsequent PCA

derived analyses.

To determine whether any structural similarities existed between the ensembles,

the correlations between the top six PCs were inspected (Figure 5.9, B). The

highest correlation observed was 42%, which was between PC3 of αvβ3-TE1 and

PC1 of αvβ3-TE2. The PCs within tropoelastin corresponded to structural sam-

pling around the N-terminal region (Figure 5.10). Correlations between the other

PCs were even lower, indicating structural deviation between the ensembles, and

were not investigated in further detail.

128

Figure 5.10: PC modes of tropoelastin from αvβ3-TE1 and αvβ3-TE2. PC modes 3and 1 are overlaid onto TE1 and TE2 respectively.

Tropoelastin’s flexibility has been implicated in a number of processes, includ-

ing receptor binding [62]. To investigate the type of structural variance that

tropoelastin underwent during REMD, the linear combination of the top six PCs

weighted by their contribution to the overall variance were considered. It was

observed that the variation of TE1’s N-terminus predominantly consisted of a

downward shift along the x-y axis towards the C-terminus (Figure 5.8, C). The

C-terminus of TE1 shifted upward in an x-y direction and was accompanied by a

twist along the y-z axis. The structural variance of TE2’s N-terminus consisted

of downward motions along the x-y axis toward the ‘spur’ region, whilst the C-

terminus mostly sampled a scissors-twist motion. As the motions of a protein and

its possible structural variants are highly dependent on its shape, the variations

between TE1 and TE2 are indicative that the shape of tropoelastin impacts its

ability to interact with integrins.

To pinpoint the precise domains of tropoelastin that underwent higher local struc-

tural variance, the square fluctuation profiles of tropoelastin’s amino acids were

constructed from the top six PCs (Figure 5.8, D). This revealed that, whilst

the fluctuation of domains 1-12 was similar, as depicted by the superimposed fluc-

129

tuation profiles, domain 13 and onwards displayed distinct patterns between the

ensembles (Figure 5.8, D) and corresponded to the regions that contacted αvβ3

(Figure 5.3, A-B). Fluctuation peaks occurred at domains 13, 17, 19 and 20

within αvβ3-TE1, where domains 13, 17 and 19 were previously noted to contact

the integrin headpiece (Figure 5.4, B). Domains 13, 17, 19 and 20 underwent

relatively less fluctuation in αvβ3-TE2, and formed troughs of minimal fluctuation

in the cases of domains 13, 19 and 20 (Figure 5.8, D). Considering the high

frequency of contact between these domains and the integrin headpiece (Figure

5.4, B), and the greater extent of headpiece opening seen within the αvβ3-TE2

(Figure 5.3, B), this suggests that the lower fluctuation, or rather, the stability

of these domains is required for headpiece opening.

Similarly, the square fluctuations of the β3 subunit within the tropoelastin-integrin

ensembles were also assessed (Figure 5.4, E). The areas that formed peaks within

β3 were similar between the two ensembles, and included the termini and the α1

and α7 helices. The elevated fluctuation at the termini corresponds to structural

variation at the hybrid domain. As the hybrid domain swings out during headpiece

opening (Figure 5.8, B), the greater fluctuation of αvβ3-TE2 compared to αvβ3-

TE1 is in agreement with the greater number of open headpiece conformations

within αvβ3-TE2 (Figure 5.3, B). Similarly, the fluctuations of both the α1 and

α7 helices were also higher in αvβ3-TE2 (Figure 5.4, D).

5.3.5 Headpiece opening remains stable in explicit sol-

vent

Next, cMD was conducted to examine whether the representative structures from

the ensembles maintained their structures within aqueous solvent. The plateau

in the RMSD of αvβ3 over time was used to denote structural equilibration. The

headpieces equilibrated within 100 ns, with the headpiece of αvβ3-Ca and αvβ3-

TE1 proving more stable in comparison αvβ3-Mg and αvβ3-TE2 (Figure 5.11,

130

A).

Next examined was whether the structures maintained their degree of headpiece

opening during cMD. The center of mass of the opening of the integrin head-

pieces remained relatively stable during equilibration as shown by the standard

error (Figure 5.11, B), as was expected considering the structural equilibration

previously noted (Figure 5.11, A). αvβ3-Mg displayed the greatest amount of

opening, followed closely by αvβ3-TE2, which was almost as open as the αIIbβ3

reference structure (Figure 5.11, B). In comparison, αvβ3-Ca maintained an in-

termediately open structure, whilst the opening of αvβ3-TE1 was almost that of

the initial closed αvβ3 structure.

Figure 5.11: Classical molecular dynamics equilibration of integrin structures. A)RMSD of the integrin headpiece in each simulation. B) Integrin headpiece opening over the last50 ns of simulation. Number of hydrogen bonds formed between domain 17 and the α1 helix/β1-α1 loop of C) αvβ3-TE1 and D) αvβ3-TE2. E) α-helical content of domain 17 from cMD andREMD. F) αvβ3-TE1 and αvβ3-TE2 structures arising from 100 ns of cMD equilibration.

131

The α-helical content of domain 17 was examined to establish whether tropoe-

lastin’s secondary structure played a role in its interaction with αvβ3, as the sec-

ondary structure of cross-linking domains impacts tropoelastin function [58]. The

α-helicity of domain 17 observed in TE1 and TE2 differed depending on the sam-

pling methodology (Figure 5.11, C). Low α-helicity (< 11%) was noted during

sampling with REMD, whereas cMD yielded higher (> 44%) helicity. The in-

creased α-helicity of domain 17 during cMD altered the interactions between αvβ3

relative to those previously derived from REMD. The hydrogen bond that formed

between A285 and K125 during REMD was prolonged during cMD, and a new

bond was formed between A287 and W129 that had not been observed during

REMD (Figure 5.11, D, F). The hydrogen bond duration of K286 and K289

was short (< 2%), which indicated that lysines were less likely to bind the integrin

when domain 17 was α-helical. As αvβ3 remained closed in the presence of TE1

(Figure 5.11, B) despite the formation of the new bonds, it is likely that the

α-helical configuration of domain 17 does not promote headpiece opening. The hy-

drogen bonds between TE2 and αvβ3 became sparser during cMD than in REMD

and were of short duration (< 2%) (Figure 5.11, E). Indeed, visual inspection

indicated that domain 17 of TE2 moved away from the α1 helix during cMD. In-

terestingly, αvβ3 remained open despite the vacating of domain 17 (Figure 5.11,

B, F), thus, I propose that the binding of other domains stabilizes the headpiece

such that it does not immediately close once domain 17 no longer binds the active

site.

5.4 Discussion

The interaction between tissue ECM and cell surface receptors is mediated by the

recognition of various short sequences within ECM proteins. Since the sequences

within tropoelastin bear no resemblance to any known integrin-binding ECM mo-

tifs, a series of peptide based studies that were previously undertaken successfully

132

narrowed down the field of possibilities to two integrin-binding sites at domains

17 and 36 [121–123]. However, the full characterization of these interactions was

hampered because the precise interactive site within the integrin that mediates

the interactions was unknown, and a full atomistic tropoelastin-integrin complex

was unavailable for structural analysis. Thus, the current study leveraged the

recent full atomistic computational molecular model of tropoelastin [61, 62] and

the crystal structure of integrin αvβ3 to probe their interaction in silico.

Prior computational studies utilized short cMD simulations to examine the extent

to which fibronectin fragments opened the integrin headpiece [154, 258]. These

simulations resulted in partially open headpieces, strongly suggesting that cMD

does not sufficiently sample the integrin’s conformational landscape for opening

to be observed within the timeframe of the simulation. This is unsurprising for

large systems the size of the integrin headpiece (let alone when in the presence of

a protein as large and flexible as tropoelastin) as large domain motions, such as

full headpiece opening, are generally only observable on a microsecond timescale

using cMD [270]. To address this, headpiece opening was approached from a

probabilistic viewpoint. I successfully employed REMD to generate a variety of

possible structures and examined whether the distribution of headpiece opening

depended on the type of ions present and the inclusion or absence of tropoelastin

structures. To the best of my knowledge, the extent of headpiece opening here

was greater than that of prior cMD simulations [154,258,271], demonstrating the

utility of REMD for exploring domain-level conformational changes within large

protein-protein complexes within a feasible amount of time and resources. The rep-

resentative structures resulting from REMD were confirmed to stabilise over 100

ns of cMD. When factoring in the nature of cMD, this indicated that a local min-

imum energy structure was reached in each case, highlighting the need for REMD

for broad conformational exploration of large protein-protein complexes.

As tropoelastin is a highly flexible protein that is capable of inhabiting a variety of

conformations [62], I first asked whether its interaction with αvβ3 is conformation

133

dependent. The TE2 conformation of tropoelastin was found to promot a greater

proportion of open integrin headpiece structures compared to TE1 during REMD,

which indicated that TE2 was the preferred binding conformation. However, since

TE1 binding to αvβ3 did not abolish integrin headpiece opening and resulted in a

high population of partially open structures, it could not be ruled out that TE1

did not induce headpiece opening. That two distinct conformations of tropoelastin

promoted varying degrees of headpiece opening via separate integrin sites indicates

unconventional binding mechanisms. As a highly flexible protein with domain-

dependent intrinsic local disorder [44,62], it is possible that tropoelastin binds its

receptors in a fuzzy manner, which is the way in which fully and partly disordered

proteins interact with their binding partners [272,273]. Since the fuzzy behavior of

both the hydrophobic domains [32, 51, 52, 274] and the cross-linking domains [58]

of tropoelastin during self-assembly have been observed, it is conceivable that

this fuzzy behavior is also present during receptor binding. The formation of

multiple transient electrostatic bonds and variable structural fluctuations within

tropoelastin during integrin binding observed here are also consistent with fuzzy

binding [272, 273], however, further studies need to be undertaken to confirm

this.

In addition to the tropoelastin-integrin complexes, αvβ3 was examined in its

medium and low affinity binding structures. Little difference was found between

the proportion of open structures, which is in contrast to prior computational stud-

ies [154,258]. As previous studies utilized cMD rather than REMD, it is likely that

the lack of headpiece opening in prior studies was due to limited conformational

sampling. Indeed, it displayed less movement in the current cMD experiments

relative to the broader sampling of REMD. Further contributions to the similari-

ties in headpiece opening seen here between the integrin-only ensembles may have

also arisen from the lack of integrin thigh domains that normally restrain head-

piece movement. These were not included to maximize computational resources,

however, the lack of these constraints coupled with REMD may have assisted in

134

facilitating broader conformational sampling than that previously observed in sil-

ico [154, 258]. This suggests that the αvβ3 headpiece is capable of opening in the

presence of both Mg2+ and Ca2+, and sheds light onto why cells are still able to

bind tropoelastin at a low level in the presence of calcium in vitro [121–123].

It has been hypothesized that tropoelastin binds integrins unconventionally due to

its lack of known interactive sequences and the inability of RGD peptides to block

tropoelastin-cell binding [121]. As such, the interactions between tropoelastin and

αvβ3 were examined at the residue level. In particular, I focused on domain 17 due

to its integrin binding capabilities [121–123]. It was observed that within the TE2

configuration, domain 17 bound the α1 helix rather than the β1-α1 loop, which

is downstream from the RGD binding site [153]. As TE2 facilitated the largest

amount of headpiece opening, this suggests that the preferred tropoelastin binding

site is the α1 helix. Conventionally, the binding of the β1-α1 loop to RGD causes

slight atomic shifts that result in subsequent conformational changes and head-

piece opening. Thus, if the preferred tropoelastin structure binds the α1 helix, this

raises the question of the nature of the allosteric pathway through which headpiece

opening is elicited. Both crystallographic and MD studies demonstrate that the

joining of the α1 and α1’ helices, and the subsequent movements of the α7 helix are

key components of the headpiece opening mechanism [153,154,253,258,268,275].

The current results are in alignment with this, as the REMD tropoelastin-integrin

ensemble that contained more open configurations, αvβ3-TE2, also maintained

greater α1 helicity relative to αvβ3-TE1. The greater fluctuation of the α7 helix

and its distance from the α1 helix across the ensembles suggest that domain 17

promotes headpiece opening through a similar pathway to that of RGD ligands,

albeit, via another nearby site. This is the first atomistic scale evidence to propose

that tropoelastin’s primary binding site is not the RGD-binding site on αvβ3.

Lysines are thought to be key mediators of the tropoelastin-integrin interaction,

as both of tropoelastin’s currently known interactive sites contain lysines [121,

122] and that cell binding is greatly reduced after lysine point mutations [123].

135

However, the nature of these interactions was not fully explored due to the lack

of a high-resolution structure of tropoelastin at the time. Using the full atomistic

model of tropoelastin, I demonstrated that the lysines of tropoelastin are involved

in a variety of interactions with multiple integrin residues. The lysines of domain

17 formed salt bridges with D126 from the β1-α1 loop of the integrin. The role

of D126 during ligand binding is not completely clear, however, it is capable of

losing and regaining coordination to ADMIDAS throughout the stages of headpiece

opening during RGD binding [153]. As the loss of the D126-ADMIDAS bond was

not observed here despite the interactions between D126 and the lysines of domain

17, K286 and K289, this hints that the role of the salt bridges is to anchor domain

17 to the integrin to form subsequent bonds. In addition to the D126 salt bridge,

the lysines of domain 17 formed hydrogen bonds with various β3 residues of the

β1-α1 loop and α1 helix depending on the initial tropoelastin structure. The

transiency of these bonds and interchangeability between the binding partners

is consistent with the numerous short-lived interactions that occur during fuzzy

binding [276,277].

In addition to confirming the involvement of lysines, the present findings corrob-

orate in vitro evidence that tyrosines are not crucial for tropoelastin’s interaction

with αv integrins [123]. The sole tyrosine within domain 17 did not contact the

interactive site within αvβ3-TE2 and accounted for only a small fraction of the

bonds within αvβ3-TE1. Regarding the involvement of other residues, a receptor

binding role was identified for the arginines of domain 36, as they made up a

substantial proportion of the hydrogen bonds observed between domain 36 and

αvβ3. This was interesting, as arginines have been primarily examined within the

context of tropoelastin’s structural stability [36]. Considering the results from this

study in light of the cell binding ability of the C-terminal GRKRK sequence [121]

and its high conservation across mammals [245], I propose that the arginines of

domain 36 are of functional significance in integrin binding.

Although lysines are important for the tropoelastin-integrin interaction [123], ma-

136

ture elastin is extensively cross-linked via its lysines [34, 111] and, thus, contains

less unmodified lysines available to bind integrins. Although domain 17 of tropoe-

lastin binds integrins, its participation in elastin-integrin interactions is less clear

as it is capable of participating in cross-links [34, 111] . This study is the first to

provide evidence for the involvement of a non-cross-linking region, domain 20, in

tropoelastin-integrin interactions, and is further corroborated by contacts noted

here between other non-cross-linking areas such as domains 16 and 26. As tropoe-

lastin’s largest domain, domain 20 has a greater solvent exposed surface area rela-

tive to other non-cross-linking domains [78], which renders it readily available for

cell receptor contact. Additionally, domain 20 forms part of tropoelastin’s hinge

region [47, 59] which regulates molecular flexibility [22], and as such, is likely to

have a key role in facilitating the conformational sampling necessary for cellular

interactions. This flexibility may, in part, be preserved here by the short-lived

hydrogen bonding between the backbones tropoelastin’s small non-polar residues

and the integrin. Indeed, it has previously been suggested that backbone hydrogen

bonding may be a mechanism for the extension and recoil of elastomeric peptides

of similar composition to domain 20 [278]. Furthermore, the interaction of domain

20 and αvβ3 is significant, as much emphasis has been previously placed on the

involvement of cross-linking domains in tropoelastin-integrin interactions. The

ability of tropoelastin’s hydrophobic domains to bind integrins may explain how

elastin can interact with cells if its lysines are unavailable for binding. Future

studies should test domain 20 in isolation to verify that it is capable of cell bind-

ing, as this may provide avenues for investigating the cell interactive properties of

hydrophobic domains.

The local and global structural sampling of tropoelastin have been of increasing

interest to extrapolate its behavior in the context of self-assembly [22,61,61,247].

Here, similar principles were applied to examine the structural sampling of tropoe-

lastin when bound to αvβ3. PCA demonstrated that TE1 and TE2 underwent

dissimilar modes of local structural sampling during REMD due to decreases in

137

atomic fluctuation at the majority of tropoelastin sites that bound, or were in

close proximity to, the αvβ3. The local fluctuations of TE1 seen here deviated

from tropoelastin in isolation, whilst TE2 appeared to maintain an overall fluctu-

ation pattern that resembled that of isolated tropoelastin [62]. This indicates that

TE2 bears a greater resemblance to the most representative structure of tropoe-

lastin even when bound to the integrin. This likeness was further confirmed by

examining the global motions of the molecules, where the presence of a C-terminal

scissors-twist motion was noted within TE2, but not TE1, that has been previ-

ously described [22, 61, 62]. In the context of receptor interactions, it is possible

that the scissors-twist facilitates the increased frequency of contact of C-terminal

domains 30-36 as observed here, which could be important for stabilizing this in-

teraction if tropoelastin is indeed a fuzzy binder. Its preservation here certainly

indicates that it is of importance for the interaction with αvβ3, as the intrinsic

motions accessible to ligands and their receptors are primary drivers of protein

binding [279].

The tropoelastin monomer contains high random coil content, imbuing it with

flexibility to undergo the appropriate conformational sampling required for self-

assembly [32]. A notable structural change that occurs as a consequence of cross-

linking is the increased α-helicity of the cross-linking domains [52,58]. This change

is a requirement for the formation of desmosine cross-links, which only occur if

the lysines on the face of the α-helices are aligned correctly [53]. The formation of

α-helices during cMD here was not favorable for the interaction with αvβ3, which

is intuitive as the main purpose of α-helices within tropoelastin is to become

rigid and cross-link rather than interact with receptors. The sampling of multiple

bonds between tropoelastin and αvβ3 appears to be key for their interaction, as

evidenced by this study’s comparison between REMD and cMD, and is achievable

when the interactive regions are not rigid. This builds on prior simulations that

noted the high structural sampling of domain 17 [62], and strongly implies that

tropoelastin needs to continue sampling conformational space to elicit integrin

138

headpiece opening via its cross-linking domains.

Overall, both the REMD and cMD data support a model where tropoelastin re-

quires multiple contact sites throughout the integrin headpiece to elicit biological

functionality. Comparable use of multiple binding sites is seen for fibronectin-

integrin interactions; fibronectin comprises non-RGD synergy regions that coop-

erate to promote strong receptor binding [280, 281]. Further evidence for such

synergy sites in tropoelastin comes from tropoelastin peptide studies that found

progressively less cellular interactions with sequential truncations of the tropoe-

lastin sequence [122]. On this basis, I propose that no single domain is wholly

responsible for facilitating tropoelastin-cellular responses, and that instead mul-

tiple defined sites on tropoelastin cooperate to bind the integrin. Future studies

could incorporate a wider range of tropoelastin conformations to examine the full

array of domains capable of interacting with the integrin at its binding site as

well as at other regions to establish the importance of these additional points of

contact.

139

Chapter 6

Discussion

140

6.1 General discussion

This thesis builds up a series of computational models that explore tropoelastin’s

sequence, structure and mobility with respect to its functionality. This thesis

utilises tropoelastin’s recently derived full-atomistic structure to conduct these

studies, whilst additionally leveraging prior biological experiments to shed light

onto aspects of tropoelastin’s behaviour that have remained unexplored due the

incompatibility of current structural methodologies with tropoelastin’s flexibil-

ity.

6.2 Allysine modifications and their implication

for self-assembly

The structure of WT tropoelastin has been explored in detail using experimen-

tal [36,37,59,233] and MD methodologies [22,61,62], several of which noted that

its structure and intrinstic motions are easily perturbed by a variety of mutations.

Furthermore, such changes have repercussions on the macromolecular scale, ex-

hibiting altered coacervation and biomaterial formation [36,37,233]. As it is con-

ceivable that natural modifications may also alter tropoelastin’s structure and

molecular motions, I explored whether the naturally occurring allysine modifica-

tions that are required for cross-linking are also capable of causing domain dis-

placements and alteration of tropoelastin’s overall molecular motions. I selected

lysine sites for modification on the basis of prior experimental data that were

identified as either cross-links in native elastin [53, 235] or cross-linking hotspots

in synthetic elastin studies [85,110].

I conducted REMD and cMD on three variants of modified tropoelastin, ALK353,

ALK507 and 5ALK. In all cases, I observed that both single and multiple ally-

sine modifications were capable of perturbing the canonical structural ensemble of

141

WT tropoelastin. I utilised PCA and NMA to demonstrate that these ensembles

differed significantly from not only WT [61], but also from one another. This

indicates a surprising diversity in the motions of natural tropoelastin that had

not been considered prior to this study [61, 62]. Importantly, the broad variety

of motions arising from the three tropoelastin molecules surveyed here provides

a mechanism for the heterogeneity seen in recent native elastin mass spectrom-

etry studies [34, 111]. In addition to global structural changes, I also identified

localised shifts in protein backbone fluctuation between tropoelastin containing

single allysines and 5ALK, tropoelastin containing five allysines. Interestingly, I

saw that ALK353 and ALK507 exhibited an approximate 3-fold decrease in the

overall local fluctuation relative to 5ALK, which suggests that tropoelastin’s flex-

ibility is enhanced due to multiple allysine modifications. Curiously, ALK353 and

ALK507 also presented with lowered backbone fluctuation relative to WT, simi-

lar to those displayed by tropoelastin mutants that exhibited altered functional-

ity [22, 36, 37, 61, 233]. As the elevated conformational sampling of tropoelastin

is thought to be crucial for its self-assembly into higher order structures [22, 61],

I hypothesise that the dampened flexibility of ALK353 and ALK507 provides a

checkpoint to prevent their aggregation into the growing elastin chain. In doing

so, tropoelastin with multiple allysines is preferentially incorporated into the chain

due to its high conformational sampling, permitting the formation of a stronger

chain due to more cross-linking, whilst tropoelastin with low mobility is less likely

to undergo assembly.

6.3 Updating the head-to-tail model of assem-

bly

The head-to-tail model of elastin assembly was proposed based on SAXS/SANS

analysis that identified the global structural envelope of tropoelastin [59] and

142

the only elastin cross-link discovered at the time, which originated from porcine

elastin [53]. However, the model has a number of inconsistences in light of recent

data. Technical improvements in mass spectrometry have allowed the discovery

of numerous cross-linking sites within native human elastin [34], none of which

have corresponded to those found in porcine elastin [53]; this is likely due to the

differences between porcine and human tropoelastin [245]. Moreover, the positions

of the cross-linking sites within the recently modelled full-atomistic structure of

tropoelastin are not in agreement with those postulated based on the SAXS/SANS

envelope [59]. Further to this, numerous models of early stage coacervation [32,

78] noted heterogeneity in the interactions between tropoelastin domains and, in

particular, a coarse-grained study of 40 tropoelastin monomers noted a variety of

interactions that formed between the monomers [78].

I explored whether the head-to-tail model would hold true at the earliest stage of

elastin assembly - nucleation. I examined the interactions between a variety of do-

mains from synthetic studies [110], recent native elastin publications [34,111], and

the canonical porcine cross-link that the head-to-tail model was partially based

on [53]. I generated almost 20,000 dimers via protein-protein docking using the

three most representative conformations from WT tropoelastin’s structural en-

semble, giving rise to the largest number of tropoelastin formations examined in

a single study to date. I did not observe the precise style of head-to-tail interac-

tion that has been previously postulated [59], however, I observed that a class of

dimer that could be termed “head-to-middle” that was most similar to the model

proposed by Baldock and colleagues, that also formed the majority of the dimers

noted in the current study. To better understand the factors involved in form-

ing head-to-tail dimers, I employed logistic regression through machine learning,

which indicated that geometric data, such as domains and initial conformation,

were the most crucial in determining the resultant dimer. Although I only exam-

ined early stage coacervation here, when considering that extended coacervation

studies indicate that a variety of associations persist until at least 10 fs [78], it

143

is likely that the nucleation configurations hold their structure for some time,

validating my current approach. Overall, this study presents an important step

in unifying the decade-old head-to-tail model with current data from experimen-

tal and computational sources, simultaneously demonstrating the application of

machine learning in untangling protein-protein interactions.

6.4 Fuzzy binding mechanisms of tropoelastin

and integrins

Tropoelastin is known to bind a number of integrins [121–123, 126], facilitating

events that are crucial to wound repair and tissue regeneration. Nevertheless,

there is a distinct lack of fundamental knowledge pertaining to the nature of this

interaction. Firstly, the RGD sequences of other ECM proteins are responsible

for binding integrins, however, tropoelastin is devoid of this motif. Indeed, the

regions involved in these interactions have been pinpointed to be domains 17-

18 and 36, which are similar only in that they contain lysines. As the lack of

a tropoelastin-integrin crystal structure is unavailable due to tropoelastin’s in-

herent flexibility, this also leaves the question of the integrin site that interacts

with tropoelastin a mystery. Based on prior studies detailing the conformational

changes that are crucial for assessing integrin activation [154, 258, 268] and inte-

grin residues that participate in non-RGD interactions with other surfaces [263],

I sought to construct a model of tropoelastin-integrin binding that could explain

the observations detailed above.

By comparing the REMD ensembles of two tropoelastin structures that had been

docked to integrin αvβ3, I inspected the binding mechanism from a stochastic per-

spective. Quantification of the range of tropoelastin structures and integrin head-

pieces in various states of opening demonstrated that tropoelastin has a preferred

binding conformation that strongly promotes integrin activation. More interest-

144

ingly, that the second structure did not abolish headpiece opening strongly sug-

gested that tropoelastin interacts with integrins in a fuzzy manner [276], forming

many transient bonds that still allow tropoelastin to exhibit conformational sam-

pling. Indeed, I observed that multiple tropoelastin sites were capable of contact-

ing the integrin, involving domains 17 and 36, as previously observed [121–123], as

well as domain 20. Curiously, domain 20 is a hydrophobic domain that does not

contain the positively charged lysines that appeared to be responsible for domains

17 and 36 interacting with αvβ3 in the current study. Excitingly, this provides a

mechanism with which highly cross-linked elastin - which may present with limited

lysines on its surface - can continue to interact with integrins in mature tissues. By

narrowing down tropoelastin’s preferred binding site, the α1 helix, I hypothesise

its utility for future wound healing and tissue repair therapeutics if verified using

crystallography.

6.5 Future directions

This thesis enhances the current understanding of the delicate structure-function

relationship of tropoelastin with respect to the process of self-assembly. My find-

ing that allysines modify the structure and dynamics of tropoelastin indicates that

it is not sufficient to examine elastogenesis in the context of unmodified tropoe-

lastin. The impact of allysine modifications that I hypothesise here, including

the checkpoint theory presented in Chapter 3, can be further investigated using

large scale coarse-grained studies, similar to those discussed previously in this the-

sis. In that regard, a coarse-grained study utilising allysine modified tropoelastin

would also provide further insights into the nature of nucleation events and fibre

directionality, building upon the present dimer study.

An important avenue that this thesis has opened up is the exploration of tropoelastin-

receptor binding. I have demonstrated that it is feasible to model large scale

protein-protein interactions using ensemble methodologies. This study paves the

145

way for a deeper understanding of the mechanisms of interactions between tropoe-

lastin and other ECM receptors such as GAGs and EBP, provided that they have

available structures that have been derived from crystallography. The elucidation

of these interactive sites is crucial for furthering the field of tropoelastin-based

therapeutics, as detailed knowledge of the mechanisms through which tropoe-

lastin exerts its biological effects is required for optimising features such as the

presentation of its cell interactive sites.

Although a commonly criticised aspect of computational methodologies is that

they are unverified. Here, I have taken the approach to use experimental data to

inform my modelling rather than use simulations purely as discovery tools. Thus,

the models developed in this thesis unite prior experimental data with the full-

atomistic structure of tropoelastin to examine its functionality under an atomic

scale lens. This has enabled us to better explain prior findings whilst also al-

lowing the observation of new events. By understanding that the self-interactive

and cell receptor binding behaviours of tropoelastin can only partially be ex-

plained on a macroscopic level, I highlight the need for future work to incorporate

molecular data, whilst acknowledging the importance of experimentally robust

studies.

146

References

[1] Robert P Mecham and Elaine C Davis. Elastic fiber structure and assembly.Peter D Yurchenko, David E Birk, Robert P Mecham (eds), New York,Academic Press, 281-314, 1994.

[2] Russell Ross and Paul Bornstein. The elastic fiber: I. the separation andpartial characterization of its macromolecular components. The Journal ofCell Biology, 40(2):366–381, 1969.

[3] Hiromi Yanagisawa and Jessica Wagenseil. Elastic fibers and biomechanicsof the aorta: Insights from mouse studies. Matrix Biology, 85:160–172, 2019.

[4] Jaime Moore and Susan Thibeault. Insights into the role of elastin in vocalfold health and disease. Journal of Voice, 26(3):269–275, 2012.

[5] Robert P Mecham. Elastin in lung development and disease pathogenesis.Matrix Biology, 73:6–20, 2018.

[6] SD Shapiro, SK Endicott, MA Province, JA Pierce, EJ Campbell, et al.Marked longevity of human lung parenchymal elastic fibers deduced fromprevalence of d-aspartate and nuclear weapons-related radiocarbon. TheJournal of Clinical Investigation, 87(5):1828–1834, 1991.

[7] Janet T Powell, Nicholas Vine, and Margot Crossman. On the accumulationof d-aspartate in elastin and other proteins of the ageing aorta. Atheroscle-rosis, 97(2-3):201–208, 1992.

[8] Muhammad M Bashir, Zena Indik, Helena Yeh, Norma Ornstein-Goldstein,Joan C Rosenbloom, William Abrams, M Fazio, J Uitto, and J Rosenbloom.Characterization of the complete human elastin gene. delineation of un-usual features in the 5’-flanking region. Journal of Biological Chemistry,264(15):8887–8891, 1989.

[9] Zena Indik, Helena Yeh, Norma Ornstein-Goldstein, Paul Sheppard, NoelAnderson, Joan C Rosenbloom, Leena Peltonen, and Joel Rosenbloom. Al-ternative splicing of human elastin mrna indicated by sequence analysisof cloned genomic and complementary dna. Proceedings of the NationalAcademy of Sciences, 84(16):5680–5684, 1987.

[10] Zena Indik, Helena Yeh, Norma Ornstein-Goldstein, Umberto Kucich,William Abrams, Joan C Rosenbloom, and Joel Rosenbloom. Structureof the elastin gene and alternative splicing of elastin mrna: implications forhuman disease. American Journal of Medical Genetics, 34(1):81–90, 1989.

147

[11] MJ Fazio, DR Olsen, H Kuivaniemi, ML Chu, JM Davidson, J Rosenbloom,and J Uitto. Isolation and characterization of human elastin cdnas, and age-associated variation in elastin gene expression in cultured skin fibroblasts.Laboratory Investigation; a Journal of Technical Methods and Pathology,58(3):270–277, 1988.

[12] William C Parks, Jill D Roby, Leeju C Wu, and Leonard E Gross. Cellularexpression of tropoelastin mrna splice variants. Matrix, 12(2):156–162, 1992.

[13] Sean E Reichheld, Lisa D Muiznieks, Robert Lu, Simon Sharpe, and Fred WKeeley. Sequence variants of human tropoelastin affecting assembly, struc-tural characteristics and functional properties of polymeric elastin in healthand disease. Matrix Biology, 84:68–80, 2019.

[14] Laurent Debelle and AM Tamburro. Elastin: molecular description andfunction. The International Journal of Biochemistry & Cell Biology,31(2):261–272, 1999.

[15] Zhou Chen, Mi Hee Shin, Young Ji Moon, Se Rah Lee, Yeon Kyung Kim,Jo-Eun Seo, Ji Eun Kim, Kyu Han Kim, and Jin Ho Chung. Modulationof elastin exon 26a mrna and protein expression in human skin in vivo.Experimental Dermatology, 18(4):378–386, 2009.

[16] Helen Piontkivska, Yi Zhang, Eric D Green, Laura Elnitski, NISC Compar-ative Sequencing Program, et al. Multi-species sequence comparison revealsdynamic evolution of the elastin gene that has involved purifying selectionand lineage-specific insertions/deletions. BMC Genomics, 5(1):31, 2004.

[17] Eiichi Hirano, Russell H Knutsen, Hideki Sugitani, Christopher H Ciliberto,and Robert P Mecham. Functional rescue of elastin insufficiency in mice bythe human elastin gene: implications for mouse models of human disease.Circulation Research, 101(5):523–531, 2007.

[18] William C Parks and Susan B Deak. Tropoelastin heterogeneity: implica-tions for protein function and disease. Elastic, 1(399–406):2, 1990.

[19] Sacha A Jensen, Bernadette Vrhovski, and Anthony S Weiss. Domain 26 oftropoelastin plays a dominant role in association by coacervation. Journalof Biological Chemistry, 275(37):28449–28454, 2000.

[20] Beth A Kozel, Hiroshi Wachi, Elaine C Davis, and Robert P Mecham. Do-mains in tropoelastin that mediate elastin depositionin vitro and in vivo.Journal of Biological Chemistry, 278(20):18491–18498, 2003.

[21] Lisa D Muiznieks, Ming Miao, Eva E Sitarz, and Fred W Keeley. Contri-bution of domain 30 of tropoelastin to elastic fiber formation and materialelasticity. Biopolymers, 105(5):267–275, 2016.

[22] Giselle C Yeo, Anna Tarakanova, Clair Baldock, Steven G Wise, Markus JBuehler, and Anthony S Weiss. Subtle balance of tropoelastin molecularshape and flexibility regulates dynamics and hierarchical assembly. ScienceAdvances, 2(2):e1501145, 2016.

148

[23] Ming Miao, Sean E Reichheld, Lisa D Muiznieks, Eva E Sitarz, SimonSharpe, and Fred W Keeley. Single nucleotide polymorphisms and do-main/splice variants modulate assembly and elastomeric properties of hu-man elastin. implications for tissue specificity and durability of elastic tissue.Biopolymers, 107(5):e23007, 2017.

[24] Bernadette Vrhovski, Sacha Jensen, and Anthony S Weiss. Coacervationcharacteristics of recombinant human tropoelastin. European Journal ofBiochemistry, 250(1):92–98, 1997.

[25] Judith Ann Foster, Eveline Bruenger, William R Gray, and Lawrence BSandberg. Isolation and amino acid sequences of tropoelastin peptides. Jour-nal of Biological Chemistry, 248(8):2876–2879, 1973.

[26] William R Gray, Lawrence B Sandberg, and Judit A Foster. Molecularmodel for elastin structure and function. Nature, 246(5434):461–466, 1973.

[27] Lawrence B Sandberg, Terril B Wolt, and John G Leslie. Quantitation ofelastin through measurement of its pentapeptide content. Biochemical andBiophysical Research Communications, 136(2):672–678, 1986.

[28] Rao S Rapaka, K Okamoto, and DW Urry. Coacervation properties in se-quential polypeptide models of elastin: Synthesis of h-(ala-pro-gly-gly) n-val-ome and h-(ala-pro-gly-val-gly) n-val-ome. International Journal of Peptideand Protein Research, 12(2):81–92, 1978.

[29] Bernadette Vrhovski and Anthony S Weiss. Biochemistry of tropoelastin.European Journal of Biochemistry, 258(1):1–18, 1998.

[30] Prachumporn Toonkool, Sacha A Jensen, Adam L Maxwell, and Anthony SWeiss. Hydrophobic domains of human tropoelastin interact in a context-dependent manner. Journal of Biological Chemistry, 276(48):44575–44580,2001.

[31] Lisa D Muiznieks, Anthony S Weiss, and Fred W Keeley. Structural disorderand dynamics of elastin. Biochemistry and Cell Biology, 88(2):239–250, 2010.

[32] Sarah Rauscher and Regis Pomes. The liquid structure of elastin. Elife,6:e26526, 2017.

[33] Sarah Rauscher and Regis Pomes. Structural disorder and protein elasticity.In Fuzziness, pages 159–183. Springer, 2012.

[34] Tobias Hedtke, Christoph U Schrader, Andrea Heinz, Wolfgang Hoehen-warter, Jurgen Brinckmann, Thomas Groth, and Christian EH Schmelzer.A comprehensive map of human elastin cross-linking during elastogenesis.The FEBS Journal, 286(18):3594–3610, 2019.

[35] Patricia L Brown, Lisa Mecham, Clarina Tisdale, and Robert P Mecham.The cysteine residues in the carboxy terminal domain of tropoelastin forman intrachain disulfide bond that stabilizes a loop structure and positivelycharged pocket. Biochemical and Biophysical Research Communications,186(1):549–555, 1992.

149

[36] Giselle C Yeo, Clair Baldock, Steven G Wise, and Anthony S Weiss. Anegatively charged residue stabilizes the tropoelastin n-terminal region forelastic fiber assembly. Journal of Biological Chemistry, 289(50):34815–34826,2014.

[37] Giselle C Yeo, Clair Baldock, Steven G Wise, and Anthony S Weiss. Tar-geted modulation of tropoelastin structure and assembly. ACS BiomaterialsScience & Engineering, 3(11):2832–2844, 2017.

[38] Lisa D Muiznieks and Anthony S Weiss. Flexibility in the solution structureof human tropoelastin. Biochemistry, 46(27):8196–8205, 2007.

[39] David He, Ming Miao, Eva E Sitarz, Lisa D Muiznieks, Sean Reichheld,Richard J Stahl, Fred W Keeley, and John Parkinson. Polymorphisms inthe human tropoelastin gene modify in vitro self-assembly and mechanicalproperties of elastin-like polypeptides. PloS One, 7(9):e46130, 2012.

[40] Laurent Debelle, Alain JP Alix, Marie-Paule Jacob, Jean-Pierre Huvenne,Maurice Berjot, Bernard Sombret, and Pierre Legrand. Bovine elastin and κ-elastin secondary structure determination by optical spectroscopies. Journalof Biological Chemistry, 270(44):26099–26103, 1995.

[41] Ellen Green, Richard Ellis, and Peter Winlove. The molecular structure andphysical properties of elastin fibers as revealed by raman microspectroscopy.Biopolymers: Original Research on Biomolecules, 89(11):931–940, 2008.

[42] Vesna Serrano, Wenge Liu, and Stefan Franzen. An infrared spectroscopicstudy of the conformational transition of elastin-like polypeptides. Biophys-ical Journal, 93(7):2429–2435, 2007.

[43] JR Lyerla Jr and DA Torchia. Molecular mobility and structure of elastin de-duced from the solvent and temperature dependence of carbon-13 magneticresonance relaxation data. Biochemistry, 14(23):5175–5183, 1975.

[44] Joel P Mackay, Lisa D Muiznieks, Prachumporn Toonkool, and Anthony SWeiss. The hydrophobic domain 26 of human tropoelastin is unstructuredin solution. Journal of Structural Biology, 150(2):154–162, 2005.

[45] Alex Kentsis and Tobin R Sosnick. Trifluoroethanol promotes helix for-mation by destabilizing backbone exposure: desolvation rather than nativehydrogen bonding defines the kinetic pathway of dimeric coiled coil folding.Biochemistry, 37(41):14613–14622, 1998.

[46] Herald Reiersen and Anthony R Rees. Trifluoroethanol may form a solventmatrix for assisted hydrophobic interactions between peptide side chains.Protein Engineering, 13(11):739–743, 2000.

[47] Ming Miao, Judith T Cirulis, Shaun Lee, and Fred W Keeley. Structuraldeterminants of cross-linking and hydrophobic domains for self-assembly ofelastin-like polypeptides. Biochemistry, 44(43):14367–14375, 2005.

[48] Laurent Debelle, Alain JP Alix, Shao M Wei, Marie-Paule Jacob, Jean-Pierre Huvenne, Maurice Berjot, and Pierre Legrand. The secondary struc-

150

ture and architecture of human elastin. European Journal of Biochemistry,258(2):533–539, 1998.

[49] Brigida Bochicchio, Antonietta Pepe, and Antonio M Tamburro. Investi-gating by cd the molecular mechanism of elasticity of elastomeric proteins.Chirality: The Pharmacological, Biological, and Chemical Consequences ofMolecular Asymmetry, 20(9):985–994, 2008.

[50] Sarah Rauscher, Stephanie Baud, Ming Miao, Fred W Keeley, and RegisPomes. Proline and glycine control protein self-organization into elastomericor amyloid fibrils. Structure, 14(11):1667–1676, 2006.

[51] Stefan Roberts, Michael Dzuricky, and Ashutosh Chilkoti. Elastin-likepolypeptides as models of intrinsically disordered proteins. FEBS Letters,589(19):2477–2486, 2015.

[52] Sean E Reichheld, Lisa D Muiznieks, Fred W Keeley, and Simon Sharpe.Direct observation of structure and dynamics during phase separation ofan elastomeric protein. Proceedings of the National Academy of Sciences,114(22):E4408–E4415, 2017.

[53] Patricia Brown-Augsburger, Clarina Tisdale, Thomas Broekelmann, Car-olyn Sloan, and Robert P Mecham. Identification of an elastin cross-linkingdomain that joins three peptide chains possible role in nucleated assembly.Journal of Biological Chemistry, 270(30):17778–17783, 1995.

[54] An-Suei Yang and Barry Honig. Free energy determinants of secondarystructure formation: I. α-helices. Journal of Molecular Biology, 252(3):351–365, 1995.

[55] Franc Avbelj. Amino acid conformational preferences and solvation of polarbackbone atoms in peptides and proteins. Journal of Molecular Biology,300(5):1335–1359, 2000.

[56] Peizhi Luo and Robert L Baldwin. Mechanism of helix induction by triflu-oroethanol: a framework for extrapolating the helix-forming properties ofpeptides from trifluoroethanol/water mixtures back to water. Biochemistry,36(27):8413–8421, 1997.

[57] Antonio Mario Tamburro, Antonietta Pepe, and Brigida Bochicchio. Lo-calizing α-helices in human tropoelastin: assembly of the elastin “puzzle”.Biochemistry, 45(31):9518–9530, 2006.

[58] Sean E Reichheld, Lisa D Muiznieks, Richard Stahl, Karen Simonetti, SimonSharpe, and Fred W Keeley. Conformational transitions of the cross-linkingdomains of elastin during self-assembly. Journal of Biological Chemistry,289(14):10057–10068, 2014.

[59] Clair Baldock, Andres F Oberhauser, Liang Ma, Donna Lammie, VeroniqueSiegler, Suzanne M Mithieux, Yidong Tu, John Yuen Ho Chow, FarhanaSuleman, Marc Malfois, et al. Shape of tropoelastin, the highly extensibleprotein that controls human tissue elasticity. Proceedings of the NationalAcademy of Sciences, 108(11):4322–4327, 2011.

151

[60] Steven G Wise, Giselle C Yeo, Matti A Hiob, Jelena Rnjak-Kovacina,David L Kaplan, Martin KC Ng, and Anthony S Weiss. Tropoelastin: Aversatile, bioactive assembly module. Acta Biomaterialia, 10(4):1532–1541,2014.

[61] Anna Tarakanova, Giselle C Yeo, Clair Baldock, Anthony S Weiss, andMarkus J Buehler. Molecular model of human tropoelastin and implicationsof associated mutations. Proceedings of the National Academy of Sciences,115(28):7338–7343, 2018.

[62] Anna Tarakanova, Giselle C Yeo, Clair Baldock, Anthony S Weiss, andMarkus J Buehler. Tropoelastin is a flexible molecule that retains its canon-ical shape. Macromolecular Bioscience, 19(3):1800250, 2019.

[63] Tilman Flock, Robert J Weatheritt, Natasha S Latysheva, and M MadanBabu. Controlling entropy to tune the functions of intrinsically disorderedregions. Current Opinion in Structural Biology, 26:62–72, 2014.

[64] Howard Vindin, Suzanne M Mithieux, and Anthony S Weiss. Elastin archi-tecture. Matrix Biology, 84:4–16, 2019.

[65] Patricia Brown-Augsburger, Thomas Broekelmann, Joel Rosenbloom, andRobert P Mecham. Functional domains on elastin and microfibril-associatedglycoprotein involved in elastic fibre assembly. Biochemical Journal,318(1):149–155, 1996.

[66] RP Mecham, BD Levy, SL Morris, JG Madaras, and DS Wrenn. Increasedcyclic gmp levels lead to a stimulation of elastin production in ligamentfibroblasts that is reversed by cyclic amp. Journal of Biological Chemistry,260(6):3255–3258, 1985.

[67] A Sampath Narayanan, Larry B Sandberg, Russell Ross, and Don L Layman.The smooth muscle cell. iii. elastin synthesis in arterial smooth muscle cellculture. The Journal of Cell Biology, 68(3):411–419, 1976.

[68] Hiroyoshi Kajiya, Nobuhiko Tanaka, Toyoko Inazumi, Yoshiyuki Seyama,Shingo Tajima, and Akira Ishibashi. Cultured human keratinocytes expresstropoelastin. Journal of Investigative Dermatology, 109(5):641–644, 1997.

[69] Robert P Mecham, Judy Madaras, John A McDonald, and Una Ryan.Elastin production by cultured calf pulmonary artery endothelial cells. Jour-nal of Cellular Physiology, 116(3):282–288, 1983.

[70] Thomas J Mariani, Sarah E Dunsmore, Qinglang Li, Xueming Ye, andRichard A Pierce. Regulation of lung fibroblast tropoelastin expression byalveolar epithelial cells. American Journal of Physiology-Lung Cellular andMolecular Physiology, 274(1):47–57, 1998.

[71] Peter Heeger and Joel Rosenbloom. Biosynthesis of tropoelastin by elasticcartilage. Connective Tissue Research, 8(1):21–25, 1980.

[72] Barbara Myers, Michael Dubick, Jerold A Last, and Robert B Rucker.Elastin synthesis during perinatal lung development in the rat. Biochim-ica et Biophysica Acta (BBA)-General Subjects, 761(1):17–22, 1983.

152

[73] Akihiko Noguchi, Kathryn Firsching, Jonathan D Kursar, and RajkumarReddy. Developmental changes of tropoelastin synthesis by rat pulmonaryfibroblasts and effects of dexamethasone. Pediatric Research, 28(4):379–382,1990.

[74] William C Parks. Posttranscriptional regulation of lung elastin produc-tion. American Journal of Respiratory Cell and Molecular Biology, 17(1):1–2, 1997.

[75] Alkystis Phinikaridou, Sara Lacerda, Begona Lavin, Marcelo E Andia, Al-berto Smith, Prakash Saha, and Rene M Botnar. Tropoelastin: a novelmarker for plaque progression and instability. Circulation: CardiovascularImaging, 11(8):e007303, 2018.

[76] Leonard E Grosso and Robert P Mecham. In vitro processing of tropoelastin:investigation of a possible transport function associated with the carboxy-terminal domain. Biochemical and Biophysical Research Communications,153(2):545–551, 1988.

[77] Aleksander Hinek, Fred W Keeley, and John Callahans. Recycling of the 67-kda elastin binding protein in arterial myocytes is imperative for secretionof tropoelastin. Experimental Cell rResearch, 220(2):312–324, 1995.

[78] Anna Tarakanova, Jazmin Ozsvar, Anthony S Weiss, and Markus J Buehler.Coarse-grained model of tropoelastin self-assembly into nascent fibrils. Ma-terials Today Biology, 3:100016, 2019.

[79] Adam W Clarke, Eva C Arnspang, Suzanne M Mithieux, Emine Korkmaz,Filip Braet, and Anthony S Weiss. Tropoelastin massively associates duringcoacervation to form quantized protein spheres. Biochemistry, 45(33):9989–9996, 2006.

[80] Beth A Kozel, Brenda J Rongish, Andras Czirok, Julia Zach, Charles DLittle, Elaine C Davis, Russell H Knutsen, Jessica E Wagenseil, Marilyn ALevy, and Robert P Mecham. Elastic fiber formation: a dynamic view ofextracellular matrix assembly using timer reporters. Journal of CellularPhysiology, 207(1):87–96, 2006.

[81] Yidong Tu and Anthony S Weiss. Transient tropoelastin nanoparticles areearly-stage intermediates in the coacervation of human tropoelastin whoseaggregation is facilitated by heparan sulfate and heparin decasaccharides.Matrix Biology, 29(2):152–159, 2010.

[82] Yidong Tu, Steven G Wise, and Anthony S Weiss. Stages in tropoelastincoalescence during synthetic elastin hydrogel formation. Micron, 41(3):268–272, 2010.

[83] Betty A Cox, Barry C Starcher, and Dan W Urry. Coacervation of tropoe-lastin results in fiber formation. Journal of Biological Chemistry, 249(3):997–998, 1974.

153

[84] GM Bressan, I Castellani, MG Giro, D Volpin, C Fornieri, and I PasqualiRonchetti. Banded fibers in tropoelastin coacervates at physiological tem-peratures. Journal of Ultrastructure Research, 82(3):335–340, 1983.

[85] Suzanne M Mithieux, Steven G Wise, Mark J Raftery, Barry Starcher, andAnthony S Weiss. A model two-component system for studying the architec-ture of elastin assembly in vitro. Journal of Structural Biology, 149(3):282–289, 2005.

[86] M Daria Haust, Robert H More, SA Bencosme, and John U Balis. Elasto-genesis in human aorta: an electron microscopic study. Experimental andMolecular Pathology, 4(5):508–524, 1965.

[87] WH Fahrenbach, LB Sandberg, and EG Cleary. Ultrastructural studies onearly elastogenesis. The Anatomical Record, 155(4):563–575, 1966.

[88] Ernest N Albert. Developing elastic tissue: an electron microscopic study.The American Journal of Pathology, 69(1):89, 1972.

[89] AM Tamburro, V Guantieri, and D Daga Gordini. Synthesis and structuralstudies of a pentapeptide sequence of elastin. poly (val-gly-gly-leu-gly). Jour-nal of Biomolecular Structure and Dynamics, 10(3):441–454, 1992.

[90] Ming Miao, Catherine M Bellingham, Richard J Stahl, Eva E Sitarz, Christo-pher J Lane, and Fred W Keeley. Sequence and structure determinantsfor the self-aggregation of recombinant polypeptides modeled after humanelastin. Journal of Biological Chemistry, 278(49):48553–48562, 2003.

[91] Lisa D Muiznieks, Sacha A Jensen, and Anthony S Weiss. Structural changesand facilitated association of tropoelastin. Archives of Biochemistry andBiophysics, 410(2):317–323, 2003.

[92] Leanne B Dyksterhuis, Clair Baldock, Donna Lammie, Tim J Wess, andAnthony S Weiss. Domains 17–27 of tropoelastin contain key regions ofcontact for coacervation and contain an unusual turn-containing crosslinkingdomain. Matrix Biology, 26(2):125–135, 2007.

[93] Wendy J Wu and Anthony S Weiss. Deficient coacervation of two forms ofhuman tropoelastin associated with supravalvular aortic stenosis. EuropeanJournal of Biochemistry, 266(1):308–314, 1999.

[94] Jany Dandurand, Valerie Samouillan, Colette Lacabanne, Antonietta Pepe,and Brigida Bochicchio. Water structure and elastin-like peptide aggrega-tion. Journal of Thermal Analysis and Calorimetry, 120(1):419–426, 2015.

[95] Daiki Tatsubo, Keitaro Suyama, Masaya Miyazaki, Iori Maeda, andTakeru Nose. Stepwise mechanism of temperature-dependent coacervationof the elastin-like peptide analogue dimer, (c(wpgvg)3)2. Biochemistry,57(10):1582–1590, 2018.

[96] Giselle C Yeo, Fred W Keeley, and Anthony S Weiss. Coacervation of tropoe-lastin. Advances in Colloid and Interface Science, 167(1):94–103, 2011.

154

[97] Aleksander Hinek and Marlene Rabinovitch. 67-kd elastin-binding protein isa protective” companion” of extracellular insoluble elastin and intracellulartropoelastin. The Journal of Cell Biology, 126(2):563–574, 1994.

[98] Ming Miao, Sean E Reichheld, Lisa D Muiznieks, Yayi Huang, and Fred WKeeley. Elastin binding protein and fkbp65 modulate in vitro self-assemblyof human tropoelastin. Biochemistry, 52(44):7731–7741, 2013.

[99] Lisa D Muiznieks and Fred W Keeley. Proline periodicity modulates theself-assembly properties of elastin-like polypeptides. Journal of BiologicalChemistry, 285(51):39779–39789, 2010.

[100] Antonietta Pepe, Deanna Guerra, Brigida Bochicchio, Daniela Quaglino,Dealba Gheduzzi, Ivonne Pasquali Ronchetti, and Antonio M Tamburro.Dissection of human tropoelastin: supramolecular organization of polypep-tide sequences coded by particular exons. Matrix Biology, 24(2):96–109,2005.

[101] Antonietta Pepe, Roberta Flamia, Deanna Guerra, Daniela Quaglino,Brigida Bochicchio, Ivonne Pasquali Ronchetti, and Antonio M Tamburro.Exon 26-coded polypeptide: an isolated hydrophobic domain of humantropoelastin able to self-assemble in vitro. Matrix Biology, 27(5):441–450,2008.

[102] Joel Rosenbloom, William R Abrams, Zena Indik, Helena Yeh, NormaOrnstein-Goldstein, and Muhammad M Bashir. Structure of the elastin gene.In Ciba Foundation Symposium 192-The Molecular Biology and Pathologyof Elastic Tissues: The Molecular Biology and Pathology of Elastic Tissues:Ciba Foundation Symposium 192, pages 59–80. Wiley Online Library, 2007.

[103] Christian EH Schmelzer, Andrea Heinz, Helen Troilo, Michael P Lockhart-Cairns, Thomas A Jowitt, Marion F Marchand, Laurent Bidault, MarineBignon, Tobias Hedtke, Alain Barret, et al. Lysyl oxidase–like 2 (loxl2)–mediated cross-linking of tropoelastin. The FASEB Journal, 33(4):5468–5481, 2019.

[104] C Franzblau, B Faris, and R Papaioannou. Lysinonorleucine. a new aminoacid from hydrolyzates of elastin. Biochemistry, 8(7):2833–2837, 1969.

[105] Richard W Lent, Barbara Smith, Lily L Salcedo, Barbara Faris, andCarl Franzblau. Reduction of elastin. ii. evidence for the presence of α-aminoadipic acid. delta.-semialdehyde and its aldol condensation product.Biochemistry, 8(7):2837–2845, 1969.

[106] SM Partridge, DF Elsden, and J Thomas. Constitution of the cross-linkagesin elastin. Nature, 197(4874):1297–1298, 1963.

[107] Andrea Heinz, Christoph KH Ruttkies, Gunther Jahreis, Christoph USchrader, Kanin Wichapong, Wolfgang Sippl, Fred W Keeley, Reinhard HHNeubert, and Christian EH Schmelzer. In vitro cross-linking of elastin pep-tides and molecular characterization of the resultant biomaterials. Biochim-ica et Biophysica Acta (BBA)-General Subjects, 1830(4):2994–3004, 2013.

155

[108] Beth A Kozel, Hiroshi Wachi, Elaine C Davis, and Robert P Mecham. Do-mains in tropoelastin that mediate elastin depositionin vitro and in vivo.Journal of Biological Chemistry, 278(20):18491–18498, 2003.

[109] Christian EH Schmelzer, Tobias Hedtke, and Andrea Heinz. Unique molec-ular networks: Formation and role of elastin cross-links. IUBMB life,72(5):842–854, 2020.

[110] Steven G Wise, Suzanne M Mithieux, Mark J Raftery, and Anthony S Weiss.Specificity in the coacervation of tropoelastin: solvent exposed lysines. Jour-nal of Structural Biology, 149(3):273–281, 2005.

[111] Christoph U Schrader, Andrea Heinz, Petra Majovsky, Berin Karaman May-ack, Jurgen Brinckmann, Wolfgang Sippl, and Christian EH Schmelzer.Elastin is heterogeneously cross-linked. Journal of Biological Chemistry,293(39):15107–15119, 2018.

[112] Karl E Kadler. Fell muir lecture: Collagen fibril formation in vitro and invivo. International Journal of Experimental Pathology, 98(1):4–16, 2017.

[113] GM Cooper. Structure and organization of actin filaments. The cell: amolecular approach, 2, 2000.

[114] E Heitlinger, M Peter, A Lustig, W Villiger, EA Nigg, and U Aebi. Therole of the head and tail domain in lamin structure and assembly: analysisof bacterially expressed chicken lamin a and truncated b2 lamins. Journalof Structural Biology, 108(1):74–91, 1992.

[115] Zsolt Urban, Vishwanathan Hucthagowder, Nura Schurmann, Vesna Todor-ovic, Lior Zilberberg, Jiwon Choi, Carla Sens, Chester W Brown, Robin DClark, Kristen E Holland, et al. Mutations in ltbp4 cause a syndrome ofimpaired pulmonary, gastrointestinal, genitourinary, musculoskeletal, anddermal development. The American Journal of Human Genetics, 85(5):593–605, 2009.

[116] Insa Bultmann-Mellin, Anne Conradi, Alexandra C Maul, Katharina Dinger,Frank Wempe, Alexander P Wohl, Thomas Imhof, F Thomas Wunderlich,Alexander C Bunck, Tomoyuki Nakamura, et al. Modeling autosomal reces-sive cutis laxa type 1c in mice reveals distinct functions for ltbp-4 isoforms.Disease models & mechanisms, 8(4):403–415, 2015.

[117] Precious J McLaughlin, Qiuyun Chen, Masahito Horiguchi, Barry CStarcher, J Brett Stanton, Thomas J Broekelmann, Alan D Marmorstein,Brian McKay, Robert Mecham, Tomoyuki Nakamura, et al. Targeted dis-ruption of fibulin-4 abolishes elastogenesis and causes perinatal lethality inmice. Molecular and Cellular Biology, 26(5):1700–1709, 2006.

[118] Kazuo Noda, Branka Dabovic, Kyoko Takagi, Tadashi Inoue, MasahitoHoriguchi, Maretoshi Hirai, Yusuke Fujikawa, Tomoya O Akama, KenjiKusumoto, Lior Zilberberg, et al. Latent tgf-β binding protein 4 promoteselastic fiber assembly by interacting with fibulin-5. Proceedings of the Na-tional Academy of Sciences, 110(8):2852–2857, 2013.

156

[119] Svenja Hinderer, Nian Shen, Lea-Jeanne Ringuette, Jan Hansmann, Dieter PReinhardt, Sara Y Brucker, Elaine C Davis, and Katja Schenke-Layland.In vitro elastogenesis: instructing human vascular smooth muscle cells togenerate an elastic fiber-containing extracellular matrix scaffold. BiomedicalMaterials, 10(3):034102, 2015.

[120] Yoshinori Yamauchi, Eichi Tsuruga, Kazuki Nakashima, Yoshihiko Sawa,and Hiroyuki Ishikawa. Fibulin-4 and-5, but not fibulin-2, are associatedwith tropoelastin deposition in elastin-producing cell culture. Acta Histo-chemica et Cytochemica, 43(6):131–138, 2010.

[121] Daniel V Bax, Ursula R Rodgers, Marcela MM Bilek, and Anthony S Weiss.Cell adhesion to tropoelastin is mediated via the c-terminal grkrk motifand integrin alphavbeta3. Journal of Biological Chemistry, pages jbc–M109,2009.

[122] Pearl Lee, Daniel V Bax, Marcela MM Bilek, and Anthony S Weiss. A novelcell adhesion region in tropoelastin that mediates attachment to integrinalphavbeta5. Journal of Biological Chemistry, pages jbc–M113, 2013.

[123] Pearl Lee, Giselle C Yeo, and Anthony S Weiss. A cell adhesive peptide fromtropoelastin promotes sequential cell attachment and spreading via distinctreceptors. The FEBS Journal, 284(14):2216–2230, 2017.

[124] Matti A Hiob, Steven G Wise, Alexey Kondyurin, Anna Waterhouse,Marcela M Bilek, Martin KC Ng, and Anthony S Weiss. The use of plasma-activated covalent attachment of early domains of tropoelastin to enhancevascular compatibility of surfaces. Biomaterials, 34(31):7584–7591, 2013.

[125] Young Yu, Steven G Wise, Praveesuda L Michael, Daniel V Bax, Gloria SCYuen, Matti A Hiob, Giselle C Yeo, Elysse C Filipe, Louise L Dunn, Kim HChan, et al. Characterization of endothelial progenitor cell interactions withhuman tropoelastin. PloS One, 10(6):e0131101, 2015.

[126] Giselle C Yeo and Anthony S Weiss. Soluble matrix protein is a potent mod-ulator of mesenchymal stem cell performance. Proceedings of the NationalAcademy of Sciences, 116(6):2042–2051, 2019.

[127] Aleksander Hinek, David S Wrenn, Robert P Mecham, and Samuel HBarondes. The elastin receptor: a galactoside-binding protein. Science,239(4847):1539–1541, 1988.

[128] Shingo Tajima, Hiroshi Wachi, Yuko Uemura, and Kouji Okamoto. Modu-lation by elastin peptide vgvapg of cell proliferation and elastin expressionin human skin fibroblasts. Archives of Dermatological Research, 289(8):489–492, 1997.

[129] Aleksander Hinek, Kathy R Braun, Kela Liu, Yanting Wang, and Thomas NWight. Retrovirally mediated overexpression of versican v3 reverses im-paired elastogenesis and heightened proliferation exhibited by fibroblastsfrom costello syndrome and hurler disease patients. The American Journalof Pathology, 164(1):119–131, 2004.

157

[130] Laurent Duca, Laurent Debelle, Romain Debret, Frank Antonicelli, WilliamHornebeck, and Bernard Haye. The elastin peptides-mediated inductionof pro-collagenase-1 production by human fibroblasts involves activationof mek/erk pathway via pka-and pi3k-dependent signaling. FEBS letters,524(1-3):193–198, 2002.

[131] Satsuki Mochizuki, Bertrand Brassart, and Aleksander Hinek. Signal-ing pathways transduced through the elastin receptor facilitate prolifer-ation of arterial smooth muscle cells. Journal of Biological Chemistry,277(47):44854–44863, 2002.

[132] Bertrand Brassart, Patrick Fuchs, Eric Huet, Alain JP Alix, Jean Wallach,Antonio M Tamburro, Frederic Delacoux, Bernard Haye, Herve Emonard,William Hornebeck, et al. Conformational dependence of collagenase (ma-trix metalloproteinase-1) up-regulation by elastin peptides in cultured fi-broblasts. Journal of Biological Chemistry, 276(7):5222–5227, 2001.

[133] Kristian Prydz. Determinants of glycosaminoglycan (gag) structure.Biomolecules, 5(3):2003–2022, 2015.

[134] Marie-Claude Bourin and Ulf Lindahl. Glycosaminoglycans and the regula-tion of blood coagulation. Biochemical Journal, 289(Pt 2):313, 1993.

[135] Deirdre R Coombe. Biological implications of glycosaminoglycan interac-tions with haemopoietic cytokines. Immunology and Cell Biology, 86(7):598–607, 2008.

[136] Rahul Raman, V Sasisekharan, and Ram Sasisekharan. Structural insightsinto biological roles of protein-glycosaminoglycan interactions. Chemistry &Biology, 12(3):267–277, 2005.

[137] C Fornieri, M Baccarani-Contri, D Quaglino, and I Pasquali-Ronchetti. Ly-syl oxidase activity and elastin/glycosaminoglycan interactions in growingchick and rat aortas. The Journal of Cell Biology, 105(3):1463–1469, 1987.

[138] Wendy J Wu, Bernadette Vrhovski, and Anthony S Weiss. Glycosamino-glycans mediate the coacervation of human tropoelastin through dominantcharge interactions involving lysine side chains. Journal of Biological Chem-istry, 274(31):21719–21724, 1999.

[139] Thomas J Broekelmann, Beth A Kozel, Hideaki Ishibashi, Claudio C Wer-neck, Fred W Keeley, Lijuan Zhang, and Robert P Mecham. Tropoelastininteracts with cell-surface glycosaminoglycans via its cooh-terminal domain.Journal of Biological Chemistry, 280(49):40939–40947, 2005.

[140] Kerstin Tiedemann, Boris Batge, Peter K Muller, and Dieter P Reinhardt.Interactions of fibrillin-1 with heparin/heparan sulfate, implications for mi-crofibrillar assembly. Journal of Biological Chemistry, 276(38):36035–36042,2001.

[141] Timothy M Ritty, Thomas J Broekelmann, Claudio C Werneck, andRobert P Mecham. Fibrillin-1 and- 2 contain heparin-binding sites impor-

158

tant for matrix deposition and that support cell attachment. BiochemicalJournal, 375(2):425–432, 2003.

[142] Shailaja Seetharaman and Sandrine Etienne-Manneville. Integrin diversitybrings specificity in mechanotransduction. Biology of the Cell, 110(3):49–64,2018.

[143] Cedric Zeltz and Donald Gullberg. The integrin–collagen connection–a gluefor tissue repair? Journal of Cell Science, 129(4):653–664, 2016.

[144] Antonios Chronopoulos, Stephen D Thorpe, Ernesto Cortes, Dariusz La-chowski, Alistair J Rice, Vasyl V Mykuliak, Tomasz Rog, David A Lee,Vesa P Hytonen, and E Armando. Syndecan-4 tunes cell mechanics by ac-tivating the kindlin-integrin-rhoa pathway. Nature Materials, pages 1–10,2020.

[145] Aban Shuaib, Daniyal Motan, Pinaki Bhattacharya, Alex McNabb, Timo-thy M Skerry, and Damien Lacroix. Heterogeneity in the mechanical prop-erties of integrins determines mechanotransduction dynamics in bone os-teoblasts. Scientific Reports, 9(1):1–14, 2019.

[146] Laura Tomasello, Antonina Coppola, Maria Pitrone, Valentina Failla, Sal-vatore Cillino, Giuseppe Pizzolanti, and Carla Giordano. Pfn1 and integrin-β1/mtor axis involvement in cornea differentiation of fibroblast limbal stemcells. Journal of Cellular and Molecular Medicine, 23(11):7210–7221, 2019.

[147] Gabriel Neiman, Marıa Agustina Scarafıa, Alejandro La Greca, NataliaL Santın Velazque, Ximena Garate, Ariel Waisman, Alan M Mobbs,Tais Hanae Kasai-Brunswick, Fernanda Mesquita, Daiana Martire-Greco,et al. Integrin alpha-5 subunit is critical for the early stages of humanpluripotent stem cell cardiac differentiation. Scientific Reports, 9(1):1–10,2019.

[148] Aroa Duro-Castano, Elena Gallon, Caitlin Decker, and Marıa J Vicent. Mod-ulating angiogenesis with integrin-targeted nanomedicines. Advanced DrugDelivery Reviews, 119:101–119, 2017.

[149] Kevin K Kim, Dean Sheppard, and Harold A Chapman. Tgf-β1 sig-naling and tissue fibrosis. Cold Spring Harbor Perspectives in Biology,10(4):a022293, 2018.

[150] Yun Deng, Quan Wan, and Wangxiang Yan. Integrin α5/itga5 promotesthe proliferation, migration, invasion and progression of oral squamous car-cinoma by epithelial–mesenchymal transition. Cancer Management and Re-search, 11:9609, 2019.

[151] Erkki Ruoslahti. Rgd and other recognition sequences for integrins. AnnualReview of Cell and Developmental Biology, 12(1):697–715, 1996.

[152] Jian-Ping Xiong, Thilo Stehle, Rongguang Zhang, Andrzej Joachimiak,Matthias Frech, Simon L Goodman, and M Amin Arnaout. Crystal structureof the extracellular segment of integrin αvβ3 in complex with an arg-gly-aspligand. Science, 296(5565):151–155, 2002.

159

[153] Jieqing Zhu, Jianghai Zhu, and Timothy A Springer. Complete integrinheadpiece opening in eight steps. Journal of Cell Biology, 201(7):1053–1068,2013.

[154] Eileen Puklin-Faucher, Mu Gao, Klaus Schulten, and Viola Vogel. How theheadpiece hinge angle is opened: new insights into the dynamics of integrinactivation. Journal of Cell Biology, 175(2):349–360, 2006.

[155] Richard O Hynes. Integrins: bidirectional, allosteric signaling machines.Cell, 110(6):673–687, 2002.

[156] Maria Laura Duque Lasio and Beth A Kozel. Elastin-driven genetic diseases.Matrix Biology, 71:144–160, 2018.

[157] Bert Callewaert, Marjolijn Renard, Vishwanathan Hucthagowder, Beate Al-brecht, Ingrid Hausser, Edward Blair, Cristina Dias, Alice Albino, HiroshiWachi, Fumiaki Sato, et al. New insights into the pathogenesis of autosomal-dominant cutis laxa with report of five eln mutations. Human Mutation,32(4):445–455, 2011.

[158] Mayada Tassabehji, Kay Metcalfe, Jane Hurst, Gillian S Ashcroft, CayKielty, Carrie Wilmot, Dian Donnai, Andrew P Read, and Carolyn JPJones. An elastin gene mutation producing abnormal tropoelastin and ab-normal elastic fibres in a patient with autosomal dominant cutis laxa. Humanmolecular genetics, 7(6):1021–1028, 1998.

[159] Hideki Sugitani, Eiichi Hirano, Russell H Knutsen, Adrian Shifren, Jessica EWagenseil, Christopher Ciliberto, Beth A Kozel, Zsolt Urban, Elaine CDavis, Thomas J Broekelmann, et al. Alternative splicing and tissue-specific elastin misassembly act as biological modifiers of human elastin geneframeshift mutations associated with dominant cutis laxa. Journal of Bio-logical Chemistry, 287(26):22055–22067, 2012.

[160] Beth A Kozel, Chi-Ting Su, Joshua R Danback, Ryan L Minster, SuneetaMadan-Khetarpal, Juliann McConnell, Meghan K Mac Neal, Kara L Levine,Robert C Wilson, Frank C Sciurba, et al. Biomechanical properties of theskin in cutis laxa. The Journal of Investigative Dermatology, 134(11):2836,2014.

[161] Austin J Cocciolone, Jie Z Hawes, Marius C Staiculescu, Elizabeth O John-son, Monzur Murshed, and Jessica E Wagenseil. Elastin, arterial mechanics,and cardiovascular disease. American Journal of Physiology-Heart and Cir-culatory Physiology, 315(2):H189–H205, 2018.

[162] Andrew K Baldwin, Andreja Simpson, Ruth Steer, Stuart A Cain, andCay M Kielty. Elastic fibres in health and disease. Expert Reviews in Molec-ular Medicine, 15, 2013.

[163] Amanda K Ewart, Weishan Jin, Donald Atkinson, Colleen A Morris, andMark T Keating. Supravalvular aortic stenosis associated with a deletion dis-rupting the elastin gene. The Journal of Clinical Investigation, 93(3):1071–1077, 1994.

160

[164] Timothy M Olson, Virginia V Michels, Zsolt Urban, Katalin Cslszar, An-gela M Christiano, David J Driscoll, Robert H Feldt, Charles D Boyd, andStephen N Thibodeau. A 30 kb deletion within the elastin gene results in fa-milial supravalvular aortic stenosis. Human Molecular Genetics, 4(9):1677–1679, 1995.

[165] Dean Y Li, Amanda E Toland, Beth B Boak, Donald L Atkinson, Gregory JEnsing, Colleen A Morris, and Mark T Keating. Elastin point mutationscause an obstructive vascular disease, supravalvular aortic stenosis. HumanMolecular Genetics, 6(7):1021–1028, 1997.

[166] Seonmin Park, Eul-Ju Seo, Han-Wook Yoo, and Youngho Kim. Novel mu-tations in the human elastin gene (eln) causing isolated supravalvular aorticstenosis. International Journal of Molecular Medicine, 18(2):329–332, 2006.

[167] Hiroshi Wachi, Fumiaki Sato, Junji Nakazawa, Risa Nonaka, Zoltan Szabo,Zsolt Urban, Takuo Yasunaga, Iori Maeda, Koji Okamoto, Barry C Starcher,et al. Domains 16 and 17 of tropoelastin in elastic fibre formation. Biochem-ical Journal, 402(1):63–70, 2007.

[168] Zena Indik, William R Abrams, Umberto Kucich, Carolyn W Gibson,Robert P Mecham, and Joel Rosenbloom. Production of recombinant hu-man tropoelastin: characterization and demonstration of immunologic andchemotactic activity. Archives of biochemistry and biophysics, 280(1):80–86,1990.

[169] Stephen L Martin, Bernadette Vrhovski, and Anthony S Weiss. Total synthe-sis and expression in escherichia coli of a gene encoding human tropoelastin.Gene, 154(2):159–166, 1995.

[170] Suzanne M Mithieux, Behnaz Aghaei-Ghareh-Bolagh, Leping Yan, Kekini VKuppan, Yiwei Wang, Francia Garces-Suarez, Zhe Li, Peter K Maitz, Eliz-abeth A Carter, Christina Limantoro, et al. Tropoelastin implants that ac-celerate wound repair. Advanced Healthcare Materials, 7(10):1701206, 2018.

[171] Nasim Annabi, Suzanne M Mithieux, Pinar Zorlutuna, Gulden Camci-Unal,Anthony S Weiss, and Ali Khademhosseini. Engineered cell-laden humanprotein-based elastomer. Biomaterials, 34(22):5496–5505, 2013.

[172] Nasim Annabi, Devyesh Rana, Ehsan Shirzaei Sani, Roberto Portillo-Lara,Jessie L Gifford, Mohammad M Fares, Suzanne M Mithieux, and Anthony SWeiss. Engineering a sprayable and elastic hydrogel adhesive with antimi-crobial properties for wound healing. Biomaterials, 139:229–243, 2017.

[173] Richard Wang, Jazmin Ozsvar, Behnaz Aghaei-Ghareh-Bolagh, Matti AHiob, Suzanne M Mithieux, and Anthony S Weiss. Freestanding hierar-chical vascular structures engineered from ice. Biomaterials, 192:334–345,2019.

[174] Nasim Annabi, Ali Fathi, Suzanne M Mithieux, Penny Martens, Anthony SWeiss, and Fariba Dehghani. The effect of elastin on chondrocyte adhe-sion and proliferation on poly-caprolactone/elastin composites. Biomateri-als, 32(6):1517–1525, 2011.

161

[175] Jelena Rnjak-Kovacina, Steven G Wise, Zhe Li, Peter KM Maitz, Cara JYoung, Yiwei Wang, and Anthony S Weiss. Electrospun synthetic humanelastin: collagen composite scaffolds for dermal tissue engineering. ActaBiomaterialia, 8(10):3714–3722, 2012.

[176] Behnaz Aghaei-Ghareh-Bolagh, Juan Guan, Yiwei Wang, Adam D Martin,Rebecca Dawson, Suzanne M Mithieux, and Anthony S Weiss. Opticallyrobust, highly permeable and elastic protein films that support dual corneacell types. Biomaterials, 188:50–62, 2019.

[177] Behnaz Aghaei-Ghareh-Bolagh, Suzanne M Mithieux, Matti A Hiob, YiweiWang, Avelyn Chong, and Anthony S Weiss. Fabricated tropoelastin-silkyarns and woven textiles for diverse tissue engineering applications. ActaBiomaterialia, 91:112–122, 2019.

[178] Yiwei Wang, Suzanne M Mithieux, Yvonne Kong, Xue-Qing Wang, Cas-sandra Chong, Ali Fathi, Fariba Dehghani, Eleni Panas, John Kemnitzer,Robert Daniels, et al. Tropoelastin incorporation into a dermal regenera-tion template promotes wound angiogenesis. Advanced Healthcare Materials,4(4):577–584, 2015.

[179] Suzanne M Mithieux and Anthony S Weiss. Design of an elastin-layereddermal regeneration template. Acta Biomaterialia, 52:33–40, 2017.

[180] Xiao Hu, Xiuli Wang, Jelena Rnjak, Anthony S Weiss, and David L Kaplan.Biomaterials derived from silk–tropoelastin protein systems. Biomaterials,31(32):8121–8131, 2010.

[181] Xiao Hu, Sang-Hyug Park, Eun Seok Gil, Xiao-Xia Xia, Anthony S Weiss,and David L Kaplan. The influence of elasticity and surface roughness onmyogenic and osteogenic-differentiation of cells on silk-elastin biomaterials.Biomaterials, 32(34):8979–8989, 2011.

[182] James D White, Siran Wang, Anthony S Weiss, and David L Kaplan. Silk–tropoelastin protein films for nerve guidance. Acta Biomaterialia, 14:1–10,2015.

[183] Shira Landau, Ariel A Szklanny, Giselle C Yeo, Yulia Shandalov, ElenaKosobrodova, Anthony S Weiss, and Shulamit Levenberg. Tropoelastincoated plla-plga scaffolds promote vascular network formation. Biomate-rials, 122:72–82, 2017.

[184] Matti A Hiob, Steven G Wise, Alexei Kondyurin, Anna Waterhouse,Marcela M Bilek, Martin K Ng, and Anthony S Weiss. The use of plasma-activated covalent attachment of early domains of tropoelastin to enhancevascular compatibility of surfaces. Biomaterials, 34(31):7584–7591, 2013.

[185] Edgar A Wakelin, Giselle C Yeo, David R McKenzie, Marcela MM Bilek, andAnthony S Weiss. Plasma ion implantation enabled bio-functionalization ofpeek improves osteoblastic activity. APL Bioengineering, 2(2):026109, 2018.

[186] James C Phillips, Rosemary Braun, Wei Wang, James Gumbart,Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D Skeel,

162

Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics withnamd. Journal of Computational Chemistry, 26(16):1781–1802, 2005.

[187] David Van Der Spoel, Erik Lindahl, Berk Hess, Gerrit Groenhof, Alan EMark, and Herman JC Berendsen. Gromacs: fast, flexible, and free. Journalof Computational Chemistry, 26(16):1701–1718, 2005.

[188] Bernard R Brooks, Robert E Bruccoleri, Barry D Olafson, David J States,S a Swaminathan, and Martin Karplus. Charmm: a program for macro-molecular energy, minimization, and dynamics calculations. Journal of Com-putational Chemistry, 4(2):187–217, 1983.

[189] David A Case, Thomas E Cheatham III, Tom Darden, Holger Gohlke, RayLuo, Kenneth M Merz Jr, Alexey Onufriev, Carlos Simmerling, Bing Wang,and Robert J Woods. The amber biomolecular simulation programs. Journalof Computational Chemistry, 26(16):1668–1688, 2005.

[190] Thomas S Hofer. From macromolecules to electrons—grand challenges intheoretical and computational chemistry. Frontiers in Chemistry, 1:6, 2013.

[191] Ron O Dror, Robert M Dirks, JP Grossman, Huafeng Xu, and David EShaw. Biomolecular simulation: a computational microscope for molecularbiology. Annual Review of Biophysics, 41:429–452, 2012.

[192] Jean-Michel Combes, Pierre Duclos, and Ruedi Seiler. The born-oppenheimer approximation. In Rigorous atomic and molecular physics,pages 185–213. Springer, 1981.

[193] Michael P Allen and Dominic J Tildesley. Computer Simulation of Liquids.Oxford University Press, Michael P Allen and Dominic J Tildesley, 100-110,2017.

[194] Hans C Andersen. Rattle: A “velocity” version of the shake algorithmfor molecular dynamics calculations. Journal of Computational Physics,52(1):24–34, 1983.

[195] Evelyn Mayaan, Adam Moser, Alexander D MacKerell Jr, and Darrin MYork. Charmm force field parameters for simulation of reactive intermedi-ates in native and thio-substituted ribozymes. Journal of ComputationalChemistry, 28(2):495–507, 2007.

[196] Alex D MacKerell Jr, Donald Bashford, MLDR Bellott, Roland Leslie Dun-brack Jr, Jeffrey D Evanseck, Martin J Field, Stefan Fischer, Jiali Gao,H Guo, Sookhee Ha, et al. All-atom empirical potential for molecular mod-eling and dynamics studies of proteins. The Journal of Physical ChemistryB, 102(18):3586–3616, 1998.

[197] Wilfred F van Gunsteren, SR Billeter, AA Eising, Philippe H Hunenberger,PKHC Kruger, Alan E Mark, WRP Scott, and Ilario G Tironi. Biomolecularsimulation: the gromos96 manual and user guide. 1996.

[198] William L Jorgensen, David S Maxwell, and Julian Tirado-Rives. Develop-ment and testing of the opls all-atom force field on conformational energetics

163

and properties of organic liquids. Journal of the American Chemical Society,118(45):11225–11236, 1996.

[199] Wendy D Cornell, Piotr Cieplak, Christopher I Bayly, Ian R Gould, Ken-neth M Merz, David M Ferguson, David C Spellmeyer, Thomas Fox,James W Caldwell, and Peter A Kollman. A second generation force fieldfor the simulation of proteins, nucleic acids, and organic molecules. Journalof the American Chemical Society, 117(19):5179–5197, 1995.

[200] Qiang Shi, Sergei Izvekov, and Gregory A Voth. Mixed atomistic and coarse-grained molecular dynamics: simulation of a membrane-bound ion channel.The Journal of Physical Chemistry B, 110(31):15045–15048, 2006.

[201] Charles L Brooks III and Martin Karplus. Solvent effects on protein motionand protein effects on solvent motion: dynamics of the active site region oflysozyme. Journal of Molecular Biology, 208(1):159–181, 1989.

[202] Anna Rita Bizzarri and Salvatore Cannistraro. Molecular dynamics of waterat the protein- solvent interface, 2002.

[203] Thomas Simonson. Macromolecular electrostatics: continuum models andtheir growing pains. Current Opinion in Structural Biology, 11(2):243–252,2001.

[204] Alexey V Onufriev and David A Case. Generalized born implicit solventmodels for biomolecules. Annual Review of Biophysics, 48:275–296, 2019.

[205] Donald Bashford and David A Case. Generalized born models of macro-molecular solvation effects. Annual review of physical chemistry, 51(1):129–152, 2000.

[206] Xiaohui Wang, Boming Deng, and Zhaoxi Sun. Thermodynamics of helixformation in small peptides of varying length in vacuo, in implicit solvent,and in explicit solvent. Journal of Molecular Modeling, 25(1):3, 2019.

[207] Ramu Anandakrishnan, Aleksander Drozdetski, Ross C Walker, andAlexey V Onufriev. Speed of conformational change: comparing explicitand implicit solvent molecular dynamics simulations. Biophysical Journal,108(5):1153–1164, 2015.

[208] Thomas H Rod, Patrik Rydberg, and Ulf Ryde. Implicit versus explicitsolvent in free energy calculations of enzyme catalysis: Methyl transfer cat-alyzed by catechol o-methyltransferase. The Journal of Chemical Physics,124(17):174503, 2006.

[209] William L Jorgensen, Jayaraman Chandrasekhar, Jeffry D Madura, Roger WImpey, and Michael L Klein. Comparison of simple potential functions forsimulating liquid water. The Journal of Chemical Physics, 79(2):926–935,1983.

[210] Eyal Neria, Stefan Fischer, and Martin Karplus. Simulation of activa-tion free energies in molecular systems. The Journal of Chemical Physics,105(5):1902–1921, 1996.

164

[211] Hans W Horn, William C Swope, Jed W Pitera, Jeffry D Madura, Thomas JDick, Greg L Hura, and Teresa Head-Gordon. Development of an improvedfour-site water model for biomolecular simulations: Tip4p-ew. The Journalof Chemical Physics, 120(20):9665–9678, 2004.

[212] Kota Kasahara, Shun Sakuraba, and Ikuo Fukuda. Enhanced samplingof molecular dynamics simulations of a polyalanine octapeptide: Effects ofthe periodic boundary conditions on peptide conformation. The Journal ofPhysical Chemistry B, 122(9):2495–2503, 2018. PMID: 29439570.

[213] Alessandro Laio and Michele Parrinello. Escaping free-energy minima. Pro-ceedings of the National Academy of Sciences, 99(20):12562–12566, 2002.

[214] Yuji Sugita and Yuko Okamoto. Replica-exchange molecular dynamicsmethod for protein folding. Chemical Physics Letters, 314(1-2):141–151,1999.

[215] Yuji Sugita, Motoshi Kamiya, Hiraku Oshima, and Suyong Re. Replica-exchange methods for biomolecular simulations. In Biomolecular Simula-tions, pages 155–177. Springer, 2019.

[216] Daniel Sindhikara, Yilin Meng, and Adrian E Roitberg. Exchange frequencyin replica exchange molecular dynamics. The Journal of Chemical Physics,128(2):01B609, 2008.

[217] Ahmet Bakan and Ivet Bahar. Computational generation inhibitor-boundconformers of p38 map kinase and comparison with experiments. In Bio-computing 2011, pages 181–192. World Scientific, 2011.

[218] Ivet Bahar, Timothy R Lezon, Ahmet Bakan, and Indira H Shrivastava.Normal mode analysis of biomolecular structures: functional mechanisms ofmembrane proteins. Chemical Reviews, 110(3):1463–1497, 2010.

[219] Dana Reichmann, Ofer Rahat, Mati Cohen, Hani Neuvirth, and GideonSchreiber. The molecular architecture of protein–protein binding sites. Cur-rent Opinion in Structural Biology, 17(1):67–76, 2007.

[220] Carlos J Camacho and Sandor Vajda. Protein–protein association kineticsand protein docking. Current Opinion in Structural Biology, 12(1):36–40,2002.

[221] Graham R Smith and Michael JE Sternberg. Prediction of protein–proteininteractions by docking methods. Current Opinion in Structural Biology,12(1):28–35, 2002.

[222] Cyril Dominguez, Rolf Boelens, and Alexandre MJJ Bonvin. Haddock: aprotein- protein docking approach based on biochemical or biophysical in-formation. Journal of the American Chemical Society, 125(7):1731–1737,2003.

[223] Claire C Hsu, Markus J Buehler, and Anna Tarakanova. the order-disordercontinuum: Linking predictions of protein structure and disorder throughmolecular simulation. Scientific Reports, 10(1):1–14, 2020.

165

[224] Maxwell W Libbrecht and William Stafford Noble. Machine learning appli-cations in genetics and genomics. Nature Reviews Genetics, 16(6):321–332,2015.

[225] Bradley J Erickson, Panagiotis Korfiatis, Zeynettin Akkus, and Timothy LKline. Machine learning for medical imaging. Radiographics, 37(2):505–515,2017.

[226] Nicholas A Saunders and Michael E Grant. Elastin biosynthesis in chick-embryo arteries. studies on the intracellular site of synthesis of tropoelastin.Biochemical Journal, 221(2):393–400, 1984.

[227] I Pasquali-Ronchetti, M Baccarani-Contri, C Fornieri, G Mori, andD Quaglino Jr. Structure and composition of the elastin fibre in normaland pathological conditions. Micron, 24(1):75–89, 1993.

[228] Herbert M Kagan and Kathleen A Sullivan. [35] lysyl oxidase: Preparationand role in elastin biosynthesis. In Methods in Enzymology, volume 82, pages637–650. Elsevier, 1982.

[229] Fumiaki Sato, Hiroshi Wachi, Marie Ishida, Risa Nonaka, Satoshi Onoue,Zsolt Urban, Barry C Starcher, and Yoshiyuki Seyama. Distinct steps ofcross-linking, self-association, and maturation of tropoelastin are necessaryfor elastic fiber formation. Journal of Molecular Biology, 369(3):841–851,2007.

[230] Robert C Siegel, Sheldon R Pinnell, and George R Martin. Cross-linking ofcollagen and elastin. properties of lysyl oxidase. Biochemistry, 9(23):4486–4492, 1970.

[231] Sheldon R Pinnell and George R Martin. The cross-linking of collagenand elastin: enzymatic conversion of lysine in peptide linkage to alpha-aminoadipic-delta-semialdehyde (allysine) by an extract from bone. Proceed-ings of the National Academy of Sciences of the United States of America,61(2):708–716, 1968.

[232] SM Partridge, DF Elsden, J Thomas, A Dorfman, A Telser, and Pei-LeeHo. Biosynthesis of the desmosine and isodesmosine cross-bridges in elastin.Biochemical Journal, 93(3):30–33, 1964.

[233] Giselle C Yeo, Clair Baldock, Anne Tuukkanen, Manfred Roessle, Leanne BDyksterhuis, Steven G Wise, Jacqueline Matthews, Suzanne M Mithieux,and Anthony S Weiss. Tropoelastin bridge region positions the cell-interactive c terminus and contributes to elastic fiber assembly. Proceedingsof the National Academy of Sciences, 109(8):2878–2883, 2012.

[234] Suzanne M Mithieux, Yidong Tu, Emine Korkmaz, Filip Braet, and An-thony S Weiss. In situ polymerization of tropoelastin in the absence ofchemical cross-linking. Biomaterials, 30(4):431–435, 2009.

[235] Andrea Heinz, Christoph U Schrader, Stephanie Baud, Fred W Keeley,Suzanne M Mithieux, Anthony S Weiss, Reinhard HH Neubert, and Chris-

166

tian EH Schmelzer. Molecular-level characterization of elastin-like constructsand human aortic elastin. Matrix Biology, 38:12–21, 2014.

[236] Kenno Vanommeslaeghe, Elizabeth Hatcher, Chayan Acharya, SibsankarKundu, Shijun Zhong, Jihyun Shim, Eva Darian, Olgun Guvench, P Lopes,Igor Vorobyov, et al. Charmm general force field: A force field for drug-likemolecules compatible with the charmm all-atom additive biological forcefields. Journal of Computational Chemistry, 31(4):671–690, 2010.

[237] William Humphrey, Andrew Dalke, Klaus Schulten, et al. Vmd: visualmolecular dynamics. Journal of Molecular Graphics, 14(1):33–38, 1996.

[238] Michael Feig, John Karanicolas, and Charles L Brooks III. Mmtsb toolset: enhanced sampling and multiscale modeling methods for applications instructural biology. Journal of Molecular Graphics and Modelling, 22(5):377–395, 2004.

[239] Suzanne M Mithieux, Steven G Wise, and Anthony S Weiss. Tropoelastin—amultifaceted naturally smart material. Advanced Drug Delivery Reviews,65(4):421–428, 2013.

[240] Anna Tarakanova, Wenwen Huang, Anthony S Weiss, David L Kaplan, andMarkus J Buehler. Computational smart polymer design based on elastinprotein mutability. Biomaterials, 127:49–60, 2017.

[241] Zsolt Urbßn, Jun Zhang, Elaine C Davis, Gregg K Maeda, Anil Kumar,Heather Stalker, John W Belmont, Charles D Boyd, and Margaret R Wal-lace. Supravalvular aortic stenosis: genetic and molecular dissection of acomplex mutation in the elastin gene. Human Genetics, 109(5):512–520,2001.

[242] Leanne B Dyksterhuis and Anthony S Weiss. Homology models for domains21–23 of human tropoelastin shed light on lysine crosslinking. Biochemicaland Biophysical Research Communications, 396(4):870–873, 2010.

[243] Bernadette Vrhovski, Sacha Jensen, and Anthony S Weiss. Coacervationcharacteristics of recombinant human tropoelastin. European Journal ofBiochemistry, 250(1):92–98, 1997.

[244] Yushi Bai, Quan Luo, and Junqiu Liu. Protein self-assembly via supramolec-ular strategies. Chemical Society Reviews, 45(10):2756–2767, 2016.

[245] Helen Piontkivska, Yi Zhang, Eric D Green, Laura Elnitski, et al. Multi-species sequence comparison reveals dynamic evolution of the elastin genethat has involved purifying selection and lineage-specific insertions/dele-tions. BMC Genomics, 5(1):31, 2004.

[246] Max Kuhn. Building predictive models in r using the caret package. Journalof Statistical Software, 28(5):1–26, 2008.

[247] Jazmin Ozsvar, Anna Tarakanova, Richard Wang, Markus J Buehler, andAnthony S Weiss. Allysine modifications perturb tropoelastin structure andmobility on a local and global scale. Matrix Biology Plus, 2:100002, 2019.

167

[248] Jessica F Almine, Daniel V Bax, Suzanne M Mithieux, Lisa Nivison-Smith,Jelena Rnjak, Anna Waterhouse, Steven G Wise, and Anthony S Weiss.Elastin-based materials. Chemical Society Reviews, 39(9):3371–3379, 2010.

[249] Ursula R Rodgers and Anthony S Weiss. Cellular interactions with elastin.Pathologie Biologie, 53(7):390–398, 2005.

[250] Stephan Huveneers, Hoa Truong, and Erik HJ Danen. Integrins: signaling,disease, and therapy. International Journal of Radiation Biology, 83(11-12):743–751, 2007.

[251] Junichi Takagi, Benjamin M Petre, Thomas Walz, and Timothy A Springer.Global conformational rearrangements in integrin extracellular domains inoutside-in and inside-out signaling. Cell, 110(5):599–611, 2002.

[252] A Paul Mould, Emlyn JH Symonds, Patrick A Buckley, J Gunter Gross-mann, Paul A McEwan, Stephanie J Barton, Janet A Askari, Susan E Craig,Jordi Bella, and Martin J Humphries. Structure of an integrin-ligand com-plex deduced from solution x-ray scattering and site-directed mutagenesis.Journal of Biological Chemistry, 278(41):39993–39999, 2003.

[253] Junichi Takagi, Konstantin Strokovich, Timothy A Springer, and ThomasWalz. Structure of integrin α5β1 in complex with fibronectin. The EMBOJournal, 22(18):4607–4615, 2003.

[254] Gregory C Sephel and Jeffrey M Davidson. Elastin production in humanskin fibroblast cultures and its decline with age. Journal of InvestigativeDermatology, 86(3):279–285, 1986.

[255] Robert P Mecham. Elastin synthesis and fiber assembly a. Annals of theNew York Academy of Sciences, 624(1):137–146, 1991.

[256] Caroline H Damsky and Zena Werb. Signal transduction by integrin recep-tors for extracellular matrix: cooperative processing of extracellular infor-mation. Current Opinion in Cell Biology, 4(5):772–781, 1992.

[257] Steven M Frisch and Erkki Ruoslahti. Integrins and anoikis. Current Opinionin Cell Biology, 9(5):701–706, 1997.

[258] Eileen Puklin-Faucher and Viola Vogel. Integrin activation dynamics be-tween the rgd-binding site and the headpiece hinge. Journal of BiologicalChemistry, 284(52):36557–36568, 2009.

[259] Lingyun Wang, Di Pan, Qi Yan, and Yuhua Song. Activation mechanismsof αvβ3 integrin by binding to fibronectin: a computational study. ProteinScience, 26(6):1124–1137, 2017.

[260] Antonella Paladino, Monica Civera, Flavio Curnis, Mayra Paolillo, Ce-sare Gennari, Umberto Piarulli, Angelo Corti, Laura Belvisi, and GiorgioColombo. The importance of detail: How differences in ligand structuresdetermine distinct functional responses in integrin αvβ3. Chemistry–A Eu-ropean Journal, 25(23):5959–5970, 2019.

168

[261] Dror Yahalom, Angela Wittelsberger, Dale F Mierke, Michael Rosenblatt,Joseph M Alexander, and Michael Chorev. Identification of the principalbinding site for rgd-containing ligands in the αvβ3 integrin: a photoaffinitycross-linking study. Biochemistry, 41(26):8321–8331, 2002.

[262] A Paul Mould, Steven K Akiyama, and Martin J Humphries. Regulationof integrin α5β1-fibronectin interactions by divalent cations evidence fordistinct classes of binding sites for mn2+, mg2+, and ca2+. Journal ofBiological Chemistry, 270(44):26270–26277, 1995.

[263] Zaira Martın-Moldes, Davoud Ebrahimi, Robyn Plowright, Nina Dinjaski,Carole C Perry, Markus J Buehler, and David L Kaplan. Intracellularpathways involved in bone regeneration triggered by recombinant silk–silicachimeras. Advanced Functional Materials, 28(27):1702570, 2018.

[264] Srinivasan Jayashree, Pavalam Murugavel, Ramanathan Sowdhamini, andNarayanaswamy Srinivasan. Interface residues of transient protein-proteincomplexes have extensive intra-protein interactions apart from inter-proteininteractions. Biol. Direct, 14(1):1, 2019.

[265] Joao PGLM Rodrigues, Mikael Trellet, Christophe Schmitz, Panagiotis Kas-tritis, Ezgi Karaca, Adrien SJ Melquiond, and Alexandre MJJ Bonvin.Clustering biomolecular complexes by residue contacts similarity. Proteins:Structure, Function, and Bioinformatics, 80(7):1810–1817, 2012.

[266] Hironao Yamada, Sakiko Mori, Takeshi Miyakawa, Ryota Morikawa, Fumi-hiko Katagiri, Kentaro Hozumi, Yamato Kikkawa, Motoyoshi Nomizu, andMasako Takasu. Structural study of cell attachment peptide derived fromlaminin by molecular dynamics simulation. PloS One, 11(2):e0149474, 2016.

[267] Barry J Grant, Ana PC Rodrigues, Karim M ElSawy, J Andrew McCammon,and Leo SD Caves. Bio3d: an r package for the comparative analysis ofprotein structures. Bioinformatics, 22(21):2695–2696, 2006.

[268] Tsan Xiao, Junichi Takagi, Barry S Coller, Jia-Huai Wang, and Timothy ASpringer. Structural basis for allostery in integrins and binding to fibrinogen-mimetic therapeutics. Nature, 432(7013):59–67, 2004.

[269] Marco Vassura, Pietro Di Lena, Luciano Margara, Maria Mirto, GiovanniAloisio, Piero Fariselli, and Rita Casadio. Blurring contact maps of thou-sands of proteins: what we can learn by reconstructing 3d structure. BioDataMining, 4(1):1, 2011.

[270] Vishal C Nashine, Sharon Hammes-Schiffer, and Stephen J Benkovic. Cou-pled motions in enzyme catalysis. Current Opinion in Chemical Biology,14(5):644–651, 2010.

[271] Ranjani K Paradise, Douglas A Lauffenburger, and Krystyn J Van Vliet.Acidic extracellular ph promotes activation of integrin αvβ3. PloS One,6(1), 2011.

169

[272] Rashmi Sharma, Zsolt Raduly, Marton Miskei, and Monika Fuxreiter. Fuzzycomplexes: Specific binding without complete folding. FEBS Letters,589(19):2533–2542, 2015.

[273] Peter Tompa and Monika Fuxreiter. Fuzzy complexes: polymorphism andstructural disorder in protein–protein interactions. Trends in BiochemicalSciences, 33(1):2–8, 2008.

[274] Davoud Mozhdehi, Kelli M Luginbuhl, Joseph R Simon, Michael Dzuricky,Rudiger Berger, H Samet Varol, Fred C Huang, Kristen L Buehne,Nicholas R Mayne, Isaac Weitzhandler, et al. Genetically encoded lipid–polypeptide hybrid biomaterials that exhibit temperature-triggered hierar-chical self-assembly. Nature Chemistry, 10(5):496–505, 2018.

[275] Bing-Hao Luo, Christopher V Carman, and Timothy A Springer. Structuralbasis of integrin regulation and signaling. Annual Reviews in Immunology,25:619–647, 2007.

[276] Monika Fuxreiter, Istvan Simon, and Sarah Bondos. Dynamic protein–dnarecognition: beyond what can be seen. Trends in Biochemical Sciences,36(8):415–423, 2011.

[277] Bankala Krishnarjuna, Toshihiko Sugiki, Rodrigo AV Morales, Jeffrey Seow,Toshimichi Fujiwara, Karyn L Wilde, Raymond S Norton, and Christo-pher A MacRaild. Transient antibody-antigen interactions mediate thestrain-specific recognition of a conserved malaria epitope. Communicationsbiology, 1(1):1–10, 2018.

[278] Isabella L Karle and Dan W Urry. Crystal structure of cyclic (apgvgv)-2, ananalog of elastin, and a suggested mechanism for elongation/contraction ofthe molecule. Biopolymers: Original Research on Biomolecules, 77(4):198–204, 2005.

[279] Dror Tobi and Ivet Bahar. Structural changes involved in protein bindingcorrelate with intrinsic motions of proteins in the unbound state. Proceedingsof the National Academy of Sciences, 102(52):18908–18913, 2005.

[280] Shin-ichi Aota, Motoyoshi Nomizu, and Kenneth M Yamada. The shortamino acid sequence pro-his-ser-arg-asn in human fibronectin enhances cell-adhesive function. Journal of Biological Chemistry, 269(40):24756–24761,1994.

[281] Diwakar Chada, Timothy Mather, and Matthias U Nollert. The synergysite of fibronectin is required for strong interaction with the platelet integrinαiibβ3. Annals of Biomedical Engineering, 34(10):1542–1552, 2006.

170

Appendix A

Code and scripts

Appendix A includes a collection of scripts used throughout this dissertation. Inorder to improve readability, code snippets were used where possible to commu-nicate the core function of the program.

171

A.1 Code for implementing machine learning

The following R packages were used for carrying out machine learning in Chapter3.

1 library(caret)

2 library(glmnet)

3 library(mltools)

4 library(standardize)

5 library(pROC)

6 library(caTools)

7 library(Hmisc)

8 library(CaretMisc)

The following R code snippet was used to set up the train control for machinelearning in Chapter 3.

1 # Set up 5 repeats of 10-fold CV

23 train_control_repeatedcv <- trainControl(

4 method = "repeatedcv",

5 number = 10, # k = 10

6 repeats = 5, # repeat 5 times ,

7 classProbs = TRUE ,

8 summaryFunction = twoClassSummary , # binary outcome

9 savePredictions = "final" # save predictions for the best

hyperparameter set

10 )

The following R code snippet was used to execute the machine learning in Chap-ter 3.

1 # Set up search grid for hyperparameters

2 hyperparameter_search <- expand.grid(

3 alpha = c(0, 0.4, 0.6, 0.8, 1),

4 lambda = c(0.1, 1, 10, 100)

5 )

67 # Set seed to make results reproducible

8 set.seed (123)

910 # Execute glmnet using caret

11 data_elastic_net <- train(

12 association_HT ~ ., # head -to-tail outcome

13 data = data_train_engineered , # training data set

14 metric = "Sens", # select sensitivity as metric

15 method = "glmnet",

16 trControl = train_control_repeatedcv ,

17 tuneGrid = hyperparameter_search

18 )

1920 # Examine the effect of the hyperparameters

21 plot(data_elastic_net)

The following R code snippet was used to evaluate the trained glmnet model inChapter 3.

172

1 # Examine the effect of the hyperparameters

2 plot(data_elastic_net)

34 # Create model list

5 allModels <- list("elastic_net" = data_elastic_net)

67 # Read in test data set

8 test_data <- readRDS("data_test_engineered.rds")

910 # Examine test/train performance

11 results <- lapply(allModels ,

12 eval_classifier ,

13 test_data = test_data) %>%

14 bind_rows(.id = "modeltype")

173

Research is what I’m doing when I don’t know what I’m doing.

174

Computational modelling of tropoelastin modifications and ...

Documents