The structure and RNA-binding of Poly (C) Binding Protein1

THE UNIVERSITY OF WESTERN AUSTRALIA

The structure and RNA-binding of Poly (C) Binding Protein1

Mahjooba Sidiqi

The School of Biomedical, Biomolecular and Chemical Science, School of Medicine and Pharmacology and Western Australian Institute for Medical Research University of Western Australia Perth Australia A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy December 2006

i

Declaration The work described in this thesis was performed between March 2003 and December

2006 in the School of Biomedical, Biomolecular and Chemical Sciences (formerly the

Department of Biochemistry) and the School of Medicine and Pharmacology at the

University of Western Australia. Some facilities at Monash University, Melbourne, were

also used. Unless otherwise stated, the experiments described were performed by the

author. This work constitutes an original body of research that has not been submitted,

either in whole or in part, for the purpose of obtaining any other degree.

Mahjooba Sidiqi

ii

Detailed statement of authorship The work presented in this thesis was conducted primarily by myself, but also involved

experiments conducted by others in the Wilce Structural Biology group. I was

responsible for all cloning procedures, expression of both labeled and unlabelled

protein, and most oligonucleotide preparation. I was responsible for all protein and

protein/oligonucleotide crystallization, optimization and screening of crystals for data

collection. In the initial stages of using the X-ray generator I was assisted by Jason

Schmidbeger with the mounting on a crystal and with collection of the data set, but

after his training I was able to set up my own experiments. Complete diffracted data

sets were interpreted by Matthew Wilce. Then, with the help of Jackie Wilce, I analysed

the structures and compared them with other similar structures. The first set of NMR

experiments was conducted by me and the second set was conducted by Corrine

Porter. I was solely responsible for all SPR experiments using Biacore and the

subsequent analysis.

Co-supervisor: Professor Peter Leedman

iii

Abstract

Regulation of mRNA stability is an important posttranscriptional mechanism involved in

the control of gene expression. The rate of mRNA decay can differ greatly from one

mRNA to another and may be regulated by RNA-protein interactions. A key

determinant of mRNA decay are sequence instability (cis) elements often located in the

3’ untranslated region (UTR) of many mRNAs. For example, the AU rich elements

(AREs), are such well characterized elements, and most commonly involved in

promoting mRNA degradation, and specific binding of proteins to these elements

leading to the stabilization of some mRNAs.

Other cis-elements have been described for mRNA in which mRNA stability is a critical

component of gene regulation. This includes the androgen receptor (AR) UC-rich cis

element in its 3’UTR. The AR is a key target for therapeutics in human prostate cancer

and thus understanding the mechanism involved in regulating its expression is an

important goal. The αCP1 protein, a KH-domain containing RNA-binding protein has

been found to bind this UC-rich region of the AR and is thought to play an important

role in regulating AR mRNA expression.

αCP1 protein is a triple KH (hnRNP K homology) domain protein with specificity for C-

rich tracts of RNA and ssDNA (single stranded DNA). Relatively little is known about

the structural interaction of αCP1 with target RNA cis elements, thus the present study

aimed to better understand the nature of interaction between 30 nt 3’UTR UC-rich AR

mRNA and αCP1 protein using various biophysical techniques, in an attempt to

determine which αCP1 domain or combination of domains is involved in RNA-binding.

These studies could ultimately provide novel targets for drugs aimed to regulate AR

mRNA expression in prostate cancer cells.

At the commencement of this study little was known about the structure of the αCP1-

KH domains and their basis for poly (C) binding specificity. Therefore, the first aim of

this study was to determine the structure of each of the isolated αCP1-KH domains,

and also in complex with target RNA or DNA sequences. Chapter 3 describes the

purification of recombinant full-length αCP1 and αCP1-KH1, KH2 and KH3 domains. In

Chapter 4, I describe the crystallographically derived structure of the αCP1-KH3

domain to 2.10 Å. The αCP1-KH3 domain adopts the classical type I KH domain fold

with a triple-stranded β-sheet held against a three-helix cluster in a βααββα

configuration. A model of αCP1-KH3 bound to poly (C) RNA was generated by

iv

homology to the Nova-2-KH3-RNA structure, providing insight into the likely mode of

poly (C) RNA binding displayed by the �αCP1-KH3 domain. Nuclear magnetic

resonance (NMR) spectroscopy was used to analyse the interaction of αCP1-KH3 with

an 11 mer RNA sequence representing a component at the 3’UTR of AR mRNA. The

results indicate that the domain is likely to be folded in its correct secondary and

tertiary structures and that the protein is fully complexed with the RNA, maintaining

good solution characteristics, with no evidence of aggregation or formation of large

complexes.

Chapter 5 describes the subsequent studies of the crystallographically derived

structure of the first domain of the αCP1 bound to a single C-rich DNA 11-mer solved

to 3.0 Å resolution. αCP1-KH1 assumes a classical Type I KH domain fold with a triple

stranded β-sheet held against a triple α-helix cluster forming a narrow hydrophobic cleft

that accommodates the oligonucleotide. Extensive hydrophobic and hydrogen bond

contacts are made with four core recognition nucleotides, including critical contacts that

form the basis for cytosine specificity. The oligonucleotide positioning is similar to the

closely related hnRNP K-KH3/DNA structure. The protein/DNA complex formed with a

2:1 stoichiometry demonstrating that KH domains may bind to immediately adjacent

oligonucleotide target sites. Additional studies examined the interaction of αCP1-KH1

with a 20 mer RNA sequence 5’- CUUUCUUUUUCUUCUUCCCU-3’, representing the

αCP1 target site in the 3’UTR of AR mRNA using NMR spectroscopy. Interestingly, this

U-rich element contains a binding site for HuR, a RNA Recognition motif (RRM)-

containing RNA-binding protein involved in the regulation of AR mRNA expression. We

were interested in characterising interactions between HuR RRM 1 and 2 and αCP1-

KH1 bound to the RNA sequence containing both the C-rich site and U-rich segment.

My studies revealed no evidence for interaction between the adjacently bound αCP1-

KH1 and HuR RRM1/2. This, however, does not preclude an interaction with the αCP1

full-length protein through either αCP1-KH2 or αCP1-KH3 domains.

An additional aim of this thesis was to characterise the kinetics and binding affinities of

αCP1-KH domains with poly (C) rich site in the 3’UTR of AR mRNA, and a number of

other RNA and DNA sequences. This work is described in Chapter 6. The kinetics and

affinity of the interactions were quantified using surface plasmon resonance (SPR)

spectroscopy. I found that isolated αCP1-KH domains prefer DNA sequence over the

RNA sequence of AR mRNA with widely ranging affinities ordered as follows: αCP1-

KH1> KH3>KH2, but this was not observed when a simple 10mer sequence with only

v

one triplet cytosine site. This study highlights that each of the individual domains can

function as a discrete and independent RNA and DNA binding unit, albeit with different

levels of binding activity. αCP1-KH1 and KH3 were found to bind homopolymer (C) but

not (A), (U) and (G) RNA, consistent with previous findings. My studies are the most

detailed analysis of αCP1-KH domain binding to date. Furthermore, the data showed

that, in contrast with previous reports, αCP1-KH2 also has the capacity to bind

oligonucleotide. Taken together, these data have enabled the generation of a model of

the full-length αCP1 molecule bound to a target oligonucleotide.

We also examined αCP1-KH domain binding to a simpler cis-element sequence

consisting of a single C-triplet site and also compared RNA and DNA binding. We

found that binding of αCP1-KH1 to the C-triplet RNA target (Kd of 48 µM) was at a

lower affinity than to the corresponding DNA sequence (Kd of 4.5 µM). In contrast,

αCP1-KH3 binds RNA (Kd of 3.2 µM) and DNA (Kd of 2.2 µM) sequence equally well.

Additional studies addressed the significance of the four core recognition nucleotides

(TCCC) using a series of cytosine to thymine mutants. The findings verified some of

the results predicted from structural studies, especially the need for maximum KH

binding to a core tetranucleotide recognition sequence. Our mutational studies of the

four core bases confirmed the importance of cytosine in positions two and three as no

binding was observed, while some binding was observed when the fourth base was

mutated.

In summary, the work presented in this thesis provides new detailed insight into the

molecular interactions between the αCP1-KH domain and AR mRNA. Furthermore,

these studies shed light on the nature of protein/mRNA interactions in general, as well

as the specific complex that forms on AR mRNA. These studies have provided new

understanding into the mode of αCP1 binding at a target oligonucleotide binding site

and, provide a foundation for future studies to define structure of

multiprotein/oligonucleotide complexes involved in AR mRNA gene regulation.

Understanding the detailed interaction between the AR mRNA and αCP1 could provide

possible targets for drug development at reducing AR expression in prostate cancer

cells by interfering with the interaction of αCP1 and AR-mRNA.

vi

Acknowledgements I would like to firstly thank my supervisor, Dr Jackie Wilce for her support,

encouragement and motivation towards my work and myself. I have enjoyed working

with her and I truly admire her for her strong personality and yet calm nature, which has

helped me all throughout my PhD, especially those stressful times. I wish her always

happiness and my best regards. I would also like to thank my co-supervisor

A/Professor Matthew Wilce. I greatly appreciate his time and knowledge that he has

offered me.

I will always remain indebted to my co-supervisor Professor Peter Leedman, who kindly

accepted me in his lab towards the last two years of my PhD. Without his lab, his

encouragement, support and guidance, I don’t think I would have been able to

successfully complete my PhD. I greatly appreciated his time dedicated towards my

project and for encouraging my ambition of pursuing science and a medical degree.

I don’t think I would have survived PhD without the help and support of all the members

of the Structural Biology group. They have been absolutely great and wonderful to me.

I have enjoyed working with them and I sincerely appreciate their friendship, knowledge

and wisdom that each one offered me in their own unique way. Furthermore, I don’t

think I would have coped well with the change of labs, if it were not for the great,

extremely friendly and wonderful team in the Leedman lab. Each member of the lab

made me feel at home and assisted me greatly. I cannot thank them enough. A special

thanks goes to Christen Down and Esme Hatchell, whose friendship and support over

the last year of my PhD was invaluable and will not be forgotten.

I would like to thank Dr Lindsay Byrne and Dr Corrine Porter who provided technical

expertise and advice in all areas involving NMR. A special thank you to Jason

Schmidberger for maintaining the X-ray generator. I would also like to thank the

Biacore team at WAIMR and Rick Filonzi, for their help and advice on using the

machine and data analysis. I would like to thank Ke Nguyen and Richard Claudius for

providing me with a computer and an office for writing my thesis. In addition, I thank Dr

Ranjna Kapoor for proof reading my thesis.

There are many other people in Pharmacology and WAIMR who provided assistance

and support over the past four years, their help is gratefully acknowledged.

vii

I must also thank my dearest friends. Without them I would have probably be driven to

insanity. So a special thank you goes to Safia Al-Saeedy for her constant motivation

especially during the writing of my thesis and her concern for my well being; to Susan

Lo for listening to my complaints, for her lifts to the station and many other places and

for just being Susan; to Madhu Sharp for her help in protein concentration

determination, for her beautiful lunches and for the jokes and the laughter that she

shared with me. In addition, I like to thank all my other friends both at work and outside

work and they know who they are.

Last but not least, my parents, sister and my brothers, thank you for your constant love,

encouragement and understanding. Without it, the last four years would have been so

much harder. I hope to be able to help and care for you all in the future.

Above all I would like to thank God for His help, blessing, love and for granting me this

great opportunity to seek knowledge and to develop intellectually and spiritually.

viii

Table of Contents Declaration i Detailed Statement of authorship ii Abstract iii Acknowledgements vi Table of Contents viii Abbreviations xiii List of tables and figures xvi Three letter and one letter code for the common amino acids xxii Publications and conference presentations xxiii Chapter 1: General Introduction 1 1.1 Levels Of Gene Regulation 1 1.2 Regulation Of Transcription In Eukaryotes 2

1.3 Pre-mRNA processing reactions: Capping, editing, splicing, 3’end processing 3

1.4 Translation 5 1.5 mRNA Decay 7 1.5.1 mRNA half-life 7 1.5.2 Eukaryotic mRNA Decay Pathways 7 1.5.3 mRNA Stability 10 1.5.4 Adenosine Uridine (AU)-rich elements (AREs) 10 1.5.5 Non-ARE cis-elements and their binding protein 12 1.6 Role of RNA-Protein interaction 13 1.6.1 RNA Binding Motifs 13 1.6.2 The RRM (RNA Recognition Motif) motif 13 1.6.3 The Arginine rich motif (ARM) 17 1.6.4 Double stranded RNA binding domain 18 1.6.5 KH Motif 20 1.7 αCP proteins 21 1.7.1 Protein-Protein interaction 25 1.7.2 αCP KH motifs-synergy 26 1.7.3 KH Structure 29 1.7.4 KH and oligonucleotide interaction 30 1.7.5 KH containing protein and disease 34 1.8 Androgen receptor and prostate cancer 35

1.8.1 Androgen receptor mRNA stability and RNA binding proteins 37

1.9 Summary and Research aims 39 1.9.1 Hypotheses that formed the basis of this study 39 Chapter 2: Materials and Methods 41 2.1 Molecular Biology 41 2.1.1 Materials 41 2.1.2 Buffers and solutions 43

ix

2.1.3 Culture media 46 2.2 Cloning of αCP1-KH domains 47 2.2.1 Polymerase chain reaction 47 2.2.2 Agarose gel electrophoresis of DNA 48 2.2.3 Restriction endonuclease digestion of DNA 48 2.2.4 pGEX -6P-2 vector digestion 48 2.2.5 Ligation reaction 48 2.2.6 Transformation of XL1-Blue competent cells 49

2.2.7 “Colony screening” and extraction of Plasmid DNA from bacterial culture 49

2.2.8 Restriction enzyme digestion 49 2.3 Protein expression 50 2.3.1 Background 50

2.3.2 Expression of unlabeled Glutathione-S-Transferase Fusion protein 50

2.3.3 Overexpression of labelled αCP1-KH1 and �CP1-KH3 domains 51

2.4 Protein purification 53 2.4.1 Cell Lysis 53

2.4.2 Glutathione-agarose bead adsorption and preScission protease cleavage 53

2.4.3 Regeneration of GSH-agarose beads 54 2.4.4 PAGE analysis 54 2.4.5 Tris/glycine SDS-PAGE 54 2.4.6 Electrophoresis 54 2.4.7 Size-exclusion chromatography 55 2.4.8 Anion exchange chromatography 55 2.4.9 Cation exchange chromatography 56 2.5 Protein concentration 56 2.6 Mass Spectrometry 57 2.7 Circular Dichroism spectropolarimtery 57 2.7.1 Sample preparation and spectra acquisition 57 2.8 Oligonucleotide Preparation 58 2.8.1 Preparation of 11-nt αCP1 target site from AR mRNA 58 2.8.2 Preparation of 50-nt αCP1 target site from AR mRNA 58 2.9 Oligonucleotide-protein binding studies 58 2.9.1 Surface Plasmon Resonance Spectroscopy (Biacore) 58 2.9.2 Oligonucleotide-Protein binding measurements 59 2.9.3 Preparation of RNA for REMSA 60 2.9.4 REMSA 61 2.10 Nuclear Magnetic Resonance Spectroscopy (NMR) 61 2.10.1 Sample preparation 61 2.10.2 NMR experiments 62 2.11 Structural studies 63 2.11.1 Crystal growth for X-ray diffraction experiments 63 2.11.2 Crystallisation of αCP1-KH3 64 2.11.3 Preparation of αCP1-KH1/DNA complex 64 2.11.4 Preparation of other crystallisation experiments 64 2.11.5 X-ray data collection 65 2.11.6 Structure solution and refinement 65

x

2.12 Molecular dynamics simulation using NAMD 65

2.12.1 Molecular modelling of αCP1-KH3 bound to poly (C) oligonucleotide 66

Chapter 3: Protein Preparation 67 3.1 Chapter overview 67 3.2 αCP1-KH Domain Boundaries 67 3.3 Protein Expression 69 3.3.1 αCP1 expression and purification 69 3.3.2 αCP1-KH1 expression and purification 71

3.3.3 αCP1-KH2 plasmid preparation, overexpression and purification 74

3.3.4 αCP1-KH2 sequence analysis 75 3.3.5 Plasmid preparation 76 3.3.6 αCP1-KH2 Expression and purification 76 3.3.7 αCP1-KH3 expression and purification 78 3.3.8 Combined αCP1-KH domains 1 and 2, 2 and 3 80

3.4 Circular Dicroism and Confirmation of correct recombinant protein expression 81

3.5 Conclusions 84 Chapter 4: Structural and NMR studies of αCP1-KH3 85 4.1 Chapter overview 85 4.2 Why Crystallography? 85 4.2.1 Crystallography 86 4.2.2 Crystals and the unit cell 86 4.2.3 Crystal Growth 87 4.2.4 X-ray Diffraction 89 4.3 4.3 What is NMR? 91 4.3.1 Protein NMR 91 4.4 Results 93 4.4.1 Crystallization of αCP1-KH3 93 4.4.2 X-ray data collection 93 4.4.3 Structure solution and refinement 94 4.4.4 Structural overview 95 4.4.5 The oligonucleotide-binding cleft 98 4.4.6 Comparison with other KH domain structures 98 4.5 Model of αCP1-KH3 bound to poly (C) oligonucleotide 101 4.5.1 Poly (C) RNA structure may favor binding 104 4.6 NMR Studies of αCP1-KH3 105 4.6.1 Formation of an αCP1-KH3/11-nucleotide RNA complex 105 4.7 Conclusions 108 Chapter 5: Structural and NMR studies of αCP1-KH1/DNA 110 5.1 Chapter overview 110 5.2 Crystallization of αCP1-KH1/DNA 111

xi

5.2.1 Structure determination of αCP1-KH1/DNA 111 5.2.2 Structural Overview 113 5.2.3 Oligonucleotide binding 117 5.2.4 Residues underlying cytosine specificity 120 5.3 Comparison of αCP1-KH1 with other KH domains 121

5.4 Comparison of αCP1-KH1 with other KH domain/oligonucleotide complexes 124

5.5 NMR Studies of αCP1-KH1 domain 126 5.6 Conclusions 130 Chapter 6: SPR analysis of αCP1-KH domains 130 6.1 Chapter Overview 133 6.2 Why use Surface Plasmon Resonance (SPR)? 133 6.3 Principles and applications of SPR 134 6.4 Kinetics 136 6.4.1 Langmuir binding model 137 6.4.2 Determination of Equilibrium Constants 140 6.4.3 The two compartment or Mass transfer model 142 6.4.4 Other binding models 142 6.5 Deviation from 1:1 model 145 6.5.1 Sample purity 145 6.5.2 Aggregation state 146 6.5.3 Mass-transport 146 6.5.4 Steric hindrance 147 6.5.5 Minimal non-specific binding 147 6.6 Results 148

6.6.1 Binding measurements of αCP1-KH domains to the 30 nucleotide AR mRNA 148

6.6.2 Binding measurements of αCP1-KH domains to DNA sequence representing the 30 nucleotide 3’UTR of AR mRNA 151

6.6.3 αCP1-KH interaction with homopolymers 155 6.6.4 αCP1-KH domain binding to single poly C site 158 6.6.5 αCP1-KH domains binding specificity 162 6.7 Conclusions 167 Chapter 7: General summary and discussion 169 7.1 Chapter Overview 169 7.2 Stability of αCP1-KH domains 170 7.3 Structural studies of αCP1 171 7.3.1 αCP1-KH1/DNA and other KH/nucleotide complexes 173 7.4 αCP1-KH binding Kinetics 174 7.5 Future directions 175 Chapter 8: References 177 Appendix A: Crystallization Screens 193

xii

Appendix B: REMSA 197 Appendix C: Publications 200

xiii

List of Abbreviations

3-D three-dimensional A280 absorbance at 280 nm A600 absorbance at 600 nm ADAR adenosine deaminases acting on RNA αCP poly (C) binding protein ARE AU-rich element ARM arginine rich motif AR androgen receptor AUF1 ARE/poly (U) binding degradation factor bp base pair BSA bovine serum albumin C cytosine CD circular dichroism COSY Correlation spectroscopy DICE differentiation control element DEPC diethyl pyrocarbonate DHT dihydrotestosterone DNA deoxyribonucleic acid dNTP deoxyribonucleotide 5’-triphosphate

DSRM double-stranded RNA-binding motif

DTT dithiothreitol EDTA ethylenediaminetetraacetic acid EPO erythropoietin

ELAV embryonic lethal abnormal vision EGFR epidermal growth factor receptor NOESY nuclear Overhauser enhancement spectroscopy FBP fuse binding protein FMR fragile mental retardation FSH follicle stimulating hormone

GAPDH glyceraldehyde-3-phosphate dehydrogenase GM-CSF granulocyte monocyte-colony stimulating factor GMP guanine mono-phosphate GSH glutathione GST glutathione-S-transferase HCl hydrochloric acid HEPES N-(2-Hydroxyethyl)piperazine-N'-2-ethanesulfonic acid

HIV human immunodeficiency virus

hnRNP heterogeneous nuclear ribonucleoprotein HSQC heteronuclear single-quantum coherence IPTG isopropyl β-D-thiogalactopyranoside IRES internal ribosome entry site

xiv

JB Jena Bioscience kDa kilodalton KH k homology KSRP k homology splicing regulatory protein LB Luria-Bertoni medium LH luteinising hormone LOX lipoxygenase M molecular mass MALDI-TOF matrix assisted laser desorption ionisation-time of flight MD molecular dynamics MES 2-N-Morpholinoethanesulfonic acid

MOR mouse opiod receptor

MPD 2-methyl-2,4-pentane-diol

mRNA messenger ribonucleic acid

MQW Milli-Q® water MW molecular weight MWCO molecular weight cut off NaOH sodium hydroxide NLS nuclear localisation signal NMR nuclear magnetic resonance NOESY nuclear Overhauser enhancement spectroscopy PAB poly A polymerase PABP poly A binding protein PARN poly A specific ribonuclease PBS phosphate buffered saline PCBP/hnRNP E poly (C) binding protein PCR polymerase chain reaction PEG polyethylene glycol PKR RNA-activated protein kinase DAI PMSF phenylmethylsulphonyl fluoride ppm parts per million PV poliovirus REMSA RNA electrophoretic mobility shift assay REN rennin r.m.s. root mean square r.m.s.d. root mean square deviation RNA ribonucleic acid RNP ribonucleoprotein RRM RNA recognition motif RT room temperature SA streptavidin SDS sodium dodecyl sulphate SDS-PAGE sodium dodecyl sulphate polyacrylamide gel electrophoresis SELEX systematic evolution of ligands by exponential enrichment

xv

SLDE stem-loop destabilizing element SPR surface plasmon resonance STAR signal transduction and activation of RNA TAE tris-acetate EDTA TEMED N,N,N’,N’-tetramethylethylenediamine TFA trifluoroacetic acid TNFα tumour necrosis factor TPA 12-O-Tetradecanoylphorbol-13-acetate Tris tri[hydroxymethyl]aminomethane U units UTR untranslated region UVXL UV cross-linking assay UVXL-IP UV cross-linking assay immunoprecipitation VEGF vascular endothelial growth factor

xvi

LIST OF TABLES AND FIGURES

Chapter 1 Introduction 1

Figure 1.1 Schematic representation of the steps involved in gene

expression. 1

Figure 1.2 The ribosome-recycling concept 6

Figure 1.3 Eukaryotic mRNA degradation pathways 9

Figure 1.4 A Schematic representation of different features of the

mRNA 10

Figure 1.5 Cartoon representation of NMR structure of the RRM domain

of the SF2 protein (PDB code 1X4A) generated using VMD 14

Figure 1.6 Sequence alignment of a selection of RRM domains for

which the structure has been solved (PDB codes are

indicated in brackets

15

Figure 1.7 Structure of the HuD RRM domain 1 and 2 c-fos-11 complex

(PDB code: 1FXL) generated using VMD 16

Figure 1.8

NMR structure of the Jembrana disease virus (JDV) Tat

Arginine rich motif (ARM) – bovine immunodeficiency virus

(BIV) TAR RNA complex (PDB code: 1ZBN)

17

Figure 1.9 Figure 9: The solution structure of Rnt1p dsRBD complexed

to the 5' terminal hairpin of one of its small nucleolar RNA

substrates, the snR47 precursor

19

Figure 1.10 Schematic representation of the αCP1 and KH domain

arrangement 21

Figure 1.11 Sequence alignment of the KH domains from the known

PCBP (αCP) proteins, PCBP1-4 and hnRNP K. 27

Figure 1.12 The crystal structure of αCP2-KH1 (residues 11–82) solved

to 1.7 Å resolution depicted in cartoon form 30

Figure 1.13 crystal structure of the αCP2 KH1-human telomeric (ht) DNA

complex 31

Figure 1.14 Analysis of KH domains with their target RNA or DNA 33

Figure 1.15 Androgen ablation kills prostate cells 35

Figure 1.16 The androgen receptor (AR) 36

Figure 1.17 The action of testosterone and androgen receptor (AR) 37

Figure 1.18 HuR/αCP1 and androgen receptor system 40

xvii

Chapter 2 Materials and Methods 41

Figure 2.1 Schematic of the sensor chip 59

Chapter 3 Protein preparation 67

Figure 3.1

Schematic representation of the cloned αCP1-KH domain

boundaries, amino acid sequence alignment and cartoon

representation of secondary structures

68

Figure 3.2 Overexpression and size-exclusion chromatography of

αCP1 70

Figure 3.3 Size-exclusion chromatography of αCP1 71

Figure 3.4 Overexpression of GST-αCP1-KH1 72

Figure 3.5 Cation exchange chromatography of αCP1-KH1 and SDS-

PAGE analysis 73

Figure 3.6 SDS-PAGE analysis of GST-αCP1-KH2 overexpression at

37°C (A) and 30°C 74

Figure 3.7 The initial and truncated domain boundary of αCP1-KH2 75

Figure 3.8 Sub-cloning of αCP1-KH2 76

Figure 3.9 Overexpression of GST-αCP1-KH2 77

Figure 3.10 Affinity purification αCP1-KH2 78

Figure 3.11 SDS PAGE analysis of GST-αCP1-KH3 overexpression 79

Figure 3.12 Size-exclusion chromatography of αCP1-KH3 and SDS-

PAGE analysis 80

Figure 3.13

Circular dichroism spectra of the KH1, KH2 and KH3 at 50

mM Tris buffer pH 8.00, 150 mM NaCl, 1mM EDTA and

1mM DTT.

82

Table 1

The expected percentage of α helix and � sheet for the

three αCP1-KH domains

83

Chapter 4 Structural and NMR studies of αCP1-KH3 85

xviii

Figure 4.1 Crystals are used to diffract X-rays 86

Figure 4.2 Schematic of the hanging drop vapour diffusion 89

Figure 4.3 Crystals of αCP1-KH3 93

Table 4.1 Data collection and refinement statistics 94

Figure 4.4 The crystal structure of αCP1-KH3 (residues 279–356)

solved to 2.1 Å resolution 96

Figure 4.5 Comparison of KH domain structures 100

Figure 4.6

Structure-based sequence alignment of seven KH domains

of High structural similarity to αCP1-KH3

101

Figure 4.7

Molecular surface of αCP1-KH3 showing modeled position

of poly (C) RNA (orange) based on the Nova-2-KH3-RNA

structure

102

Figure 4.8 Summary of potential interactions occurring between the

modeled αCP1-KH3 and poly (C) RNA. (A) 103

Figure 4.9

The 1H–15N heteronuclear single quantum correlation

spectra recorded at 25°C for 15N-labelled αCP1-KH3 before

and after the addition of the 11-nucleotide RNA of sequence

5-UUCCCUCCCUA-3

106

Chapter 5 Structural and NMR studies of αCP1-KH1/DNA 110

Figure 5.1 Crystals of αCP1-KH1/DNA 111

Table 5.1 Data collection and refinement statistics 112

Figure 5.2 Structures of the αCP1-KH1/DNA complexes in the

asymmetric unit 113

Figure 5.3 αCP1-KH1 dimerisation 114

Figure 5.4 Electron density map of αCP1-KH1/DNA 116

Figure 5.5 The electrostatic potential emanating from the αCP1-KH1 117

Figure 5.6 Cartoon representation of αCP1-KH1/DNA complex 118

Figure 5.7 Summary of the contacts between αCP1-KH1 and bound

DNA tetrad of sequence 5’-TCCC-3’ 119

xix

Figure 5.8 A structural comparison of several different KH domains 122

Figure 5.9 Comparison of αCP1-KH1/DNA (purple) and αCP2-KH1

(cyan) backbone traces 123

Figure 5.10

Overlay of bound oligonucleotides from αCP1-KH1 (purple),

αCP2-KH1 (cyan) hnRNP K-KH3 NMR structure (yellow),

hnRNP K-KH3 crystal structure (red) and Nova-2-KH3

(green) structures as obtained by structural superimposition

of KH domains

124

Figure 5.11

The 1H–15N heteronuclear single quantum correlation

spectra recorded at 25 °C for 15N-labelled αCP1-KH1 before

and after the addition of the 20-nucleotide RNA of sequence

5- CUUUCUUUUUCUUCUUCCCU -3

127

Figure 5.12

The 1H–15N heteronuclear single-quantum correlation

spectra recorded at 25°C for 15N-labelled αCP1-KH1/RNA

sample before and after the addition HuR RRM1/2

129

Figure 5.13 Model of full-length αCP1 131

Chapter 6 SPR analysis of αCP1-KH domains 133

Figure 6.1 Schematic depicting the streptavidin (SA) sensor chip 134

Figure 6.2 The basic principles of SPR using Biacore© 135

Figure 6.3 Langmuir binding of analyte A to immobilized ligand B, to

form AB complex 137

Table 6.1 A brief explanation of the rate constants 138

Figure 6.4 Graph of Req against the analyte concentration 141

Figure 6.5 Mass transfer model 142

Figure 6.6 Heterogeneous model 143

Figure 6.7 Bivalent model model 144

Figure 6.8 Conformational change model: 145

Figure 6.9 Binding studies of αCP1-KH1, KH2 and KH3 with RNA

sequence 149

Table 1 Equilibrium constants for αCP1-KH1, αCP1-KH2, αCP1-

KH3 150

xx

Figure 6.10 Binding studies of αCP1-KH1, KH2 and KH3 with DNA

sequence 152

Table 2 Equilibrium constants for aCP1-KH1, aCP1-KH2, aCP1-

KH3 interaction with DNA representing AR 153

Figure 6.11 Kinetic analysis of αCP1-KH2 and DNA representing the 30

nucleotide of the 3’UTR of AR mRNA 154

Figure 6.12 Binding studies of αCP1-KH1, KH2 and KH3 with RNA

homopolymers, poly (C), (G) and (U) 157

Figure 6.13 Binding studies of HuR and HuR RRM1/RRM2 with RNA

homopolyers, poly (C), (G) and (U) 158

Figure 6.14 Binding studies of αCP1-KH1 and KH3 to a 10mer poly (A)

(adenine) and triplet CCC (cytosine) sequence 159

Figure 6.15 Kinetic analysis of αCP1-KH1 and αCP1-KH3 to triplet CCC

RNA and DNA sequence 160

Table 3 Kinetic and affinity constants for KH domains 161

Figure 6.16 Steric hindrance 161

Figure 6.17 Binding studies of αCP1-KH1 to systematic mutation of the

triplet CCC site to thymine DNA sequences 163

Figure 6.18 Binding studies of αCP1-KH3 to systematic mutation of the

triplet CCC site to thymine DNA sequences 164

Figure 6.19 Schematic depicting the complex αCP1-KH1/DNA 166

xxi

xxii

Three letter and one letter code for the common amino acids

Residue three letter code one letter code

Alanine Ala A

Arginine Arg R

Asparagine Asn N

Aspartic acid Asp D

Cysteine Cys C

Glutamine Gln Q

Glutamic acid Glu E

Glycine Gly G

Histidine His H

Isoleucine Ile I

Leucine Leu L

Lysine Lys K

Methionine Met M

Phenylalanine Phe F

Proline Pro P

Serine Ser S

Threonine Thr T

Tryptophan Trp W

Tyrosine Tyr Y

Valine Val V

xxiii

Publications M. Sidiqi, J.A. Wilce, J.P.Vivian, C.J. Porter, P.J. Leedman and M.C.J. Wilce (2005)

Structure and RNA binding of the third domain of poly(C)-binding protein. Nucleic Acids Research 33,1213-1221.

M. Sidiqi, J.A Wilce, C.J., Porter, A. Barker, P.J. Leedman and M.C.J.Wilce (2005)

Formation of an alphaCP1-KH3 complexed with UC-rich RNA. Eur Biophys J. 34, 423-429. M. Sidiqi., Wilce J.A., Schmidberger J.W., Barker A., Leedman P.J. and Wilce J.A. Wilce M.C.J. (2005) Structure of alphaCP1-KH1 bound to C-rich DNA 11-mer: multiple KH binding mode revealed. (manuscript in preparation – Nucleic Acids Research)

Conference Poster presentations Porter, C.J., Sidiqi, M., Vivian, J.P., Leedman, P.J., Wilce, M.C.J. and Wilce, J.A. (2003) An Investigation of Protein-mRNA Interactions Affecting Androgen Receptor Expression. Proc. 28th Annual Lorne Conf. Protein Structure and Function, Lorne, VIC p149. Sidiqi, M., Vivian, J.P., Wilce, J.A. and Wilce, M.C.J. (2003) Structural studies of CP protein and its fragments with androgen receptor mRNA. AsCA’03/Crystal-23 Conference, Broome, WA. ATuP-35. Sidiqi M., Wilce J.A. Vivian J.P., Porter C.J. Leedman P.J. and Wilce M.C.J. (2004) Structure and RNA binding of the third domain of poly(C)-binding protein. Proc. 29th Annual Lorne Conf. Protein Structure and Function, Lorne, VIC. abs #136 (This was awarded the first poster prize award) Sidiqi, M.S., Wilce, J.A., Vivian, J.P., Porter, C.J., Leedman, P.J. and Wilce, M.C.J. (2004) An investigation of the poly(C)-binding protein KH domains Combined ASBMB, AZSCDB and ASS Annual Meeting. Pos-Wed-010. Sidiqi, M.S., Wilce, J.A., Vivian, J.P., Porter, C.J., Leedman, P.J. and Wilce, M.C.J. (2004) An investigation of the poly(C)-binding protein KH domains and androgen receptor mRNA interactions. 28th Annual Scientific Meeting of the Australian Society of Biophysics. P20. M.C.J. Wilce, M.Sidiqi, A. Barker, J.Schmidberger, P.J. Leedman and J.A.Wilce.(2005) Double Grip of alphaCP1-KH1 on C-rich DNA 11-mer. The Murnau Conference of Structural Biology of Molecular Recognition, Sept 15-17, Murnau, Germany P093B.

J.A. Wilce , M. Sidiqi, A. Barker, J.W. Schmidberger, L.Pattenden, P.J. Leedman, and M.C.J.Wilce (2006) Structure of αCP1-KH1/DNA complex and comparative binding to RNA vs DNA: model of the triple-KH domain protein bound to target androgen receptor mRNA. Proc. 31st Annual Lorne Conf. Protein Structure and Function, Lorne, VIC. abs #145

Chapter 1 General Introduction

Chapter 1: General Introduction

1

Figure 1.1: Schematic representation of the steps involved in gene expression. Transcription factors initiate transcription of specific genes at exposed regions of DNA. The DNA protein complex is removed exposing DNA allowing transcriptional factors to bind, initiating transcription. Once transcribed, the mRNA undergoes extensive processing which involves capping of the 5’ terminal nucleotide, polyadenylation of the 3’ end, splicing of the introns, nucleocytoplasmic transport, mRNA degradation and finally translation of the protein products. Translation is the final determinant of whether a gene is expressed or not because it is ultimately the protein that will carry out the majority of the functions of the gene.

1.1 Levels of Gene Regulation

A high level of gene regulation for functionality is required by all living organisms, from

the most basic unicellular organism to the more complex, multicellular organisms. The

number of genes ranges from approximately 6000 in yeast to 30000 in humans.

Accordingly, the coordinated regulation of these genes requires the interaction of many

complex processes. Every gene is expressed at a different level and some not at all, in

response to extrinsic and intrinsic stimuli. Different genes will be either active or inactive

through the development of the cell and the organism. Only half to three quarters are

actively expressed at any one point (Claverie, 2001; Goffeau et al., 1996).

Gene expression is a process starting from transcription all the way to protein synthesis.

There are a number of steps involved in gene expression, each representing a potential

control point for regulation. These are highlighted in Figure 1.1 (Hollams et al., 2002).


2

1.2 Regulation of transcription in Eukaryotes

Gene expression is initiated by transcription factors, which are necessary for the production

of an RNA transcript from a DNA template. The factors that regulate transcription have

been extensively studied and include chromatin structure, the methylation status of the

gene, basal transcriptional machinery and the presence and amount of activator and

repressor transcription factors.

DNA exists in the form of chromatin in the cell. Chromatin is a highly compact structure,

covered with proteins called histones. This highly protected physical structure of the DNA

can influence the ability of transcriptional regulatory proteins and RNA polymerases to find

access to specific genes and to activate their transcription (Collingwood et al., 1999;

Narlikar et al., 2002). Chromatin is densely packaged in inactive areas of the chromosome,

whereas the chromatin is less tightly bound in transcriptionally active sites (Collingwood et

al., 1999; Narlikar et al., 2002). The packaging of the DNA around histones blocks access

to transcription initiation factors. As a consequence the cell takes advantage of a number of

mechanisms to remodel chromatin to expose DNA to initiation factors. Initially, it utilizes

trans-acting factors to disrupt the chromatin complex by displacing histones and allowing

the pre-initiation complex to form. In addition it can also acetylate the chromatin complex,

which leads to displacement of histones, permitting binding of the initiation factors and the

beginning of transcription (Collingwood et al., 1999).

The ability of factors to bind to mammalian DNA is also affected by methylation of the

cytosines in the DNA. Methylation of the cytosines in the DNA is a common method of

gene silencing. A number of studies have shown that genes with methylated cytosines in

the promoter region are inactive, while cells that have non-methylated forms of these genes

express them. Overall chromosome structure, modification of histones and methylation of

cytosines influence gene regulation by regulating the accessibility of the DNA to

transcription factors (Razin et al., 1980; Rezai-Zadeh et al., 2003).


3

Transcription factors, or transcription regulatory proteins, influence the frequency of

transcription initiation. They assemble as multiple complexes alongside the basal

transcription machinery. A number of mechanisms regulate the activity and assembly of

transcription factors (Beckett, 2001). Some of these regulatory mechanisms include small

ligand binding, which affects DNA-binding ability, post-translational modification of

protein such as phosphorylation due to an external signal, DNA binding where the binding

of an initial factor to DNA acts as a platform for subsequent factors and other protein-

protein interactions which either lead to activation or repression of transcription factor

assembly (Beckett, 2001).

A high level of regulation is also present in the transcriptional elongation step. RNA

polymerases respond to different stimuli, including association with the auxiliary

elongation factors and the phosphorylation of the large carboxy terminal domain (CTD)

subunit of RNA polymerases, at stages subsequent to recruitment to a promoter and

establishment of a pre-initiation complex (Payne et al., 1989; Shilatifard et al., 2003). It is

possible that modifications such as these can effectively obstruct elongation, leading to

either transient pausing or full transcriptional arrest (Fish et al., 2002). A number of factors

have also been identified that lead to enhancement of elongation. One group suppresses

transient pausing and stimulates the rate of transcript elongation and another group

reactivates RNA polymerase, arrested at transcription, by stimulating cleavage activity of

the RNA polymerase (Fish et al., 2002).

1.3 Pre-mRNA processing reactions: Capping, editing, splicing, 3’end processing Once the mRNA is processed it goes through extensive processing steps, which are also

potential points of gene expression control. Pre-mRNA processing begins with capping of

the 5’ end. After the synthesis of 20 to 30 nucleotides, a N7-methyl guanosine

monophosphate (GMP) is added to the first nucleotide by a 5’-5’ triphosphate linkage. The

5’ cap is joined by nuclear cap binding complex consisting of a 20-80 kDa protein

(Izaurralde et al., 1994; Soller, 2006). The cap structure is recognized as m7(5’)Gppp(5’)N.


4

The cap structure has many essential functions including stabilising mRNA, splicing of the

first intron, 3’ end processing and enhancing translation (Flaherty et al., 1997; Izaurralde et

al., 1994; Lewis et al., 1996; Parker et al., 2004).

In addition to 5’ end processing the pre-mRNA is also modified at the 3’ end. All

eukaryotic mRNAs are polyadenylated. Polyadenylation takes places in two steps (Colgan

et al., 1997; Shatkin et al., 2000; Wahle et al., 1999). Firstly, an endonucleolytic reaction

cleaves the transcript 10-30 nucleotides downstream of the conserved AAUAAA sequence,

while the second step involves adenylation by the Poly (A) polymerase (PAP). The

synthesized length of the poly (A) tail is species-dependant. For example, in human the

length of the poly (A) tail can be as long as 250 nucleotides while in yeast it is only up to

90 nucleotides. By the time the mRNA is transported to the cytoplasm about 50 to 100

adenines of the poly A have been degraded (Bergmann et al., 1980; Brawerman, 1981). The

mRNA breaks down completely, once the adenines falls below a critical number (Beelman

et al., 1995). This 3’ end alteration thus plays an essential role in the regulation of the gene

product by providing protection to the transcript from exoribonucleases (Beelman et al.,

1995).

Another pre-mRNA modification, involves editing of certain single nucleotides in the pre-

mRNA. Two types of base changes occur in the mammalian pre-mRNA, namely

deamination of adenosine to inosine (A to I) and also of cytidine to uridine (C to U) (Anant

et al., 2003; Barlati et al., 2005). Most of the genes found to undergo A to I editing are

expressed in the nervous system while an example of C to U in humans is the

apolipoprotein B mRNA, which is edited in some tissues, but not others (Anant et al.,

2003). The editing may create an early stop codon which, upon translation, produces a

truncated protein. Therefore, RNA editing can lead to the generation of alternative protein

products from a single gene and hence is a regulatory step in gene expression (Tauson,

2004).

Alternative protein products do not just arise from mRNA editing but also mRNA splicing

which is another alteration that the mRNA undergoes. Genes in eukaryotes are interspersed

with interfering sequences called introns that are removed during gene expression. mRNA


5

splicing is the process by which these sequences are specifically removed and functional

sequences exons are attached together (Izquierdo et al., 2006). Pre-mRNA messages may

be spliced in several different ways, allowing a single gene to encode multiple proteins. A

process called alternative splicing. RNA splicing is a significant part of pre-mRNA

processing and is one of the major steps in the control of gene expression in eukaryotes

(Hastings et al., 2001).

1.4 Translation The mRNA is translated to its gene product in the cytoplasm. The process of translation is

mediated by specific interactions with proteins including the initiation, elongation and

termination factors. Translation initiation involves at least twelve different elongation

factors (Kozak, 2005; Tarun et al., 1997). mRNA translation is stimulated upon binding of

poly (A) binding protein (PABP) to the poly (A) tail. The PABP interacts with elongation

factors eIF4G and eIF4E, which are associated with cap binding proteins at the 5’cap end of

the mRNA. The translational synergism between the 5’cap and the poly (A) tail leads to

circularization of mRNA, which stimulates the recruitment of ribosomal machinery that

allows translation to take place (Figure 2) (Kozak, 2005; Sachs et al., 1997; Tarun et al.,

1997). The circularization of mRNA has been shown in vitro to promote translation and

lead to the stabilization of the mRNA by protecting the targets from deadenylating and

decapping enzymes (Preiss et al., 2003).


6

Figure 1.2: The ribosome-recycling concept. A possible function of mRNA circularization could involve facilitation of a direct recycling of ribosomes or ribosomal subunits, after the termination at the stop codon, back to the 5’ region of the same mRNA. This model is supported by the observation of circular polyribosomes, as well as interactions between three molecules of PABP, initiation factors (4G and 4E) to the cap structure, and the translation termination factor eRF3 (Preiss et al., 2003).


7

1.5 mRNA Decay 1.5.1 mRNA half-life

Messenger RNA degradation plays an important role in the regulation of gene expression.

The amount of mRNA available at any particular stage reflects the balance between its

synthesis, processing and degradation. All mRNAs have distinct half-lives that are related

to their general level of expression and also closely correlated with the function of their

protein product. For example, although the average half-life of eukaryotic mRNAs is 3 to 5

hours (Jacobson et al., 1996; Ross, 1995), regulatory proteins, such as growth factors,

whose levels change rapidly in cells, have mRNAs with a half-life of several minutes

(Chkheidze et al., 1999). In contrast, the most stable mRNAs, such as β globin and the

glycolytic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPDH) have a half-life

of a day or more (Ross et al., 1985). mRNA half-lives can fluctuate in response to several

stimuli including environmental factors, growth factors, hormones and second messengers

that are released by signaling cascades (Staton et al., 2000). It was found, for example, that

hypoxia increases vascular endothelial growth factor (VEGF) mRNA half life from 43 min

to ~106 min (Levy et al., 1996), epidermal growth factor (EGF) increases the half-life of

the mRNA of its receptor (epidermal growth factor receptor (EGFR) (Balmer et al., 2001)

and dihydrotestosterone (DHT) regulates the stability of androgen receptor mRNA (Yeap et

al., 1999). Furthermore, the mitogen,12-O-tetradecanoylphorbol-13-acetate (TPA), can

have differential effects on mRNA fate, either stabilizing it or leading towards its

degradation (Guhaniyogi et al., 2001).

1.5.2 Eukaryotic mRNA Decay Pathways

Degradation of the mRNA can be achieved by a number of pathways, and decay is initiated

by different cis-elements within the mRNA. A model for mRNA decay in both yeast and

mammals proposes that the initial step in mRNA degradation is the removal of most or all

of the poly (A) tail with 3’-5’ exonucleases (Hollams et al., 2002). The removal of the

remaining part of the mRNA can take place through two major pathways. The first one

involves the removal of the 5’ cap, followed by 5’ to 3’ exonucleolytic degradation from

the 5’ end. The second pathway also involves deadenylation and 3’-5’ exonucleolytic

degradation of the rest of the RNA but excludes the removal of the 5’ cap (Newbury, 2006).

The critical exoribonuclease in the 5’ to 3’ degradation pathway in yeast is Xrn1p, which


8

hydrolyses the mRNA from the 5’ end, releasing single nucleotides. A related enzyme

Rat1p has also been identified in other eukaryotic cells (Newbury, 2006), while the main

exoribonuclease in the 3’ to 5’ pathway is the exosome, which is a large complex protein

containing several exoribonucleases (Newbury, 2006).

In order for the exoribonucleases to access the mRNA, it must first be deadenylated,

decapped or endonucleolytically cleaved (Figure 1.3) (Newbury, 2006). One deadenylase

that has been purified and cloned from vertebrates is PARN (poly (A)-specific

ribonuclease) (Meyer et al., 2004). Deadenylation by PARN is initiated by recognition of

an m7-guanosine cap on the RNA. The presence of the cap binding protein eIF4E prevents

deadenylation in vitro competing for the same binding site (Gao et al., 2000). Decapping of

the mRNA is achieved by the decapping proteins Dcp1p and Dcp2p (Decker et al., 2002)

(Figure 3). It is not fully understood how decapping occurs by these proteins but it appears

that Dcp2P cleaves the RNA, a process initiated by the Dcp1p protein (Coller et al., 2004).

Finally, access to mRNA is also gained through endonucleolytic cleavage of the mRNA.

Endonucleases have been identified and shown to bind premature stop codons leading to

the cleavage of the mRNA. Thus both the cap structure and the poly (A) tail play critical

roles in protecting the mRNA from degradation and their removal affects the rate of mRNA

decay (Hilleren et al., 1999; Palacios et al., 2004).

Although almost all mRNAs contain these elements there still exists a great difference in

the half-lives of different mRNAs ranging from several minutes to several days. This

emphasises that there must exist specific elements which are distinct to some mRNAs that

also contribute either to their degradation or stability (Hollams et al., 2002).


9

Figure 1.3: Eukaryotic mRNA degradation pathways. The mRNA is first is deadenlyated, decapped or endonucleolytically cleaved. The exoribonucleases can then access the mRNA and direct its degradation either by Xrn1p or by exosome (Newbury, 2006).


10

1.5.3 mRNA Stability

In addition to the cap structure and the poly (A) tail, mRNA stability can also be

determined by specific cis-acting sequences known as “instability sequences” in the 5’

untranslated region (5’UTR), coding sequence and/or 3’untranslated region (3’UTR)

(Figure 1.4).

Figure 1.4: A Schematic representation of different features of the mRNA. Instability elements are located at 5’UTR, coding sequence and 3’UTR.

For example, instability sequences have been identified in the 5’UTR of interleukin and in

the coding region of c-fos and c-myc mRNAs (Chen et al., 1998a; Wisdom et al., 1991).

There are also 3’UTR instability sequences in growth factor, cytokine and lymphokine

messages (Wilson et al., 1999a). These specific instability sequences generate an unstable

mRNA, leading to degradation. Their mode of function is thought to be linked with RNA

binding proteins that either directly or indirectly affect the activity of ribonucleases.

1.5.4 Adenosine Uridine (AU)-rich elements (AREs)

Another determinant of mRNA decay are the cis acting elements, which are most often

located at the 3’UTR. In mammalian mRNA the best characterized instability element is the

AU-Rich Element (ARE), which is generally found in the 3’UTR of a number of cytokines

and protooncogene mRNAs (Bakheet et al., 2001; Chen et al., 1995). Shaw and Kamen

demonstrated that AREs destabilise the mRNA by showing that inserting a 51 nucleotide

AU rich sequence from the 3’UTR of human granulocyte monocyte colony stimulating

factor (GM-CSF) mRNA into the 3’UTR of the relatively stable β-globin mRNA signaled

its degradation (Shaw et al., 1986). Computational analysis of the 3’UTR has shown that

almost 8% of human mRNAs contain AREs. These analyses suggest that AREs may be

responsible for the degradation of a vast number of unstable mRNAs (Bakheet et al., 2001;

Bakheet et al., 2003).

C


11

AREs are generally 50 to 150 bases long and have been classified into three groups based

on their sequence features and decay characteristics (Chen et al., 1995; Wilson et al.,

1999b). Class I AREs include one to three copies of a pentanucleotide AUUUA within a U

rich region such as found in c-fos and c-myc mRNAs. Decay of mRNAs with class I AREs

begins with synchronous deadenylation resulting in mRNAs with 30 to 60 nucleotide poly

(A) tails. Class II AREs are found only in cytokines and consist of multiple overlapping

copies of UUAUUUA(U/A)(U/A) nonamers (Bakheet et al., 2001; Bakheet et al., 2003).

Decay occurs here in an asynchronous manner, which gives rise to fully deadenylated

intermediates. However, in class III AREs the classical AUUUA motif is not present but a

U rich sequence occurs (Chen et al., 1994; Peng et al., 1996; Xu et al., 2001).

Deadenylation of this class takes place in the same manner as class I. To date, no real

consensus sequences have been revealed for any of the ARE classes. Furthermore, the

classification of these AREs is not based on biological function (Barreau et al., 2006).

AREs function via their recognition by specific RNA-binding proteins (“trans-acting

factors”). A whole range of proteins associated with AREs has been found, but the

physiological significance of these RNA-protein interactions and their precise role in ARE

mediated mRNA decay still remains to be elucidated. AUF1, a 37-45 kDa molecule, also

known as ARE/poly (U) binding degradation factor or hnRNP D, binds AREs and is

thought to recruit nucleases, leading to degradation of the message (Chen et al., 2001). In

contrast HuR, a 32 kDa molecule and member of the embryonic lethal abnormal vision

(ELAV) class of RNA-binding proteins, has been shown to specifically bind ARE

sequences of labile cytokine and growth factor mRNAs, leading to their stabilization (Zhao

et al., 2000). HuR is believed to compete for the binding site of AUF1, thus preventing the

recruitment of nucleases and degradation of the message. Although AUF1 enhances

degradation it has also been reported to increase mRNA stability of several genes and it is

thought that this effect may be cell type specific (Zhao et al., 2000).

Despite extensive research, little is known about the mechanism by which ARE binding

proteins achieve mRNA degradation. However, in vitro studies have revealed that

degradation is achieved by the mammalian exosome in the 3’ to 5’ direction (Chen et al.,

2001; Mukherjee et al., 2002; Wang et al., 2001b). It has been suggested that AREs


12

specifically interact with the exosome via ARE binding proteins and hence cause the

shortening of Poly (A) tail and an immediate decay of the mRNA body (Gherzi et al.,

2004). One such protein is the KSRP (K homology Splicing Regulatory Protein) which has

been shown to interact with the exosome both in vitro and in vivo, leading to mRNA

degradation via the ARE mRNA decay pathway (Figure 1.3) (Chen et al., 2001; Shim et

al., 2002).

1.5.5 Non-ARE cis-elements and their binding protein

In addition to the classical ARE elements, non-classical ARE elements have been identified

in several genes. Examples of these include the stem-loop destabilizing element (SLDE) in

the 3’UTR of GM-CSF mRNA and also the recently identified cis element which

influences TNFα mRNA degradation through an ARE independent pathway (Brown et al.,

1996). Another distinct destabilizing element contains a repeated GUUUG motif within a

non-AUUUA AU-containing sequence of the c-jun 3’UTR (Peng et al., 1996).

A commonly found non-ARE element in the 3’UTR of multiple mRNAs is the poly (C)

motif. This C-rich element has a consensus sequence CCUCC or CCCUCCC that is

targeted by poly (C) binding proteins such as αCP and hnRNP K proteins (Wang et al.,

1995) The binding of proteins such as the poly (C) binding protein to poly (C) motifs in

the 3’UTR has been shown to affect both mRNA stability and translation rate (Kong et al.,

2003; Makeyev et al., 2002). The C rich element is found in several mRNAs including that

of tyrosine hydroxylase, erythropoetin, lipoxygenase, α globin and of several viruses

(Makeyev et al., 2002). αCP has been identified as a component of a ribonucleoprotein

complex that assembles on the poly (C) rich regions of these mRNAs resulting in alteration

of mRNA stability or regulation of the translation rate (Makeyev et al., 2002). These

interactions have been shown to play an important role in many biological processes . A

better understanding of the nature of these interactions will provide valuable insight on how

such interactions impact on post-transcriptional gene regulation.


13

1.6 Role of RNA-protein interactions RNA-protein interactions play an essential role in the regulation of gene expression.

Disruption of RNA-protein complexes has in many instances resulted in disease (Faustino

et al., 2003). The RNA-protein interaction is mediated through one or more protein

domains and either through the recognition of specific sequences or higher order RNA

structures (Kumar et al., 1990; Messias et al., 2004). Primary sequence analysis of RNA-

binding proteins has lead to the identification of a number of RNA-binding domains. These

include the ribonucleoprotein (RRM), K homology (KH) and double-stranded RNA

binding domain. Understanding the nature of the interaction of these RNA-protein

complexes at a molecular level is a major aim in structural biology. In particular, the goal is

to provide insight into RNA recognition and the formation of multiple protein/RNA

complexes that underlie key cellular processes from posttranscriptional regulation to

protein synthesis (Cusack, 1999; Messias et al., 2004).

Several RNA-binding protein motifs have been structurally elucidated in the absence of an

RNA ligand. Such structures provide critical information on potential RNA-binding

surfaces and also show flexible regions that may change shape upon RNA binding. Well

known RNA-binding motifs commonly occur (Keenan et al., 1998; Liu et al., 1997) and

some of these are described below.

1.6.1 RNA-Binding motifs

RNA-binding motifs, often present in tandem along the amino acid sequence of the protein,

mediate RNA interaction and are involved in both the transcriptional and

posttranscriptional regulation of gene expression. Of the RNA-binding motifs or domains

the four most common ones include the RNA recognition motif (RMM domain, also called

RNP, ribonucleoprotein motif), the arginine rich motif (ARM), the double stranded RNA-

binding domain (dsRBD) and the KH (hnRNP K homology) domain.

1.6.2 The RRM (RNA Recognition Motif) motif

The RNA recognition (RRM) motif is the most widely found and the best studied RNA-

binding motif (Maris et al., 2005). It is also referred to as a ribonucleoprotein motif (RNP),


14

RNP Consensus sequence and RNA-binding domain. The RRM is involved in different

aspects of gene regulation; for example, the PABP contains four RRM domains and binds

to the poly (A) tails of eukaryotic mRNAs in the cytoplasm. This interaction is essential not

only for the stability of the mRNA but also for the initiation of translation through a

number of molecular interactions (Messias et al., 2004). The RNP motif is found in a

variety of other RNA-binding proteins including the sex-lethal protein, HuR and U1A, a

small nuclear ribonucleoprotein-specific proteins (Maris et al., 2005).

Several structures of the isolated RRM domain from various proteins have been elucidated.

The RRM motif is generally incorporated within four β-strands and two α−helices and

arranged in the following topology βαββαβ (Burd et al., 1994; Park et al., 2000). The

RRM fold is composed of one four-stranded antiparallel β-sheet spatially arranged in the

order of β4β1β3β2 from left to right when facing the sheet and two α helices (α1 and α1)

packed against the β-sheet (Figure 5). The hydrophobic core of the domain contains most

of the conserved residues except four conserved residues that are involved in RNA binding,

which are known as RNP1 and RNP2 (Figure 1.6). They are positioned between the central

strands of β-sheets, namely β3 and β1, which are highly conserved among RRM domains

(Park et al., 2000; Wang et al., 2001a).

Figure 1.5: Cartoon representation of NMR structure of the RRM domain of the SF2 protein (PDB code 1X4A) generated using VMD. SF2 is an RNA binding protein that has an activity important for pre-mRNA splicing in vitro (Cáceres et al., 1993). SF2 contains several RNA binding motifs including an RRM domain. NMR structure of the RRM domain of the SF2 protein shows the motif consisting of four β sheets (Magenta) and two α−helices (Green) arranged in the following topology βαββαβ. The structure is yet to be published.


15

Figure 1.6: Sequence alignment of a selection of RRM domains for which the structure has been solved (PDB codes are indicated in brackets). The alignment was generated using the program ClustalX and ESpript. The conserved RNP 1 and RNP 2 sequences are highlighted in blue boxes

Since the depiction of the first structure of an U1A-RRM in complex with RNA (Maris et

al., 2005), several other complex structures have been determined either by NMR or X-ray

crystallography (Figure 1.7). Analysis of the RRM-RNA interface in these complexes has

revealed a common interaction code, involving four conserved protein side chains, located

in each RNP1 and RNP2, and two nucleotides. The two bases of the dinucleotide are

stacked on a conserved aromatic ring; the two sugar moieties are in contact with a

hydrophobic side chain and a positively charged side chain neutralizes the phosphodiester

group. This small set of RRM-nucleic acid interactions illustrates the perfect adaptation of

the RRM for effective binding with single stranded nucleic acids of any sequence (Maris et

al., 2005).


16

Figure 1.7: Structure of the HuD RRM domain 1 and 2 c-fos-11 complex (PDB code: 1FXL) (Sungmin et al., 2003) generated using VMD. Hu proteins bind to AREs in the 3'UTR regions of many short-lived mRNAs, thereby stabilizing them. Cartoon diagram of the HuD1,2 c-fos-11 complex. The RNA is shown as a ribbon model, colored pink and the RRM domain is represented by the green sheets and yellow helices. The RRM fold is composed of one four-stranded antiparallel β-sheet spatially arranged in the order of β4β1β3β2 from left to right when facing the sheet and two α helices (α1 and α2) packed against the β-sheet. The hydrophobic core of the domain contains most of the conserved residues except for four conserved residues that are involved in RNA binding. These residues are known as RNP1 and RNP2. They are positioned between the central strands of β-sheets, namely β3 and β1.


17

1.6.3 The Arginine rich motif (ARM)

Another motif found in viral, bacteriophage and ribosomal proteins that mediate RNA

interactions is the arginine-rich motif (ARM), which consists of short arginine rich

sequences. Notable examples of RNA-binding proteins with the arginine rich motif are Rev

and Tat. Rev is a regulatory RNA-binding protein that mediates the export of the unspliced

HIV pre-mRNAs from the nucleus while Tat plays a role in HIV pre-mRNA transcription

(Calnan et al., 1991). Unbound ARMs generally are unfolded and can adopt a variety of

conformations upon RNA binding (Figure 1.8 A, B), often with a concomitant change in

RNA structure. In the case of HIV, Tat remains in an extended conformation when bound to

HIV TAR RNA but causes a large conformational change in the RNA, inducing stacking

between the two helical stems and formation of a U–A:U base triple (Calabro et al., 2005).

The arginines in the ARM motif function by facilitating two distinct interactions. Firstly,

non-specific affinity for the RNA phosphate backbone is increased by the positive charge

of the arginine. The other interaction involves specific hydrogen bonding networks with the

RNA bases (Calnan et al., 1991).

Figure 1.8 (A): NMR structure of the Jembrana disease virus (JDV) Tat Arginine rich motif (ARM) – bovine immunodeficiency virus (BIV) TAR RNA complex (PDB code: 1ZBN) (Calabro et al., 2005). The peptide (Tube representation), which is unstructured in the absence of RNA, inserts deeply into the RNA major groove (Orange, line representation of the RNA) and adopts a β-ribbon-like conformation, with two antiparallel strands, residues 71–73 (Pink) and 77–79 (Green), linked by a sharp turn residues 74–76 (Blue). The figure was generated using the program VMD. (B )Comparison of Tat ARM domains and TAR RNAs. The HIV-1, BIV, and JDV Tat ARM domains are aligned based on homology between the N-terminal activation domains (partially shown, with conserved residues shaded). Analogous residues in the BIV and the JDV Tat ARM are shown in bold

A


18

1.6.4 Double-stranded RNA-binding domain

Another common RNA binding motif is the double-stranded RNA-binding domain

(dsRBD). They bind double-stranded RNA but not dsDNA and DNA-RNA hybrids. The

dsRBD is found in proteins such as Xenopus rbpa, Drosophila Staufen, RNase III, the

protein kinase PKR and ADAR family of adenosine deaminase, which have diverse

functions in transcription, RNA processing, mRNA localization and translation (Kim et al.,

2006). Study of several structures of dsRBDs reveals that dsRBD forms a compact protein

domain with a α–β–β–β–α topology in which two α-helices are packed against the same

face of a three-stranded antiparallel β-sheet (Figure 9A) (Bycroft et al., 1995; Kharrat et al.,

1995). Furthermore, the molecular basis for the complex has been revealed in several

structures of single dsRBD–dsRNA complexes (Ramos et al., 2000; Ryter et al., 1998; Wu

et al., 2004). A single dsRBD recognizes two consecutive minor grooves and the dominant

major groove on one face of the RNA helix. The first α-helix (α1) of the dsRBD interacts

with a minor groove of the RNA helix or UUCG or AGNN tetraloop. The loop between β1

and β2 of the dsRBD interacts with a successive minor groove. The β3–α2 loop interacts

with intervening major groove (Figure 9B). These contacts are mainly involved with 2′-

hydroxyl groups of the ribose sugar providing some insight to their preference of RNA

rather than DNA (Kim et al., 2006).

B


19

Figure 1.9: The solution structure of Rnt1p dsRBD complexed to the 5' terminal hairpin of one of its small nucleolar RNA substrates, the snR47 precursor (PDB code:1T4L). Rnt1p, a member of the RNase III family of dsRNA endonucleases, is a key component of the Saccharomyces cerevisiae RNA-processing machinery (Wu et al., 2004). The Rnt1p dsRBD has been implicated in targeting this endonuclease to its RNA substrates, by recognizing hairpins closed by AGNN tetraloops. (A) Cartoon representation of Rnt1P dsRBD forms a compact protein domain with a α–β–β–β–α topology in which two α-helices (Red) are packed against the same face of a three-stranded antiparallel β-sheet (Gold). (B) Cartoon representation of Rnt1p dsRBD complexed to the 5’ terminal hairpin of one of its small nucleolar RNA substrates the snR47 precursor. The RNA is the blue and cyan ribbons. The dsRBD contacts the RNA at successive minor, major, and tetraloop minor grooves on one face of the helix. α helix 1 is positioned into the minor groove of the RNA tetraloop. The loop between β1 and β2 of the dsRBD interacts with the minor groove as well. The β3–α2 loop interacts with intervening major groove. The figure was generated using the program VMD.

A B


20

1.6.5 KH Motif

The K homology (KH) domain is another type of eukaryotic RNA-binding domain. The

KH unit is a motif located in many proteins that are found to be in close association with

RNA (Musco et al., 1996). The KH motif was originally detected in hnRNP K (Dejgaard et

al., 1994; Matunis et al., 1992) and subsequently identified in a variety of nucleic acid

binding domains from eukaryotes, eubacteria and archaea (Dejgaard et al., 1994; Gibson et

al., 1993; Siomi et al., 1993). A number of KH domain proteins for which biological

function and RNA targets have been recognized include the Nova proteins, associated with

the regulation of pre-mRNA splicing; the zipcode-binding protein 1, implicated in mRNA

subcellular localization; the fragile X mental retardation syndrome protein (FMRP),

implicated in translational regulation and poly (C) binding protein (also known as αCP and

hnRNP E proteins); and hnRNP K, implicated in mRNA stabilization and translation

(Kiledjian et al., 1995; Lewis HA, 1999; Musunuru et al., 2004). KH domains are also

found in a number of other RNA-binding proteins. Some of these include the proteins

vigilin, transcription activator FBP and the bacterial nusA protein (Worbs et al., 2001).

The arrangement and the number of KH domains in a given protein differ. KH containing

proteins can have from 1 to up to 14 KH domains. The STAR (signal transduction and

activation of RNA) family of RNA-binding proteins contains a single KH domain. This

family includes the cell signaling protein Sam68 (Lukong et al., 2003). The FMR protein

contains two KH domains, which are closely spaced. The three KH domain family includes

hnRNP K (Siomi et al., 1993), αCP1 (PCBP-1 and hnRNP-E1), αCP2, αCP3 and αCP4

(Leffers et al., 1995), as well as Nova-1 and Nova-2 the latter being involved in neuronal

RNA metabolism (Buckanovich et al., 1997). The organization of the three KH domains in

this group of proteins is such that the first two domains are closely located at the N-

terminus which are linked by a variable segment to the third domain at the C-terminus

(Figure 10) (Makeyev et al., 2000). The transcription activator FBP contains four KH

domains, which are regularly spaced. The final group, which contains 14 closely arranged

KH entities, includes the lipoprotein-binding protein, vigilin (Duncan et al., 1994;

McKnight et al., 1992; Musunuru et al., 2004).


21

Figure 1.10: Schematic representation of the αCP1 and KH domain arrangement. Similar domain arrangements are observed in other members of the PCBP family. The numbers represent the cloned amino acid boundaries for the KH domains in this thesis. This naming system for the full-length protein and the KH domains will be used throughout the rest of the thesis.

The focus of the current study is the KH domain. This domain will be further discussed

below especially in the context of the αCP proteins.

1.7 αCP proteins The α globin poly (C) binding proteins (αCP) (also known as hnRNP E, PCBP) belong to

an abundant and widely expressed family of RNA-binding proteins. αCP and hnRNP K are

the two major poly (C) binding proteins in the cell. The conservation of αCP across

species, abundant expression and their presence in an extensive number of tissues suggest

that they play a role in important cellular functions (Kong et al., 2003).

There are five major αCP isoforms present in human and mouse tissues including αCP1,

αCP2, αCP3, αCP4 and a splice variant of αCP2, αCP2-KL (Chkheidze et al., 2003). The

highest degree of homology is present between αCP1 and αCP2. They share 89% amino

acid sequence identity (Tommerup et al., 1996). A significant deviation in the amino acid

sequence appears in αCP3 while αCP4 has the most divergent amino acid sequence

(Makeyev et al., 2000). Some of the αCP isoforms including αCP1, αCP2 and αCP2-2KL

are present in both the nucleus and the cytoplasm, while αCP3 and αCP4 are only found in

the cytoplasm. Both αCP1 and αCP2 possess a nuclear localization signal (NLS I), located


22

between the KH2 and KH3 segment, which is hypothesized to contribute to a shuttling role

between the nucleus and the cytoplasm (Chkheidze et al., 2003). Furthermore, the presence

of another NLS (NLS II) within the KH3 segment of αCP2, which together with NLS I is

crucial for its nuclear accumulation, suggests a possible relationship between RNA-binding

domain and sub-cellular transport. A possible mechanism of action suggests that initial

binding of αCP2 to its RNA target through KH3 in the nucleus blocks access to NLS II and

hence promotes translocation to the cytoplasm where the αCP2-RNA complex achieves its

cytoplasmic function. Upon dissociation of the complex NLS II would be exposed again,

which would then direct the protein back to the nucleus (Chkheidze et al., 2003).

Each αCP isoform contains three KH domains. The KH domain is present in a number of

RNA-binding domains and can interact with four to five contiguous bases in a target RNA.

The interaction with RNA may be influenced by posttranslational modification as it has

been shown that phosphorylation of αCP1 and αCP2 greatly reduces their RNA-binding

activity (Makeyev et al., 2000). Isolated KH domains are capable of binding RNA

sequences independently and in the case of proteins containing a number of KH domains,

the role of each domain in interacting with the target RNA is not yet fully understood.

The target site of αCPs is usually located at the 3’UTR of the mRNA. The sequence present

at the 3’UTR is a single-stranded C-rich motif and binding of αCPs to this motif has been

correlated to a number of transcriptional and posttranscriptional regulatory processes,

including apoptotic and developmental processes (Du et al., 2004). αCPs have been shown

to bind to C-rich patches of several mRNAs including α globin (Kiledjian et al., 1995),

collagen-α1 (Lindquist et al., 2000), tyrosine hydroxylase and erythropoietin, affecting

either mRNA stabilization and or translation (Czyzyk-Krzeska et al., 1999; Paulding et al.,

1999).

The initial role of αCP in the stabilization of mRNA was identified using α2 globin

mRNA. The stability of globin mRNA is dictated by the formation of a complex at a C-rich

sequence at the 3’UTR. This α complex consists of a number of proteins including

αCP1and αCP2 (Ji et al., 2003). It is widely accepted that the presence of the α complex


23

presents a general feature of high level of mRNA stability (Holcik et al., 1997). A number

of experimental data support this hypothesis. A reduction in mRNA was observed in vivo

and in vitro when αCP’s ability to form the α complex was destroyed through mutations in

the 3’UTR. Further studies suggest that the α complex protects the poly (A) tail from rapid

degradation (Holcik et al., 1997) A specific endoribonuclease that cleaves within the C-rich

region where αCP binds has been identified. It has been also suggested that αCP blocks the

endoribonuclease site and hence protects the mRNA from degradation (Ji et al., 2003).

αCP not only influences mRNA stabilization but also plays a role in translational control

(Waggoner et al., 2003a; Waggoner et al., 2003b). Binding of αCP and hnRNP K to the

3’UTR differential control element (DICE) of the LOX mRNA appears to keep the RNA in

a translationally silent state until the later stages of erythroid differentiation (Ostareck et al.,

2001; Ostareck et al., 1997). Interestingly, αCP is also implicated in translational

enhancement, such as binding to the 5’UTR cloverleaf structure and the stem-loop element

of the picornavirus mRNA influencing the efficiency of the cap independent mRNA

(Parsley et al., 1997). In contrast to the 3’UTR binding site of αCP in other mRNAs, the

two viral C-rich sites are in the form of structured mRNA. There are also additional studies

implicating the role of αCP in a number of other viral mRNAs (Blyn et al., 1997; Graff et

al., 1998). In addition, αCP has recently been discovered as a novel rennin mRNA-binding

protein that targets a cis-element in the 3'-UTR and regulates rennin production (Adams et

al., 2003).

Furthermore, αCPs have also been associated with translational recruitment of inactive

mRNAs during the early development of Xenopus embryo, by the regulated extension of

poly (A) tail (Paillard et al., 2000). It is therefore evident that αCPs are involved in a

number of post-transcriptional regulatory pathways directly affecting mRNA stability,

modification and expression. The interactions of αCP proteins with target mRNA appear

sequence specific but the underlying mechanism of these interactions have only been

partially elucidated.


24

The interaction of αCPs is not just limited to RNA. αCP’s are also capable of binding to

single stranded DNA (ssDNA). Such interaction has been shown to have a regulatory

function in transcription. A recent study shows αCP binding to a polypyrimidine region in

the proximal mouse opiod receptor (MOR) promoter leading to transcription activation

(Kim et al., 2005). Specific binding of the closely related hnRNP K to a single stranded

pyrimidine sequence in the promoter region activates transcription of the human c-myc

gene (Michelotti et al., 1996). In addition, Du et al, (2005) have shown high affinity

binding of hnRNP K and αCP to a C-rich stretch of human telomeric DNA. The functional

importance of these interactions is not well understood. However, studies carried out to

date imply that proteins of the PCBP family may participate in the regulation of telomere

length and telomerase activities (Bandiera et al., 2003; Du et al., 2005; Lacroix et al.,

2000).

αCP and hnRNP K are structurally similar. Primary sequence analysis of these proteins

shows the KH motifs of these two proteins are more closely related to each other than the

KH domains from the same protein. The conservation of the KH domain number, sequence,

organization and their binding to the DICE element of the lipoxygenase (LOX) mRNA,

suggests that these proteins possess a similar mode of action. However the specific function

and role of αCPs in translational enhancement and stabilization of specific mRNAs

suggests that these proteins have separate RNA-binding specificities (Makeyev et al., 2000;

Thisted et al., 2001).

Despite comprising the major poly (C) binding proteins in the cell, hnRNP K and αCP have

quite distinct optimal binding sites. αCP binds to the 3’UTR of α globin mRNA leading to

its stabilization, while hnRNP K cannot form the α-complex at this site (Chkheidze et al.,

1999). The binding of αCPs is not a result of simple recognition of poly (C) sequence as it

has been shown that αCP-2KL binds to 3’UTR of globin mRNA with higher affinity than

to poly (C) homoribopolymers (Wang et al., 1995). In addition, it has been experimentally

shown that mutations outside the binding side affect the formation of the α complex,

suggesting that it is not only the sequence but also the structural motifs present in the RNA

which influence the preference and affinity of the cognate RNA (Wang et al., 1995).


25

1.7.1 Protein-Protein interaction

αCP proteins are also involved in protein-protein interactions. αCP2 can form homodimers

and in yeast two hybrid experiments and interaction of αCP2 was observed with a number

of proteins including hnRNP L, hnRNP K and hnRNP I (Kim et al., 2000). The N-terminal

half of αCP2 including the two exons after the second KH domain is required for both

homodimerization and interaction with these proteins (Makeyev et al., 2000). Further

studies on the interaction of these proteins are required in order to determine their

functional significance.

The closely related hnRNP K protein can also dimerise and oligomerise with a number of

proteins. It interacts with a number of signal transduction proteins including the Src family

tyrosine kinases, proto-oncogene Vav (Bustelo et al., 1995) and protein kinase C (Schullery

et al., 1999). These observations suggest a role for hnRNP K in cell signaling. Its

interaction with TATA-binding protein and with transcriptional repressors proposes a role

in transcriptional regulation (Michelotti et al., 1996). In addition hnRNP K and αCP2 can

interact with each other and share a number of common binding partners including Y-box

binding protein, splicing factor 9G8, and hnRNP L (Kim et al., 2000; Shnyreva et al.,

2000). The basis of such interactions is not well understood, however interaction of cell

signaling proteins and the transcriptional repressors with hnRNP K appears to be through a

proline rich domain denoted KI. Conversely, interaction with αCP2 is mediated through the

N terminal KH domain of hnRNP K (Makeyev et al., 2002). Moreover, recently it has also

been shown that hnRNP K interacts with neuron specific RNA binding protein HuB

through its RGG box, which is located between the second and third KH domains (Yano et

al., 2005).

In addition, the isolated third KH domain of Nova-1 protein homodimerises in solution, in

the absence of RNA and without the involvement of other parts of the full-length protein

(Ramos et al., 2002). The residues involved in the binding interface are conserved between

KH1 and KH3 of the Nova- 1 protein. As a result, in vivo protein interactions can occur

through KH1 and KH3, which can cooperatively increase the dimerisation affinity (Ramos

et al., 2002). Such homo and hetero-dimerisation of KH domains may prove very important


26

to the functioning of αCP proteins. Interestingly, this mechanism is not novel, as it has

been described with DNA-binding proteins (Ramos et al., 2003).

1.7.2 αCP KH motif-synergy

The availability of multiple KH domains in αCP proteins presents the question of which

domain or combination of domains is responsible for binding? In vitro experiments reveal

that single KH domains have displayed binding but whether each KH domain in the full-

length protein participates in binding is not known yet (Makeyev et al., 2002). Previous

studies of the closely related hnRNP K revealed that minimal binding was observed to the

poly CT DNA sequence for first and/or second KH domains while the isolated third KH

domain exhibited a decreased affinity compared to the protein as a whole (Braddock et al.,

2002a). On the other hand, Ito et al (Ito et al., 1994) reported that the binding of hnRNP K

to dC-rich oligonucleotide is through the KH3 domains. In contrast, Siomi et al (Siomi et

al., 1994), proposed that all three KH domains are capable of binding under strict

conditions (1 M salt). Moreover, SELEX studies revealed that the RNA target that bound

hnRNP K consisted of a single 6-7 nt long C-rich box, suggesting that only one of the three

KH domains participated in RNA binding activity (Makeyev et al., 2002). Recent studies

have shown cooperative binding of hnRNP K KH domains to mRNA targets (Paziewska et

al., 2004). Paziewska et al (Paziewska et al., 2005) have shown using the yeast three-hybrid

system, that the three KH domains bind synergistically and that a single KH domain binds

RNA weakly compared to the full-length hnRNP K.

For αCP1 and αCP2, filter binding assays showed that αCP1-KH1 and KH3 are capable of

binding with high affinity and specificity to a poly (rC) homopolymer while αCP1-KH2 did

not exhibit such activity (Dejgaard et al., 1996). However, in the same SELEX study as

hnRNP K mentioned above (Makeyev et al., 2002), it was shown that the RNA target

identified for αCP-2KL contained three C-rich patches, suggesting a three prong interaction

between this protein and its RNA target (Makeyev et al., 2002).


27

Figure 1.11: Sequence alignment of the KH domains from the known PCBP (αCP) proteins, PCBP1-4 and hnRNP K. Alignments were carried out using the program ClustalX and ESpript. The sequence shown for the domains correspond to residues 11-82 in the full-length protein. Secondary structures were based on the crystal structure of the PCBP1 KH1 (PDB code: 2AXY). In another study trying to identify the role of the KH domains in αCP binding in the

cloverleaf and stem-loop IV structures at the 5’UTR of poliovirus, purified recombinant

proteins (KH1, KH2, and KH3) of both αCP1 and 2 were used for binding reactions with

radio labeled RNA probes (Dejgaard et al., 1996). The results showed that the

corresponding domains from both αCP1 and 2 interact with the RNA probes in a similar

fashion and the domains behaved as described previously. Although both the KH1 and

KH3 domains bind to poly (rC) homopolymers , only KH1 was capable of specifically

interacting with the poliovirus RNA structures in RNA electrophoretic shift assays

(REMSA). In addition, mutation of KH1 within PCBP2 led to the most alteration in RNA

binding. These data indicate that the KH1 domain is the major RNA-binding determinant

for recognition of poliovirus-specific RNA targets by PCBPs. However, the KH2 and KH3

domains must also play an essential part in these interactions because mutations of the

highly conserved tetra-peptide motif (Gly-X-X-Gly), which has been shown to directly

contact the RNA target in these domains, have a profound effect on the binding by the full-

length protein (Musco et al., 1997; Musco et al., 1996). It was initially thought that

mutations in the KH2 and KH3 domains of full-length PCBP causes a structural change in

the protein or promote misfolding of the protein in E. coli. However, this appears very


28

unlikely as comparable expression and solubility levels were obtained for both the

recombinant and wild type αCP2 protein. Moreover, the mutation was made in the flexible

loop of the KH domain structure and sequence alterations in this site would not be expected

to change the overall structure of the protein. It is not known how KH2 and KH3 stabilize

the interaction of αCP2 with the viral RNA. It appears that all three motifs must be linked

within a single polypeptide to have optimal affinity for the RNA also shown by RNA

REMSA (Silvera et al., 1999).

A more recent study reveals not only a difference in RNA binding activities of αCP2 KH

domains but also distinct functions in poliovirus translation and RNA replication (Walter et

al., 2002). The integrity of the first KH segment in αCP2 was shown to be absolutely

essential for translation initiation of the Poliovirus (PV) internal ribosome entry site (IRES)

element and for replication of PV RNA, consistent with previously published data (Silvera

et al., 1999; Walter et al., 2002). On the other hand, an intact second KH domain of αCP2

was not required for translation initiation on the PV IRES element or for PV RNA

replication. It has also been shown that, an intact third KH module functions to mediate

efficient translation initiation on the PV IRES element, but is not essential for replication of

PV RNA (Walter et al., 2002). Taken together, these studies suggest distinct roles for KH

domains of αCP2 in PV translation and RNA replication.

Based on these observations it has been suggested that KH domains may collaborate and

bind cooperatively within the full-length protein (Makeyev et al., 2002). The spatial

arrangement of such multiple interactions awaits a detailed three-dimensional structure of

the full-length protein with its target RNA. However, in order to fully appreciate the

contribution of each domain in the whole protein, both structural and binding studies of the

intact protein are required to provide additional insight into this area.

Despite the absence of a three-dimensional structure of the full-length protein, there are a

number of structures available for independent KH motifs in the absence as well as in the

presence of oligonucleotide (Braddock et al., 2002b; Du et al., 2005; Musco et al., 1996).

They have provided a wealth of information and some insight into the mechanism of KH


29

oligonucleotide binding strength and specificity, such as hydrogen bonding interactions and

the insertion of bases into the hydrophobic protein pockets.

1.7.3 KH Structure

The first structure of a KH domain was solved by Musco and coworkers (Musco et al.,

1996). They showed, using NMR, that the KH domain 6 of vigilin consists of three

antiparallel β-sheets packed against three α-helices. The KH domain was originally thought

to comprise of 45-55 amino acid residues but subsequently, based on structural studies, the

domain boundaries were redefined to 68-72 amino acids. This extension in the domain was

found to be essential for structural stability of the domain (Musco et al., 1996).

NMR and crystallographic studies revealed that the 45 amino acid motif corresponded to

βααβ and more extensive structural studies led to the identification of two types of KH

domains (Grishin, 2001). The type I KH domain contains a C terminal βα extension (KH3

of αCP1), while type II includes an N terminal αβ extension (ribosomal protein S3). The

three KH modules of αCPs are predicted by sequence alignment to fall in the type I KH

family comprising of a βααββα configuration (Figure 1.12). The structures of several

independent KH motifs all consist of similar three-stranded antiparallel β-sheets packed

against three α helices. A number of conserved hydrophobic residues are interspersed

throughout the domain, some of which extend their side chains from α-helix 2 to form

contacts with the inner face of the β-sheet, presenting a hydrophobic environment for

oligonucleotide binding (Jensen et al., 2000). Furthermore, all KH modules consist of two

distinct loops known as the GXXG loop and the variable loop. These loops play a

significant role in the recognition and binding specificity of nucleic acids.


30

Figure 1.12: The crystal structure of αCP2-KH1 (residues 11–82) solved to 1.7 Å resolution depicted in cartoon form (Du et al., 2005)The structure is shown from the beginning of β-strand 1 to the end of α-helix 3. The GXXG motif is colored green. The ‘variable loop’ region between β-sheets 2 and 3 is colored red. These regions bind the hydrophobic oligonucleotide-binding cleft that accommodates C-rich RNA and ssDNA. The figure was generated using VMD.

1.7.4 KH and oligonucleotide interaction

A number of physiological oligonucleotide sites are known for KH domains and it is also

understood that these domains can bind a range of target structures. The size of some

currently recognized oligonucleotide targets range from 7 to 75 oligonucleotides, but each

individual domain recognizes only four core recognition bases. The binding affinity of

individual KH domains falls in the micromolar range indicative of weak binding, while for

the full-length protein it ranges from 10-6 to 10-9 M (Backe et al., 2005; Makeyev et al.,

2002; Paziewska et al., 2004).

A number of structures of KH domains with either RNA or ssDNA have been studied using

NMR and X-ray crystallography. These structures include Nova-2-KH3/RNA, hnRNP K-


31

KH3/DNA, αCP2-KH1/DNA (Figure 1.12 and 1.13) and FBP-KH3 and KH4/ssDNA

(Braddock et al., 2002b; Du et al., 2005). A comparison of the structures has resulted in

several conclusions by the authors. Each KH domain recognized a core motif of four

nucleotides. The four core bases 5’-UCAY-3’ (RNA) for Nova-2-KH3, 5’-ACCC-3

(DNA)’ for αCP2 KH1, 5'-TTTT- 3’ (DNA) for FBP KH3, 5'-ATTC-3’ for FBP-KH4 and

5’ T/CCCC-3’ for hnRNP K-KH3. In each of these structures only pyrimidines were found

at the first and fourth positions. Furthermore, the first and fourth positions were not

involved in highly specific interactions. Position two and three of the core motif were

involved in a number of base specific interactions and these interactions were mostly

conserved. Upon oligonucleotide binding, there were no large conformational changes

except in the flexible regions of the molecule. These structures agree with the position of

the oligonucleotide binding cleft. The oligonucleotide lies in a narrow groove between the

invariant Gly-X-X-Gly motif and the variable loop, which readily accommodates

pyrimidine over purine bases owing to their smaller size (Figure 1.13). The nucleotides of

the core sequence, and also a number of water molecules participate in a dense network of

hydrogen bonds, hydrophobic interactions and stacking interactions (Backe et al., 2005; Du

et al., 2005; Jensen et al., 2000).

Figure 1.13: crystal structure of the αCP2 KH1-human telomeric (ht) DNA complex (Du et al., 2005). The KH domain is shown in cartoon representation in tan. The htDNA is shown using a sticks representation colored by elements. Secondary structure elements of the KH domain are labeled. The GXXG motif is colored red. The ‘variable loop’ region between β-sheets 2 and 3 is colored pink. These regions bind the hydrophobic oligonucleotide-binding cleft that accommodates C-rich RNA or ssDNA. The figure was generated using VMD.


32

These observations have all shed some light on the binding mode of the KH domains and

their target. Structural studies of KH domains in the presence of oligonucleotide have

revealed a number of critical residues involved in specific interactions. For example the

complex structure of hnRNP K-KH3/DNA specific recognition of the DNA tetrad is

achieved by a number of hydrogen bonds involving residues Ile29, Ile36, Ile49 and Arg59

(for example Arg59s NH1 and NH2 hydrogen bond to the Cyt3 N3 and O2). Electrostatic

interaction with the DNA backbone is made from the backbone amide of Gly32 and the

side chains of Lys31, Lys37 and Arg40 (Backe et al., 2005). A number of corresponding

residues also make contact with the DNA in the complex structure of PCBP2-KH1

structure. Some of these include Ile29, Lys31, Lys32, Val36, Lys37, Arg40, Ile49 and

Arg57 (Du et al., 2005). Based on structural data on key residues participation in

oligonucleotide recognition and sequence alignments information, all KH1 and KH3

domains should be able to bind poly (C) sequences in a similar manner. In addition, the

KH2 domain should also be able to specifically recognize at least two cytosines at the

second and third position of the core sequence.

RNA and DNA recognition by KH domains is very similar. For example, a comparison of

the complex structures of Nova-2-KH3 with RNA and PCBP2 KH1 with DNA reveals a

number of similarities. Although the sequence specificity for nucleic acid recognition are

different, the overall structures are similar and they both adopt a common binding groove

(Figure 1.14A, B, C). Moreover, the four core recognition bases adopt similar conformation

and share the same location with very similar orientation (Du et al., 2005).


33

Figure 1.14: Analysis of KH domains with their target RNA or DNA. (A) Structure-based sequence alignment of the KH domains of αCP2-KH1, Nova-2-KH3, hnRNP K-KH3 and FBP-KH3 and KH4. Conserved residues are colored and the GXXG and variable loop contacting oligonucleotide are indicated. (B) Backbone superposition of the KH domains from the same proteins. (C) The α-carbon deviation for each KH domain residue from the corresponding aligned residue of αCP2-KH1 is plotted versus amino acid residue number. The C root mean square (r.m.s.) of these proteins are very similar except in the variable region. The figures were generated using ESPript and VMD.

C

A

B


34

1.7.5 KH domain-containing proteins and disease

There is substantial genetic evidence, from various species, supporting a physiological role

of the KH domain. For example, in humans gene lesions that interfere with the expression

of the KH protein FMR1 (fragile mental retardation), lead to the fragile X mental

retardation syndrome (Di Fruscio et al., 1998; Pieretti et al., 1991; Verkerk et al., 1991).

The clinical significance of the KH domain was illustrated by a point mutation, changing of

a conserved isoleucine 304 to an asparagine residue in the second KH domain of FMR1 (De

Boulle et al., 1993). This particular point mutation modifies the structure of the KH domain

(Musco et al., 1996) and impairs RNA binding activity (Siomi et al., 1994). A cytoplasmic

protein GLD-1 in C. elegans, which is required for germ cell differentiation (Francis et al.,

1995a; Francis et al., 1995b; Jones et al., 1996) leads to a recessive tumorous germ line

phenotype upon alteration of glycine 227 within the KH domain (Jones et al., 1995).

Interestingly, this conserved glycine forms part of the RNA-binding surface (Musco et al.,

1996). Mutation of the corresponding residue also abolishes RNA-binding in Sam68, which

is an RNA-binding protein, that associates with c-Src in mitosis (Chen et al., 1997;

Fumagalli et al., 1994). In mice, oligodendrocyte differentiation and subsequent formation

of myelin requires the Quaking gene. Quaking encodes Qk1, a member of the highly

conserved STAR/GSG family of RNA-binding proteins. Qk1 has been implicated in the

regulation of alternative splicing, stability, and translation control of mRNAs that code for

myelin structural components in glial cells. In mice, mutation in the Quaking gene greatly

impairs myelination and as a consequence, the mice develop a rapid tremor at postnatal day

10 (Ryder et al., 2004; Sidman et al., 1964). A missense mutation in the GSG domain part

of the KH domain of Qk1 is embryonic lethal. This point mutation has been observed to

hinder homodimerization and may be the reason for the lethality observed in mice (Chen et

al., 1998b; Ebersole et al., 1996). The Drosophila Bicaudal C (Bic-C) contains five KH

domains and gene lesions that truncate the Bic-C protein or a point mutation that replaces

glycine 295 with an arginine in the third KH domain results in defects in RNA-binding and

oogenesis (Mahone et al., 1995; Saffman et al., 1998).


35

As RNA-binding and recognition by the KH domains are playing an emerging role in

human disease, it is important to understand the underlying mechanism of such

interactions. Therefore I have taken a structural approach to understanding this interaction

in a human disease paradigm.

1.8 Androgen receptor and prostate cancer A disease in which mRNA stability plays a role is prostate cancer. Prostate cancer is a

leading cause of male cancer mortality in Western societies. The prostate gland is

approximately the size of a walnut, which weighs about 20 grams, located immediately

below the bladder surrounding the urethra. Normally the prostate gland is highly androgen

dependent for growth and morphogenesis (Garnick et al., 1996). Consequently, androgen

deprivation leads to a dramatic regression of the gland. Similarly prostate cancer relies on

the presence of androgen action to stimulate its initial development and progression. In the

initial stages of prostate cancer depletion of androgen suppresses the proliferation of cancer

cells (Figure 1.15), although in the later stages the cells become insensitive to androgen

(Kati et al., 2006; Koivisto et al., 1997). Although hormonal manipulation is an important

step in the treatment of metastatic prostate-cancer, androgen responsiveness is transient

with ultimate relapse of disease with continued androgen blockade (Bubley et al., 1996).

Figure 1.15: Androgen ablation kills prostate cells. Prostate cells, along with prostate cancer cells, require the presence of androgens. Thus, the removal of androgens kills a large majority of prostate cancer cells. Hormone therapy or androgen ablation is still common practice today. The release of hormones from the higher brain centers, dictate the release of testosterone from the testes. The removal or blocking the release of testosterone leads to the atrophy of the prostate and death of the majority of the prostate cancer cells. FSH is follicle stimulating hormone and LH is luteinizing hormone.


36

Research group led by Professor Leedman (Yeap et al., 1999) has demonstrated that

androgen receptor (AR) mRNA stability is a major determinant of androgen receptor gene

expression in prostate cancer and that androgens regulate AR mRNA (Kati et al., 2006;

Yeap et al., 1999). The AR mediates the primary action of androgens in androgen-sensitive

tissues, and is a member of the superfamily of nuclear receptors regulating gene expression.

Members of the superfamily are characterized by a central DNA-binding domain composed

of two highly conserved zinc finger protein motifs which bind specific DNA sequences or

response elements within target genes and regulate transcriptional activity (Figure 1.16).

The carboxy-terminal portion of the receptor functions as the ligand-binding domain, and

binding of specific ligands to their cognate receptor modulates transcriptional activation

(Bubley et al., 1996).

Figure 1.16: The androgen receptor (AR). It is an intracellular hormone receptor, which is present in tissues that respond to androgens. The AR protein has three domains, the transcription regulation domain, DNA binding domain and ligand-binding domain, each with a unique function. The AR promotes the expression of genes that are hormonally controlled by androgens.

Androgens such as testosterone (T) and dihydrotestosterone (DHT) are steroid hormones

with a central role in male sexual differentiation and maintenance of body composition.

Androgens bind AR (Figure 1.17), leading to a cascade of responses including the

proliferation of specific cancer cells, as in prostate cancer.


37

Figure 1.17: The action of testosterone and androgen receptor (AR). Testosterone is converted to dihydrotestosterone. The AR upon binding DHT activates and dimerizes, which then enters the nucleus binding to the androgen response element leading to gene expression.

1.8.1 AR mRNA stability and RNA binding proteins

Studies conducted by the Leedman lab (Yeap et al., 2002) have shown that the proximal

3’UTR of AR mRNA contains a UC rich region, which acts as a cis element. Their studies

suggested that this region plays a role in AR mRNA turnover using Luciferase (Luc)

reporter transfection assays in LNCaP prostate cancer cells. They examined the change in

basal Luc activity induced by the UC-rich region, and showed that the presence of the AR

UC-rich sequence reduced reporter activity by 30%. As the AR UC-rich element was

capable of regulating the Luc reporter, they also investigated if this region was a target for

RNA-binding proteins using RNA electrophoretic mobility shift assay (REMSA). Multiple

RNA-protein complexes that bound to the 32P-labeled AR UC-rich transcript were

identified in LNCaP cells. Competition studies with unlabeled RNA confirmed specificity

for the AR UC-rich probe. UV cross-link (UVXL) analysis was performed to further define

the proteins binding the UC-region. The 32P-UC transcript bound multiple distinct RNA-

binding proteins from LNCaP cytoplasmic extract. Two specific proteins with masses of ~


38

43 and 36 kDa were identified, the binding of these were significantly decreased by excess

unlabelled poly (C) and poly (U) respectively. Addition of excess poly (A) had no effect.

Given the close similarity between the UC-rich region of AR and the reported RNA target

sequences for He1-N1 and HuD, Leedman’s group examined whether the 36 kDa RNA-

protein complex contained HuR, the member of the elav/Hu family of RNA binding

proteins that is not restricted to the central nervous system. In REMSA assays a monoclonal

antibody against HuR supershifted the 36 kDa protein and no shift was observed with an

unrelated antibody. To further investigate this interaction, they also used a recombinant

GST-HuR fusion protein and observed a supershift using HuR antibody. In addition, in UV

cross-link of prostate cancer cells, AR mRNA immunoprecipitated with HuR antibody,

indicating close association of HuR and AR in prostate cancer cells.

The 43 kDa protein was subsequently identified as αCP1 and/or αCP2: first, poly (C)

competition abolished the faster migrating RNA protein complex in REMSAs; second, poly

(C) competition in UVXL assay specifically reduced binding of the major 43 kDa RNA-

binding protein; third, analysis of the UC-rich region revealed a conserved CCCUCCC

sequence identified as a component of the αCP binding motif in erythropoietin (EPO)

mRNA (Yeap et al., 2002).

To further investigate the possible identity of the 43 kDa proteins as αCP1 and αCP2, the

Leedman group conducted supershift experiments. αCP1 and αCP2 antibodies each

produced a prominent supershift and UVXL-IP performed on LNCaP cytoplasmic extract

immunoprecipitated a band at 43 kDa .

In addition, they also conducted experiments with the nuclear extracts and identified the

presence of HuR, αCP1 and αCP2 located in the nucleus of LNCaP cells. These data

suggest that each of these proteins may have a role in binding AR mRNA in both the

cytoplasm and nucleus.

Taken together, these studies on AR mRNA-protein interactions formed a strong

foundation upon which to embark on structural studies to examine UC-rich element binding


39

to HuR and the αCPs. Thus, the focus of the current study was designed to better

understand the role of RNA-binding proteins in the regulation of AR mRNA stability. In

particular, I wished to probe the molecular interactions between αCP1 and HuR with the

UC-rich region in the 3’UTR of AR mRNA. I was specifically interested in characterizing

the structural attributes of this multi-protein/RNA complex. An understanding of the

structure of the complex involved in AR mRNA regulation could reveal ways to design

drugs that modulate AR expression and provide a platform for developing new

therapeutics.

1.9 Summary and Research aims The AR is a key modulator of prostate cancer growth and proliferation, and a prime

therapeutic target. The data generated by the Leedman group established that HuR and the

αCPs bind to the AR are likely to contribute to the regulation of AR mRNA stability. Given

the increasing importance of understanding the mechanisms underlying gene expression,

the current study was designed to explore the mechanism of binding of αCP1 to the target

UC-rich sequence at the 3’UTR of AR mRNA, as a starting point to understanding the

larger multiprotein HuR/αCP1-AR mRNA complex (Figure 1.18).

The major aims of this study were:

1) to determine the structural basis of αCP1 for its binding, to the C rich region at the

3’UTR of the AR mRNA, with reference to its affinity and specificity and

2) to characterize the kinetics and binding affinities of the isolated αCP1-KH domains 1, 2

and 3 with their target probe, as well as a variety of other RNA and DNA probes, in an

attempt to determine the strength of binding and the preferred sequence.

1.9.1 Hypotheses that formed the basis of this study

1) The AR mRNA contains an UC-rich region that is the target for the αCP proteins each of

which contributes to overall regulation of AR expression in prostate cells.

2) Specific interference with the binding of these RNA binding proteins to the UC-rich

region could modulate the stability of AR mRNA.


40

3) αCP1 protein binds the C-rich patch of the 3’UTR via its three KH domains.

4) All of the αCP1-KH domains participate in binding the UC- rich element.

5) Each αCP1-KH domain has a different binding affinity for the oligonucleotide probe.

It was envisaged at the outset that these studies would provide a better knowledge of the

complex interactions between αCP1 and AR mRNA at a molecular level, and shed light on

the nature of protein/mRNA interactions in general. Understanding the detailed structural

interactions within the AR mRNA may provide valuable insight into the protein and mRNA

interfaces and identify possible targets for drugs aimed to regulate AR expression in

prostate cancer cells by interfering with the interaction of αCP1 and AR mRNA.

Figure 1.18: HuR/αCP1 and androgen receptor system. A molecular model of HuR RRM domains and αCP1-KH domains binding to the 3’UTR of AR mRNA (Wilce et al., 2002 )

αCP1

Chapter 2 Materials and Methods

Chapter 2: Materials and Methods

41

2.1 Molecular Biology 2.1.1 Materials

The αCP1 coding sequence was provided in a pGEX-6P-2 vector (Pharmacia), and was

available in the laboratory. E.coli XL-Blue cells were used as plasmid hosts during the

cloning and screening procedures for the αCP1-KH domains. E.coli BL21-Codon (plus)

cells (Stratagene) were used as the expression host for the final plasmid products.

Primers, restriction enzymes and the various buffers for the cloning experiments were

purchased from Promega and Sigma. Ampicillin and chloramphenicol were obtained

from Sigma. Bacto-agar and yeast extracts were from Oxoid Ltd (Basingstoke England).

Bactotryptone was obtained from Becton Dickinson (Cockeysville USA). All other

chemicals were of molecular biology or analytical reagent grade. A complete list of all

the chemicals and their suppliers used for this project are listed below in table 2.1

Table 2.1: A list of Chemicals, reagents and consumables with their suppliers used throughout this work. Product Supplier General

Polyethylene Glycol ( PEG 4000, PEG

8000, ) Fluka A.G. (Buchs, Switzerland)

Dithiothreitol Astral (Sydney, NSW, Aus) EDTA IPTG Progen (Darra, QLD, Aus)

BSA (2 mg/mL) Bio-Rad Laboratories (Hercules, CA,

USA)

PreScission Protease Amersham Biosciences (Uppsala,

Sweden)

PMSF Roche (Mannheim, Germany)

Lupeptin

Roche (Mannheim, Germany)

Aprotenin Roche (Mannheim, Germany)

D2O Cambridge Isotope Laboratories, Inc.

(Andover, MA, USA)


42

Table 2.1 continued: A list of Chemicals, reagents and consumables with their suppliers used throughout this work. Product Supplier Electrophoresis

Acrylamide Amersham Biosciences (Uppsala,

Sweden)

N,N’-methylbisacrylamide Amersham Biosciences (Uppsala,

Sweden)

Boric Acid (electrophoresis reagent) Bio-Rad Laboratories (Hercules, CA,

USA)

Coomassie Brilliant Blue R Bio-Rad Laboratories (Hercules, CA,

USA)

Agarose Promega Corporation (Madison, MI,

USA)

Ethidium Bromide Bio-Rad Laboratories (Hercules, CA,

USA)Mark12-Molecular weight marker Novex (Terry Hills, NSW, Aus) Culture Media Bacto Agar Difco (Detroit, MI, USA) Bacto Tryptone Difco (Detroit, MI, USA) Yeast Extract Difco (Detroit, MI, USA) Molecular Biology Reagents 100 bp ladder Promega (Madison, MI, USA) pGem markers Promega (Madison, MI, USA) 6x loading Dye Promega (Madison, MI, USA) 100 mM dNTP’s Promega (Madison, MI, USA) Consumables NMR tubes - high PreScission 528 PP

5-mm O.D. Wilmad (Buena, NJ, USA)

Wizard® Plus SV Minipreps DNA

purification system Promega (Madison, MI, USA)


43

Table 2.1 continued: A list of Chemicals, reagents and consumables with their suppliers used throughout this work. Product Supplier

Syringe filters Pall-Gellman (Northborough, USA)

Millipore (North Ryde, NSW) Glutathione agarose

(Glutathione sepharose 4 fast flow)

Amersham Biosciences (Uppsala,

Sweden) Vivaspin concentrators Vivascience (Hanover, Germany) Membrane filters Millipore (North Ryde, NSW, Aus)

24 well culture plate with cover Linbro (Australian Biosearch, WA,

Aus)

Siliconised glass cover slips Hampton Research (Aliso Viejo, CA,

USA) Desalting Columns Waters 3.5 kDa MWCO Dialysis tubing Pierce (Rockford, Il, USA) Slide-a-lysers (MWCO: 3.5 kDa, 10

kDa) Pierce (Rockford, Il, USA)

2.1.2 Buffers and solutions

All solutions/buffers were prepared using either MilliQ® water (MQW) or double

distilled water (DDI water) unless otherwise stated. The composition of buffers and

solutions used throughout this work are summarised in Table 2.2.

Table 2.2: The composition of buffers and solutions used throughout this project.

Buffer/Solution Composition

TAE (Tris-acetate EDTA) 40 mM Tris (pH 8.2)

40 mM acetic acid

2mM EDTA

1 x PBS (pH 7.4) 8 g/L NaCl

0.2 g/L KCl

1.44 g/L NaH2PO4

0.24 g/L KH2PO4

2 x Loading buffer 31.25 mM Tris-HCl (pH 6.8)

4% SDS (w/v)

20% glycerol (v/v)

10% β-mercaptoethanol (v/v)


44

0.4% bromophenol blue (w/v)

Glycine gel running buffer 0.025 M Tris (pH 8.8)

0.2 M glycine 0.1 % (w/v) SDS

Table 2.2: The composition of buffers and solutions used throughout this project.

Buffer/Solution Composition

2 x Loading buffer 31.25 mM Tris-HCl (pH 6.8)

4% SDS (w/v)

20% glycerol (v/v)

10% β-mercaptoethanol (v/v)

0.4% bromophenol blue (w/v)

Glycine gel running buffer 0.025 M Tris (pH 8.8)

0.2 M glycine 0.1 % (w/v) SDS

Stain 0.125% coomassie brilliant blue

40% methanol

7% glacial acetic acid

Acrylamide stock solution 29% (w/v) acrylamide

1% (w/v) N,N’-methylbisacrylamide

Destain 40% methanol

7% glacial acetic acid

Lysis Buffer 1 x PBS (pH 7.1)

2 mM EDTA

1 mM PMSF

2 mM DTT

0.5% TritonX-100

Wash buffer 1 x PBS (pH 7.1)

1 mM EDTA

PreScission™ protease buffer 50 mM Tris-HCl (pH 7.5)

150 mM NaCl

1 mM EDTA

1 mM DTT

Buffer A 50 mM Tris-HCl (pH 8.1)


45

1 mM DTT

1 mM EDTA

150 mM NaCl

Buffer B 50 mM MES (pH 6.2)

(Morpholinoethanesulfonic acid)

l mM EDTA

1 mM DTT

Buffer C 50 mM MES (pH 6.2)

l mM EDTA

1 mM DTT

1 M NaCl

Buffer D 50 mM HEPES (4-(2-hydroxyethyl)-1-

piperazineethanesulfonic acid (pH

7.00)

1 mM DTT

1 mM EDTA

Buffer E 50 mM HEPES (pH 7.2)

1 mM DTT

1 M NaCl

1 mM EDTA

Table 2.2 continued: The composition of buffers and solutions used throughout this work. Buffer/Solution Composition

Crystallisation Buffer 1 25 mM KH2PO4, K2HPO4 (pH 6.2)

1 mM DTT

1 mM EDTA

150 mM NaCl

0.1 M Na HEPES (pH 7.5)

1.5 M Lithium Sulfate

Crystallisation Buffer 2 50 mM Tris-HCl (pH 8.1)

1 mM DTT


46

1 mM EDTA

150 mM NaCl

0.1 M Na Cacodylate (pH 6.5)

0.2 M Magnesium acetate

30% MPD

NMR buffer 10 mM NaH2PO4 / Na2HPO4 (pH 6.0)

100 mM NaCl

2 mM EDTA

2 mM DTT

10% D2O

Biacore buffer 10 mM Tris-HCL (pH 7.4)

150 mM NaCl

2 mM DTT

2 mM EDTA

125 ug/mL tRNA

62.5 ug/mL BSA

0.5% TritonX-100

2.1.3 Culture Media

All culture media (Table 2.3 and 2.5) were prepared using deionised water, autoclaved

and stored at room temperature before use.

Table 2.3: Culture media composition.

Media Composition

LB medium 10 g/L bacto tryptone

10 g/L NaCl

5 g/L yeast extract

LB agar 10 g/L bacto tryptone

10 g/L NaCl

5 g/L yeast extract

15 g/L agar


47

After autoclaving, the molten LB agar was cooled to 45°C and the appropriate

antibiotics added to a final concentration of 100 µg/mL for ampicillin and 25 µg/mL for

chloramphenicol, before the plates were poured.

2.2 Cloning of αCP1-KH domains

The individual αCP1-KH domains were cloned successfully into the pGEX-6p-2

plasmid as part of my Honours project. These constructs were used for protein

overexpression experiments in order to obtain milligrams of pure protein for further

biophysical studies. However, during my PhD the only domain that was cloned again

was αCP1-KH2. The following section will detail the method employed in the

subcloning of αCP1-KH2.

2.2.1 Polymerase Chain Reaction

The DNA insert encoding residues 97-150 of αCP1 was amplified from the pGEX-6P-

2.αCP1 plasmid using PCR with primers containing the desired restriction endonuclease

sites, BamH1 and EcoR1. The reaction components shown in Table 2.4 were used to

make the master mix. Two different concentrations of magnesium were used to

determine the optimum concentration for highest yield of product. The DNA was

amplified using an iCycler Thermal Cycler (Bio-Rad Laboratories) and the following

program; (i) 95°C for 5 min, (ii) 55˚C for 45 s and 68˚C for 1.30 minutes for a total of

30 cycles. Agarose gel electrophoresis was then carried out to determine the success of

the PCR.

Table 2.4: PCR reaction master mix

Reactants Mg [1mM] Mg [2mM]

10 x reaction buffer: 200 mM Tris-HCL (pH

8.8), 100 mM (NH4)2SO4, 20 mM MgSO4, 1

mg/mL nuclease-free bovine serum albumin

(BSA)

5 μL 5 μL

MgSO4 (50 mM) 2 μL 4 μL

DNA template (10 ng/μl) 1 μL 1 μL

dNTPs (25 mM) 1 μL 1 μL

Primers 1 μg/μl (Reverse and Forward) 1 μL 1 μL

Baxter water 39 37

Total Volume 50 50


48

2.2.2 Agarose Gel electrophoresis of DNA

Agarose gels were prepared by melting agarose in 1xTAE buffer (1 g agarose per 100

mL TAE). 6 x DNA loading buffer was added to the solution containing the DNA and

the mixture loaded onto the agarose gel and electrophoresed at 90V until the dye front

reached 3 cm from the end of the gel. The gel was then stained with ethidium bromide

solution of 0.5 μg/mL, visualised using a UV transilluminator and photographed using a

Bio Gel Doc™ EQ gel documentation system (Bio-Rad Laboratories) processed by

QuantityOne-4.5.0 software (Bio-Rad Laboratories).

2.2.3 Restriction endonuclease digestion of DNA

The αCP1-KH2 PCR derived construct was digested with the appropriate enzyme in

Multi-CoreTM 10x reaction buffer (250 mM Tris acetate (pH 7.8), 1 M potassium

acetate, 100 mM magnesium acetate, 10 mM DTT). 40 μl of the amplified DNA (200

ng) was added to a reaction mixture containing 5.5 μl of reaction buffer, 5.5 μl of 10 X

BSA (1 mg/mL) and 2 μl of the enzymes BamHI and EcoRI (~10 units/μl). The

reactions were incubated at 37˚C for 3 hours.

2.2.4 pGEX-6P-2 vector digestion

pGEX-6P-2 vector (1.4mg/mL) was digested with BamHI and EcoRI. The digestion

reactions involved incubation of 10 μl of the vector with 2 μl of Multi-CoreTM 10 x

reaction buffer, 10 x BSA (1 mg/mL), 2 μl of the appropriate enzymes (~10 U/μl) for 3

hours at 37˚C

2.2.5 Ligation reaction

Prior to the ligation reaction the digested insert and vector were purified using the PCR

kit (Promega. Madison, MI, USA) as described by the manufacturer. Ligation of the

αCP1-KH2 domain into the vector was performed in a final volume of 20 μl. The

reaction contained 14 μl of the insert (170 ng), 2 μl of the T4 DNA ligase buffer (300

mM Tris-HCl (pH 7.8), 100 mM MgCl2 100 mM DTT, 10 mM ATP), 2 μl T4 DNA

ligase (10 units/μl) and an extra 3 mM fresh ATP in addition to that provided in the

ligase buffer. The reaction mixture was incubated for 4 hours at room temperature.


49

2.2.6 Transformation of XL1-Blue competent cells

XL1-Blue competent cells (prepared using the calcium chloride method as described by

(Sambrook et al., 1989) stored at –80˚C were thawed on ice and 100 μl transferred into

an eppindorf. 2 μl of the ligation mixture for αCP1-KH2 domain was gently mixed with

a 50 μl aliquot of cells and the transformation reaction was incubated on ice for 30

minutes. Immediately, following incubation on ice a heat pulse was applied to the

transformation by placing the tube in a 42˚C water bath for 1.30 minutes followed by

subsequent chilling on ice for 2 minutes. 0.5 mL of 2YT (0.4 g tryptone, 0.2 g yeast, 0.1

g NaCl) solution was then added to the transformation mixture followed by incubation

at 37˚C for one hour in a shaker at 220 rpm. 200 μl of each transformation reaction was

then spread onto the surface of LB-agar plates containing, 100 μg/mL ampicillin to

select for transformants which were then incubated overnight at 37˚C.

2.2.7 “Colony screening” and extraction of Plasmid DNA from bacterial culture

Three to five individual colonies that had grown on the LB (Luria-Bertani)- ampicillin

agar plates were selected for αCP1-KH2 construct and used to inoculate 5mL LB

containing 100 μg/mL ampicillin. The bacterial cultures were grown overnight at 37˚C

shaking at 180 rpm. Following overnight growth, plasmid DNA was extracted from 5

mL cultures using the Wizard Plus SV miniprepration Kits according to the

manufacturer’s instructions.

2.2.8 Restriction enzyme digestion

For each of the positive αCP1-KH2 colonies, 10 μl of purified plasmid DNA was

digested with the appropriate enzymes and buffers as described previously (Section

2.2.4). The digested DNA was analysed using 1% agarose gel electrophoresis. The

resultant bands were visualised using a UV transilluminator and photographed using a

Bio Gel Doc™ EQ gel documentation system (Bio-Rad Laboratories) and processed

using QuantityOne-4.5.0 software (Bio-Rad Laboratories). The sequence of plasmids

yielding DNA fragments of correct size was then confirmed by nucleotide sequencing at

the WA Genome Resource Centre, Royal Perth Hospital, WA, Australia.


50

2.3 Protein expression 2.3.1 Background

Bacterial overexpression systems provide a means of obtaining large amount of a

desired protein, which can be used for different biological and structural studies. One of

the most common and extensively used systems for expression and purification of

recombinant proteins is the Glutathione-S-Transferase (GST) gene fusion system. The

GST gene fusion system results in the overexpression of the recombinant protein

attached at its N or C terminal end to the GST protein. GST binds strongly and

specifically to glutathione (a small peptide based molecule) and the GST-protein may

therefore be readily purified by affinity chromatography using glutathione-bound beads.

A specific amino acid sequence also exists between the GST and the recombinant

protein of interest, so that after the affinity chromatography step, the GST may be

cleaved and separated from the protein by a specific protease. In pGEX-6P-2 vectors it

is preScission protease (also a fusion of GST with human rhinovirus protein) that

permits simultaneous digestion of GST-fusion proteins and removal of both GST and

protease from the protein of interest.

2.3.2 Expression of unlabeled Glutathione-S-Transferase Fusion protein

There are several factors that can affect the expression of a GST fusion protein. One

such factor is the amount of inducer, isopropyl B-D thiogalactopyranoside (IPTG). The

first step taken, in the expression of αCP1 and the single KH domains, was to

determine the optimum amount of (IPTG) required for best induction. As a consequence

an IPTG titration was conducted in the following manner outlined below.

Each colony positive for αCP1-KH construct was used to inoculate 5 mL of LB (1%

tryptone, 0.5% yeast, 0.5% NaCl) broth containing 100 μg/mL ampicillin and 25

μg/mL choloramphenicol. All cultures were incubated overnight at 37˚C and shaking at

180 rpm.

LB media (400 mL) containing 100 μg/mL ampicillin and 25 μg/mL chloramphenicol

was inoculated with the 5mL of overnight culture and then incubated at 37˚C with

shaking at 180 rpm. This culture was grown to mid exponential phase (A600~ 0.8,

optical density of cell in media is at 0.8 when measured at λ 600 nm), and then

aliquoted in 20 mL portions into 50 mL Falcon tubes. Each tube was induced with a


51

different final IPTG concentration between 0.2 mM and 0.8 mM. An IPTG titration was

conducted for all the αCP1-KH domains. The induced cultures were incubated for a

further 4 hours shaking at 180 rpm followed by centrifugation for 15 minutes at 5000

rpm, 4˚C. The cell pellets were stored at –20˚C until required. Before and after, 1mL

induction samples were collected and stored for analyses by SDS PAGE. For large-scale

protein expression an IPTG concentration of 0.2 mM was used for all the different

proteins because this was found to be the optimum concentration.

2.3.3 Overexpression of labelled αCP1-KH1 and αCP1-KH3 domains 15N-labelled protein was also prepared for αCP1-KH1 and αCP1-KH3. A fermentor

was used and the method used was adapted from (Cai et al., 1998). This approach

produces a higher cell mass per gram of 15NH4Cl than the conventional shaker flask

method. Table 2.5 summarises the composition of the minimal media. 2 L of the basic

salt solution was prepared for each fermentor grow-up. Salt solution (1.6 L) was

autoclaved in the reaction vessel and 400 mL in a baffled flask. A BIOFLO III

Fermentor/Bioreactor (New Brunswick Scientific, NJ, USA) was used.

A single colony of the desired bacterial strain was used to inoculate 5 mL LB containing

100 μg/mL ampicillin and 25 μg/mL chloramphenicol. After 8 hours incubation at 37°C

with shaking (180 rpm), 1 mL of the day culture was used to inoculate the flask

containing 400 mL of basic salt solution to which the trace metal solution (Table 2.5),

glucose, yeast extract, MgCl2 solution, ampicillin and chloramphenicol had been added.

The culture was grown overnight at 37°C with shaking (180 rpm). The following

morning, trace metal solution, glucose, yeast extract, MgCl2 solution, ampicillin and

chloramphenicol were added to the 1.6 L of basic salt solution autoclaved in the

fermentor reaction vessel. The fermentor was set to maintain a constant temperature of

37°C, a constant pH of 6.8 (maintained by the addition of 5 M NaOH) and an agitation

rate of 500 rpm. The dissolved oxygen level was maintained at 75% by altering the ratio

of air/oxygen bubbled into the culture. The entire overnight culture was added to the

solution to the fermentor and the culture was grown until all the 14N-ammonium

chloride had been depleted as indicated by a sharp spike in the dissolved O2 level. 0.5 g 15NH4Cl was added and the culture was grown until the 15N-ammonium chloride was

again depleted. At this point a further 2.5 g 15NH4Cl was added and expression of the

αCP1-KH1 or αCP1-KH3 domain was induced by the addition of IPTG to a final


52

concentration of 0.2 mM. After the depletion of the last batch of ammonium chloride

(~3 h) the cells were harvested by centrifugation (5000 x g, 15 min, 4°C) and the

supernatant discarded. The cell pellet was resuspended in ice cold PBS and transferred

to 50 mL falcon tubes. The cells were pelleted by centrifugation (5000 x g, 20 min, 4°C)

and the supernatant discarded. The cell pellet was snap frozen in liquid nitrogen and

stored -80°C until required. Pre-induction and pre-harvest samples (1 mL) were taken

and protein over-expression analysed by SDS-PAGE analysis as detailed in Section

2.4.4.

Table 2.5: The composition of basic salt solution.

Stock solution per litre Stock composition

Basic salt solution a 970 mL per 970 mL

KH2PO4

K2HPO4

NaHPO4

K2SO4 14NH4Cl

13.0 g

10.0 g

9.0 g

2.4 g

1.0 g

Trace metal solution b,c 0.4 mL per 100 mL

Conc. HCl

FeCl2.4H2O

CaCl2.2H2O

MnCl2.4H2O

CoCl2.6H2O

ZnCl2

CuCl2.2H2O

H3BO3

Na2MoO4.2H2O

8 mL

5.0 g

184 mg

40 mg

18 mg

340 mg

4 mg

64 mg

605 mg

1 M MgCl2 c 10 mL MgCl2.6H2O

10.16 g/50 mL

MQW

100 mg/mL ampicillin c 1 mL ampicillin 1 g/10 mL MQW

50 mg/mL

chloramphenicol c

0.5 mL chloramphenicol 0.5 g/10 mL

ethanol

40 % w/v D-Glucose c 30 mL D-Glucose 40 g/100 mL

MQW

10 % w/v yeast extract c 20 μl yeast extract 5 g/50 mL MQW


53

aAutoclaved prior to use bDifferent composition to that described by Cai et al ., (1998) cSterile filtered prior to use

2.4 Protein purification 2.4.1 Cell Lysis

The pellet from the small scale 20 mL culture (prepared as outlined in section 2.3.3)

was resuspended in 800 μl of ice-cold PBS lysis buffer. The cells were subsequently

lysed using 6 cycles of freeze and thaw where the tubes containing the cells were

immersed in liquid nitrogen for 1 min, after which the lysate was centrifuged for 45

minutes at 20000 g, 4˚C. Unbroken cells, large cellular debris and inclusion body

protein were pelleted out. At this stage 20 μl each of the supernatant and pellet were

analysed by 15% Tris-glycine SDS-PAGE gel to check for the presence of the over-

expressed protein in the soluble and insoluble fraction.

2.4.2 Glutathione-agarose bead adsorption and PreScission protease cleavage

In cases where the expression of soluble GST-fusion protein was achieved, large scale

protein expression was conducted. Everything was performed as in small scale

expression, however, cell lysis was achieved by French pressing and the cell lysate was

supplemented with PMSF, leupeptin and aprotinin to a final concentration of 0.5 mM,

to inhibit proteases. Protein purification was then conducted to separate the GST-fusion

protein from the rest of the E.coli proteins. Glutathione-agarose beads (300 mg) were

hydrated at room temperature for 30 min before approximately 80 mL of cell lysate was

added. The mixture was incubated with gentle rocking at 4˚C for 16 hours. Following

incubation, the beads were washed six times with 50 mL PBS containing 0.5% Triton

X- 100 buffer per wash to remove any non adherent protein, and once in preScission

protease buffer. The beads were then resuspended with ~7 mL preScission protease

buffer and incubated with 5 U/mL PreScission protease for approximately 48 hours at

4˚C with gentle rocking. At various stages samples were collected for SDS-PAGE

analysis. Following cleavage, the beads were spun down and the eluate collected.


54

2.4.3 Regeneration of GSH-agarose beads

Used glutathione agarose beads were transferred to 50 mL tubes and washed with DDI

water (2 x 40 mL) before being incubated with 8 M Urea (40 mL) at room temperature.

The beads were allowed to settle and overlaying solution was discarded. This procedure

was repeated twice. The beads were then washed with MQW (5 X 40 mL) and stored as

a 50% slurry in 20% ethanol at 4°C until required.

2.4.4 PAGE analysis

Tris/Glycine gels were used to check the success of protein over expression and

purification protocols. SDS-PAGE was carried out using Hoefer® Tall Mighty Small®

apparatus (Amersham Biosciences). Thick gels (1.5 mm) were cast. Mark 12™ low

molecular weight protein standards (10μl) were run per gel. Gels were stained using

Coomassie blue staining. Protein gels were photographed using a Gel Doc™ EQ gel

documentation system (Bio-Rad Laboratories) and processed using QuantityOne-4.5.0

software (Bio-Rad Laboratories).

2.4.5 Tris/glycine SDS-PAGE

The 1 mL pre- and post- induction cell pellets were resuspended in 1 x loading buffer.

The volume of the loading buffer depended on the final A600 of the sample (i.e. 80 µl of

loading buffer was added to samples that had an undiluted A600 of 0.80). The samples

were boiled for 10 min at 100°C, with brief in-between mixing using a vortex. 15 μl of

each sample was loaded per well. For samples obtained during cell lysis and protein

purification an equal volume amount of 2 x loading buffer was added. All samples were

then heated at 95°C for 5 min prior to loading. For each sample, 10–15 µl was loaded

per well.

2.4.6 Electrophoresis

Gels were prepared according to (Sambrook et al., 1989). Briefly, the resolving gel

contained 0.375 M Tris (pH 8.8) and 0.1% SDS. The acrylamide concentration used in

the resolving gel was 15%. The stacking gel contained 4% acrylamide, 0.125 M Tris

(pH 6.8) and 0.1% SDS. Gels were electrophoresed at 200 V using glycine running

buffer.


55

2.4.7 Size-exclusion chromatography

Size-exclusion chromatography was used to separate full-length αCP1, αCP1-KH1,

αCP1-KH2 and αCP1-KH3 from undigested fusion protein, GST and any remaining

bacterial contaminants. Size-exclusion chromatography was carried out using a

Superdex™ 75 10/300 GL column (Amersham Biosciences) connected to a BioLogic

DuoFlow Chromatography System (Bio-Rad Laboratories). The cleavage reaction

mixture was dialysed twice against, 100 x volumes of buffer A at 4°C. The dialysate

was concentrated using a vivaspin 5K concentrator to 1-2 mL. 500 μl aliquots of the

concentrate was loaded on to the superdex column pre-equilibrated in buffer A at a flow

rate of 0.4 mL/min. Protein was eluted by the application of the same buffer at a flow

rate of 0.4 mL/min. The absorbance of the eluate was monitored at λ 280 nm and the

fractions of interest were collected. The purity of the collected fractions was determined

by SDS-PAGE analysis Fractions containing pure αCP1 and αCP1-KH1, αCP1-KH2

and αCP1-KH3 domain were pooled and concentrated to 500 µl using a vivaspin 5K

concentrator and stored at -80°C until required.

2.4.8 Anion exchange chromatography

Anion exchange chromatography was also employed to further purify full-length αCP1

from GST, GST-αCP1 and any remaining bacterial contaminants. Anion exchange

chromatography was carried out using a MonoQ™ HR 10/10 (Amersham Biosciences,)

connected to a BioLogic DuoFlow chromatography system (Bio-Rad Laboratories). All

buffers were 0.2 μm filtered (Millipore) and degassed prior to use. All samples were 0.2

μm filtered (Millipore) immediately prior to loading.

The cleavage reaction mixture was dialysed (either with 5 K or 10 K cut-off membrane

depending on the protein size) twice against 100 x volumes of buffer B at 4°C. The

filtered dialysate was loaded in 2 mL aliquots on to a MonoQ™ HR 10/10. Pure αCP1

was eluted by applying a linear gradient of 0 to 60% buffer C over a volume of 40 mL

at a flow rate of 1 mL/min. Fractions were monitored at λ280 nm and the main peaks

collected and identified using 15% Tris-Gly SDS-PAGE. Those containing pure αCP1

were pooled and dialysed into the Biacore Buffer (50mM Tris-HCL pH 7.4, containing

150 mM NaCl, 0.5% Triton-X 100 and 2 mM DTT, EDTA, 62 μg/mL, 125 μg/mL) and

stored at -80˚C.


56

2.4.9 Cation exchange chromatography

Cation exchange chromatography was also employed to separate αCP1-KH1, from

GST, GST-αCP1-KH1 and any remaining bacterial contaminants. Cation exchange

chromatography was carried out using a MonoS™ HR 10/10 column (Amersham

Biosciences,) connected to the BioLogic DuoFlow chromatography system (Bio-Rad

Laboratories). All buffers were filtered through 0.2 μm filters (Millipore) and degassed

prior to use. All samples were 0.2 μm filtered (Millipore) immediately prior to loading.

The cleavage reaction mixture was dialysed (with 5 K cut-off membrane) twice against

100 x volumes of buffer D at 4 °C. The filtered dialysate was loaded in 2 mL aliquots

on to a MonoS™ HR 10/10 column. Pure αCP1-KH1 domain was eluted by applying a

linear gradient of 0 to 60% buffer E over a volume of 40 mL at a flow rate of 2 mL/min.

Peak elution was monitored at λ280 nm 280, with the collection of the desired fractions.

Collected fractions were analysed by SDS-PAGE analysis and fractions containing

αCP1-KH1 domain were pooled and stored at -80°C until required.

2.5 Protein concentration Protein concentrations were determined using the detergent compatible protein standard

assay (BioRad), which was conducted as instructed in the manual and detected

spectrophotomerically at 750 nm. The assay is based on the reaction of protein with an

alkaline copper tartrate (Bradford, 1976). In addition, protein concentration was

determined spectrophotometrically using the absorbance of the sample at 280 nm. Beer-

Lambert’s law was used to calculate the protein concentration.

A280 = ε.c.l

where: A is the absorbance at 280 nm

ε is the theoretical molar extinction coefficient at 280 nm (M-1cm-1)

c is the concentration (M)

l is the path length (cm)

The molar extinction coefficients used were calculated from the primary sequence of the

protein using the expasy site and are shown in Table 2.6. The protein concentration was

further confirmed using SDS-PAGE analysis and bovine serum albumin as a standard.


57

Table 2.6: Protein molar extinction coefficients

Protein Extinction coefficients (M-

1cm-1) αCP1 13450

αCP1-KH1 0.00 (absence of Cysteines or aromatic amino acid residues

αCP1-KH2 125 αCP1-KH3 1615

2.6 Mass spectrometry Mass spectral analysis using matrix-assisted laser desorption/ionisation-time of flight

mass spectrometry (MALDI-TOF MS) was performed by Proteomics International (East

Perth, WA, Aus) on the αCP1-KH1, αCP1-KH2 and αCP1-KH3 domain to confirm the

identity of the purified protein.

2.7 Circular dichroism spectropolarimetry

Circular dichroism spectra were obtained for αCP1-KH1, αCP1-KH2 and αCP1-KH3

to confirm the correct folding of the protein. Far-ultraviolet circular dichroism (CD)

spectra were collected on a J-810 spectropolarimeter (Jasco, Easton, MD, USA)

equipped with a Pharmacia LKB MultiTemp II temperature controller (Amersham

Biosciences). Quartz cuvettes of 1 cm path length were obtained from Starna Pty Ltd

(Thornleigh, NSW, Aus). The data was collected on to a personal computer and

visualised using the supplied software, Spectra Manager (Jasco).

2.7.1 Sample preparation and spectra acquisition

Samples were dialysed into 50mM Tris buffer pH 8.2, 150 mM NaCl, 1mM EDTA and

DTT using the 3k cutoff dialysis tube (Sliderlysers). The samples were then diluted in

the same buffer to a final concentration of 20 µg/ 200 µl. All spectra were collected at

25 °C, under a constant nitrogen flush (>5 L/min). Data were collected over the

wavelength range 200 and 300 nm with a data pitch of 0.2 nm and a bandwidth of 1 nm.

The final spectra represented the average of 30 scans collected at a speed of 100 nm/min

with a response time of 2 sec.


58

2.8 Oligonucleotide Preparation 2.8.1 Preparation of 11-nt αCP1 target site from AR mRNA

The DNA and RNA oligonucleotides representing the complementary nucleotides

3315–3325, 5-UUCCCUCCCUA-3 of AR mRNA, was purchased from Dharmacon in a

protected crude form and further purified by denaturing PAGE. After the separation of

the sample by 20% PAGE, the band was visualised by UV shadowing once the

oligonucleotide had run approximately one half way down the gel. The sample was

recovered by excising the appropriate band, which was then crushed and eluted

overnight in 0.3 M sterile sodium acetate at 37°C. The eluent was filtered and desalted

using a reverse-phase solid-extraction cartridge (C18 Sepak cartridge, Waters). The

eluted fractions were lyophilised and deprotected. The deprotecting procedure involved

dissolving the oligonucletide pellet in 400 µl deprotecting buffer supplied by

Dharmacon. This was then incubated for 30 min at 60 °C, which was subsequently

lyophilised and dissolved in distilled water for quantification using UV spectroscopy.

Oligonucleotide concentrations were determined by measuring the absorbance at

λ260 nm and assuming one absorbance unit to be equivalent to 34 μg/mL.

2.8.2 Preparation of 50-nt containing αCP1 target site from AR mRNA

The target 5’-biotinylated mRNA (mRNA:5-

CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3) representing nt 3296-3325 of

androgen receptor (AR) mRNA was obtained from Dharmacon and the complementary

DNA sequence was purchased from Geneworks. The RNA was deprotected and also

both DNA and RNA quantified as described (section 2.8.1).

2.9 Oligonucleotide-protein binding studies 2.9.1 Surface plasmon resonance spectroscopy (SPR) using Biacore 2000

BIACORE experiments were conducted for αCP1, αCP1-KH1, αCP1-KH2 and αCP1-

KH3 domain. BIACORE is a surface plasmon resonance based instrument that uses an

optical method to measure the refractive index near a sensor surface, which is a gold-

coated glass chip. The gold surface is covered on the optical side with a thin layer of

glass and on the other side with carboxymethylated dextran matrix, onto which the

protein streptavidin is attached. Streptavidin has a high affinity for biotin and so

biotinylated single stranded oligonucleotide is immobilised over the carboxymethylated

surface. In order to detect the interaction of protein and oligonucleotide, 50 μl of protein


59

solution is injected through a flow cell with constant flow of the buffer of interest. This

allows the protein solution to pass near the surface of the chip. Protein and

oligonucleotide interaction leads to a change in the refractive index near the surface,

which is detected in real time by an optical device on the other side of the chip and

plotted as a sensogram representing response unit verses time.

Figure 2.1: Schematic of the sensor chip: Biotinylated RNA is immobilized on carboxy methylated streptavidin matirx and protein solution is passed near the sensor surface, which may bind to the target RNA leading to change in the refractive index near the sensor surface

2.9.2 Oligonucleotide-protein binding measurements

Surface plasmon resonance (using a BIAcore 2000 instrument) was employed to

characterise the αCP1-KH1, αCP1-KH2 and αCP1-KH3 interaction with RNA and

DNA sequences of interest. A research grade chip coated with streptavidin was

purchased from Biacore. The target 5’-biotinylated mRNA (mRNA: 5’

CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA 3’) representing nt 3296-3325 of

the androgen receptor mRNA was obtained from Dharmacon and the complementary

DNA sequence was purchased from Geneworks. The RNA was immobilised on the

second flow cell and the DNA in third flow cell as the captured molecules. The first

flow cell coated with only streptavidin was used as the reference surface. Firstly, the

streptavidin chip was washed with three injections of 1 M NaCl and 50 mM NaOH

solution at a flow rate of 50 μl/min to remove excess streptavidin from the surface of

the chip. The chip was then equilibrated with running buffer for 30 min before

immobilisation of oligonucleotides. The immobilisation steps were carried out at a flow


60

rate of 10 μl/min in the biacore running buffer (10 mM Tris-HCl (pH 7.4), 150 mM

NaCl, 0.5% Triton X and 2 mM dithiothreitol, 2 mM EDTA, 125 μg/mL tRNA and 62.5

μg/mL bovine serum albumin). An average of 30 RU of RNA and DNA was

immobilised on flow cell 2 and 3 respectively. The αCP1-KH domains were injected

over flow cells 1, 2 and 3 at concentrations of 10, 5, 2.5, 1.25, and 0.625 μM using a

flow rate of 50 μl/min from lowest to highest concentration. All experiments were

duplicated (or performed multiple times) to determine the reproducibility of the signal.

Regeneration of the surface involved removal of the bound protein from the streptavidin

chip with a 2 min wash at 20 μl/min with 2 M NaCl. The data was analysed with the

BIAevaluation software to obtain a binding constant using a steady state model and 1:1

binding model.

A number of other RNA and DNA with 5’ biotinylated sequences were also tested for

αCP1, αCP1-KH1, αCP1-KH2 and αCP1-KH3 domain interaction. These included the

sequences below and the same procedure as described above was applied.

5’AAA AAA AAA A 3’

5’CCC CCC CCC C 3’

5’UUU UUU UUU U 3’

5’GGG GGG GGG G 3’

5’AAA AAA CCC A 3’

5’AAA AAA TTT A 3’

5’AAA AAA TCC A 3’

5’AAA AAA CTC A 3’

5’AAA AAA CCT A 3’

2.9.3 Preparation of RNA transcript for REMSA (This was prepared by Dr Andrew

Barker) A 51-nucleotide RNA corresponding to nucleotides 3275–3325 of the AR mRNA 3

untranslated region (UTR) was generated by in vitro transcription of the relevant DNA

cloned into pBLUESCRIPT II KS+. The pBLUESCRIPT vector alone was used as a

control. Linearised plasmid DNA was transcribed in a 20-μl reaction volume with 20 U

T7 RNA polymerase (Promega, Madison, WI, USA) at 37°C for 60 min in the presence

of 100 μCi (32P UTP (Amersham Pharmacia Biotech., Chalfont, UK) and 2.5 mM each


61

of ribosomal ATP, ribosomal CTP and ribosomal GTP (Amersham Pharmacia

Biotech.), and 20 mM DTT. One unit of DNase 1 (RNase free) (Promega) was added

for 10 min and the mixture incubated at 37°C, followed by 5 min at 65°C. Loading dye

[12 μl of 95% formamide, 20 mM EDTA, 0.3% bromophenol blue and xylene cyanol

(wt/vol)] was added to the reaction mixture, which was heated to 80°C for 3 min, before

resolving labelled RNAs on a 7 M urea/6% polyacrylamide gel, pre-electrophoresed in

1× 90 mM Tris borate/0.2 mM EDTA at 200 V for 20 min. Radiolabelled transcripts

were visualised using 2-min exposure of the gel to X-ray film. The full-length transcript

was excised and eluted from the gel slice by shaking at 1,500 rpm for 4 h at 22°C in

sterile 0.5 M ammonium acetate and 1 mM EDTA. The RNA transcripts were recovered

by ethanol precipitation, resulting in 1–4×1010 cpm/μg RNA.

2.9.4 REMSA

REMSA is an alternative technique for monitoring RNA-protein associations in vitro. It

takes advantage of the decrease in the electrophoretic mobility of RNA when

complexed with protein. RNA Purified full-length αCP1, αCP1-KH1, αCP1-KH2 or

αCP1-KH3 was thawed on ice. A 1-μl aliquot of the protein (100 ng/μl) was transferred

to a vial on ice containing cytoplasmic extraction buffer [10 mM N-(2

hydroxyethyl)piperazine-N-ethanesulfonic acid pH 7.5, 3 mM MgCl2, 14 mM KCl, 5%

glycerol, 0.2% Nonidet P-40, 1 mM DTT; and a cocktail of protease inhibitors (0.5 mM

PMSF, 0.2 mM leupeptin, 0.2 mM aprotinin). A 1 μl aliquot of transfer RNA (2 μg/mL)

was added to prevent nonspecific binding and the mixture was made up to a final

volume of 10 μl with 1 μl of 104 cpm 32P-labelled RNA. The mixtures were incubated

on ice for 30 min and immediately after incubation 1 μl of loading dye was added and

mixed gently. The reaction mixtures were loaded on a 6% nondenaturing acrylamide gel

and run at 125 V in the cold room for 2 h. The gel was dried in a gel drier for 20 min at

80°C, and was then exposed to a phosphorimager plate overnight for detection of

protein and RNA complexes (Appendix B)

2.10 Nuclear magnetic resonance (NMR)

2.10.1 Sample preparation

The 15N-labelled αCP1-KH1 and aCP1-KH3 were prepared at 200–500 μM protein in

10 mM sodium phosphate buffer pH 6.0, containing 100 mM NaCl, 2 mM EDTA,


62

2 mM DTT and 10% D2O. This sample was used to collect spectra for 1H–15N

heteronuclear single quantum coherence (HSQC) spectra for apo-αCP1-KH1 and KH3.

Aliquots of RNA were added in a stepwise manner to the 15N-αCP1-KH1 and KH3.

Desired ratios of the protein and the RNA were combined in volumes of 3–4 mL such

that upon concentration to 500 μl, the protein concentration was 300 μM and the RNA

concentration was 75 μM. The final protein to RNA ratio was approximately 1:1. The

solutions were concentrated using 1-kDa molecular weight cut-off concentrators to a

final volume of 500 μl. To all samples, D2O was added to 10%. The titrations were

monitored by collecting 1H–15N HSQC spectra at each titration point.

2.10.2 NMR

(These experiments were conducted with the help of Dr Jackie Wilce and Dr Corrine

Porter). Two-dimensional 1H–15N HSQC spectra were acquired with a Bruker

Avance600 or 600 NMR spectrometer operating at 25°C, using adapted versions of the

published pulse sequences (Thank you to Paul Gooley and Lindsy Bryne for use of

facilities at Monash University and UWA respectively). Water suppression was

achieved by replacing the final 90° pulse with a Watergate sequence. Relaxation effects

were minimised by setting the evolution period for 15N–1H one-bond couplings to

2.3 ms, slightly shorter than 1/4 JNH. The 15N-decoupling during acquisition was

achieved using the GARP decoupling scheme. The 1H carrier frequency was set to that

of the water resonance and the 15N carrier frequency was set at a frequency between the

Arg N and backbone amide resonances. Time-proportional phase incrementation was

used to achieve quadrature detection over spectral widths of 12.25 ppm (7.3 kHz) for F2

and 40.0 ppm (2.4 kHz) for F1. A total of 32 scans per increment were collected over

256 t1 increments of 2,048 complex data points. Spectral processing was carried out

using XWINNMR software or nmr Pipe. Strip transformation of the regions between

6.3 and 9.8 (F2) and 103.5 and 132.0 ppm (F1) and zero filling into 2,048×2,048 real

data points was used to increase digital resolution, while spectral resolution was

enhanced by apodisation with a Lorentz–Gauss function in both dimensions prior to

Fourier transformation. Baselines were corrected after phase correction by subtracting a

third-order polynomial fitted to the baseline (Wider, 2000 ; Wuthrich, 1990).


63

2.11 Structural Studies 2.11.1 Crystal growth for X-ray diffraction experiments (The x-ray machine was

operated and maintained with the assistance of A/Prof Matthew Wilce and Mr Jason

Schimidberger). Crystallization experiments were conducted using the vapor diffusion

hanging drop method. In this method a drop of 1 μl of the protein and 1 μl of buffer

containing a particular precipitant solution is transferred to a glass cover slip which is

placed upside down on a well containing approximately 1 mL of the same precipitant

solution that was added to the protein. The cover slip is sealed on the well with a ring of

Vaseline. Water evaporates from the drop toward the buffer reservoir in the well due to

the higher concentration of precipitant in the well and as a result, the protein

concentration in the drop increases and very slowly crystals may form.

Crystallization trials included using the commercially designed kit (Hampton Research

Screen 1), JB screen (Jena Bioscience, Jena, Germany), Sigma screen (Sigma) and the

Natrix screen (Hampton Research) which contained from 24 to about 100 conditions

each containing different buffer type and pH with variable precipitant types and

concentrations and additives.

The JB screens and Natrix screen (Appendix A) were prepared from lab stock solutions

by Jamie Tan, Dr Andrew Barker and myself. All screen solutions prepared were

filtered through a 0.2 µm filtered membrane and stored at 4°C. The composition of the

JB screens can be found at http://www.jenabioscience.com. The composition of the

screens obtained from Hampton Research and sigma screen can be found at

(http://www.hamptonresearch.com),

www.sigmaaldrich.com/Brands/Fluka_Riedel_Home/Bioscience/Peptide_Analysis/Crys

tallization.htmL.

These kits were used for initial large scale screening, which involved setting up crystal

trays with protein concentrations of approximately 5 mg/mL both at 20˚C and at 4˚C

using the vapor diffusion hanging drop method.


64

2.11.2 Crystallization of αCP1-KH3

Initially αCP1-KH3 crystals grew in 2μl hanging drops containing 1:1 mixtures of

protein and reservoir solutions. The protein solution contained 5 mg/mL in 25 mM

potassium phosphate pH 6.0, 1 mM DTT, 1 mM EDTA, 150 mM NaCl and the

reservoir solution was composed 0.1 M Na HEPES, pH 7.5 in 1.5 M Lithium Sulfate-

Hampton Crystal Screen reagent formulation number 16 (Hampton Research, CA).

These initial crystals were poor diffracting quality crystals. At this stage a number of

factors could be modified to optimise the growth and quality of the crystals. These

factors include the protein concentration, temperature, buffer pH, additives (addition of

divalent cations, glycerol, 2-methyl-2,4-pentane-diol (MPD) ), salt and precipitant

concentration and protein: reservoir buffer ratio. In the case of αCP1-KH3 changing the

protein: reservoir buffer ratio to 2:1 at room temperature, resulted in crystal growth in 2

days to dimensions of ~0.3 x 0.2 x 0.02mm with the outline of a rugby football.

2.11.3 Preparation of αCP1-KH1/DNA complex

The 11 nt DNA sequence (TTCCCTCCCTA) and αCP1-KH1 complexes were prepared

by dissolving the lyophilized DNA with the protein solution to a final ratio of 1:1. The

mixture was left on ice for 30 min prior to setting crystal drops. Crystal trays were setup

using Hampton screen I, Natirx screen and Sigma Screen. Crystals of αCP1-KH1/DNA

were grown using vapor diffusion in 1 μl hanging drops containing 1:1 mixtures of

protein and reservoir solutions. The complex solution contained 309 μM of protein and

DNA in 50 mM Tris-HCL pH 8.0, 1 mM DTT, 1 mM EDTA, 150 mM NaCl, and the

reservoir solution was composed of 0.1 M sodium cacodylate pH 6.5 in 0.2 M

magnesium acetate, 30% MPD, from Sigma Crystal Screen reagent formulation number

21 (Hampton Research, California, USA). Crystals typically grew in eight weeks to

dimensions of ~ 0.2 x 0.2 x 0.04 mm with the outline of a diamond.

2.11.4 Preparation of other crystallisation experiments

Crystal trials were also conducted for the isolated αCP1-KH1 and αCP1-KH2 domains.

Both sigma and Hampton screens were used. Once promising conditions were identified

a number of narrow screens around the particular conditions were conducted. The

optimisation screens were prepared from laboratory stocks, which had been prepared

using double deionised (DDI) water and analytical grade reagents. After preparation the


65

stock solutions were filtered through a 0.2 μm filter and stored at 4˚C until required.

Initial optimisation screens involved changing the precipitant concentration and pH or

varying both. In general the precipitant concentration was varied in 2.00% increments

around the successful concentration and the pH was varied in increments of 0.5 pH

units. Smaller increments were only used if thought necessary. Screening also involved

changing crystallization buffers and also the addition of additives such as 5 to 10%

MPD and glycerol. Additional screens also involved changing protein concentration or

by modifying the ratio of reservoir buffer to protein in the drop.

In addition, both macro-seeding and streak seeding using a cat’s whisker were

attempted to produce crystals of suitable size and quality for diffraction experiments.

2.11.5 X-ray data collection

All diffraction experiments were carried out at 100 K. Crystals were mounted using

nylon cryo-loop and in each case, passed through a solution of reservoir buffer modified

to include 15% glycerol as cryoprotectant prior to being subjected to flash-freezing in a

nitrogen stream. X-ray diffraction data was collected at The University of Western

Australia using a Rigaku RU-200 rotating anode Cu Kα source (40kV, 100mA;

Rigaku/MSC, TX, USA) equipped with osmic mirrors (Osmic, MI, USA) and a Mar345

image plate detector (Mar Research, Hamburg, Germany). Crystals were cryo-cooled

using a nitrogen cryostream (Oxford Cryosystems, Oxford, United Kingdom). Data

were processed by Matthew Wilce, including integration and scaling, using DENZO

and SCALEPACK (Otwinowski and Minor, 1997). Structure factor amplitudes were

calculated using TRUNCATE (Collaborative, 1994).

2.11.6 Structure solution and refinement

Structures were solved by molecular replacement using AMORE (Collaborative, 1994).

Cycles of manual model building and refinement were carried out with REFMAC

(Collaborative, 1994).


66

2.12 Molecular dynamics simulations using NAMD

2.12.1 Modelling of αCP1-KH3 bound to poly (C) oligonucleotide

The αCP1-KH3 structure was superposed with the structure of Nova-2-KH3 bound to

RNA (accession number: 1EC6;) using LSQMAN. In this way the coordinates of

oligonucleotides could be extracted and used to generate an 8-nt poly (C) RNA docked

to the αCP1-KH3 structure (using the Insight II software package to change the bases to

cytosine). The structure was subjected to molecular dynamics simulations using NAMD

in a fully solvated box, with overall neutral charge (through the addition of randomLy

placed sodium ions). The complex structure was allowed to equilibrate in 106 fs time

steps using the CHARMM27 energy forcefield at 310 K and 1 atm using periodic

boundary conditions. This ensured there were no steric clashes in the final model and

allowed a full set of possible intermolecular interactions to be viewed. The

stereochemistry of the oligonucleotide and the intermolecular hydrogen bond formation

during the simulation were recorded at picosecond intervals for analysis.

Chapter 3 Protein Preparation

Chapter 3: Protein Preparation

67

3.1 Chapter overview We employed molecular biology techniques to overexpress the gene product for full-length

αCP1 as well as αCP1-KH1, KH2 and KH3, all as GST fusion proteins. In order to obtain

sufficient amounts of protein for biophysical studies, the overexpression and purification of

the full-length αCP1 and KH domains were optimized as described in Methods Section

2.3.2. The following Sections detail the procedures used to optimize the quality and

quantity of the final products.

In this chapter we aimed:

1) to obtain sufficient amounts of protein for biophysical studies using an E-coli

overexpression system,

2) to purify protein using affinity chromatography and

3) to use these purified proteins in structural and mRNA binding studies, in order to

better characterize and understand the interactions between αCP1, as well as its

isolated KH domains, and the respective poly (C) target sites (Chapters 5 and 6).

3.2 αCP1-KH Domain Boundaries

We had already successfully cloned the individual αCP1-KH domains into the pGEX-6P-2

plasmid as part of my Honours project. The domain boundaries were based on structural

sequence alignment of the αCP1 fragments with the solved KH3 domain of Nova-2 protein.

These constructs were used for protein overexpression experiments in order to obtain

milligrams of pure protein for further biophysical studies. The regions cloned are illustrated

schematically below (Figure 3.1).


68

Figure 3.1: Schematic representation of the cloned αCP1-KH domain boundaries, amino acid sequence alignment and cartoon representation of secondary structures. (A) A schematic diagram of the domain structure of human αCP1 and the regions cloned from biophysical analysis. The domain boundaries were based on structural alignment with the Nova-2-KH3 domain. The numbers, or the domain boundaries, represent the amino acids at the start and the end of each KH domain. Note that the domain boundary of the cloned αCP1-KH2 was reduced from 97-182 to 97-150, for solubility reasons. (B) Amino acid sequence alignment of the cloned regions of αCP1-KH domains. Conserved residues are highlighted in red and yellow. The GXXG and the variable loop of the domains vary most between KH domains. (C) The αCP1-KH domain adopts a fold with a triple-stranded β-sheet held against a three-helix cluster in a βααββα configuration. The variable loop and the GXXG loop are colored purple and blue respectively.

B

C

A


69

3.3 Protein Expression

In order to obtain sufficient protein for biophysical studies (i.e., for full length αCP1 and

fragments including αCP1-GST-KH1, KH2 and KH3), two to four liters of culture were

used for bacterial overexpression (see Methods Section 2.3.2). Soluble GST-fusion protein

was obtained after cell lysis, purified by affinity chromatography and cleaved from the GST

moiety with preScission protease. The protein obtained for each construct was typically

~90% pure but further purification was required for biophysical studies. This was achieved

using affinity and size-exclusion chromatography. The following sections describe the

optimized purification protocol adapted for the αCP1 constructs.

3.3.1 αCP1 expression and purification

Recombinant αCP1 full-length protein was prepared as a GST fusion protein using BL21

(Codon plus) cells containing the pGEX-6P-2 plasmid. Reasonable levels of protein

overexpression (Figure 3.2) and solubility were obtained by growing the culture at 37°C

(see Methods Section 2.3.3). The cultures grew to a final optical density of A600 3.5,

resulting in a mass pellet of 5 g/L.

In order to get maximum levels of soluble protein it was necessary to include 0.5%

TritonX-100 in the lysis buffer (PBS). Furthermore, αCP1 was found to be very unstable. A

degradation product was always present upon cleavage, regardless of the time spent

between the purification step and the storage at –80°C. The protein stored in 50% glycerol

at – 80°C did not only have an increased degradation product compared to freshly prepared

protein, but also proved inactive after approximately 3 months. The increased degradation

product was apparent on SDS-PAGE analysis and the inactivity of the protein was

ascertained by REMSA studies. Therefore, in an attempt to minimize this degradation, a

cocktail of protease inhibitors, DTT and EDTA was always included in the lysis buffer (see

Methods Section 2.4.2); however, the protease inhibitors were excluded at the size-

exclusion purification step. Even with these precautions, there was no significant reduction

in the amount of the degradation product (Figure 3.2B).


70

A further step to minimize degradation was the addition of 5% glucose and 1% sodium

azide in the lysis buffer. This significantly reduced the degradation product to

approximately less than 10% as shown on the gel (Figure 3.2A, Lane 5).

Following cell lysis, the soluble fraction was subjected to glutathione agarose batch

purification. αCP1 fusion protein was then cleaved from the GST moiety using preScission

protease. The αCP1 was then eluted from the column, resulting in approximately 90%

cleavage of the fusion protein.

Figure 3.2: Overexpression and size-exclusion chromatography of αCP1. (A) SDS-PAGE analysis of GST-αCP1 overexpression, glutathione agarose batch purification and PreScission cleavage. Lane MW: Molecular weight markers, Lane 1: Whole cell lysate before induction, Lane 2: Whole cell lysate after induction, Lane 3: insoluble fraction after cell lysis, Lane 4: soluble fraction after cell lysis, Lane 5: Pure αCP1 (with the inclusion of of 5% glucose and 1% sodium azide in the lysis buffer), after size-exclusion chromatography, Lane 6: cleaved Protein after exposure to preScission protease. (B) SDS-PAGE analysis of αCP1 after size-exclusion chromatography. Lane MW: Molecular weight markers. Lane 1: αCP1 and the presence of degradation product.

αCP1 was separated from the contaminating GST, degradation product and any uncleaved

fusion protein using either anion exchange or size exlusion chromatography (Figure 3.3).

Both methods resulted in similar protein purity level. Fractions containing the αCP1 protein

were pooled and dialysed into 50 mM Tris (pH 8.2), 150 mM NaCl, 2 mM DTT and 2 mM

EDTA, which was the buffer used in subsequent REMSA studies (Appendix B). The

molecular mass of the purified αCP1 could not be confirmed by MALDI TOF mass

spectrometry, as the purity level of the protein posed technical problems. The expected


71

mass from sequence analysis and SDS-PAGE analysis is 37.5 kDa. Approximately 1mL

(1mg/mL) of purified αCP1 protein was obtained from 4 L of culture.

Figure 3.3: Size-exclusion chromatography of αCP1. A typical size-exclusion chromatogram obtained during purification of �αCP1. After PreScission protease cleavage �αCP1 was purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using size-exclusion chromatography. αCP1 eluted at approximately 30 min and fractions were collected from the regions indicated by the arrows and confirmed on SDS-PAGE as indicated in Figure 2A lane 5.

3.3.2 αCP1-KH1 expression and purification

The αCP1-KH1 domain encoding residues 13-86 was expressed as a GST fusion protein in

BL21 (Codon plus) cells (see Methods Section 2.3.2). Overexpression at 37°C produced

high levels of soluble fusion protein (Figure 3.4A). Following cell lysis, the soluble

fraction was subjected to glutathione agarose batch purification. αCP1–KH1 fusion protein

was then cleaved from the GST moiety using preScission protease. The αCP1-KH1 was

then eluted from the column, resulting in approximately 90% cleavage of the fusion protein

(Figure 3.4B).


72

Figure 3.4: Overexpression of GST-αCP1-KH1 (A) SDS-PAGE analysis of GST-αCP1-KH1 overexpression, glutathione agarose batch purification and PreScission cleavage. Lane 1: Whole cell lysate before induction, Lane 2: Whole cell lysate after induction, Lane 3: soluble fraction after cell lysis, Lane 4: insoluble fraction after cell lysis, Lane 5: GST-αCP1-KH1 bound to glutathione beads. (B) αCP1-KH1 cleaved from GST. Lane 1: Protein fraction and GST beads after elution from the glutathione agarose and preScission cleavage. Lane 2: The supernantant from preScission cleavage sample. Lane MW. Molecular weight markers. The αCP1-KH1 protein was separated from GST, GST-αCP1-KH1 and other contaminants

using cation exchange chromatography. When size-exclusion chromatography was used as

an alternative to cation exchange chromatography, similar purity levels resulted (data not

shown). Using the cation exchange chromatography, the αCP1-KH1 eluted from the

column using a 0–0.6 M sodium chloride gradient in 50 mM HEPES (pH 7.00), 2 mM DTT

and 2 mM EDTA at 25 minutes as shown in Figure 5A.

A B


73

Figure 3.5: Cation exchange chromatography of αCP1-KH1 and SDS-PAGE analysis. After PreScission protease cleavage� αCP1-KH1 was also purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using cation exchange chromatography. (A) A typical cation exchange chromatogram obtained during purification of αCP1-KH1. αCP1-KH1 eluted at approximately 25 min, 40% salt concentration. Fractions were collected from the regions indicated with bold dashes (Peak 2). GST and GST-KH1 eluted at 10 min as flow through, not binding to the column (Peak 1) (B) SDS PAGE analysis of fractions collected during cation exchange chromatography. Lane MW: Molecular weight markers. Lane 1: flow through or unbound fraction (peak 1). Lane 2-4: αCP1-KH1 fractions collected from peak 2.

SDS-PAGE analysis showed (Figure 3.5B) that both GST and the GST fusion protein

eluted in the unbound fraction (peak 1), while the αCP1-KH1 domain eluted as one peak,

with a broad beginning and end (peak 2). Fractions collected from the sharpest part of peak

2 were pooled, dialysed into 50 mM Tris (pH 8.2), 150 mM NaCl and 2 mM DTT and 2

mM EDTA for use in both biophysical and structural studies. The broad ends of this peak

were discarded as they contained traces of contaminants not visible on SDS-PAGE (data

not shown). Approximately 1mL (5 mg/mL) of protein was obtained from a 4 L culture.

The molecular mass of αCP1-KH1 was verified using MALDI TOFF Mass Spectroscopy

(measured MW 8681; expected MW 8690).

A B


74

3.3.3 αCP1-KH2 plasmid preparation, overexpression and purification

The αCP1-KH2 domain encoding residues 97 to 182 was expressed in BL21 (codon plus)

cells as a GST-fusion protein. Initial expression and solubility tests at 37°C showed that the

fusion protein overexpressed well, with a protein band present in the induced whole cell

lysate at the expected molecular weight (~32 kDa; Figure 3.6). However, the protein was

found almost exclusively in the insoluble fraction, even with the inclusion of 0.5%

TritonX-100 in the lysis buffer (Figure 3.6A). In our laboratory, 0.5% sodium cholate has

been found to improve and enhance protein solubility of other insoluble proteins. However,

addition of 0.5% sodium cholate to the αCP1-KH2 lysis buffer did not improve the

solubility of αCP1-KH2.

Furthermore, varying the IPTG concentration (ranging from 0.02 to 1mM) and reducing

post-induction temperatures to 30°C (Figure 3.6B) and lower (23°C) did not enhance

solubility.

Figure 3.6: SDS-PAGE analysis of GST-αCP1-KH2 overexpression at 37°C (A) and 30°C (B). (A) Lane 1: Whole cell lysate before induction. Lane 2: Whole cell lysate after induction. Lane 3: soluble fraction after cell lysis. Lane 4: Insoluble fraction after cell lysis. (B) Lane 1 and 2: Whole cell lysate before induction. Lane 3 and 4: Whole cell lysate after induction. Lane 5: soluble fraction after cell lysis. Lane 6: Insoluble fraction after cell lysis. Lane MW: Molecular weight markers.

A B


75

3.3.4 αCP1-KH2 sequence analysis

The amino acid sequence of αCP1-KH domains are similar, with a number of conserved

residues (Figure 3.1B). Therefore, the poor solubility of αCP1-KH2 was unexpected,

especially when both αCP1-KH1 and KH3 are highly soluble. A detailed analysis of each

αCP1-KH domain using the sequence analysis tool Protparam in the expasy site

(www.expasy.com), gave an instability half-life index of 35.42 and 34.47 for αCP1-KH1

and αCP1-KH3 respectively, compared to an instability half-life index of 56 for αCP1-

KH2. The program predicted αCP1-KH1 and KH3 with these instability scores as stable

and αCP1-KH2 as unstable. The program bases the instability score on a comparison of the

protein sequence to other stable proteins. In an attempt to obtain a more stable αCP1-KH2

domain, approximately 30 residues from the C terminus were excluded from the domain;

this corresponded to removal of the third α helix in the domain (Figure 3.1B). The

truncated sequence gave an instability half-life of 34.47, and was thus predicted to be

stable. As a consequence the domain boundary of αCP1-KH2 was revised to amino acid

residue 97 to 150, instead of 97 to 180 (Figure 3.7).

Figure 3.7: The initial and truncated domain boundary of αCP1-KH2. (A) The initial αCP1-KH2 domain boundary based on structural sequence alignment is from amino acid residue 97 to 180. The highlighted red sequence is from the cloning vector. The truncated αCP1-KH2 domain boundary based on sequence stability comparisons was amino acid residue 97 to 150, indicated with an arrow.


76

3.3.5 Plasmid preparation

Primers were designed for the amplification of the DNA encoding residues 97-150 of

αCP1-KH2. PCR amplification, using the pGEX-6P2 αCP1 plasmid, produced a single

DNA fragment (Figure 3.8A). A comparison of the bands with the DNA standard revealed

that the αCP1-KH2 fragment was amplified to its correct expected size and was free of

contaminants. The PCR product was then sub-cloned into the BamHI and EcoRI restriction

sites of the E. coli expression vector pGEX-6P-2 and transformed into XL1 Blue cells.

Diagnostic restriction endonuclease digests of plasmid DNA isolated from several of the

transformed colonies confirmed the presence of an insert of the correct size. The presence

and integrity of the insert was confirmed using DNA sequencing (Figure 3.8B).

Figure 3.8: Sub-cloning of αCP1-KH2. (A) A DNA fragment encoding residues 97 –150 of αCP1 (~ 159 bp) was amplified from the pGEX-6P-2 αCP1 plasmid. The yield of DNA produced was optimised by varying the MgCl2 concentration. Lane 1 and 2: 1.0 mM and 2.0 mM MgCl2. The location of the desired product is indicated by the arrow. (B) Diagnostic restriction endonuclease digests of plasmid DNA, isolated from a transformed colony, confirmed the presence of an insert of the correct size. Lane M: pGem markers, Lane 1: Representative restriction digest of pGEX-6P-2 using BamHI and EcoRI. Lane 2: PCR product. pGem markers were used (indicated by M).

3.3.6 αCP1-KH2 Expression and Purification

The pGEX-6P-2/αCP1-KH2 plasmid was transformed into BL21 (codon plus) cells. A

small scale expression and solubility trial was undertaken at 37°C. GST-αCP1-KH2 was

successfully overexpressed. This was shown by SDS-PAGE analysis (Figure 3.9) with the

appearance of the protein band at the expected molecular weight of insoluble GST-αCP1-

KH2 (~30 kDa) in the post-induction, whole cell extract. In addition, using PBS (pH 7.4), 2

A B


77

mM EDTA, 2 mM DTT, 0.5% TritonX-100 and 5mM PMSF as the lysis buffer, more than

50% of the protein was found to partition into the soluble fraction from a visual inspection

of the gel (Figure 3.9).

Figure 3.9: Overexpression of GST-αCP1-KH2. (A) SDS PAGE analysis of GST-αCP1-KH2 overexpression, glutathione agarose batch purification and preScission cleavage. Lane MW: Molecular weight markers, Lane 1: Whole cell lysate before induction, Lane 2: Whole cell lysate after induction, Lane 3: insoluble fraction after cell lysis, Lane 4: soluble fraction after cell lysis.

Following cell lysis, the soluble fraction was subjected to glutathione agarose batch

purification. αCP1–KH2 fusion protein was then cleaved from the GST moiety using

preScission protease. The αCP1-KH2 was then eluted from the column, resulting in

approximately 90% cleavage of the fusion protein. (Figure 3.10A, lane 2).

GST-αCP1-KH2 was then subjected to size-exclusion chromatography to further purify it

from GST and other contaminants. The peak corresponding to αCP1-KH2 eluted at 40 min

using a flow rate of 0.4 mL/min, which was collected as fractions of 0.2 mL (Figure

3.10B). The fractions corresponding to αCP1-KH2 were pooled and dialysed into 50 mM

Tris pH 8.2, 150 mM NaCl and 2 mM DTT and EDTA. Approximately 0.3 mg/mL of

protein was obtained from a 4 L of culture. The molecular mass (M) of the purified protein

was confirmed using MALDI-TOF mass spectrometry (measured MW 5947; expected MW

5944).


78

Figure 3.10: Affinity purification αCP1-KH2. (A) Cleaving GST from GST-αCP1-KH2. Lane MW: Molecular weight markers. Lane 1: cleaved size-exclusion purified αCP1-KH2. Lane 2: The protein fraction and the GST beads after elution from the glutathione agarose and preScission cleavage. (B) Size-exclusion chromatography of αCP1-KH2. After PreScission protease cleavage�, αCP1-KH2 was purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using size-exclusion chromatography. A typical size-exclusion chromatogram was obtained during purification of� αCP1-KH2. αCP1-KH2 eluted at approximately 40 min as indicated by peak 2 and fractions were collected from the regions indicated with the dashes. Peak 1 at ~20 min corresponded to the GST and uncleaved fusion protein.

3.3.7 αCP1-KH3 expression and purification

The αCP1-KH3 domain encoding residues 279-356 was expressed as a GST fusion protein

in BL21 (Codon plus) cells (Methods Section 2.3.3). Overexpression at 37°C produced

high levels of soluble fusion protein (Figure 3.11A). Following cell lysis, the soluble

fraction was subjected to glutathione agarose batch purification. αCP1–KH3 fusion protein

was then cleaved from the GST moiety using preScission protease. The αCP1-KH3 was

then eluted from the column, resulting in approximately 90% cleavage of the fusion protein

(Figure 3.11B).

A B


79

Figure 3.11: SDS PAGE analysis of GST-αCP1-KH3 overexpression. (A) Lanes 1 and 2: Whole cell lysate after induction, Lane 3: Whole cell lysate before induction. (B) Cleaved αCP1-KH3. Lane 1: Protein fraction after elution from the glutathione agarose and preScission cleavage. Lane MW: Molecular weight markers. The αCP1-KH3 domain was separated from GST, GST-αCP1-KH3 and other contaminants

using size-exclusion. The peak (peak 2) corresponding to αCP1-KH3 started eluting from

30 min to approximately 40 min using a flow rate of 0.4 mL/min, and was collected as

fractions of 0.2 mL (Figure 3.12). The fractions corresponding to αCP1-KH3 at the

sharpest part of the peak were pooled and dialysed into 50 mM Tris pH 8.2, 150 mM NaCl

and 2 mM DTT and EDTA. The broad end at the beginning of the peak contained GST

contaminant (data not shown) and was discarded. Approximately 5 mg/mL of protein was

obtained from a 4 L of culture. The molecular mass (M) of the purified protein was

confirmed using MALDI-TOF mass spectrometry (measured MW 8525; expected MW

8552). In crystallization the flexible regions, especially the ends of C- and N-terminus are

not often seen and as a consequence this may be the reason for the molecular weight

discrepancy.

A B


80

Figure 3.12: Size-exclusion chromatography of αCP1-KH3 and SDS-PAGE analysis. (A) After preScission protease cleavage, αCP1-KH3 was purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using size-exclusion chromatography. A typical size-exclusion chromatogram obtained was during purification of �αCP1-KH3. αCP1-KH3 eluted at approximately 35 to 40 min (peak 2). Peak 1 corresponds to GST and other contaminants. (B) SDS-page analysis of αCP1-KH3. Lane 1: Molecular weight markers. Lane 2: peak 2 from size-exclusion corresponding to pure αCP1-KH3

3.3.8 Combined αCP1-KH domains 1 and 2, 2 and 3

Studying isolated αCP1-KH domains can provide valuable information on the role of each

individual domain. However, a detailed understanding of their role in the context of the

full-length protein cannot be gained from just the studies of the isolated KH domains. To

look at the role of the combined domains, we also successfully cloned αCP1-KH1/KH2 and

αCP1-KH2/KH3. They also expressed at high levels but, unfortunately, upon cell lysis the

protein fractionated in the insoluble fraction (data not shown).

Primary sequence analysis of the these combined domains using the Expasy Protparam tool

revealed the constructs as unstable, giving an instability score of 47.72 and 45.75 for αCP1-

KH1/KH2 and αCP1-KH2/KH3 respectively. However, the removal of ~ 30 amino acid

residues from the C terminus of the αCP1-KH2 domain in these constructs predicted a

stable protein with instability scores of 34.18 and 32.13 for αCP1-KH1/KH2 and αCP1-

KH2/KH3 respectively.

A B


81

The cloning of the truncated constructs was only initially attempted for the αCP1-

KH1/KH2 domain. The PCR amplification step was not successful. However, in the future

optimization of the PCR reaction may lead to successful amplification. Due to time

constraints, the cloning expression of these domains were not pursued further.

3.4 Circular dicroism and confirmation of correct recombinant protein

folding When a recombinant protein is produced, it is important to determine whether it has been

correctly folded. Circular dichroism spectropolarimetry is a useful biophysical technique.

The two informative regions in the CD spectrum are the far UV (below 250 nm), where the

peptide contribution dominate and the near UV (250-300 nm), where the aromatic side

chains dominate. α−helices and β−sheets are the two most common secondary structures in

protein. The alpha-helical specturm is characterised by two negative bands at 208 and 222

nm and a positive band at 192 nm. The CD spectrum of a typical β sheet has a negative

band at 215 and a positive band near 198 nm.

The three αCP1-KH constructs were each subjected to CD spectropolarimetric analysis as

described in Methods Section 2.7. Each domain produced a spectrum typical of a protein

mainly consisting of alpha-helical and beta-sheet secondary structure which is indicated by

a negative peak in the 208 to 222 nm region (Figure 3.13 (A).


82

Figure 3.13: Circular dichroism spectra of the KH1, KH2 and KH3 at 50 mM Tris buffer pH 8.00, 150 mM NaCl, 1mM EDTA and 1mM DTT. (A) The parameters used are as follows; band width - 1 nm, response - 2 seconds, sensitivity – Standard, measurement range - 300 - 200 nm, data pitch - 0.2 nm, scanning speed - 50 nm/min, accumulation – 100 scans At 20 ˚C. The spectra depict a negative peak at around 210 to 222 nm indicative of alpha-helical and beta-sheet secondary structured protein. Predicted circular dichroism spectrum of the KH1, KH2 and KH3 using the K2D site. (B) αCP1-KH1 spectrum based on the predicted α-helix and β-sheet percentages in the domain.αCP1-KH2 spectrum based on the predicted α-helix and β-sheet percentages in the domain, however missing the last helix in the domain. Note the missing helix from the C-terminus of αCP1-KH2 domain does not affect the fold of the protein as both domains generate similar profiles.

A

B


83

The expected percentages of α-helix and β-sheet for the three αCP1-KH domains are

indicated in Table 1. This was simply calculated by adding the total number of residues

forming β-sheets or α-helix and then dividing by the total number of amino acid residues in

the domain (Figure 3.1B).

The α-helix and β-sheet percentages were used to generate a predicted CD spectrum for a

correctly folded αCP1-KH domain using K2D site (http://www.embl-

heidelberg.de/~andrade/k2d/). This was done in order to compare the fold of αCP1-KH2.

The cloned construct of αCP1-KH2 that resulted in soluble protein was excluding the third

helix (Figure 3.1B), compared to a complete αCP1-KH domain. The spectra generated are

shown in Figure 3.13 (B). The shape of the two spectra is the same, but the intensity is not.

However, it shows that αCP1-KH2, although missing a helix is folded correctly.

αCP1-KH Domains α-helix (%) β-sheet %

αCP1-KH1 0.49 0.28

αCP1-KH2 0.32 0.36

αCP1-KH3 0.45 0.26

Table 1: The expected percentage of α helix and β sheet for the three αCP1-KH domains


84

3.5 Conclusions Recombinant protein is often used in laboratories when studying biological problems.

However, this is not always readily achievable due to problems faced in obtaining sufficient

quantities of soluble protein. In the current study, the main difficulty encountered was

protein aggregation, leading to insolubility, and protein instability. Protein insolubility in

αCP1-KH2 was solved, by truncating the domain. This was based on amino acid sequence

analysis, which revealed that the third helix at the C-terminus of the domain made the

domain unstable. Upon removal of the helix and re-cloning the domain, soluble protein was

readily achieved. In addition, other alternative methods not mentioned here were also tried

to increase protein solubility. These included protein overexpression at lower temperatures

than the normal 37°C and also a variation of IPTG concentrations. Protein instability in the

case of αCP1 full length was to some extent reduced with the addition of glucose and

sodium azide. Sodium azide is often used in protein purification to prevent bacterial growth

and protein degradation. N terminal sequencing was conducted for αCP1 to identify the

degradation product. It was highlighted, that the degradation site may reside in the KH2

domain. This could in the future be tested further by mutation of the appropriate residues.

Successful overexpression and purification of the various constructs made possible the

various biophysical and functional characterizations of αCP1 and the KH domains.

Chapter 4 Structural and NMR studies of

αCP1-KH3

Chapter 4: Structural and NMR studies of αCP1-KH3

85

4.1 Chapter overview

After the successful cloning, expression and purification of αCP1-KH domains, we aimed

to use the milligram quantities of pure protein in our biophysical studies. One of the initial

approaches was using structural methods. Therefore, in this chapter I used X-ray

crystallography and NMR to examine the structure and dynamics of αCP1-KH3 with its

target oligonucleotide.

In this way, I aimed to:

1) characterize the structural features underlying poly (C) binding specificity of αCP1-

KH3,

2) illustrate the three-dimensional (3-D) structure of αCP1-KH3 on a computer screen

and develop insight into how RNA and DNA targets fit in the protein binding site,

3) use molecular dynamics simulation to compare the αCP1-KH3 RNA binding

specificity, using the complex structure of Nova-2-KH3 bound to RNA as a

comparative model and

4) better characterize the solution properties of the αCP1-KH3 domain, using NMR

spectra both in the absence and presence of oligonucleotide.

4.2 Why Crystallography? The 3-D structure of molecules can greatly assist our understanding of their biological

function at the molecular level. The structure can provide information on how molecules

associate and interact, and how enzymatic reactions occur. Most importantly, molecular

structures can be exploited to help in the development of new therapeutic agents and drugs.

Information on both the fold and the atomic bonding of the molecule can be correctly

obtained from structural data. The great advantage of crystallography is that a molecule of

any size can be studied, unlike NMR which can only be used for molecules of less than

~100 kDa and more so for molecules less than ~30 kDa. However, using crystallography a

good diffracting quality crystal is required and this can be a challenging process. Also, the

structural information only provides a snapshot of the molecule. However, combining data


86

obtained from both NMR and X-ray studies can be highly informative about the core

regions of the molecule.

4.2.1 Crystallography

Visible light has a wavelength in the range of hundreds of nanometers while atomic

distances are in the order of 0.1 nm or 1 Å. X-rays from the electromagnetic spectrum fall

in the correct range of wavelength. Unfortunately, an X-ray microscope cannot be built

because unlike visible light, there is no known way to focus X-rays with a lens. Therefore,

the approach used involves the crystals being bombarded with a focused X-ray energy

source, the atoms cause scattering of X-rays, leading to a diffraction pattern from which

structural information is obtained (George and Lyle, 1989).

Figure 4.1: Crystals are used to diffract X-rays, resulting in a diffraction pattern. The diffraction pattern is processed using computer programs to solve the three dimensional structure of the protein.

4.2.2 Crystals and the unit cell

The first and most important step in X-ray crystallography is growing high-quality crystals.

Crystals are ordered three-dimensional structures that consist of repeating identical unit

cells. The unit cell is the smallest part of the crystal and a repeated array is representative of

a complete crystal. The repeated unit cells are important in X-ray diffraction. The

diffraction of a single unit cell is not significant but the repeated unit cells amplify the

diffraction signal, which can then be used for data analysis. A unit cell has dimensions,

including three edge lengths a, b, c and three angles alpha, beta, and gamma. Within the

unit cell, the position of the atoms are presented as their x, y, z Cartesian Coordinates

(George and Lyle, 1989).


87

4.2.3 Crystal Growth

Crystallization of proteins involves controlled precipitation of the protein sample. However,

precipitation does not always give crystals. This is because most often precipitation does

not involve the regular arrangement of the protein molecules into a crystal. Unfortunately

there is no way of confidently predicting protein crystal formation. It is basically a trial and

error approach. However, there are a number of parameters that need to be considered.

Anything that is likely to denature the protein, e.g., high and low pH, very low salt

concentrations and any other known conditions that may lead to the disruption of a complex

and to aggregation must be avoided (Nick, 1970).

The ionic strength of the protein sample plays a significant role in its solubility, which in

turn depends on the concentration and nature of the salt. Protein solubility is highly

influenced by the pH of the solution. Proteins are marginally soluble near their isoelectric

point (pI) because net charge is neutralized and electrostatic repulsion is minimized. At low

ionic strength the addition of salt promotes protein solubility by favorable interactions with

amino acid residues Arg, Lys, Asp and His. This process is known as “salting in”.

Therefore reversing this process can promote protein precipitation. At high ionic strength,

increased salt concentration decreases solubility (“salting out”) essentially by competition

for water. Surface charges are also modified, reducing intermolecular repulsions. Many

proteins are crystallized by this method (Alan and Phylis, 1960).

There are also a number of compounds that can be added to the protein crystallization

solution to alter protein solubility, leading to crystallization. Polyethylene glycol (PEG) is

used as a precipitant in crystallization conditions. PEG precipitates out proteins via size-

exclusion and competition for water. PEG’s of molecular weight 400-20,000 are often used,

typically in the 10-20% (w/v) range. Organic solvents ethanol, iso-propanol, tert-butanol

and 2-methyl-2,4-pentane-diol (MPD) precipitate proteins by disturbing the dielectric

constant of water and reducing ionic shielding. Temperature is also a determinant of

protein solubility. There is often a large difference in crystallization behavior by changing

the temperature, eg., from 22°C to 4°C.


88

Crystallization can take between a day to months and years. For small and often simple

molecules such as salt, crystals are often obtained easily by slow alteration of the solution

conditions. However, it is often very hard to obtain protein crystals, due to their larger size

and complexity. There are a number of methods for growing protein crystals, including

vapor diffusion (hanging drop), batch crystallization, microbatch crystallization,

microseeding and macroseeding. I utilized vapor diffusion and this is probably the most

common way of crystal growth. A drop of protein solution is suspended over a reservoir

containing buffer and precipitant. Water diffuses from the drop to the solution leaving the

drop, the protein becomes supersaturated and crystal nuclei form, leading to crystal growth.

Typically, hundreds or thousands of conditions are screened before a suitable condition is

found that leads to high quality crystals. Many protein samples may not crystallise

successfully. Imperfections in the crystal structure, caused by impurities in the protein

sample, can hinder the acquisition of high resolution data.

Proteins are crystallized on such a small scale that it is sometimes difficult to reproduce the

conditions accurately. This makes crystallizing proteins almost more of an art than a

science, and sometimes multiple methods are tried before crystals of the required size are

grown. They can form in many different shapes, from perfect diamonds to sharp needles.


89

Figure 4.2: (A) Schematic of the hanging drop vapour diffusion. This method involves suspending a drop that is a mixture of protein solution and precipitant solution over the well that contains precipitant solution. The cover slip is sealed on the well with a ring of vaseline. Volatile components (e.g. water, alcohols, ammonia, acetate) evaporate from the drop toward the buffer reservoir in the well due to the higher concentration of precipitant in the well and as result the protein concentration in the drop increases and very slowly crystals may form. (B) Crystal growth pathway: Crystal growth and nucleation occurs beyond the saturation point. In a hanging drop experiment, initially nucleation takes place to form a few crystal nuclei. Upon the formation of a few nuclei, the protein concentration drops to the crystal growth region, where crystal growth will take place. The time spent in the nucleation region is very important. Spending too much time may either result in precipitation or too many small crystals. Spending too little time or no time will result in no crystals.

4.2.4 X-ray Diffraction

Once a crystal is generated, it is exposed to a narrow beam of X-rays. Prior to X-ray

diffraction analysis, the crystals are often cryocooled with liquid nitrogen. Cryocooling of

the crystal protects and decreases radiation damage to it during data collection and

decreases thermal motion within the crystal, giving rise to better diffraction limits and

higher quality data. The electron clouds of the atoms in the crystal diffract the X-rays. The

diffraction pattern provides information on how the protein molecule is arranged inside the

crystal and about the structure of each protein molecule. This information is extracted from

the direction and brightness of the scattered rays. The diffraction pattern is then converted

into an electron density map using mathematical Fourier transform. These maps depict

contour lines of electron density. Since electrons surround atoms, it is possible to show

where atoms are located. The crystal is rotated while exposed to X-rays, to obtain a three-

dimensional picture and a computerized detector records two dimensional electron density

A B


90

maps for each angle of rotation. The third dimension comes from comparing the rotation of

the crystal with the series of images. Computer programs use this method to generate three-

dimensional spatial coordinates.

The location of each spot in the diffraction pattern is determined by the size and shape of

the unit cell and the inherent symmetry present in the crystal. The intensity of the

diffraction spot is proportional to the square of the structure factor amplitude. The structure

factor of each diffraction spot contains information relating to both the amplitude and phase

of a wave. The phases are not given in the diffraction pattern. The phases must be solved in

order to get an interpretable electron density map. This is known as the phase problem. A

number of ways can be used to obtain phase information. One way is molecular

replacement. The structure of a homologous protein can be used as a search model and then

molecular replacement can be used to solve the structure, if a structure of a related protein

exists. The related structure is used as a search model and then molecular replacement is

used to identify the orientation and position of the protein of interest within the unit cell.

The phases determined via this method are then used to generate an electron density map

into which an initial model can be built.

The electron density is a blurry representation of where the atoms are inside the protein.

From the sequence of the protein the order of the amino acids is known and then using 3-D

computer graphics programs such as O, the density is interpreted. The protein is built in

stages. Initially the backbone or the overall fold of the protein is assembled and then the

amino acid side chains are added, which produces an atomic structure. The model is further

refined, to generate refined cartesian coordinates of atoms and B factors. B factors relate to

the thermal motion of the atom. The process of refinement is repeated a number of times, in

order to get the best fit to the diffraction data. Each refinement is aimed to generate a more

accurate electron density map. The model is then revised until there is a very close

correlation between the diffraction data and model. The standard crystallographic R-factor

is a measure of the quality of the atomic model. The R value is the average fractional error

in the calculated amplitude compared to the observed amplitude. Although there are a

number of other factors involved, a good structure has an R value in the range of 15 to

25%.


91

4.3 What is NMR? NMR is another technique used for solving molecular structures. NMR also permits the

observation of the physical flexibility of proteins and the dynamics of their interactions

with other molecules. NMR experiments detect signals from the nuclei of the atoms and not

the electrons. In the magnetic field of the NMR spectrometer the nuclei of atoms act as

small magnets, which align their poles with the poles of the larger magnet. These small

magnets possess a resonance frequency that can be detected after perturbation by radio

waves at their resonance frequency. The radio waves act to make the nuclear magnetic

moments process (“wobble”) with coherence and to shift their overall magnet moment out

of alignment with the large magnetic field of the NMR machine magnet, thus giving rise to

the NMR signal. Each nucleus gives rise to an individual signal frequency depending on its

unique electronic environment, allowing individual signals representing particular nuclei to

be observed.

4.3.1 Protein NMR

Protein NMR spectroscopy is conducted on purified aqueous samples of the protein of

interest. The sample consists of ~300 to 500 μl of protein with a concentration of 0.1 to 0.3

mM. In the current study, the protein was recombinantly expressed, which is often easier to

make and obtain in sufficient quantities and additionally, it allows for the production of

isotopically labeled protein.

In protein molecules, the most commonly found isotopes of carbon and oxygen are 12 and

16 respectively. They are not useful for NMR as they do not posses nuclear spin which is

the physical property necessary for the NMR signal. While 14N is the most abundant

isotope of nitrogen, which does posses nuclear spin, it has a large quadrupolar moment,

which hinders generating high resolution information. As a consequence, for proteins

prepared from natural sources, we are limited to obtaining nuclear magnetic data only from

their protons. However, recombinant proteins can be isotopically labeled with the less

naturally found isotopes, 13C and 15N which are preferred for NMR experiments. To

achieve this, the protein is prepared in media containing 13C-glucose and 15N-ammonium

chloride.


92

Protein structural studies involve the use of multidimensional NMR experiments. Each

distinct nucleus in the molecule ideally is in a distinct environment and hence gives rise to a

distinct NMR signal. The multidimensionality of the experiment allows detection of signals

between two different connecting nuclei. The NMR signals in a protein sample arise either

from transfer of magnetization through chemical bonds (COSY type experiment) or

through space independent of the bonding structure (NOSEY type experiment). The latter

type of experiment has provided the traditional method of obtaining structural information

from NMR experiments A single multidimensional NMR experiment on a protein sample

may take several hours or even days, depending on the concentration of the sample, on the

magnetic field of the spectrometer and on the type of the experiment (Shuker et al., 1996).

In protein NMR spectroscopy a particularly useful spectrum is the HSQC, which stands for

Heteronuclear Single Quantum Correlation. The two axes in the spectrum are of a proton

and a heteronucleus axis, which is most often of 13Carbon and 15Nitrogen. The signals or

peaks in the spectrum arise for each 1H covalently attached to the heteronucleus. The 15N

HSQC experiment is probably the most frequently performed experiment in protein NMR.

Each amino acid (except proline and the N terminal residues) in the protein has amide

protons attached to the nitrogen in the peptide bond. A correctly folded protein will give

rise to well dispersed peaks and most of the individual peaks can be distinguished. The

number of peaks present should correspond to the number of residues present minus

prolines and the N-terminal residues, plus the signals due to amino acid side chains that

contain nitrogen bound protons. Assignment of the peaks corresponding to the individual

residues is not possible from just a single HSQC spectrum. A number of other experiments

are required in the assignment, which will not be discussed here. However, the HSQC

experiment is particularly useful in observing interactions with ligands, as in the case of

proteins and oligonucletides. Upon interaction of the ligand with the protein, the signals

corresponding to the protein may move, broaden or disappear. By comparing the HSQC of

the free protein with the one bound to ligand, peak purturbations can reveal the residues

affected at the binding interface or restructured due to binding. In the current study, this

method was used to analyse the interactions between �CP1-KH domain with RNA.


93

4.4 Results 4.4.1 Crystallization of αCP1-KH3

Crystals of αCP1-KH3 were grown using vapor diffusion as described in the Methods

section 2.11.2. Crystals typically grew in 2 days to dimensions of ~ 0.3 × 0.2 × 0.02 mm

with the outline of a rugby football, and diffraction data were collected to 2.1 Å resolution

(Figure 4.3).

Figure 4.3: Crystals of αCP1-KH3. αCP1-KH3 crystals were grown in 0.1 M HEPES pH 7.5 and 1.5 mM lithium sulfate using vapour diffusion hanging drop method

4.4.2 X-ray data collection

Data were recorded with a Rigaku R-Axis V imaging plate detector as described in the

Methods section 2.11.5. Data were integrated and scaled with DENZO and SCALEPACK

(Otwinowski and Minor, 1997). Structure factor amplitudes were calculated using

TRUNCATE (Collaborative, 1994). The data collection statistics are given in Table 1.


94

Table 4.1: Data collection and refinement statistics

Data collection Symmetry P21212 Unit cell (Å) a = 33.4

b = 71.0 c = 29.1

Measured reflections 14 721 Unique reflections 4310 Completeness (%) 98.1 (87.2) Rmerge(%) 5.7 (25.8)

Wilson B (+2) 28.2 Refinement

Resolution range (Å) 35.0–2.1 Rcryst (%) 21.4 Rfree (%) 25.4 r.m.s. deviation from ideal values

Bond length (Å) 0.012 Bond angle (°) 1.3

Average temperature factor (Å 2) 26.1 Number of water molecules 55

Values in parentheses are for the last resolution shell (2.16–2.1 Å).

Rmerge = Σ|I − |/ where I is the observed diffraction intensity and is the average diffraction intensity from several measurements of one reflection.

Rcryst = Σ|Fo| − |Fc|/|Σ|Fo| where |F0| and |Fc| are the observed and calculated structure factors, respectively.

4.4.3 Structure solution and refinement

The structure of αCP1-KH3 was solved by molecular replacement using the coordinates of

the Nova-2 KH3 RNA-binding domain (accession no. 1EC6) as the search model as

implemented in AMORE (Collaborative, 1994). With one molecule of αCP1-KH3 per

asymmetric unit, the estimated solvent content of the crystals is 39%. Matthew's

Coefficient was calculated as 2.0, which is within the normal range of proteins (Matthews,


95

1977). Success with molecular replacement was achieved using space group P21212, which

was also consistent with observed systematic absences. Molecular replacement with other

primitive orthorhombic space groups was not successful. Cycles of manual model building

and refinement were carried out using REFMAC (Collaborative, 1994). A total of 10% of

the reflections were used for Rfree calculations. The final model, containing 74 amino acid

residues of the αCP1-KH3 construct and 55 water molecules, has a crystallographic Rcryst of

21.4% (Rfree = 25.4%) with 96% of all amino acids within the most favorable region of a

Ramachandran plot. All residues were visible in the electron density map except the N-

terminal glycine and the C-terminal eight residues (SEKGMGCS) present in the construct,

as confirmed by mass spectrometry (measured MW 8525; expected MW 8552.72). The

final model has been deposited with the Worldwide Protein Data Bank (accession no.

1WVN).

4.4.4 Structural overview

The αCP1-KH3 adopts a classic KH type I domain fold (Grishin, 2001), with a triple-

stranded β-sheet held against a three-helix cluster in a βααββα configuration (Figure 4.4A).

The β-sheet is anti-parallel and displays the usual left-handed twist. From its inner surface

emanate numerous hydrophobic residues, which contribute both to the hydrophobic core

and the oligonucleotide binding cleft. The bundle of three amphipathic helices provides the

complementary hydrophobic surface within this compact motif. The N-terminal four

residues in the model (PLGS) are not shown. These residues, which are not part of the

αCP1 sequence but present due to cloning procedures, adopt a random coil structure.


96

Figure 4.4: The crystal structure of αCP1-KH3 (residues 279–356) solved to 2.1 Å resolution depicted in (A) cartoon form and (B) as a molecular surface in the same orientation. The structure is shown from the beginning of β-strand 1 to the end of α-helix 3, since the regions outside these bounds were random coil or not visible in the density. The GXXG motif, common to this oligonucleotide-binding motif, is colored blue. The ‘variable loop’ region between β-sheets 2 and 3 is colored pink. These regions bound the hydrophobic oligonucleotide-binding cleft that accommodates C-rich RNA or ssDNA. (C) The electrostatic potential emanating from the αCP1-KH3 structure calculated using the APBS software package (http://agave.wustl.edu/apbs/). Potential contours are shown at +1 kT/e (blue) and −1 kT/e (red) and obtained by solution of the linearized Poisson–Boltzmann equation at 150 mM ionic strength with a solute dielectric of 2 and a solvent dielectric of 78.5. The blue contour represents striking positive potential directing oligonucleotides to the binding cleft. (D) Stereo views of KH3.

D


97

β-strand 1 commences with the first native amino acid residue, Gln5. This strand extends

the length of the molecule and projects residues Leu10 and Ile12 into the hydrophobic core,

before breaking into a turn at Pro13. α-helix 1 is held in position through its hydrophobic

face (including Leu16, Ile17, Ile20 and Ile21) before its structure is interrupted by the

invariant GXXG sequence (Figure 4.4B, blue) that is essential to the KH domain

oligonucleotide-binding site. In the case of αCP1-KH3, where RQ fills the XX positions,

these side chains are projected outward and provide a hydrophobic edge of the

oligonucleotide-binding cleft. Numerous hydrophobic side chains also emanate from α-

helix 2 to form contacts with the inner face of the β-sheet, and to provide a hydrophobic

environment for oligonucleotide binding (Ile28, Ile31 and the aliphatic chain of Arg32).

Gly36 facilitates a break from helical secondary structure and the remaining two strands of

the β-sheet follow. They provide hydrophobic core residues Ile39, Ile41, Arg51, Val53 and

Ile55.

β-strands 2 and 3 are separated by the ‘variable loop’ (Figure 4.4B, purple), which bulges

slightly away from the β-sheet and forms the opposing edge of the narrow oligonucleotide-

binding cleft. This is the region of the greatest sequence variability between KH domains.

The C-terminal helix extends the length of the main body of the molecule with residues

Ile62, Leu68, Ile69, Arg72 and Leu73 projected into the hydrophobic core or towards

adjacent α-helix 2. α-helix 3 is not visible over the last six residues, due to high mobility.


98

4.4.5 The oligonucleotide-binding cleft

The oligonucleotide-binding site has long been supposed to involve the GXXG motif. This

has been confirmed through the recent structural analysis of four KH domains in the

presence of oligonucleotide. These include Nova-2-KH3 in the presence of a 20 base loop

of RNA (Lewis HA, 1999), hnRNPK-KH3 solved with a 10 base stretch of ssDNA

(Braddock et al., 2002a), KH3/4 domains of FBP solved in the presence of a 29 base

ssDNA (Braddock et al., 2002b), hnRNPK-KH3/DNA (Backe et al., 2005) and αCP2-

KH1/DNA (Du et al., 2005). In each of these cases, the main oligonucleotide contacts have

been made with the narrow hydrophobic cleft that runs between α-helix 2 and β-sheet 2 and

across the GXXG motif. It is thought that the narrowness of the cleft confers the specificity

of these KH domains for pyrimidines. Likewise, αCP1-KH3 possesses a narrow

hydrophobic cleft that would be expected to accommodate pyrimidine-rich RNA or ssDNA.

The edges of the cleft are polar and charged with basic side chains (Arg23, Arg 32, Lys40

and Arg51) providing attractive electrostatic forces for both the docking of the

oligonucleotide as well as making specific contacts with the oligonucleotide (see further

discussion below). The electrostatic potential emanating from αCP1-KH3 was calculated

using the Adaptive Poisson–Boltzmann Solution (APBS) software package

(http://agave.wustl.edu/apbs/) (Baker et al., 2001; Bank and Holst, 2003; Holst, 2001; Holst

and Saied, 1993; Holst and Saied, 1995) and is shown in Figure 4.4C. The contours

represent a numerical solution to the Poisson–Boltzmann equation (Davis and McCammon,

1990; Honig and Nicholls, 1995) and simulate the sum total of the electrostatic potential of

the molecule in salty aqueous media. The outstanding feature of the calculation is the

positive potential arising precisely from the oligonucleotide-binding cleft (blue contour).

This positive potential would provide an attractive force for the approach of the

oligonucleotide since its potential is dominated by the electronegative phosphate backbone.

4.4.6 Comparison with other KH domain structures

αCP1-KH3 shows high structural similarity to other type I KH domains. The seven most

similar KH structures, including hnRNP K (Baber et al., 1999; Braddock et al., 2002a)

Nova-2-KH3 and Nova-1-KH3 (Lewis HA, 1999; Lewis et al., 2000), FBP-KH3 and FBP-

KH4 (Braddock et al., 2002b), vigilin-KH6 (Musco et al., 1996) and FMR-KH1


99

(Musco et al., 1997), are shown superimposed in Figure 4.5A (in the case of NMR-derived

structures, the first chain in the PDB file is depicted). Their backbone traces are highly

convergent with pairwise root-mean-square deviation (RMSD) scores compared with

αCP1-KH3 over the matched regions (according to LSQMAN) <1.8 Å. Vigilin-KH6 and

FMR-KH1 show the greatest deviations, with several stretches of backbone fold unmatched

to regions within αCP1-KH3 (>3.5 Å away). These include the variable loop and the region

about the GXXG motif, which are also the regions that show the least definition in the

NMR-derived structures.

Figure 4.5B shows the deviations numerically, with α-carbon distances from matched

αCP1-KH3 residues plotted against the αCP1-KH3 residue number. The divergent regions

are shown as off-scale in this plot. The KH structures are superimposed with most α-carbon

atoms within 2 Å of the corresponding αCP1-KH3 atom. Apart from Vigilin-KH6 and

FMR-KH1, greater deviations only occur at the termini and variable loop region between β-

sheets 2 and 3. A subtle variation also occurs at the GXXG motif possibly reflecting the

inherent flexibility of the glycines. It is remarkable that these KH domains retain such high

structural similarity and yet possess distinct oligonucleotide-binding preferences.


100

Figure 4.5: Comparison of KH domain structures. (A) Backbone trace of αCP1-KH3 (grey) shown in stereo superimposed with those of other KH domain structures as listed. These include KH domain structures both in the absence and presence of bound oligonucleotide. (B) The α-carbon deviation for each KH domain residue from the corresponding aligned residue of αCP1-KH3 is plotted versus the αCP1-KH3 residue number. Amino acids >3.5 Å, or with no corresponding aligned residue, are indicated with an off-scale score (>5 Å). αCP1-KH3 shows the greatest structural similarity to its fellow poly (C) binding family

member, hnRNP K, with an RMSD of 0.63 Å. A structure-based sequence alignment of

these KH domains with the others serves to highlight the conservation of residues

reportedly underlying oligonucleotide binding (Figure 4.6). In particular, residues about the

GXXG motif as well as those in the β-strand 2 provide the main contact surface. Of these,

Ile 20, Ile 21, Ile28 and Ile 41 are highly conserved as bulky hydrophobic residues, and

Gly18, Gly22 and Gly25 are integral to the oligonucleotide-binding motif. Basic residues

Arg23 and Arg51 have also been shown to be involved in the oligonucleotide-binding

interaction and basic residues are retained at these positions except in Vigilin-KH6 and

FMR-KH1.


101

Figure 4.6: Structure-based sequence alignment of seven KH domains of high structural similarity to αCP1-KH3. Each KH domain was structurally aligned using LSQMAN against αCP1-KH3. Amino acid residues with α-carbon positions within 3.5 Å of a corresponding αCP1-KH3 residue are shown in black. Highlighted in purple are the amino acid residues that do not align well with residues of αCP1-KH3. Secondary structural elements, as defined in Lewis et al., are shown above the corresponding sequence in cartoon form. Parenthesized numbers represent the amino acid numbers at the start and finish of the superimposed core region for each structure, and indicate the extent of the structure used to calculate sequence identity with αCP1-KH3 (final column). The GXXG motif and the variable loop regions are blocked with grey. Amino acid residues reported to make contact with the oligonucleotide, in the cases of structures determined in complex with either RNA or ssDNA is highlighted in red, and the αCP1-KH3 predicted to make contact with oligonucleotide in the current study is highlighted in tan. NMR structures were structurally aligned on the basis of the first chain in the deposited PDB coordinate file and all were deemed to be representative of the set of structures.

4.5 Model of αCP1-KH3 bound to poly (C) oligonucleotide The high degree of similarity of αCP1-KH3 to Nova-2-KH3 has permitted its interaction

with poly (C) RNA to be modeled. Nova-2-KH3 has been structurally characterized,

complexed with a 20 base stem–loop RNA (Lewis et al., 2000) as well as in its

uncomplexed forms (Lewis HA, 1999). Oligonucleotide binding incurred no significant

structural differences in the backbone conformation, suggesting that the αCP1-KH3

structure may also represent a close approximation of its oligonucleotide bound form.

Poly (C) RNA was therefore positioned in the binding cleft of αCP1-KH3 by analogy to

this structure to help predict interactions that may underlie its poly (C) binding specificity.


102

The poly (C) RNA is positioned along the hydrophobic cleft and across the GXXG motif

with four bases making most of the contacts with the binding site. The orientation of the

oligonucleotide is with the sugar–phosphate backbone directed towards the helix edge of

the cleft and the bases, planar to the protein surface and pointing towards the centre and β-

sheet 2 (Figure 4.7).

Figure 4.7: Molecular surface of αCP1-KH3 showing modeled position of poly (C) RNA (orange) based on the Nova-2-KH3-RNA structure (accession no. 1EC6). The poly(C) tetrad is viewed from above the GXXG and variable loops, highlighting their position either side of the hydrophobic binding cleft

The possible electrostatic and hydrophobic contacts between αCP1-KH3 and RNA are

summarized in Figure 4.8A. These were determined with allowance for some molecular

flexibility (as assessed using molecular dynamics simulations using the CHARMM27

energy forcefield). They include non-specific hydrophobic interactions with Ile17, Gly18,

Cys19, Ile21, Ile28 and Ile41, which form the surface of the binding cleft, as well as

numerous electrostatic contacts to the sugar–phosphate backbone involving Gly22, Arg23,

Gln24, Gly25 backbone atoms (the GXXG tetrad) and contact with the Cyt4 sugar

hydroxyl by the Lys40 side chain amino group. Interactions that may help to favour

pyrimidine binding include Arg32 and Arg51 guanidino groups positioned in close

proximity to pyrimidine carbonyls (C2 carbonyls in Cyt3 and Cyt2, respectively; Figure

4.8B).


103

Interactions that could underlie cytosine specificity include potential hydrogen bonds

between Ile28 and Ile41 side chains and the central two cytosine bases (via their O2, N3

and N4 atoms). These isoleucines are conserved in hnRNP K and form an extensive

methyl–oxygen and methyl–nitrogen hydrogen bond network with the equivalent bases in

ssDNA (Braddock et al., 2002a). In addition, several water-mediated hydrogen bonds

between the protein and RNA occur fleetingly during the simulation. In particular, Ile41

carbonyl oxygen alternates between being hydrogen bonded to Cyt4 carbonyl and sugar

hydroxyl groups, and thus contributes to the preference for ribopyrimidyl oligonucleotide.

Figure 4.8: (A) Summary of potential interactions occurring between the modeled αCP1-KH3 and poly (C) RNA. (A) Poly (C) RNA-tetrad is represented schematically. Potential hydrogen bond interactions are indicated by dotted lines. Those form specific residue atoms to the RNA backbone are listed on the right, and those to the cytosine bases are listed on the left. The red dotted lines represent intra-molecular hydrogen bonds that may stabilize the RNA in its binding mode to the KH domain. Solid lines indicate Hydrophobic or Van der Waals contacts to the cytosine bases. (B) The positions of Arg 32 and Arg 51 side chains are highlighted beneath the molecular surface of αCP1-KH3. Potential hydrogen bonds to the poly (C) RNA are shown as dotted lines.

A B


104

4.5.1 Poly (C) RNA structure may favor binding

Many of the αCP1-KH3-oligonucleotide contacts would be predicted to occur upon either

RNA or ssDNA binding, such as the hydrophobic contacts listed above and electrostatic

interactions with Gly25, Arg51 and Lys40. Other contacts are precluded from occurring in

the case of ssDNA, due to the absence of sugar hydroxyl groups. These include potential

hydrogen bonds between sugar hydroxyls and Gly25, Arg32, Arg51 and Lys40 as well as

water-mediated hydrogen bonds as mentioned above.

Inter-nucleotide phosphate hydrogen bonds may also impact on the RNA structure and

potential interactions with αCP1-KH3. Phosphates of nt 2 and 4 can hydrogen bond to

sugar hydroxyls of nt 2 and 3, respectively. Phosphates of nt 1 and 3, on the other hand,

may hydrogen bond to Cyt1 and Cyt4 amino groups. The former of these interactions are

unique to RNA and the latter are also cytosine specific. Thus, it may be that the uniquely

stable conformation of RNA in this binding cleft, and in particular that of poly (C)-RNA,

favors binding to the KH domain.

αCP1-KH3 is reported to preferentially bind poly (C) RNA over other bases and over

ssDNA (Dejgaard and Leffers, 1996), though the ssDNA sequence is not clearly specified

in this study. The crystal structure of this domain confirms its adoption of the classical type

I KH fold and has allowed a precise model of its interactions with poly (C) RNA to be

examined.

Specificity for pyrimidines can be understood in terms of its narrow binding cleft that

would only readily accommodate the smaller bases. Specificity for cytosines over uracil or

thymine can also be rationalized on the basis of specific hydrogen bond interactions to

cytosine C2 carbonyl, N3 and C4 functionalities. Preferential binding to RNA over ssDNA

would be explained in part by sugar hydroxyl intermolecular hydrogen bonding. It may also

be that a poly (C) RNA oligonucleotide is able to contour perfectly in the binding cleft,

with inter-nucleotide hydrogen bonds from sugar hydroxyls stabilizing this conformation.

On the other hand, C-rich ssDNA has been shown to adopt very similar interactions with

hnRNP K, and is reported to bind just as well, if not better, than RNA to this closely related

KH domain (Braddock et al., 2002a).


105

4.6 NMR Studies of αCP1-KH3 In addition to our mRNA binding measurements using REMSA and SPR (see chapter 4),

we also employed NMR to observe the interaction of αCP1-KH3 with a 11-mer RNA

sequence 5’-UUCCCUCCCUA-3’, representing the αCP1 target site in the 3’UTR of

androgen receptor mRNA and to form a complex that would be suitable for crystallization

trials. The samples were prepared as described in the Methods Section 2.10.

4.6.1 Formation of a αCP1-KH3/11-nucleotide RNA complex

Mutational analysis of the UC-rich 51-nucleotide sequence in the 3’UTR of AR mRNA has

previously shown that the binding of αCP1 or αCP2 to this region is dependent on the

presence of the two cytosine triplets at the end of the sequence (Yeap et al., 2002). The 11

last nucleotides (5-UUCCCUCCCUA-3) were therefore prepared as the target for αCP1-

KH3 binding. The 15N-labelled αCP1-KH3 was then monitored using NMR spectroscopy

to observe the effects of the addition of the oligonucleotide to the sample.

The 15N-labelled αCP1-KH3 gave rise to well-resolved HSQC spectra (Figure 4.9A),

showing excellent dispersion in both 1H and 15N dimensions as well as narrow line widths.

The dispersion, particularly in the 1H dimension, indicates that the domain is likely to be

folded in its correct secondary and tertiary structure and the narrow lines are consistent with

this construct behaving as a monomer in solution. Seventy-nine single 1H–15N amide cross

peaks are observed, as would be expected for this 83-residue construct containing three

prolines. These and the N-terminal amine do not give rise to an amide cross peak. In

addition, 12 doublet 1H–15N amine cross peaks are distinguishable (between 108 and

114 ppm in the 15N dimension), which represent signals from the six glutamine and six

asparagine side chain amines.


106

Figure 4.9: The 1H–15N heteronuclear single quantum correlation spectra recorded at 25°C for 15N-labelled αCP1-KH3 before and after the addition of the 11-nucleotide RNA of sequence 5-UUCCCUCCCUA-3. (A) The uncomplexed spectrum and (B) the final titration point with αCP1-KH3 fully complexed with RNA. The crosspeaks on both spectra account for all of the expected resonances in the protein, possess narrow linewidths and are well dispersed. The movement of almost half of the peaks upon complex formation with RNA is consistent with a tight protein/RNA binding interaction.

Upon the addition of RNA to the 15N-labelled αCP1-KH3 sample, the positions of the

crosspeaks changed, reflecting altered electronic environments for many backbone NH

groups in the protein. The final titration point is shown in Figure 9B. The 1H–15N HSQC

spectrum remains well dispersed and has 79 well resolved crosspeaks. However at least 35

of the crosspeaks representing backbone NH correlations have changed position. This

demonstrates that the protein is fully complexed with the RNA (i.e. no evidence of

heterogeneity) and that the complex retains good solution characteristics with no evidence

of aggregation or the formation of larger complexes. In addition, the crosspeak movement

shows that almost half the backbone NH residues experience an altered electronic

environment upon interaction with RNA—more than would be directly at the protein/RNA

interface. This is unsurprising considering the long-range electrostatic effects that would be

expected to arise from the oligonucleotide’s phosphate backbone. Spectra acquired at

intermediate stages of the titration showed no evidence of a gradual movement of the

crosspeaks from their starting to finishing positions. This would be typical of a weak


107

interaction in which the chemical shift values represent averaged positions in this fast-

exchange regime. Rather, the peaks disappeared and reappeared in new positions,

suggesting a tight binding interaction and slow exchange relative to the NMR timescale.

The resulting �CP1-KH3/RNA sample was thus considered for crystallization trials. The

complex that formed appeared heterogeneous, the interaction appeared to be tight and the

complex retained good solution properties. The complex was therefore subjected to

crystallization trials using the Hampton screen 1 and the Natrix screen (Appendix A).

Unfortunately, there was no appearance of crystals.


108

4.7 Conclusions

This study has shown that oligonucleotide binding by αCP1-KH3 is likely to involve

extensive interactions with only four bases. The question remains as to how adjacent KH

domains are arranged when full-length αCP1-KH3 binds to RNA. It may be that the KH

domains are able to bind in relatively close proximity. Indeed, the two adjacent KH

domains (KH3 and KH4 of FBP) were shown to contact stretches of 6–7 bases,

respectively, with only 5 bases in between (Braddock et al., 2002b). In addition, the

consensus binding sequence for the αCP-2KL isoform involves three C-rich stretches (of

3–5 bases) separated by 2–6 A/U stretches (Thisted et al., 2001). Thus, αCP binding may

well involve participation by all three KH domains.

NMR spectroscopy was also used to confirm the ability of αCP1-KH3 to bind to an AR

mRNA sequence, as well as to demonstrate its complete complex formation with the 11-

nucleotide sequence at the final titration point. It also showed that almost half of the αCP1-

KH3 backbone NH resonances are affected by RNA binding. Assignment of the crosspeaks

would be required to determine which residues these are. The current studies show that this

protein and this protein/RNA complex would be highly amenable to further NMR studies.

This solution study also showed that the protein/RNA complex possessed stability in

solution over several weeks and subjected to periods of 25°C. Future efforts will therefore

focus on ensuring the utmost purity of the sample, which is the most common reason for

the absence of crystal formation.

The work conducted in this chapter represents the beginning of a structural and biophysical

examination of all three KH domains of αCP1. I was able to achieve the goals outlined at

the beginning. These studies were to provide a solid foundation for the remaining chapters

of this thesis. Understanding the basis for RNA-binding affinity and specificity of the three

KH domains will allow us to predict the occurrence of αCP1 interactions with mRNA and

better understand the multi-KH domain binding complex. Structural studies of αCP1-

KH3/AR mRNA will be a step towards describing the multiprotein AR mRNA complex

that influences its stability and possibly its translational efficiency in vivo. Structural insight

into such a complex may pave the way for the development of novel therapeutics aimed at


109

disrupting the complex. This could lead to AR mRNA instability and reduce the amount of

the AR in prostate cancer cells.

Chapter 5 Structural and NMR studies

of αCP1-KH1

Chapter 5: Structural and NMR studies of αCP1-KH1/DNA

110

5.1 Chapter overview

Having solved and analysed the structure of the isolated αCP1-KH3 domain as detailed

in the previous chapter, the next challenge was to investigate the structural features of

αCP1-KH domain in the presence of RNA or DNA target probes. To do this we again

used X-ray crystallography and NMR to examine the structure and dynamics of αCP1-

KH1 with its target oligonucleotide, a specific UC-rich region of the 3’UTR of AR

mRNA. In addition, the possible cooperative binding of αCP1-KH1 and the RRM

domains 1 and 2 of HuR was analysed. αCP1 and HuR are considered part of the post-

transcriptional control mechanism for AR expression. αCP1 have been shown to bind to

a specific 5-CCCUCCC-3 motif immediately adjacent to a U-rich sequence (AR mRNA

nt 3275 to 3325), which is the target for HuR binding.

In this chapter I aimed to:

1) obtain the 3-D structure of αCP1-KH1 domain with the target DNA or RNA

sequence,

2) illustrate the 3-D structure of αCP1-KH1/DNA on a computer screen and

develop insight into the nature of interactions between the DNA sequence (5-

TTCCCUCCCTA-3) and the binding site of the protein,

3) detect the solution properties of αCP1-KH1, using NMR spectra both in the

absence and presence of a 20-mer RNA sequence 5-

CUUUCUUUUUCUUCUUCCCU-3 and

4) determine whether NMR could be used to detect interactions of HuR RRM 1

and 2 with αCP1-KH1 bound to an RNA sequence with both a C-rich and U-

rich site.


111

5.2 Crystallization of αCP1-KH1/DNA

Crystals of αCP1-KH1/DNA were grown using vapor diffusion in 1 μl hanging drops

containing 1:1 mixtures of protein and reservoir solutions as described in Methods

Section 2.11.3. Crystals typically grew in two months to dimensions of ~ 0.2 x 0.2 x

0.04 mm with the outline of a diamond (Figure 1) and diffraction data was collected to 3

Å resolution. In addition, crystal growths were conducted with other screens and in the

presence of heavy metals. However, they did not result in better diffracting quality

crystals (Figure 5.1).

Figure 5.1: Crystals of αCP1-KH1/DNA. αCP1-KH1/DNA crystals were grown in 0.1 M sodium cacodylate pH 6.5, 0.2 M magnesium acetate, 30% MPD using vapour diffusion hanging drop method.

5.2.1 Structure determination of αCP1-KH1/DNA

The DNA sequence 5-TTCCCTCCCTA-3, analogous to nucleotides 3315–3325 of AR

mRNA, plus αCP1-KH1 (residues 14-86 preceded by the sequence GPLGSPGI present

due to cloning procedures) yielded crystals containing two crystallographically

independent copies of a 2:1 protein-DNA complex in the asymmetric unit. Equivalent

crystallisation experiments utilising RNA did not produce crystals suitable for structure

determination. Experimental phases were obtained by molecular replacement using


112

coordinates from the Nova-2-KH3 structure (pdb code:1EC6) with oligonucleotide

removed. The current refinement model has a working R factor of 24.7 % and a free R

value of 30.7% at 3.0 Å resolution (Table 5.1), with good stereochemistry 95% in the

allowed regions of a Ramachandran plot.

Table 5.1: Data collection and refinement statistics

Data collection

Symmetry P21

Unit cell (Å) a = 45.6

b =76.8

c = 61.4

Measured reflections 381063

Unique reflections 7993

Completeness (%) 99.8 (99.9)

Rmerge(%) 5.7 (65.8)

Wilson B (Å2) 84.2 Å2

Refinement

Resolution range (Å) 30.0 – 3.0

Rcryst (%) 24.7

Rfree (%) 30.7

r.m.s. deviation from ideal values

Bond length (Å) 0.010

Bond angle (°) 1.7

Average temperature factor (Å 2) 85.0

Number of water molecules 0

Values in parentheses are for the last resolution shell (3.11 Å – 3.0 Å) Rmerge = Σ |I- | / where I is the observed diffraction intensity and is the average diffraction intensity from several measurements of one reflection. Rcryst = Σ | |Fo|-|Fc|/ Σ |Fo| where |Fo| and |Fc| are the observed and calculated structure factors respectively.


113

5.2.2 Structural Overview

The αCP1-KH1/DNA structure solved to 3.0 Å resolution reveals two αCP1-KH1

domains bound at adjacent cytosine triads. The positions of eight out of eleven

nucleotides could clearly be seen in the electron density. Four KH1 monomers (named

A-D) occur within the asymmetric unit, existing as dimers as previously observed for

other KH domain structures (Figure 5.2) (Lewis et al., 2000; Sidiqi et al., 2005b).

The αCP1-KH1/DNA complex forms with a stoichiometry of 2:1, with monomer A

(and B) within the asymmetric unit clasped to the 5’ end of the oligonucleotide and

monomer D (and C) bound at the 3’ end. Although the two KH domains bound to the

same oligonucleotide are held very closely, they do not make contact with one another.

Similarly to a recent structure of hnRNP K-KH3 bound to a DNA 15-mer (Backe et al.,

2005), this reveals the way in which two KH domains may be closely juxtaposed when

bound at adjacent C-rich binding sites.

Figure 5.2: Structures of the αCP1-KH1/DNA complexes in the asymmetric unit. There are four complexes in the asymmetric unit, colored green, blue, yellow, and orange for complex A, B, C, and D, respectively.


114

In addition, the αCP1-KH1 dimer formation results in there being a continuous chain of

αCP1-KH1 domains linked through a dimerisation interface and through

oligonucleotide binding throughout the crystal lattice. Furthermore, it appears that these

continuous chains are crosslinked via disulphide bonds (S-S distance = 2.05 Å) between

Cys54 residues (of chains C and D within the asymmetric unit). An SDS-PAGE gel

containing αCP1-KH1 confirmed the predominance of disulphide linked dimers in the

sample, despite its initial preparation under reducing conditions (Figure 5.3)

Figure 5.3: αCP1-KH1 dimerisation. (A). SDS-PAGE analysis of αCP1-KH1. Lane 1: Molecular weight marker. Lane 2: αCP1-KH1 sample in the absence of a reducing agent. Lane 3: αCP1-KH1 sample in the presence of a reducing agent (DTT). αCP1-KH1 dimerises in the absence of a reducing agent forming a complex at ~ 16 kDa. The position of the dimer and monomer are indicated by the arrow. (B) Disulphide bond between Cytseine 54 in the complex C and D in the asymmetric unit.

The protein conforms to the classical type I KH domain structure, with a three-stranded

anti-parallel β-sheet packed against three α-helices in a βααββα topological

arrangement (Figure 5.4B). The structure is consistent with that reported for the αCP2-

KH1 homologue bound to a telomeric DNA sequence solved to 1.7 Å resolution (Figure

4A) (pdb code 2AXY: identity 97% and Cα pairwise RMSD 1.1 Å) (Du et al., 2005). In

summary, a hydrophobic core provides the structure’s stability, with hydrophobic

residues emanating from the inner face of the β-sheet (including Leu14, Ile16, Leu18,

Met20, Ala45, Ile47, Ile49, Ile59 and Leu61) and all three helices (including Glu24,

A B


115

Val25, Ile28, Val36, Ile39, Arg40, Ala67, Ile68, Ala71, Ile75, Lys78 and Leu79). This

core is partly exposed to create the base of the hydrophobic oligonucleotide binding

cleft. The seven N-terminal and three C-terminal residues in the cloned sequence were

not visible within the electron density, and were hence excluded from the model. The

model thus includes one residue occurring due to cloning procedures dependent upon

the cleavage site of PreScission, and residues 14-83 of αCP1 (Swiss Prot Entry

Q15365).


116

Figure 5.4: (A) Electron density map of αCP1-KH1/DNA. (B) Overall structure of the αCP1-KH1/DNA complex. The KH domain is rendered by cartoon representation in green (β sheets) and pink (α helices). The 5’-tetrad of the target DNA which form contacts with the first KH domain is shown, illustrating the positioning of the critical bases about α-helix 1 and between the GXXG and variable loops.

A

B

A


117

5.2.3 Oligonucleotide binding

The oligonucleotide is accommodated in a hydrophobic cleft formed across the top of

α-helix 1 and bounded by the GXXG and variable loops (Figure 5.4). Basic residues

surrounding the binding site, including Lys 23, 31 and 32 at the XX positions, Lys37

and Arg 40, 46 and 57, create a positive potential along the length of the cleft (Figure

5.5).

Such a potential, similarly observed for αCP1-KH3 (Sidiqi et al., 2005b), could provide

a driving force for the docking of the oligonucleotide to the site, as well as provide

specific electrostatic contacts to the bound oligonucleotide.

Figure 5.5: The electrostatic potential emanating from the αCP1-KH1 (in the same orientation as cartoon representation in Figure 4). Structure was calculated using the APBS software package (http://agave.wustl.edu/apbs/; (Baker et al., 2001; Bank and Holst, 2003; Holst, 2001; Holst and Saied, 1993; Holst and Saied, 1995). Potential contours are shown at +1 kT/e (blue) and -1 kT/e (red) and were obtained by solution of the linearized Poisson-Boltzmann equation at 150 mM ionic strength with a solute dielectric of 2 and a solvent dielectric of 78.5. The blue contour represents a positive potential directing oligonucleotides to the binding cleft.

The resolution of the data did not permit a clear distinction to be made between cytosine

and thymine bases in the nucleic acid sequences. The oligonucleotide was therefore

built from nucleotide 2 to 9 which placed “TCCC” sequences in equivalent positions in


118

the two αCP1-KH1 binding sites. This positioning of a cytosine triplet is consistent with

the other structural studies of KH domains bound to C-rich DNA sequences (Backe et

al., 2005; Du et al., 2005) and consistent with binding data reported in the following

section.

The oligonucleotide makes contact with αCP1-KH1 monomer A (and B) primarily

through bases 2-5 corresponding to the TCCC sequence. The cleft is narrow, with the

phosphate-sugar backbone of the oligonucleotide pressed against the bounding GXXG

loop. The pyrimidine rings lie towards the variable loop which defines the opposite edge

of the cleft. Bases 2-4 are positioned with their rings planar to the hydrophobic floor of

the cleft, where as Cyt-5 is somewhat raised away from the surface of the protein and

positioned to make base stacking interactions with the ring of Cyt-4. KH1 monomer D

(and C) makes equivalent contacts with bases 6-9 corresponding to the 3’ TCCC

sequence of the target oligonucleotide sequence, so that the two monomers are arranged

in a tail-to-head arrangement on the DNA. The oligonucleotide twists about the

phosphate bond of Cyt-4 to allow the second KH1 monomer to bind downstream of the

first site approximately 180 degrees about the oligonucleotide axis (Figure 5.6). The

second KH1 monomer makes no contact with the first TCCC sequence.

Figure 5.6: Cartoon representation of αCP1-KH1/DNA complex. αCP1-KH1/DNA complex form with two protein molecules bound to a single 11-nt strand of DNA. The protein dimer is shown in orange and the oligonucleotide is shown in green.

The αCP1-KH1/oligonucleotide interactions are summarised in Figure 5.7. Although at

low resolution, Van der Waals and potential hydrogen-bonding interactions that

underlie the interaction could be ascertained from the data. Interestingly, we observed


119

analogous interactions with the KH domains occurring from both the first and the

second TCCC tetrad, reinforcing the mode of the interaction. The following discussion

refers to the 5’ tetrad (bases 2-5) but equally applies to the 3’ tetrad (bases 6-9).

Residues making Van der Waals contacts with the sugar-phosphate backbone are listed

on the left of Figure 7. They include the GXXG sequence which comprises Gly30,

Lys31, Lys32 and Gly33. These glycines, which are totally conserved and the classical

sequence marker of KH domains, are positioned beneath the sugar atoms of bases 2 and

4 as well as the phosphate group of Cyt-3. The sidechains of Lysines 31 and 32

(representing XX in the GXXG motif) extend into the solvent, their aliphatic chains

providing part of the hydrophobic edge of the binding cleft. At least one arginine or

lysine is quite common at these positions amongst KH domains. Both backbone and

sidechain atoms contact backbone sugar and phosphate atoms of bases 3 and 4. In

addition, Ile29 is positioned beneath both bases 3 and 4 and form Van der Waals

contacts with their backbone atoms.

Figure 5.7: Summary of the contacts between αCP1-KH1 and bound DNA tetrad of sequence 5’-TCCC-3’. Van der Waals contacts are coloured orange, and potential hydrogen bond interactions are coloured blue. The residues making important contacts with the oligonucleotide sugar-phosphate backbone are listed on the left, and the residues making contacts with the pyrimidine ring, and thus underlying base specificity, are listed on the right. The interactions are representative of both KH domains in the αCP1-KH1/DNA complex and would also be expected to occur within a αCP1-KH1/RNA complex.


120

5.2.4 Residues underlying cytosine specificity

Of particular interest in this study, was the determination of the intermolecular contacts

underlying cytosine specificity – for which these proteins are named. The amino acid

residues that make contact with the nucleotide bases are listed to the right of Figure 5.7.

The first base (Thy-2) is contacted by Gly26 and Ser27 which forms an amide bond

planar to the Thy-2 pyrimidine ring. No nucleotide specific interactions are observed at

this position, and it has been shown that an adenosine may also be accommodated at this

site (Du et al., 2005). The basis for cytosine binding in the second base position (Cyt-3)

is dominated by interactions with Arg57 (Figure 5.9). Arg 57 is completely conserved

amongst poly (C) binding protein KH domains. Its sidechain projects from the variable

loop region towards the key functionalities of the cytosine base. Bipartite hydrogen

bonds can form from NH1 and NH2 to cytosine pyrimidine ring O2 and N3 atoms.

Ile29 lies directly beneath cytosine bases 3 and 4 and contributes to both sugar ring and

pyrimidine ring hydrogen bond contacts. Binding specificity for the third base (Cyt-4) is

conferred by Ile49 and Arg40, which are conservatively substituted and conserved

respectively in poly (C)-binding proteins. The backbone carbonyl of Ile49 is ideally

positioned to form a hydrogen bond with Cyt-4 N4. Arg40 extends from the C-terminal

end of α-helix 2 and is able to make hydrogen-bond contact with the Cyt-4 O2 atom.

These observations are consistent with those observed by Du et al, (2005) in their

analysis of αCP2-KH1 bound to DNA (Du et al., 2005). A slight difference occurs,

however, in the positioning of the Cyt-5, which was reported to make contacts with

Glu51 via its N4 and N3 groups. In the current αCP1-KH1 structure the Cyt-5 sugar-

phosphate backbone and base are positioned slightly differently. Instead of making

contact with Glu51, the N4 forms a hydrogen bond interaction with the phosphate of

Cyt-4. This difference may be a result of a steric impact by the adjacently bound KH

domain on the DNA 11-mer, but also suggests that a lesser contribution to binding may

be conferred by the third base of the cytosine triad. In other studies of poly (C) binding

KH domains bound to DNA, this third cytosine makes the least contact with the protein

(Backe et al., 2005; Du et al., 2005) and our SPR data (Chapter 6) show that mutation


121

of the third cytosine to a thymine is tolerated, where as mutation of the first and second

cytosines is not.

5.3 Comparison of αCP1-KH1 with other KH domains

The αCP1-KH1 domain retains a high structural similarity to other type1 KH domain

structures that have been solved both in the presence and absence of oligonucleotide

(Figure 5.8). These include the third KH domain of αCP1 (pdb code: 1WVN), the first

KH domain of αCP2 (Du et al., 2005), the third domain of the proteotypic poly(C)-

binding protein, hnRNP K (pdb code: 1ZZJ), Nova-2-KH3 (pdb codes: 1DTJ, 1EC6),

and FBP KH domains 3 and 4 (pdb code: 1J4W) (Baber et al., 1999; Braddock et al.,

2002a; Braddock et al., 2002b; Du et al., 2005; Lewis HA, 1999; Lewis et al., 2000;

Sidiqi et al., 2005b). With sequence identities of 31% to 37%, they retain a very high

degree of structural similarity with a pairwise r.m.s. deviation of between 1.1 and 1.5 Å

(excluding the variable loop region where their sequences and structures diverge)

(Figure 5.8A). Of note is the observation that oligonucleotide binding has little impact

on the molecular structure in the cases of hnRNP K and Nova-2-KH3 where both the

unliganded and oligonucleotide bound structures have been determined (Braddock et

al., 2002a; Lewis et al., 2000).


122

Figure 5.8: A structural comparison of several different KH domains. (A) Backbone superposition of several KH domains from different proteins. The structures overlay very well and are very similar except in the variable region. (B) A structural sequence alignment of closely related KH domains. The sequences are limited to the regions that have been structurally characterized in all determined structures and the numbering scheme for the first and last residue is shown in parentheses. The sequence identity and Cα pairwise RMSD of each protein structure compared with αCP1-KH1 is shown. Conserved amino acids are colored red and boxed. Amino acids that do not show any structural similarity occur predominantly in the variable loop region. Amino acids highlighted red are those shown, through structural studies, to be involved directly in oligonucleotide binding.

A

B


123

It is therefore of interest that the unliganded αCP2-KH1 structure (coordinates kindly

provided by the authors (Du et al., 2005), which possesses a sequence identity of 97%

with αCP1-KH1, shows a relatively low structural similarity to αCP1-KH1/DNA - with

a pairwise r.m.s. deviation 0.6 Å (excluding the variable loop). The αCP2-KH1

backbone varies significantly from that of αCP1-KH1 within the first α-helix and at the

GXXG and variable loop regions despite the sequences being identical at these

positions. This may be an artifact of the methodologies used to determine the structures,

or there may be a degree of conformational change due to oligonucleotide binding. In

particular, the variable loop region approaches the oligonucleotide relative to its

unliganded position, better positioning Arg57 for the formation of cytosine-specific

hydrogen-bond contacts (Figure 5.9). This would represent the first reported case of a

conformational change of a KH domain upon ligand binding.

Figure 5.9: Comparison of αCP1-KH1/DNA (purple) and αCP2-KH1 (cyan) backbone traces (C-terminal helix omitted for clarity). The traces deviate particularly along the first helix and GXXG loop, as well as at the variable loop. In the oligonucleotide bound structure the variable loop is positioned towards the oligonucleotide, able to make contacts via Arg57 (R57).


124

5.4 Comparison of αCP1-KH1 with other KH domain/oligonucleotide complexes

To compare whether the DNA binding mode of αCP1-KH1 is similar to that of other

KH domains bound to RNA or ssDNA, we superimposed the current structure with

Nova-2-KH3/RNA and hnRNP K/DNA and αCP2-KH1 structures (Figure 5.10A). The

four contacting bases are shown on the molecular surface of αCP1-KH1. The

oligonucleotide binding modes are similar. Each of the oligonucleotides is positioned

within the hydrophobic cleft – with their phosphate backbone towards the GXXG edge

of the cleft and the nucleotide rings tending towards the variable loop edge. What is

striking, however, is the significant degree of variation in the positioning of the bases –

even in the region of the conserved α-helix 1 and GXXG loop. In particular, the base

positions reported in the study of the complex between hnRNP K and a 5’d-TCCC

tetrad (Braddock et al., 2002a) are displaced by approximately half a base when

compared with the oligonucleotide bound to αCP1-KH1 in the current study (Figure

5.10B). This is unexpected, since these KH domains are relatively closely related and

both possess poly (C)-binding specificity. However, this difference may be due to the

methodologies used to solve the structures. The base positioning of the structure that did

not match well to our current structure was a solution NMR structure. The recent crystal

structure of hnRNP K with a 15-mer DNA (5’-TTCCCCTCCCCATTT-3’) (Backe et

al., 2005) is very similar to our αCP1-KH1/DNA structure. The 5’ core recognition

bases of hnRNP K-KH3 overlay very closely with αCP1-KH1 core recognition bases, in

particular the bases in positions two and three have the same base positions (Figure

5.10C).


125

Figure 5.10: Overlay of bound oligonucleotides from αCP1-KH1 (purple), αCP2-KH1 (cyan) hnRNP K-KH3 NMR structure (yellow), hnRNP K-KH3 crystal structure (red) and Nova-2-KH3 (green) structures as obtained by structural superimposition of KH domains. The surface of the αCP1-KH1 domain is shown. All three oligonucleotides bind in a similar fashion, but with unexpectedly large differences in base positions. The position of the αCP1-KH1 oligonuclotide is more similar to that of Nova-2-KH3 than the solution structure of hnRNP K-KH3, however very similar to the crystal structure of hnRNP K-KH3.

A

B C D


126

The oligonucleotide positioning within the αCP1-KH1/DNA structure is also similar to

that of the looped RNA 20-mer bound in the cleft of Nova-2-KH3 (Figure 5.10D)

(Lewis et al., 2000). This is not due to phase bias from the molecular replacement

model, as oligonucleotide was removed from the search model. Thus, despite the lower

identity of Nova2-KH3 (29 %) with αCP1-KH1 compared with hnRNP K-KH3 (37 %),

the mode of interaction with the KH domain is much more conserved. Nevertheless –

across all KH domain/oligonucleotide structures reported to date, most of the analogous

amino acid residues are reported to be involved in the oligonucleotide binding, albeit

with differing contacts.

5.5 NMR Studies of αCP1-KH1 domain In addition to our mRNA binding measurements using REMSA and SPR (Chapter 6 and

Appendix B), we also employed NMR to detect the binding of αCP1-KH1 with a 20

mer RNA sequence 5- CUUUCUUUUUCUUCUUCCCU-3, representing the αCP1 target

site in the 3’UTR of AR mRNA. This sample could then be used for crystal trials. The

samples were prepared as described in Methods Section 2.10.

Furthermore, we wanted to determine whether NMR could be used to detect interactions

of HuR RRM 1 and 2 (RRM1/2) with αCP1-KH1 bound to an RNA sequence with both

a C-rich site and U-rich segment. Again, the resulting multi protein/RNA sequence

could be used for crystal trials. The samples were prepared as described in Methods

Section 2.10.1.

Mutational analysis of the UC-rich 51-nucleotide sequence in the 3’UTR of AR mRNA

has previously shown that the binding of αCP1 or αCP2 to this region is dependent on

the presence of the two cytosine triplets, and that the binding of HuR is dependent on

the presence of one uridine triplet and one quintet at the end of the sequence (Ostareck

et al., 1997). The 20 last nucleotides (5-CUUUCUUUUUCUUCUUCCCU-3) were therefore

prepared as the target for αCP1-KH1 and HuR RRM 1/2 binding. The 15N-labelled

αCP1-KH1 was then monitored using NMR spectroscopy to observe the effects of the


127

addition of the oligonucleotide to the sample and also observe the effect of the addition

of HuR RRM1/2 to the complex sample.

The 15N-labelled αCP1-KH1 gave rise to well-resolved HSQC spectra (Figure 5.11A),

showing excellent dispersion in both 1H and 15N dimensions as well as sharp signals.

The dispersion, particularly in the 1H dimension, indicates that the domain is likely to be

folded in its correct secondary and tertiary structure and the sharpness of the signals are

consistent with this construct tumbling freely in solution most likely as a monomer in

solution. Seventy single 1H–15N amide cross peaks are observed, as would be expected

for this 73-residue construct containing two prolines. These proline residues and the N-

terminal amine, signals from 6glu and 6asp side chains do not give rise to an amide

cross peak.

Figure 5.11: The 1H–15N heteronuclear single quantum correlation spectra recorded at 25 °C for 15N-labelled αCP1-KH1 before and after the addition of the 20-nucleotide RNA of sequence 5- CUUUCUUUUUCUUCUUCCCU -3. (A) The uncomplexed spectrum and (B) the final titration point with αCP1-KH1 fully complexed with RNA. The crosspeaks on both spectra account for all of the expected resonances in the protein. The movement of almost half of the peaks upon complex formation with RNA is consistent with a tight protein/RNA binding interaction.

A B


128

Upon the addition of RNA to the 15N-labelled αCP1-KH1 sample, the positions of

specific crosspeaks changed, reflecting altered electronic environments for many

backbone NH groups in the protein. The final titration point is shown in Figure 11B.

The 1H–15N HSQC spectrum remains well dispersed and has 70 well resolved

crosspeaks. However, at least 23 of the crosspeaks representing backbone NH

correlations have changed position. This demonstrates that the protein is fully

complexed with the RNA (i.e. no evidence of heterogeneity) and that the complex

retains good solution characteristics with no evidence of aggregation or the formation of

larger complexes. In addition, the crosspeak movement shows that almost half the

backbone NH residues experience an altered electronic environment upon interaction

with RNA—more than would be directly at the protein/RNA interface. This is

unsurprising considering the long-range electrostatic effects that would be expected to

arise from the oligonucleotide’s phosphate backbone. Spectra acquired at intermediate

stages of the titration showed no evidence of a gradual movement of the crosspeaks

from their starting to finishing positions. This would be typical of a weak interaction in

which the chemical shift values represent averaged positions in this fast-exchange

regime. Rather, the peaks disappeared and reappeared in new positions, suggesting a

tight binding interaction and slow exchange relative to the NMR timescale.

To test for an interaction between HuR RRM1/2 and oligonucleotide bound αCP1-KH1,

NMR spectra were collected as the sample was slowly titrated with increasing

concentrations of HuR RRM1/2. Upon the addition of HuR RRM1/2 to the 15N-labelled

αCP1-KH1/RNA sample, the positions of the crosspeaks did not change, indicating no

apparent interaction between these proteins. The initial titration point with the lowest

concentration and the highest concentration of HuR RRM1/2 are shown in Figure 5.12

A and B respectively. Thus, assuming that HuR RRM1/2 bound to the poly U sequence

(and separate SPR experiments have shown that HuR RRM1/2 does bind poly U with

high affinity whilst αCP1-KH1 does not, Chapter 6), there was no evidence for

interaction between the adjacently bound proteins. This, however, does not preclude the

interaction of αCP1 full-length protein through either αCP1-KH3 and or αCP1-KH2

domains. Previous studies have shown interactions between HuB and hnRNP K-KH2


129

(Yano et al., 2005), which are HuR and αCP1 homologues. This experiment represents,

therefore, the first of a series of experiments that could be used to test for interactions

between RNA-bound proteins.

Figure 5.12: The 1H–15N heteronuclear single-quantum correlation spectra recorded at 25°C for 15N-labelled αCP1-KH1/RNA sample before and after the addition HuR RRM1/2 (A) The complexed spectrum with lowest HuR RRM1/2 and (B) the final titration point with αCP1-KH1/RNA fully titrated with highest concentration of HuR RRM1/2. The crosspeaks do not change position or appear and disappear. The signals are broadened due to protein dilution. There is no apparent interaction of HuR RRM1/2 with αCP1-KH1.

A B


130

5.6 Conclusions

The formation of a tertiary complex involving two αCP1-KH1 domains bound to a

single strand of DNA reveals not only the important contacts underlying poly (C)

specificity, but the possible juxtaposition of poly (C) binding KH domains at a target

oligonucleotide binding site. An optimised RNA target has been determined for the

αCP-2KL isoform (which is derived from an alternatively spliced αCP2 transcript

resulting in the deletion of 31 residues from the linker region between KH domains 2

and 3) (Thisted et al., 2001). The study revealed that the optimal target sequence for the

αCP protein encompasses three C-rich patches of 3-5 bases, each displaced by between

2 and 6 bases. Our study has confirmed that a single KH domain makes contacts with 4

bases and that there is no steric hindrance to the binding of two KH domains to adjacent

oligonucleotide stretches.

This, together with our demonstration that αCP1-KH2 does, in fact, bind to

oligonucleotide (Chapter 6), suggests that all three KH domains of full-length αCP

proteins are involved in oligonucleotide binding. The arrangement of such an

αCP1/oligonucleotide complex has been modelled on the basis of the arrangement of

KH domains seen in the αCP1-KH1/DNA crystal structure (Figure 13A/B). The KH

domains are bound to three adjacent poly (C) patches. Whilst domains 1 and 2, which

are separated by only a 13-residue linker region are most likely to bind alongside one

another (the linker is the ideal length for this arrangement), the third KH domain

(separated by 112 residues) could bind either side of this pair. In the case of the AR

mRNA sequence, two poly (C)-patches exist in the 51-nt region shown to bind αCP1

and αCP2 and to stabilise the mRNA (Yeap et al., 2002). These could be the binding

sites for KH1 and KH2. A third C-rich patch, however, exists 7 bases downstream and

could readily be targeted by KH3. The structure of the linker region between domains 2

and 3 is unknown. It is known, however, that a nuclear targeting sequence (NLS) exists

in this region (Wang et al., 1995) and it is likely, therefore that this sequence is within a

structure which appropriately presents the NLS to its recognition protein.


131

Figure 5.13: Model of full-length αCP1. (A) Model of the arrangement of the three KH domains of αCP1 contacting oligonucleotide on the basis of the arrangement of KH domains seen in the αCP1-KH1/DNA (B) crystal structure.

The question of whether dimer formation, as demonstrated within the crystallographic

arrangement of the αCP1-KH1/DNA and other KH domains (Lewis HA, 1999; Sidiqi et

al., 2005b), occurs in vivo remains to be investigated. However, previous studies have

shown self-association of KH modules. Nova-1-KH3 homodimerizes in solution in the

absence of RNA and without the contribution of other regions of the full-length protein.

Self association has also been observed for hnRNP K and also for αCP proteins and has

been suggested to dictate their biological function by associating with other effector

proteins (Ramos et al., 2002). It is unlikely, however, that a dimer would form between

KH domains within a single αCP/oligonucleotide complex. The dimer is formed using

interactions between the C-terminal helices and N-terminal strands, and does not

interfere with oligonucleotide binding. The oligonucleotide binding clefts, however, are

positioned too far apart for adjacent (separated by only 2-6 residues) C-rich sequences

to reach (Figure 13B). Dimerisation may play a role when one C-rich patch is distal to

the others, or it may represent a means by which several αCP molecules can interact in

a multiprotein/oligonucleotide complex. However, it is important to note that in vivo

dimerization and RNA binding will almost certainly be dictated by the presence of the

other KH domains, modulating the affinity and the cooperativity of the binding and

modifying the kinetics.


132

NMR spectroscopy was also used to confirm the ability of αCP1-KH1 to bind to an AR

mRNA sequence, as well as to demonstrate its complete complex formation with the 20-

nucleotide sequence at the final titration point. It showed that almost half of the αCP1-

KH1 backbone NH resonances are affected by RNA binding. As for interactions

between αCP1-KH1 and HuR RRM 1/2, our NMR studies did not show any signs of

interactions between these proteins but this does not exclude the possibility of

interaction through either αCP1-KH1 and/ or αCP1-KH2.

This study thus reveals a way in which full-length αCP molecules may interact with

their target oligonucleotides. The protein-bound oligonucleotide complex may not

involve any intramolecular interactions between RNA-binding domains, but

nevertheless effectively protects the oligonucleotide against nucleases, thus enhancing

RNA stability. The exposed surface of the complex may also provide clues as to how

the αCP/oligonucleotide complex docks with putative αCP binding partners such as

PABP, HuR and hnRNP D (Kiledjian et al., 1995; Wang et al., 1999; Wang and

Kiledjian, 2000; Yeap et al., 2002). This set of interactions may involve the interface

adjacent to the immediately 5’-bound HuR protein in the AR mRNA regulatory

sequence, raising the possibility that these interactions could underlie cooperative

binding to RNA (Kiledjian et al., 1995). The study thus provides insight into the mode

of αCP1 binding at a target oligonucleotide binding site and is the first step towards the

structural definition of multiprotein/oligonucleotide complexes involved in the

regulation of AR gene expression.

Chapter 6 SPR analysis of αCP1-KH

domains

Chapter 6: SPR analysis of αCP1-KH domains

133

6.1 Chapter Overview

Having looked at the structural features of αCP1-KH domains, both isolated and in the

presence of DNA, a number of key residues were identified as crucial for binding and

specificity. To further study the poly (C) binding specificity of αCP1-KH domains, we

were interested to thoroughly examine the binding of these domains to a number of

different RNA and DNA sequences, in particular, the 30 nucleotide at the 3’UTR of AR

mRNA, which contains the C-rich site. In addition, it has been previously shown that

interaction with the oligonucleotide target is mediated through the αCP1-KH1 and KH3

domains. However, a triplet poly (C) sequence has been shown to be optimal for binding,

suggesting that αCP1-KH2 is also likely to bind to a poly (C) sequence (Makeyev et al.,

2002). The binding of αCP1-KH2 has not, prior to this study, been demonstrated.

Therefore, in this chapter I aimed:

1) to measure the binding kinetics of the interaction between αCP1-KH domains and

the target oligonucleotides,

2) to investigate, whether there is a preference for binding to RNA over ssDNA, by

αCP1-KH domains,

3) to better understand the relative contributions of each αCP1-KH domain binding to

the overall affinity of αCP1,

4) to determine the basis of specificity of αCP1-KH domains for oligonucleotide and

5) to investigate the role of each cytosine in the four core recognition nucleotides.

6.2 Why use surface plasmon resonance (SPR)?

We chose SPR for studying αCP1 interactions with an oligonucleotide system for a number

of reasons. First, SPR does not require any labelling of the compounds for detection. It

depicts the binding process in real time. SPR can also provide kinetic data for bimolecular

interactions. This allows researchers to quantitate the binding characteristics of compounds

with their targets in terms of affinity, specificity, and association/dissociation rates, as

opposed to just the determination of the equilibrium constant such as in gel shift assays.

SPR experiments are also relatively rapid to conduct. Lastly, SPR has been used previously


134

for kinetic analysis of a number of different RNA/DNA and protein systems (Schuck,

1997b).

6.3 Principles and applications of Surface Plasmon Resonance Surface plasmon resonance is an optical sensing technique. This system detects changes in

refractive index within the vicinity (~300 nm) of the sensor surface (Figure 6.2). In the

Biacore© SPR system the sensor surface is made of a sensor chip. We used an SA

(streptavidin) chip, which irreversibly captures biotinylated ligands (Figure 6.1). This

sensor chip consists of a gold layer on a glass surface. Carboxymethyl-dextran-streptavidin

is attached to the gold surface forming the interaction layer (~100 nm thick). One of the

interacting molecules, referred to as the ligand, in our case a biotinylated oligonucleotide is

attached to this layer to provide a biospecific recognition surface for the other molecule,

referred to as the analyte, in our case αCP1-KH domains. This gold-dextran surface forms

one wall of a flow cell through which the solution flows (Hahnefeld et al., 2004; Torreri et

al., 2005).

Figure 6.1: Schematic depicting the streptavidin (SA) sensor chip. On the gold surface,

carboxymethyl-dextran layer is attached, on to which is attached streptavidin. Biotin has a strong

affinity to streptavidin. Biotinylated ligand is immobilized on the surface and interaction of the

protein solution with the ligand is monitored.


135

Figure 6.2: (A) The basic principles of SPR using Biacore©. L: light source, D: detector, P: prism, CD: carboxymethyl-dextran, SA: streptavidin. The two dark lines in the reflected beam projected on to the detector symbolise the light intensity drop following the resonance phenomenon at time = t1 and t2. The glass prism in contact with the gold layer is subjected to laser light. Light is reflected at all angles except the critical angle. At this particular angle a portion of the light energy excites the electrons (plasmons) in the metal film, thus generating the evanescent wave and also causing a reduction in the intensity of the reflected light. The angle at which this occurs is sensitive to the refractive index changes taking place near the backside the sensor surface. To this surface nucleic acids are immobilised and protein molecules are in the mobile phase running along a flow cell. The line projected at t1 corresponds to the situation before binding of the protein to the nucleic acid and t2 is the position of resonance after binding. The change in the angle is recorded by the detector and displayed in the form of a sensorgram. The angle change is reported as a resonance unit. 1000 RU represents a change of ~ 0.1°. For most proteins a binding of ~1 ng/mm2 at the dextran surface corresponds to a change of 1000 RU (B) Schematic of a sensogram. Initially there is no protein bound and the signal is flat (buffer flow). As protein is injected, it will interact with the nucleic acid, giving information on the association rate of the molecule. If injection is long enough it will reach steady state. At the end of injection, buffer flows over the surface giving information on the dissociation rate of the complex. Regeneration allows removal of any remaining bound protein (Katsamba et al., 2002).

B A


136

6.4 Kinetics Kinetic characterisation of a bimolecular system reveals the rates at which the complex

forms and dissociates. Biosensor technology such as the one used in SPR biacore allows a

real-time visualization of such bimolecular interactions (Figure 6.2).

In a sensogram the association phase is the period during which the analyte is being

injected and the dissociation phase is the time following the end of injection. During

association phase there is simultaneous association and dissociation occurring. Equilibrium

is approached when the rates of association and dissociation are equal. This is also termed

steady state and corresponds to the flattest region on the sensogram (Figure 6.2B). Under

ideal experimental conditions, only dissociation should take place during the dissociation

phase. Nonetheless, some rebinding often occurs.

The main factors influencing association rate are the concentration of the analyte near the

ligand (CA), the concentration of the ligand (CL) and the association rate constant (Kon). In

the case of high surface density, the rate of analyte binding can be greater than the rate at

which the analyte is being delivered to the surface. This is termed the “mass-transport

phenomenon” and is discussed in detail later in the chapter. Under such circumstances the

binding rate is “mass-transport limited”. Analysis of the association constant under such

situation will result in an apparent kon, which will be slower than the true Kon. It is

problematic to obtain the kon under such conditions. A number of experimental conditions

can be altered in order to eliminate mass-transport (Schuck, 1997a). These are all discussed

further on.

Data analysis used to quantify binding affinities for specific systems is an important step in

SPR analysis. Quantification involves fitting the data with a model, which reflects the

binding reaction and also takes into account any limitations. Kinetic data analysis can be

performed using the BIAevaluation software supplied by Biacore©. A number of binding

models are then examined for their capacity to fit real data. These are briefly described

below.


137

6.4.1 The Langmuir binding model

If the interaction being examined is anticipated to occur in a 1:1 stoichiometric ratio, then

the simplest model should first be examined. The 1:1 binding model, which is also known

as the one site model or the 1:1 Langmuir binding model describes the simplest bimolecular

interaction. This model assumes that the analyte concentration does not change during

association due to the constant flow of analyte, which is intended to prevent depletion and

accumulation of the analyte in the solution. It also assumes the analyte concentration is zero

during the dissociation phase and that the binding of the analyte to the ligand is 1:1. A

schematic of the model is shown below in Figure 6.3.

Figure 6.3: 1:1 Langmuir binding of analyte A to immobilized ligand B, to form AB complex. ka is the association rate constant and kd is the dissociation constant.

The chemical equation for the model is:

Eq.1

Where A represents the free analyte in the solution, B is the ligand immobilized on the

sensor surface. AB is the complex of the analyte bound to the ligand, ka is the association


138

rate constant and kd is the dissociation rate constant (Table 1). The thermodynamic

equilibrium dissociation constant is then equal to

The differential rate equation for the 1:1 model is:

Eq.3

Where d[AB]/dt corresponds to the rate of change of the concentration of the complex AB

at time t, [B]tot is the total concentration of the ligand site. This equation may be combined

Eq.2

Table 6.1: A brief explanation of the rate constants


139

with an approximation/assumption that the analyte concentration maintains its initial value

[A]i to give

Eq.4

Because the biosensor response is directly proportional to [AB]t. Eq.4 maybe rewritten as

Eq.5

Where, Rt denotes the response at time t, and Rmax is the maximal response obtained, if all

the available ligand-binding sites are occupied. An integrated form of this equation

Eq. 6

is used to determine Rmax and the rate constants ka and kd. An alternative form of the

equation is

Eq. 7


140

Where this equation expresses the time dependence of biosensor response in terms of first

order rate constant kobs and Req, the response at equilibrium. From equations 6 and 7,

equation 8 is derived:

Eq.8

Which is an equality arising from the definition of the association equilibrium constant,

Eq.9

Therefore, analysis of SPR data by this manner can also be used to determine the response

at equilibrium.

6.4.2 Determination of Equilibrium Constants

There are a number of ways to represent the affinity of interaction as shown in below.


141

The affinity constant can be measured directly by equilibrium binding analysis or via

measurement of the kinetic rate constants using equation 2. Equilibrium analysis involves

multiple sequential injections of the analyte at various concentrations and measuring the

level of binding at equilibrium. The analyte concentration should be ideally varied from

0.01*KD to 100*KD. An assumption in these affinity measurements is that the level of the

active immobilized ligand remains constant. The time it takes to reach equilibrium is

determined by the dissociation rate constant.

The affinity constant is obtained from the data by using the steady state or the Langmuir

binding isotherm model

Where “Bound” is measured in RUs (response units) and Max is the maximum response

(RUs). CA is the concentration of the analyte.

The KD and Max values are obtained by fitting of the above equation to the data using the

BIAevaluation software. A graph of equilibrium response against the concentration of the

analyte gives the KD, which is equivalent to the concentration of the analyte at which 50%

of the binding sites are occupied (Figure 6.4).

Figure 6.4: Graph of Req against the analyte concentration. This graph used to obtain the KD, which is equivalent to the concentration of the analyte at which 50 % of the binding sites are occupied.

Eq.10


142

6.4.3 The Two compartment or Mass Transfer model

This model is similar to the 1:1 model in that it assumes one to one binding between the

analyte and the immobilized ligand (Figure 6.5). The major difference is that the

concentration of the analyte in the flow is variable. This is because binding takes place in

two steps or compartments and each step has a different analyte concentration (Glaser,

1993). The bulk compartment has the same concentration of the analyte as the initial

injection concentration, while the concentration of the analyte at the surface compartment is

influenced by the rate of mass-transport of the analyte from the bulk compartment towards

the sensor chip surface and then binding of the analyte to the immobilized ligand. Both

processes have their own independent rate constants and each are incorporated into the rate

equations for this model (Myszka et al., 1998).

6.4.4 Other binding models

A number of other different models have also been used in SPR studies. These include the

two-site model or otherwise known as the heterogeneous model, avidity model and

conformational change model.

Figure 6.5: Mass transfer model: Mass transfer model for transport of analyte A to the surface (A) and binding of the analyte to the ligand B to form the complex AB. Kt is the mass transfer coefficient.


143

The heterogeneous model can either arise from a heterogeneous ligand or a heterogeneous

analyte (Morton et al., 1995). The heterogeneity of the ligand can be a result of the

immobilization chemistry adopted such as amine coupling (Figure 6.6). Natural sources

such as polyclonal antibodies, posttranslational modifications and also impurities can all

cause heterogeneity. The major assumptions of this model are that two different forms of

the ligand are immobilized and each presents an equally accessible binding site for the

analyte, which can interact in a simple bimolecular interaction. Each form of the

immobilized ligand presents binding sites with different binding affinities and also different

rate constants for the analyte, which is reflected in the sensogram as the sum of these. It is

also assumed here that the concentration of the analyte is constant throughout the flow cell

and does not change with time (Morton et al., 1995).

Heterogeneity of the analyte can occur naturally or through enzymatic degradation. This

can lead to two possible reactions taking place. Firstly, one analyte binding to two different

ligand sites and secondly, one analyte having two different affinities for one ligand site.

In the first case, the sensogram will present the sum of these two independent analyte

reactions, which is a consequence of two anlaytes binding to their independent binding

Figure 6.6: Heterogeneous model: This model can arise from a heterogeneous ligand surface, resulting in the formation of two complexes, each with a separate association and dissociation rates.


144

sites. However, in the second case, there is a competitive reaction occurring, where there is

only one binding site and there are either two different analytes or one analyte with

different affinities.

The bivalent analyte model describes an analyte with two identical binding sites, but only

one site binds to the ligand (Baumann, 1998). The second free site can stabilize the ligand-

analyte complex, however with no extra response, but with a change in the equilibrium

constant (Figure 6.7). The model usually fits well to data collected for antibody

interactions.

Figure 6.7: Bivalent model model: This model is when the analyte has two identical binding sites but only one site binds to the ligand. Analyte A, binds to the surface of the ligand B, to form the complex AB, resulting in one association and dissociation rate constant.

The conformational change model describes a situation where the analyte and the ligand

complex change conformation after binding (Figure 6.8). Although, the change is not mass

based, it still alters the response. This is because it modifies the equilibrium between bound

and free forms of the analyte. Such interactions are seen in some receptor-hormone and

antibody-antigen interactions (Jonsson, 1991).


145

Figure 6.8: Conformational change model: The conformational change model involves the analyte and ligand complex changing conformation after binding, forming the complex AB and resulting in one association and dissociation constant. 6.5 Deviation from 1:1 model A simple bimolecular interaction model does not always fit the data. Experimental artifacts

such as surface-imposed heterogeneity, mass-transport, aggregation, crowding, matrix

effects and non-specific binding can complicate binding responses. However, careful

experimental design can help avoid a number of these unwanted effects (Svitel et al.,

2003). Some of these will be discussed briefly below.

6.5.1 Sample purity

Biosensors measure the interaction of molecules with the sensor surface; therefore impure

samples can be used to conduct experiments. However, there are a number of setbacks with

using impure substances for kinetic analysis. This is because firstly, the impurities may

interact non-specifically with the surface of the chip. In order to avoid this, pure samples

should be used and these should be tested for non-specific binding by injecting the sample

over a reference surface, which is a surface without the ligand immobilised. This should be

done at the highest analyte concentration. Furthermore, if there is still considerable

background response, experimental conditions should be modified. For example, basic

proteins interact electrostatically with the carboxymethyl dextran matrix. This effect can be

reduced by increasing the ionic strength of the buffer and by changing the charge of the

matrix by blocking the surface with amines. Other available surfaces that can minimise

non-specific binding include a dextran matrix with lower charge, a flat carboxyl surface


146

with no dextran and a plain gold surface. Secondly, the impurities may increase bulk

refractive index change during association, which can be subtracted using a reference cell.

Artefacts such as bulk refractive index changes, matrix effects, non-specific binding,

injection noise and baseline drift due to temperature variation can be avoided by using a

reference surface. It is ideal to treat the reference surface with the same chemicals used for

the immobilisation of the ligand to maintain similar environments within the matrix.

Thirdly, the active concentration of the analyte may not be quantified accurately, which is

important for determining rate constants.

6.5.2 Aggregation state

It is very important to determine the aggregation state of the sample. Samples that self-

associate can complicate the binding response. The concentration of aggregates of the

analyte in the solution may be low, but it can reach high levels when bound to the surface.

For example, a 1000 RU signal is equal to a protein concentration of 1 ng/mm2 on the

biosensor surface (Davis et al., 1998). As a consequence, it is crucial to know that the

sample does not aggregate even at such high concentrations, which can be checked on a

non-reducing gel or by size exclusion chromatography and analytical ultracentrifugation.

6.5.3 Mass-transport

The occurrence of mass-transport effects can be determined by examining the impact of

changing the flow rate on the association constant. This is readily achieved by injection of

the same concentration of analyte over the immobilised surface at three different flow rates.

If it results in a different binding response then it is indicative of reactions influenced by

mass-transport. Mass-transport essentially represents an insufficient transport of the mobile

analyte to the sensor surface and as a consequence hinders the analysis of the chemical

kinetics of bimolecular reactions (Myszka et al., 1998).

In order to limit the effect of mass-transport, experiments should be conducted at high flow

rates, which will ensure that a constant concentration of analyte is delivered to the sensor

surface. Mass-transport problems can also be avoided by having a low surface density of

ligand. In addition, there are models in the Biacore BIAevaluation software that takes into

account mass-transport, when it has been shown to occur.


147

6.5.4 Steric Hindrance A number of studies have shown that high local concentrations of immobilised ligand can

lead to steric hindrance, which will restrict access of the analyte to the ligand binding site

(Schuck, 1996; Schuck, 1997b). The amount of ligand immobilised to the surface is another

important experimental condition to consider. Ligand density determines the binding

capacity. It is always highly recommended to use the lowest capacity surface possible, for

performing kinetic experiments. This is because it can minimise artifacts such as mass-

transport, steric hindrance, crowding and aggregation. Binding curves with maximum

responses of as little as 50 RU can still readily be measured.

6.5.5 Minimal non-specific binding

It is always very important to establish experimental conditions that produce minimal

nonspecific binding (David, 1999; Morton et al., 1998). Non-specific interactions can be

measured by testing binding of the analyte at high concentrations over a nonderivatized or

reference cell of the sensor chip. One approach of reducing nonspecific binding is by

changing the buffer conditions. In our experiments and other nucleic acid-protein

interaction systems (Katsamba et al., 2001), the standard buffer (10 mM Tris–HCl, pH 7.4,

150 mM NaCl, 0.5% TritonX-100, 62.5 μg/ml bovine serum albumin (BSA), 125 μg/ml

tRNA, 2 mM DTT and EDTA), also included 62.5 μg/ml BSA and 125 μg/ml tRNA to

prevent non-specific binding. tRNA would act as a non-specific competitor. With sequence

specific binding, an intermediary complex may form in which protein is bound to

oligonucleotide with lower affinity and in a non-specific manner. Addition of excess

competitor oligonucleotide, then excess protein that is not bound in a sequence specific

manner will be bound to the competitor, of which there is more, rather than the probe, thus

avoiding the formation of non-specific complexes. The presence of tRNA, therefore

presents a potential site for the analyte to bind nonspecifically, which can then be

subtracted from the specific binding data. BSA is also added to block nonspecific protein

binding. BSA would occupy sites on the ligand, which the analyte may potentially have

bound nonspecifically if available. In addition, the first runs after desorb or cleaning of the

Biacore machine, can suffer from adsorption of the analyte to the tubing and IFC-walls. For

example, a pre-run with high protein such as BSA can reduce this effect (BIACORE,


148

1997). In addition, BSA helps to stabilise some proteins and hence is used as a carrier

protein.

6.6 Results

6.6.1 Binding measurements of αCP1-KH domains to the 30 nucleotide 3’UTR of AR

mRNA

I first set out to determine the binding affinities of the single KH domains of αCP1 to its

binding site, within the 30 nucleotide 3’UTR AR mRNA nt 3275- to-3325 (5’-

CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3’) identified by Yeap et al (Yeap et

al., 2002). The nucleotide sequence of AR mRNA used in this study contains two cytosine

triplets at the 3’ end of the sequence. We utilised an SA chip for the analysis. The first flow

cell was the reference flow cell while in the second flow cells 5’ biotinylated RNA

representing the 30-nt target sequence from 3’UTR of AR mRNA was immobilised.

(Methods Section 2.9.2). This produced a stable and homogeneous recognition surface,

which is important for performing a detailed kinetic analysis. A high flow rate (50 μl/min)

and a low binding capacity surface (30 RU) were used to prevent the effect of mass-

transport and steric hindrance. After the immobilisation of the ligand, the sensor surface

was subjected to several injections of 2 M NaCl to test the integrity of the surface. For the

collection of binding kinetic data, a protein concentration series starting from 10 μM to

0.625 μM or 0.312 μM were injected over the immobilised surface and a reference surface

simultaneously. The protein concentrations were determined as described in Methods

Section 2.5. Responses from the reference surface were used to correct for the refractive

index changes and instrument noise, giving high quality sensor data. Each experiment was

conducted multiple times and the data from these experiments overlapped closely.

Representative sensograms are shown in Figure 6.9.


149

Figure 6.9: Binding studies of αCP1-KH1, KH2 and KH3 with RNA sequence, 5-CUGGGUUUUUUUUUCUCUUUCUCUCCUUUCUUUUUCUUCUUCCCUCCC-3, representing the 30 nucleotide sequence at the 3’UTR of AR. 30 RU biotinylated RNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1, KH2 and KH3 domains from 10 to 0.312 µM (10, 5, 2.5, 1.25, 0.625 and 0.312 µM) at increasing concentrations for 2 min using flow rate of 50 µl/min.

Visual inspection of the binding responses for the three KH domains of αCP1 shows that

each KH domain interacts with the target sequence to a different degree. αCP1-KH1

displays a quick on-rate and a relatively slow dissociation rate. αCP1-KH3 displays very fast

association and dissociation rates apparent from the steep drop to zero in the response curves.

αCP1-KH2 shows no propensity to bind at all.

The maximum response (Rmax) of each of the αCP1-KH domains to the RNA surface is

also different. αCP1-KH1 has the highest maximum response followed by αCP1-KH3 and

no apparent binding for αCP1-KH2. From the αCP1-KH1 and KH3 maximum responses

the stoichiometry of the surface molecular complex can be calculated using the equation

below

Eq.11


150

Where Rmax is the analyte binding capacity, which can be extrapolated from experimental

data, ligand level in response unit (RU) is the amount of ligand immobilized on the sensor

surface and MW stands for the molecular weight.

The nucleotide sequence of AR mRNA used in this study contains two cytosine triplets at

the 3’ end of the sequence, which constitute two αCP-KH domain target sites. The KH

domains can, therefore, potentially bind with a 2:1 stoichiometry to the surface-

immobilised oligonucleotide. The observed stoichiometry measured for both αCP1-KH1

and αCP1-KH3 using equation 11 was 0.91 and 1.1 respectively, representing a

substoichiometric interaction of the protein to the RNA surface. This is likely to be a result

of the unavailability of all of the RNA binding sites, which can be due to RNA adhering to

the chip in such a way as to block the binding sites or it may also result from the formation

of secondary structures in the RNA.

The data for αCP1-KH1 and αCP1-KH3 was hence analysed using the steady state method.

The dissociation equilibrium constants (KD) are shown in Table 1. The KD values for both

αCP1-KH domains are indicative of moderate binding affinity. However, αCP1-KH1 had a

higher affinity for the RNA target than αCP1-KH3. This is reflected in the much slower

dissociation of αCP1-KH1 from the RNA target, demonstrating the stability of αCP1-

KH1/RNA complex over time. In contrast, αCP1-KH3 had a much shorter half-life during

the same period of time, apparent from its steep dissociation rate.


151

To study the kinetics of the αCP1-KH/RNA interactions the kinetic data in Figure 6.9 were

modeled to a simple 1:1 Langmuir interaction and the other models available in the

BIAevaluation software. The use of these models did not result in good fits with the

association and dissociation curves, indicating a complexity of the interaction that may be

due to the presence of both core and secondary binding interactions.

Consistent with previous reports, no binding interaction was detectable between αCP1-

KH2 and RNA even at elevated protein concentrations (Dejgaard and Leffers, 1996; Sidiqi

et al., 2005a). Although, CD spectropolarimetry was consistent with the formation of a

folded αCP1-KH2 domain, and the knowledge that the missing final α-helix is not on the

oligonucleotide binding face of the KH domain (Chapter 3, Section 3.1), it is possible that

the truncation could have impacted on the ability of αCP1-KH2 to bind RNA. Thus, the

ability of αCP1-KH2 to participate in oligonucleotide binding remained uncertain from this

experiment.

6.6.2 Binding measurements of αCP1-KH domains to DNA sequence representing the

30 nucleotide 3’UTR of AR mRNA

This experiment aimed to see if there were any differences between the binding affinities of

αCP1-KH domains to a DNA sequence comprising the 30 nucleotide 3’UTR of AR. To

study the kinetics of αCP1-KH/DNA interaction on the biosensor, chemically synthesised

5’biotinylated DNA was captured on one SA chip flow cell, whereas a second, unmodified

flow cell served as reference surface. Responses from the reference surface were used to

correct for refractive index changes and instrument noise, producing high quality sensor

data. A representative data set of sensograms for αCP1-KH1, KH2 and KH3/DNA

interaction is shown in Figure 6.10.


152

Figure 6.10: Binding studies of αCP1-KH1, KH2 and KH3 with DNA sequence, 5-CTGGGTTTTTTTTTCTCTTTCTCTCCTTTCTTTTTCTTCTTCCCTCCC-3 representing the 30 nucleotide at the 3’UTR of AR. 30 RU biotinylated DNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1, KH3 and KH2 domains from 10 to 0.312 µM (10, 5, 2.5, 1.25, 0.625 and 0.312 µM) at increasing concentrations for 2 min using flow rate of 50 µl/min.

In contrast to RNA binding, all three αCP1-KH domains bound to the DNA sequence. A visual

inspection of the binding responses shows both αCP1-KH1 and αCP1-KH2 have a

relatively slow on and off rates while αCP1-KH3 shows both fast on rate and off rate. It is

also evident from response curves, that the maximum response of each of the αCP1-KH

domains to the DNA surface is different. αCP1-KH1 has the highest maximum response

followed by αCP1-KH3 and KH2. The stiochiometry of the molecular complex was

calculated for each αCP1-KH/DNA complex using equation 11. In the case of αCP-KH1

and αCP1-KH3 binding to DNA, the calculated stoichiometry of binding was ~ 2.3, which

indicated that about two molecules of αCP1-KH1/KH3 are binding to the target DNA

sequence. This is in agreement with the expected 2:1 theoretical response, as there are two

cytosine triplets present at the 3’ end of the target DNA sequence. However, in the case of

αCP1-KH2, the derived stoichiometry was 0.5, describing a substoichiometric interaction

of the protein to the DNA surface. The low stoichiometry may be indicating that either a

fraction of the protein was inactive or inaccessible for interaction with the ligand.

The data for αCP1-KH domains were again analysed using the steady-state method. The

dissociation equilibrium constants (KD) are shown in Table 2. The KD value for αCP1-KH1

domain is indicative of moderate binding affinity, while the KD of αCP1-KH2 and KH3 are


153

indicative of low affinity. In both cases the half-lives of the complexes are clearly shorter

than for αCP1-KH1. However, this is the first time that αCP1-KH2 has been reported to

show some binding, albiet with a very low response.

The kinetic data in Figure 6.10 were modeled with a simple 1:1 Langmuir interaction. The

model 1:1 interaction did not result in an excellent fit to the association and dissociation

curves of αCP1-KH1 and KH3 that may be associated with the occurrence of a complex

binding mechanism, either due to both core and secondary binding interactions. The data is

consistent, however, with slower off-rate and faster on-rate kinetics, underlying the higher

affinity of αCP1-KH1 over αCP1-KH2 and αCP1-KH3 for AR DNA over RNA.

Furthermore, other models such as the bivalent analyte, heterogenous ligand and analyte

models did not also produce good fits. It was not expected for the data to fit models such as

the bivalent and heterogeneous analyte model as αCP1-KH domains are monovalent and

αCP1-KH protein samples were pure and homogenous. The heterogenous ligand model

also did not result in a good fit. This model suggests that one analyte binds independently

to two ligand sites, which again is not possible for αCP1-KH domains, as structural studies

indicate that only one KH domain can bind to one triplet poly (C) site.

The only KH domain data that could be fitted using a simple Langmuir model was the

interaction of αCP1-KH2 with DNA (Figure 6.11), indicating a relatively fast on rate and


154

slow off rate, resulting in a KD of 6.49 μM. This agreed closely with the steady state KD of

5.46 ± 1.31 μM.

Figure 6.11: Kinetic analysis of αCP1-KH2 and DNA representing the 30 nucleotide of the 3’UTR of AR mRNA. 30 RU biotinylated DNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH2 domain from 10, 5, 2.5, 1.25 and 0.625 µM at increasing concentrations for 2 min using flow rate of 50 µ l/min. The black dotted line represents the experimental data and the purple lines represent the best fit to a simple bimolecular reaction model. The binding constants obtained from simple bimolecular reaction model. Steady state KD represents the equilibrium constant from steady state analysis. A KD of 5.46 µM is indicative of weak affinity.

Overall, the αCP1-KH domain RNA and DNA binding studies demonstrated that the three

KH domains have a binding preference for DNA over RNA comprising the 3’UTR of AR

sequence. However, it is not known to what extent this is due to RNA secondary structure

blocking binding, or an actual difference in binding preferences between KH domains for

RNA versus DNA. Each isolated domain is capable of binding to oligonucleotides

separately with widely ranging affinities ordered as follows: αCP1-KH1> KH3> KH2.

This study revealed for the first time that αCP1-KH2 is capable of specific binding to

DNA. The αCP1-KH2 construct used in the current study is truncated at the C-terminus

compared with the lengths of the other αCP-KH domains. The truncated C terminal region

A


155

was predicted to be unstable (Chapter 3, Section 3.1) and its removal proved necessary for

protein production. Despite this, circular dichroism spectra and the improved expression

and stability of the protein suggested it had folded correctly. In addition, since the C-

terminus exists away from the oligonucleotide binding-site, the truncation is unlikely to

impinge on oligonucleotide binding. It is therefore likely that, when held in the proximity

of oligonucleotide in the context of full-length αCP1, that αCP-KH2 participates in the

oligonucleotide binding interaction – even in the case of RNA binding. A similar RNA-

protein interaction has previously been described for the triple RRM molecule HuD (Park

et al., 2000). Whilst HuD-RRM3 was not observed to independently bind to RNA, it

clearly contributed to the binding interaction in the context of the full-length molecule –

primarily by reducing the off-rate of the protein-oligonucleotide interaction.

More detailed insight into the role of αCP1-KH2 domain and its recognition of nucleic acid

awaits structural information of the complex. However, these data are the first to

demonstrate αCP1-KH2 binding to DNA, albeit with low affinity and still no detectable

binding to RNA. It will be of great interest to investigate the binding of the longer αCP1-

KH2, if it can be successfully prepared, to DNA and RNA in order to examine the role of

the C-terminal helix.

6.6.3 αCP1-KH interaction with homopolymers

In order to understand the binding specificity of αCP1-KH domains to their target site, a

number of SPR experiments with RNA homopolymers were conducted. These included

poly (C), (G) and (U). 30 RU biotinylated poly (C), (G) and (U) RNAs were immobilised

on flow cells 2, 3 and 4 respectively. Flow cell 1 was used as a reference cell. Protein

samples were injected over the surface and binding responses of αCP1-KH1, KH2 and

KH3 at various concentrations were monitored. Representative sensograms for these results

are shown in Figure 6.12. Both αCP1-KH1and KH3 bound to homopolymer poly (C) but

none of the KH domains bound to the poly (G) or (U). In addition, αCP1-KH1 appears to

form a stronger complex than αCP1-KH3 with its much slower dissociation, in direct

contrast to the very fast off rate of αCP1-KH3. αCP1-KH2 did not show binding to any of

the RNA homopolymers.


156

A large negative refractive index change was present in the αCP1-KH2 response. This

could be attributed to a number of factors. Ober et al, (1999) have showed that there can be

considerable variability between the sensograms of the bulk shifts in the four channels,

even when the running buffer is matched with the injected buffer, bulk shifts of up to about

100 RU have been observed (Ober and Ward, 1999). Furthermore, this variability is even

more marked when a larger bulk shifts are introduced, e.g., by using buffer that is more

dilute than the running buffer, which essentially occurs when the analyte is introduced in

the buffer. It was also shown that these effects are further enhanced at low signal levels,

which is possible in the case of αCP1-KH2, as it did not bind at all. Even after subtraction

of the bulk shift from the data, large perturbations can remain present in the flow cells

(Ober and Ward, 1999). Extensive equilibration of the chip can minimise some of these

effects, however, there is no evidence that equilibration removes them completely.

To verify that αCP1-KH domains do not bind to poly (G) and poly (U) and that the lack of

binding was not due to incorrect immobilisation of the homopolymers, we injected full-

length protein HuR and HuR RRM 1 and 2 and monitored the binding responses. HuR is an

RRM-containing RNA-binding protein and binds to U rich sites. The results for these are

shown in Figure 6.13. As predicted HuR full-length and HuR RRMs bound to the poly (U)

but not poly (C) and (G), which confirm that the RNA surface was available.

A number of models were generated to the αCP1-KH1 and KH3 poly (C) homopolymer

responses. But, not surprisingly, non would fit very well. Firstly in the homopolymer there

are eight possible trinucleotide binding sites (CCCC CCCC CC). This may make the

binding response very complex. More than one molecule of αCP1-KH domain could

readily bind. The expected stoichiometry of αCP1-KH1 and KH3 using equation 11 is 1.1

and 1.3 respectively, indicating a substoichiometric interaction of the protein with the RNA

surface, suggesting that either a fraction of the protein was inactive or the DNA ligand

inaccessible for interaction with the protein.

The findings in this study are consistent with the results of previous studies using filter

binding assays and SELEX methods (Dejgaard and Leffers, 1996), where KH domains

were shown to bind only poly (C) homopolymers and not poly (G) and (U). However, a


157

detailed analysis of the importance of each of the three cytosine for αCP1-KH domain

binding has never been investigated. This is what I next aimed to do, outlined in the

following sections.

Figure 6.12: Binding studies of αCP1-KH1, KH2 and KH3 with RNA homopolymers, poly (C), (G) and (U). 30 RU biotinylated RNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1, KH3 and KH2 domains from 10 to 0.625 µM at increasing concentrations for 2 min using flow rate of 50 µl/min.


158

Figure 6.13: Binding studies of HuR and HuR RRM1/RRM2 with RNA homopolyers, poly (C), (G) and (U). 30 RU biotinylated RNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of HuR and RRM 1 and 2 domains from 100 to 6.25 nM (100, 50, 25, 12.5 and 6.25 nM) at increasing concentrations for 2 min using flow rate of 50 µl/min.

6.6.4 αCP1-KH domain binding to single poly (C) site

My next experiment aimed at examining αCP1-KH domain binding to a simpler system. A

single site oligonucleotide probe was designed comprising a C-triplet embedded in a poly

(A) sequence. This single site oligonucleotide probe was anticipated to interact with the

αCP1-KH domains with the simplest possible kinetics, since adenosine is reportedly, and

from our structural studies, not able to fit in the αCP1-KH domain oligonucleotide binding

cleft, and a 1:1 stoichiometric ratio is expected.

I again wished to compare RNA and DNA binding and therefore the following probes were

designed:

1) 5’-biotin-AAA AAA AAA A-3’ (RNA, as a control)

2) 5’-biotin-AAA AAA CCC A-3’ (RNA)

3) 5’-biotin-AAA AAA CCC A-3’ (DNA)


159

To study the kinetics of αCP1-KH1 and KH3 interactions (not αCP1-KH2 due to its

already established generally low affinity to DNA and RNA) with the above probes on the

biosensor, the RNA control probe, the RNA probe and the DNA probe were captured on

SA chip flow cells 1 to 3 respectively. A representative data set for the binding reaction is

shown in Figure 6.14. No binding was detected to the poly (A) RNA sequence.

Furthermore, as predicted, both αCP1-KH1 and KH3 bound to the C triplet RNA and DNA

motifs. To analyse the binding responses quantitatively a simple Langmuir model was

generated and fitted well to the curves as shown in Figure 16. The rate constants from

analysis of the kinetic data are shown in Table 3 along with the steady state equilibrium

constant (SS KD). These data showed several interesting features.

Figure 6.14: Binding studies of αCP1-KH1 and KH3 to a 10mer poly (A) (adenine) and triplet CCC (cytosine) sequence. 30 RU biotinylated RNA and DNA were immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1 and KH3 domains from 10, 5, 2.5, 1.25, 0.625 and 0.312 µM at increasing concentrations for 2 min using flow rate of 50 µl/min. αCP1-KH1 is described by a slower association and dissociation rate. αCP1-KH3 is described, by fast association and dissociation rate. A Similar pattern is observed for their interaction to DNA, however αCP1-KH1 appears to prefer DNA to RNA evident from higher response while αCP1-KH3 seems to bind equally well to RNA and DNA.


160

A visual inspection of the binding curves shows there is not a significant difference

between the RNA and DNA response of the αCP1-KH domains. However, the maximum

response for each of αCP1-KH domains to the RNA and DNA surface is different. αCP1-

KH1 gives rise to a lower maximum response when compared to αCP1-KH3 and in

addition, it has a lower response to RNA than DNA. In contrast, αCP1-KH3 gives rise to

the highest maximum response and furthermore with very similar responses for both RNA

and DNA. The stoichiometry of the molecular complex was calculated for each αCP1-

KH/RNA and DNA complex using equation 11.

In the case of αCP1-KH1 the calculated stoichiometry was 0.12 and 0.23 to RNA and DNA

respectively, describing a substoichiometric interaction of the protein to both the RNA and

DNA surfaces. In addition, αCP1-KH3 stoichiometry was also described by

substoichiometric interaction giving a value of 0.6 for both RNA and DNA. This low

stoichiometry could be due to steric hindrance or a crowding affect of the ligand (Figure

6.16), which will essentially block the analyte from gaining access to the ligand. The ligand

could have interacted with itself forming secondary structures or with the chip surface.

Other possible reasons for this low stoichiometry could be due to impure ligand in which

only a small fraction of immobilized material represents ligand molecules, though both

RNA and DNA were purchased pure. Some of the ligand could also have been inactivated

by the immobilization conditions.


161

Figure 6.15: Kinetic analysis of αCP1-KH1 and αCP1-KH3 to triplet CCC RNA and DNA sequence 30 RU of biotinylated DNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH1 and KH3 domain from 10, 5, 2.5, 1.25 and 0.625 and 0.325 µM at increasing concentrations for 2 min using flow rate of 50 µl/min. The black dotted line represents the experimental data and the purple solid lines represent the best fit to a simple bimolecular reaction model.

Figure 6.16: Steric hindrance. The limited availability of the binding site leading to substoichiometric binding could result from steric hindrance. 1 shows that the analyte (A) has easy access to ligand (L) 2 shows the ligand is inaccessible and 3 shows binding of one analyte may prevent access of another analyte molecule to the ligand.

Further analysis of each domain reveals that αCP1-KH1 binds more slowly to RNA than

DNA. In the DNA response the curve depicts a faster association indicated by the steep

rise. This is not observed in the RNA response. Moreover, αCP1-KH1 forms a much more

stable complex with the DNA. This is apparent from the binding constants obtained from

1:1 binding model (Table 3) (Figure 16). The association rate for αCP1-KH1/DNA is ten

times faster then that for αCP1-KH1/RNA, underlying an equilibrium dissociation constant

for αCP1-KH1/DNA 10 fold lower than that for αCP1-KH1/RNA (4.5 versus 48 μM). This


162

difference agrees with the KD obtained from steady-state analysis, although there is a ten

fold difference. These results suggest that αCP1-KH1 shows a preference for binding to

DNA.

In contrast to αCP1-KH1, the association and dissociation constants of αCP1-KH3 with

either RNA or DNA were not significantly different between the two, consistent with the

shape of the binding curves. This suggests that αCP1-KH3 binds RNA and DNA sequence

equally well. Similar binding to RNA and DNA has been described for the αCP1-KH3

domain of the closely related protein hnRNP K. Based on NMR titrations of the hnRNP K-

KH3 domain, equilibrium constants obtained for the RNA sequence UCCC and DNA

sequence TCCC corresponded to 1.8 and 2.2 μM. This is consistent with their study that no

close contacts or steric clashes were observed when RNA was modelled in the hnRNP K-

KH3/DNA complex by replacing the H2’ hydrogen in the deoxyribose of DNA by the 2’-

hydroxyl group in the ribose of RNA and the replacement of the Thy2 with a uridine

(Backe et al., 2005).

αCP1-KH2 was excluded from this experiment, as the protein had degraded, but we

predicted that it should also bind the DNA sequence. It will be of great interest to conduct

these experiments with αCP1-KH2 in the future.

6.6.5 αCP1-KH domain binding specificity

Our earlier experiments revealed that the KH domains of αCP1 are capable of binding to

both RNA and DNA sequences representing the 30 nucleotide 3’UTR of AR. In addition,

they have the ability to bind a 10 mer RNA or DNA sequence containing only a single C-

rich site (AAA AAA CCC A). Next, we were interested to investigate the significance of

each cytosine in the C-rich site by testing whether KH domains can tolerate mutations of

any of the cytosines. We systematically mutated each cytosine (C) with thymine (T), since

this pyrimidine would not preclude binding by the αCP1-KH domains due to steric

hindrance. Any loss of binding observed would signify the loss of a specific interaction.

For these experiments DNA sequences were adopted, as it was much easier to handle DNA

than RNA. The sequences included are listed below:


163

1) AAA AAA TTT A (DNA control sequence known as TTT)

2) AAA AAA TCC A (DNA sequence known as ATCC)

3) AAA AAA CTC A (DNA sequence known as ACTC)

4) AAA AAA CCT A (DNA sequence known as ACCT)

To study the kinetics of αCP1-KH1 and KH3 interaction with the above probes on the

biosensor, the DNA control probe and the target DNA probes were captured on the SA chip

flow cells 1 to 4 in the same order as listed above. A representative data set for the binding

reaction is shown in Figures 6.17 and 6.18.

Figure 6.17: Binding studies of αCP1-KH1 to systematic mutation of the triplet CCC site to thymine DNA sequences. 30 RU biotinylated DNA sequence was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH1 domain from 10, 5, 2.5, 1.25 and 0.625 µM at increasing concentrations for 2 min using flow rate of 50 µl/min. The triplet TTT sequence was used as a control. αCP1-KH1 does not bind to it. αCP1-KH1 also does not bind to TCC and CTC mutated sequences as αCP1-KH1 does slightly bind to CCT mutated sequence.


164

Figure 6.18: Binding studies of αCP1-KH3 to systematic mutation of the triplet CCC site to thymine DNA sequences. (A) 30 RU biotinylated DNA sequence was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH3 domain from 10, 5, 2.5, 1.25 and 0.625 µM (from top to bottom) at increasing concentrations for 2 min using flow rate of 50 µl/min. the triplet TTT sequence was used as a control. αCP1-KH3 does not bind to it. αCP1-KH3 does not also bind to TCC and CTC mutated sequences. αCP1-KH3 does slightly bind to CCT mutated sequence. (B) The chemical structures of thymine and cytosine.

No binding was detected to the TTT DNA sequence, as anticipated. Neither αCP1-KH1

nor αCP1-KH3 showed binding to the target DNA ATCC sequence. This showed that the

(A)

(B)


165

KH domains do not tolerate another base in the second position. This matches well with the

data obtained from our structural studies of αCP1-KH1 with DNA and also others such as

αCP2-KH1/DNA, hnRNP K-KH3/DNA and Nova-2-KH3/RNA. In each case, a core motif

of four nucleotides can be identified that is recognized by the KH domains and each of

these bases plays an essential role in binding. The four core bases essential for KH binding

are TCCC (DNA), ACCC (DNA), T/CCCC (DNA) and UCAY (RNA, Y is a pyrimidine)

for αCP1-KH1, αCP2-KH1, hnRNPK-KH3 and Nova-2-KH3 respectively (Backe et al.,

2005; Du et al., 2005; Lewis et al., 2000). Analysis of these structures reveals that the

identity of the first base in position 1 is very variable. It is not involved in base-specific

interactions. Position 1 and 2 bases act as a molecular prong holding onto helix one of the

KH domain. However, recognition of the second nucleotide (cytosine in all four mentioned

complexes) in the core sequence is extremely specific. The cytosine in position 2 (Cyt2) is

specifically recognised by a number of hydrogen bonds in all the complexes. In particular, a

conserved arginine hydrogen bonds from NH1 and NH2 to cytosine pyrimidine ring 02, C2,

N3 and C4 atoms. In addition, Gly22 backbone carbonyl is positioned to make specific

hydrogen bond contacts to the Cyt N4. This recognition mode is present in all the

complexes and the unique set of hydrogen bonds is only compatible with a cytosine in this

position. As confirmed from our SPR data the equivalent interactions are not possible with

the thymine base mutated in this position. Furthermore, Van der Waals contacts are

mediated through a set of conserved hydrophobic residues and they appear not cytosine

specific but rather pyrimidine specific. However, our SPR results show that interactions

made with just the conserved hydrophobic residues are not enough for KH binding. It

requires the complete set of hydrogen bonding as well as the hydrophobic interactions,

which can only be formed if there is a cytosine base present in position 2 of the core

sequence.

αCP1-KH1 and KH3 were also intolerant of a mutation in Cyt in position 3. When cytosine

in position three was mutated to thymine (ACTC) the binding was also completely

abolished as shown in Figures 6.17 and 6.18. This is also consistent with structural studies.

In the case of poly (C) binding proteins, Cyt3 is also involved in a specific network of

hydrogen bonds. Here also a conserved arginine and isoleucine make extensive hydrogen

bonds to the N3, N4, O2 and C2 of the cytosine (Figure 6.19). Again, it is evident that this


166

unique set of hydrogen bonds is only possible with a cytosine residue in this position.

Interestingly, the third residue is an adenine in the core tetranucleotide for Nova-2-KH3,

which form the same hydrogen bonding partners as a cytosine in this position. That is N6 in

Ade and N4 in Cyt hydrogen bonds to the backbone carboxyl oxygen of a conserved

isoleucine residue. It is not known whether αCP1-KH domains would similarly tolerate

adenine.

Figure 6.19: Schematic depicting the complex αCP1-KH1/DNA. The four core bases TCCC are shown in orange, making contact with a number of amino acids shown with element colours. These contacts are essential for KH binding.

Lastly, we mutated the fourth cytosine (ACCT) and here, interestingly, the domains did

slightly tolerate thymine as evident from the small response in the binding curves. The fact

that some binding was observed for ACCT but not ATCC, suggests that the αCP1-KH

domains are binding in the intended register, rather than simply accommodating a CC in the

second and third positions. In addition, αCP1-KH3 gave a higher response than αCP1-

KH1. Recognition of the last residue in the tetranucleotide in all the complexes is less

specific. In hnRNP K-KH3 recognition of the Cyt4 is made through a water mediated

hydrogen bond to the protein while in αCP2-KH1 and Nova-2-KH3, this fourth base is

stabilised by stacking to the bases on either side and Van der Waals contacts are maintained

by conserved amino acid residues and a hydrogen bond to Glu51. Also, in our αCP1-

KH1/DNA, structure this residue is a cytosine and the structure reveals that this residue is


167

not so extensively contacted by KH domains as the first three bases. It also shows that

conserved amino acid residues are positioned close by and are able to make contacts with

O2 and N3 atoms of the pyrimidine ring. Therefore, irrespective of whether this position

contains a cytosine, uracil or thymine, some contact is still possible.

6.7 Conclusions There are several structural studies of single KH domains complexed with DNA and RNA,

which have provided great insight into the molecular basis of their sequence specific

binding. However, there is not much in the literature regarding the binding affinities and

kinetics of KH domains, in particular of αCP1-KH domains. This study is the first time

such a detailed analysis of binding of αCP1-KH domains to RNA and DNA sequences

have been conducted. The findings obtained verify some of the results predicted from

structural studies especially regarding the need for maximum KH binding to a core

tetranucleotide recognition sequence. Our mutational studies of the four core bases

confirmed the importance of cytosine in position two and three. Furthermore, this is the

first time that binding of αCP1-KH2 has been observed to DNA. The binding is consistent

with the conservation of amino acid residues shown to be important for αCP-KH domain

architecture and oligonucleotide binding. If such interaction is also observed in vivo then it

is likely that, when held in the proximity of oligonucleotide in the context of full-length

αCP1, that αCP-KH2 participates in the oligonucleotide binding interaction – even in the

case of RNA.

This study showed that isolated αCP1-KH domains prefer DNA sequence over the RNA

sequence of AR mRNA, but this was not observed when a simple 10-mer sequence with

only one triplet cytosine site was present. αCP1-KH1 exhibited a stronger affinity to the

10mer RNA but aCP1-KH3 did not show any significant difference in affinity to either the

10mer RNA or DNA. What can be concluded from this study is that each of the individual

domains can function as a discrete and independent RNA and DNA binding unit, albeit

with different levels of binding activity. These results indicate that the basis for differences

in αCP1 binding activity to RNA and DNA sequence results from binding of a different

KH domain. This differential αCP1-KH domain behavior has also been shown in previous


168

studies. For example, αCP1 participates in the regulation of mouse μ-opioid receptor

(MOR) gene regulation. EMSA studies have showed that the MOR DNA binding activity is

mainly due to KH1 domain and with some activity from KH2 and KH3 of αCP1 (Malik et

al., 2006). Another study has shown that the KH3 domain of hnRNP K, another poly (C)

binding protein, is the main domain recognizing DNA sequences (Braddock et al., 2002a;

Paziewska et al., 2004). These studies highlight structural differences between αCP1 and

hnRNP K as both can bind C-rich sequences but primarily with a different domain.

Each KH domain of αCP1 may behave differently in the context of the full-length protein.

For example, αCP1 can bind to the α-globin 3'-UTR effectively, leading to stabilization of

the mRNA, but hnRNP K cannot (Waggoner and Liebhaber, 2003b). In addition, αCP1 can

bind MOR DNA sequence (containing poly C sequence), but not hnRNP K. These studies

raise an interesting question of whether the differences in binding of these two proteins

result from protein topology differences or structural cooperativity between the different

domains.

To study stringently the binding behavior of αCP1-KH domains and their preferences to

RNA or DNA sequences, it is also important to consider a combination of two sequential

domains. This may elucidate either additive or synergistic effects of the combined domains

on the overall binding and may then shed light on sequence preference. However, it is

important to note that binding of αCP1-KH domain to either RNA or DNA sequences in

biological systems, both may lead to modulation of gene expression.

Chapter 7 General Discussion and Future

Work

Chapter 7: General Discussion and Future Work

169

7.1 Chapter Overview

We now appreciate that mRNA stability represents a key point of regulation in the control

of gene expression of a vast array of molecules. The duration of time a mRNA transcript

spends in the cytoplasm or nucleus prior to degradation can have profound influence on the

amount of the final protein product in the cell, which in turn can modulate biological

activity of the cell. Some mRNAs have short mRNA half-life of minutes while for some it

is several hours (Chkheidze et al., 1999; Jacobson and Peltz, 1996). Given that a large

range of proteins involved in key biological processes are regulated at the level of mRNA

decay, the field has focused in understanding the functional and structural biology

associated with basic protein-mRNA interactions.

The identification of mRNA cis-elements which interact with trans-acting RNA-binding

proteins has enabled detailed characterisation of the molecular mechanism involved in

regulating mRNA decay. The cis-element that I focused on in this thesis resides in the

3’UTR of AR mRNA, a UC-rich highly conserved 50 nt sequence.

The AR plays a critical role in the growth of prostate cancer. Prostate cancer constitutes a

major health issue in Western countries where it is now the second leading cause of cancer

deaths (Heinlein and Chang, 2004). Of both scientific and clinical interest is defining the

mechanisms that modulate AR gene expression, which would be of great value for the

development of novel prostate cancer therapies. AR expression is maintained throughout

prostate cancer progression, and the majority of androgen-independent or hormone

refractory prostate cancers express AR. Mutations of AR especially affect AR ligand

specificity and may add to the progression of prostate cancer and the failure of endocrine

therapy by permitting AR transcriptional activation in response to antiandrogens. In

addition, differences in the relative expression of AR coregulators have been found to occur

with prostate cancer progression and may contribute to differences in AR ligand specificity

or transcriptional activity (Heinlein and Chang, 2004).

The UC-rich cis-element of AR is thus an ideal target for investigation, as the AR mRNA is

regulated significantly at the level of stability. The element is a target for at least two


170

families of RNA-binding proteins and reporter assays indicate the importance of the

element in regulating AR mRNA turnover.

The present study was designed to explore the mechanism of binding of αCP1 to the target

AR UC-rich sequence at the 3’UTR of AR mRNA, in order to begin to understand how the

larger HuR/αCP1 complex might regulate the stability of AR mRNA. My aims were to

determine the structural basis of αCP1 and its binding to the C-rich region of AR mRNA,

with reference to its affinity and specificity. In addition, I aimed to characterise the kinetics

and binding affinities of the isolated αCP1-KH domains 1, 2, 3 with target probes, as well

as a variety of other RNA and DNA probes.

7.2 Stability of αCP1-KH domains

Previous studies have suggested that the minimum size of a KH domain is 68-72 amino

acid with the following amino acid domain boundaries for αCP-KH1 15-80, αCP-KH2

100-167 and αCP-KH3 282-348 (Ito et al., 1994). Our studies of αCP1 KH domains of 62-

65 amino acids proved unstable, resulting in aggregation of the protein in the insoluble

fraction. Each domain was missing approximately 10 amino acid residues from the C-

terminus when compared to the domain boundaries of previous studies (Dejgaard and

Leffers, 1996), suggesting that the aggregation of the protein may have been due to the

missing residues. It appeared that the C-terminus conferred a stabilization effect upon the

domain. The residues from the C-terminus corresponded to almost half of the third α helix.

The absence of this helix can expose some of the hydrophobic core residues to a

hydrophilic environment and hence cause the aggregation of the protein. This C-terminus

stability effect was supported when protein aggregation did not occur upon the extension of

αCP1-KH2 domain at the C-terminus. However, this extension only produced soluble

protein for αCP1-KH1 and KH3 but not KH2. We were able only once to obtain a

minimum amount of KH2 protein, which was not functional. Interestingly, for αCP1-KH2

a soluble domain was only obtained with the exclusion of the third helix. Based upon

peptide sequences commonly found in unstable proteins, the C-terminal region of αCP1-

KH2 in particular renders this domain unstable. Our truncated αCP1-KH2 not only proved

to be stable but was also functional. The protein was correctly folded, confirmed by our


171

circular dichroism data, and in addition, it bound oligonucleotides despite contrary data

from previous studies (Dejgaard and Leffers, 1996).

7.3 Structural studies of αCP1

It was assumed that αCP interacts with AR mRNA through its KH domain based on

structural studies of single KH motifs in the presence of RNA. Our structure and molecular

dynamic studies of αCP1-KH3 not only reveal that αCP1-KH3 adopts a classical type I KH

domain fold with a triple-stranded β-sheet held against a three-helix cluster in a βααββα

configuration, but also our homology model of αCP1-KH3 with poly (C) RNA provided

insight to the molecular basis for oligonucleotide binding and poly (C) RNA specificity as a

initial step towards characterising the full-length protein.

Structural analysis of several KH domains in the presence of oligonucleotide have revealed

the main oligonucleotide contacts to involve the narrow hydrophobic cleft that runs

between α-helix 2 and β-sheet 2 and across the GXXG motif. It is thought that the

narrowness of the cleft confers the specificity of these KH domains for pyrimidines.

Likewise, αCP1-KH3 possesses a narrow hydrophobic cleft that would be expected to

accommodate pyrimidine-rich RNA or ssDNA, rather than the larger purine bases (Chapter

4). Specificity for cytosines over uracil or thymine can also be rationalized on the basis of

specific hydrogen bond interactions to cytosine C2 carbonyl, N3 and C4 functionalities.

Preferential binding to RNA over ssDNA would be explained in part by sugar hydroxyl

intermolecular hydrogen bonding. It may also be that a poly (C) RNA oligonucleotide is

able to contour perfectly in the binding cleft, with inter-nucleotide hydrogen bonds from

sugar hydroxyls stabilizing this conformation. On the other hand, C-rich ssDNA has been

shown to adopt very similar interactions with hnRNP K-KH3, a closely related KH domain,

and is reported to bind just as well, if not better, than RNA (Braddock et al., 2002a).

Our study of a single isolated KH domain provides insight into the way in which single KH

domains recognize RNA, but does not reveal the way in which tandem repeats interact with

RNA. The presence of multiple KH domains in αCP protein raises the question of which

domain or combination of domains dictates RNA-binding specificity and affinity.


172

Previous studies of αCP isomers have revealed that the optimal target sequence for αCP2

encompasses three short C-stretches within the RNA target, suggesting that each of the

three KH domains may play a role in binding to the RNA (Chkheidze et al., 1999). This

mode of nucleic acid/αCP interaction would increase the sequence specificity and affinity

of the interaction, potentially by maintaining the interacting partners in a certain significant

biological conformation. In another study (Thisted et al., 2001), the interaction of αCP1 or

2 with the C-rich sequence of the 5’UTR of poliovirus RNA is mediated via KH1.

Although all three KH domains are capable of binding nucleic acids, here they functionally

differ. In addition, hnRNP K, which is closely related to the αCP proteins both in the

number and organization of KH domains, the optimal target sequence has been shown to be

a single short C stretch (Thisted et al., 2001). These data suggest that whereas a single KH

domain in hnRNP K mediates a high affinity interaction, a tandem array of three patches

maximises αCP binding to its RNA target. Therefore, the binding of αCP to its optimised

target might reflect individual interactions by each of the three KH domains. The question

of how these multiple contacts are organised and why the closely related αCP and hnRNP

K differ in their RNA binding await further structural studies of the respective RNA-protein

complexes.

It is likely that the three KH domains of αCPs act synergistically within the protein and

thereby modulate the overall affinity of the individual domains. An example of KH-domain

collaboration is seen for FUSE binding protein in which KH domains 3 and 4 are both

necessary and sufficient for the binding of the protein to the DNA promoter region

upstream of c-myc (Braddock et al., 2002b).

The tandem arrangement of poly (C) stretches on the human AR C-rich motif is very

similar to some of the other well characterized nucleic acid targets for αCP1/2 proteins

such as the 3’UTR of α-globin mRNA. Three C-rich stretches are present at the 3’UTR of

AR mRNA (Yeap et al., 2002), arranged spatially in a manner to allow the possible binding

of all three KH domains. In the case of the AR mRNA sequence, two poly (C) regions exist

in the 51-nt cis element shown to bind αCP1 and αCP2 and to stabilise the mRNA (Yeap et

al., 2002). These could be the binding sites for αCP1-KH1 and KH2. A third C-rich stretch,


173

however, exists 7 bases downstream and could readily be targeted by αCP1-KH3. This was

partially revealed to us by our αCP1-KH1/DNA structure (Chapter 5). The αCP1-KH1

structure with the 11 nucleotide (5’-TTCCCTCCCTA-3’) DNA sequence containing the

two C-rich target elements at the 3’UTR of AR formed a dimer, with one monomer

contacting 5’-TCCC and the other contacting the TCCC-3’. Although the two KH domains

bound to the same oligonucleotide are positioned very closely, they do not make contact

with one another. This reveals the way in which two KH domains may be closely

juxtaposed when bound at adjacent C-rich binding sites. Our study has confirmed that a

single KH domain makes contacts with 4 bases and that there is no steric hindrance to the

binding of two KH domains to adjacent oligonucleotide stretches. This, together with our

demonstration that αCP1-KH2 does, in fact, bind to oligonucleotide (Chapter 6), suggests

that all three KH domains of full-length αCP proteins are involved in oligonucleotide

binding.

7.3.1 �αCP1-KH1/DNA and other KH/nucleotide complexes

Previous studies had primarily focused on the biological role of αCP1-KH domains on

RNA except the recent report of αCP2-KH1 with telomeric DNA (Du et al., 2005). In this

study we have established the ability of αCP1-KH1 to recognise the 11-mer poly (C) DNA

sequence representing the poly (C) rich site at the 3’UTR of AR mRNA. Comparison of the

DNA binding of the αCP1-KH1/DNA with previously solved structures of KH domains

with RNA reveals the same nucleic binding cleft and a very similar mode of interaction,

suggesting that binding of αCP1-KH to RNA will also adopt very similar interactions. This

is somewhat different to other dual-specificity nucleic acid-binding proteins including the

transcription factor, Xenopus TFIIIA, which recognises RNA and DNA targets with

different protein motifs but where also the RNA and DNA structure recognised is different

(Cassiday and Maher I, 2002).

Analysis of structures of various KH domains in complexes with RNA and DNA has

revealed a number of common features, despite the different nucleic acid targets and

detailed interactions dictating specificity. All of the KH domains maintain the overall same

topology and a common binding groove formed by the variable and the GXXG loop, the


174

two α helices and the second � strand (Chapters 4 and 5). The floor of this binding groove

is hydrophobic and the edges on both sides are charged and hydrophobic. The core

recognition sequence is single stranded and extended, positioned in the binding groove with

the 5’end of the sequence at the top of the groove, the bases of the sequence point inward to

the right of the groove and the sugar-phosphate backbone face the left side of the groove,

having a neutralizing affect on the positively charged residues located on this left side of

the binding site. Of general importance in all of the complexes is the hydrophobic

interaction between the nucleic acid bases and the hydrophobic floor, whereas specificity is

dictated by specific hydrogen bonding between the protein side chains and the functional

groups of the bases. The core recognition sequence recognised in all these structures

comprise of four residues in length. These observations are also seen in our structure of

αCP1-KH1/DNA, albeit with a number of specific hydrogen bonds defining poly (C)

specificity.

7.4�αCP1-KH binding Kinetics

Previous studies have reported that αCP not only binds poly (C) rich RNA but also binds

single and double stranded DNA (Dejgaard and Leffers, 1996). In our study, all three αCP-

KH domains were shown to be able to bind to the poly (C) containing AR DNA, with a

binding affinity order of αCP1-KH1>KH3>KH2. Binding affinity of αCP1-KH1 and αCP-

KH3 to AR RNA was considerably lower than to DNA. However, this was not the case

when the binding was monitored to a simpler system (Chapter 6, Section 6.6.4). αCP1-KH1

preferred the single poly (C) DNA sequence over the RNA sequence, while αCP1-KH3 did

not show any significant difference in RNA and DNA binding affinities. Furthermore,

αCP1-KH2 did not exhibit any binding to RNA, consistent with previous studies.

Our kinetic studies also revealed the preferred and minimum sequence required for αCP1-

KH interaction with oligonucleotides. These domains do not bind any other sequence

except a C-rich sequence. Binding to the C-rich sequence is primarily mediated by four

core recognition bases (XCCC, X is a different base in different complex systems). The KH

domains did not tolerate any other base in position 2 and 3 of the core recognition

sequence. Binding was completely abolished. This is consistent with data from structural


175

studies of several KH domains with oligonucleotides revealing that the bases at position 2

and 3 are involved in a number of specific hydrogen bonding to the nucleic acid that can

only be achieved with a cytosine present at these positions. Position 4 slightly tolerated a

different base, which agreed with that showed fewer specific hydrogen bonds to the base.

The interaction of αCP with single stranded DNA on the sensor chip, in our SPR study and

our crystal structure is consistent with previous studies. The closely related hnRNP K-KH3

binds specifically to the single stranded C-rich sequence in the promoter of human c-myc

gene, activating transcription. It was also shown in vitro experiments that both hnRNP K

and αCP1 bind a C-rich strand of human telomeric DNA; whether such interaction is

biologically significant awaits further studies.

7.5 Future directions

In summary, these studies have provided new insight into the structural and biophysical

features of the individual αCP1-KH domains and together with our studies of interactions

with nucleic acids, enabled determination of binding and affinity that have highlighted the

basis of poly (C) specificity. However, in order to contemplate the mode of interaction by

the full-length protein, further biophysical and structural studies are required, including

studies in the presence of both the target RNA and ssDNA. This will reveal the three-

dimensional arrangement of the protein complexed with the nucleic acid, which will

elucidate the essential role of each αCP1-KH domain in interacting with the probe. In

addition, another interesting system that should be investigated is the combined αCP1-

KH1/2 and αCP1-KH2/3. This will not only elucidate the role of αCP1-KH2 but also the

linker region between these domains. Furthermore, the binding affinities of these

complexes will reveal whether KH2 has any cooperative affect on the binding of the KH1

and KH3 domains.

�he αCP and AR mRNA complex will not be the only target for drugs. This is because the

biological functions imparted due to αCP interactions with their target RNAs involve multi

complex protein-RNA systems. In the AR mRNA, there is also HuR, which may be closely

associated and required for its stability. Future experiments should also look at the


176

structural and binding affinities of each RRM domain in HuR and ultimately examine at

how they are all associated to form a multiprotein/RNA complex.

There are other examples of multiprotein/RNA complexes, such as α�globin mRNA,

which is stabilised by the formation of the α�complex, comprising PABP, a number of

unidentified proteins and αCP (Kiledjian et al., 1995; Wang et al.,

1995). Ιντερεστινγλψ τηε ωηολε χομπλεξ ισ ρεθυιρεδ φορ σταβιλιτψ. Τηυσ, ιν εαχη

σψστεμ τηατ αCP is involved, there are other proteins that play important roles towards

executing a particular biological function. Similarly, the HuR/αCP/AR mRNA complex

may require all proteins present to be stable and functional, which could, in the long term,

present multiple targets for novel therapeutics, where the goal is to specifically disrupt the

αCP and AR mRNA interaction, downregulate AR expression and reduce growth of

prostate cancer cells.

Chapter 8 References

Chapter 8: References

177

References

Adams, D. J., Beveridge, D. J., van der Weyden, L., Mangs, H., Leedman, P. J., and Morris, B. J. (2003). HADHB, HuR, and CP1 Bind to the Distal 3'-Untranslated Region of Human Renin mRNA and Differentially Modulate Renin Expression. J Biol Chem 278, 44894-44903. Alan, H., and Phylis, S. (1960). Crystals and Crystal Growing (New York: Anchor Books-Doubleday). Anant, S., Blanc, V., and Davidson, N. O. (2003). Molecular regulation, evolutionary, and functional adaptations associated with C to U editing of mammalian apolipoproteinB mRNA. Prog Nucleic Acid Res Mol Biol 75, 1-41. Baber, J. L., Libutti, D., Levens, D., and Tjandra, N. (1999). High Precision Solution Structure of the C-terminal KH Domain of Heterogeneous Nuclear Ribonucleoprotein K, a c-myc Transcription Factor. Journal of Molecular Biology 289, 949-962. Backe, P. H., Messias, A. C., Ravelli, R. B. G., Sattler, M., and Cusack, S. (2005). X-Ray Crystallographic and NMR Studies of the Third KH Domain of hnRNP K in Complex with Single-Stranded Nucleic Acids. Structure 13, 1055-1067. Baker, N. A., Sept, D., Joseph, S., Holst, M. J., and McCammon, J. A. (2001). Electrostatics of nanosystems: Application to microtubules and the ribosome. PNAS 98, 10037-10041. Bakheet, T., Frevel, M., Williams, B. R. G., Greer, W., and Khabar, K. S. A. (2001). ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucl Acids Res 29, 246-254. Bakheet, T., Williams, B. R. G., and Khabar, K. S. A. (2003). ARED 2.0: an update of AU-rich element mRNA database. Nucl Acids Res 31, 421-423. Balmer, L. A., Beveridge, D. J., Jazayeri, J. A., Thomson, A. M., Walker, C. E., and Leedman, P. J. (2001). Identification of a Novel AU-Rich Element in the 3' Untranslated Region of Epidermal Growth Factor Receptor mRNA That Is the Target for Regulated RNA-Binding Proteins. Mol Cell Biol 21, 2070-2084. Bandiera, A., Tell, G., Marsich, E., Scaloni, A., Pocsfalvi, G., Akintunde Akindahunsi, A., Cesaratto, L., and Manzini, G. (2003). Cytosine-block telomeric type DNA-binding activity of hnRNP proteins from human cell lines. Archives of Biochemistry and Biophysics 409, 305-314. Bank, R., and Holst, M. (2003). A new paradigm for parallel adaptive meshing algorithms. SIAM Rev 45, 291-323. Barlati, S., and Barbon, A. (2005). RNA editing: a molecular mechanism for the fine modulation of neuronal transmission. Acta Neurochir Suppl 93, 53-57.


178

Barreau, C., Paillard, L., and Osborne, H. B. (2006). AU-rich elements and associated factors: are there unifying principles? Nucl Acids Res 33, 7138-7150. Baumann, S. (1998). Indirect immobilization of recombinant proteins to a solid phase using the albumin binding domain of streptococcal protein G and immobilized albumin;. Immunol Methods 221, 95-106. Beckett, D. (2001). Regulated assembly of transcription factors and control of transcription initiation. Journal of Molecular Biology 314, 335-352. Beelman, C. A., and Parker, R. (1995). Degradation of mRNA in eukaryotes. Cell 81, 179-183. Bergmann, I. E., and Brawerman, G. (1980). Loss of the polyadenylate segment from mammalian messenger RNA : Selective cleavage of this sequence from polyribosomes. Journal of Molecular Biology 139, 439-454. BIACORE, A. (1997). Kinetic and affinity analysis using BIA - Level 1). Blyn, L. B., Towner, J. S., Semler, B. L., and Ehrenfeld, E. (1997). Requirement of poly(rC) binding protein 2 for translation of poliovirus RNA. J Virol 71, 6243-6246. Braddock, D. T., Baber, J. L., Levens, D., and Clore, G. M. (2002a). Molecular basis of sequence-specific single-stranded DNA recognition by KH domains: solution structure of a complex between hnRNP K KH3 and single-stranded DNA. EMBO J 21, 3476-3485. Braddock, D. T., Louis, J. M., Baber, J. L., Levens, D., and Clore, G. M. (2002b). Structure and dynamics of KH domains from FBP bound to single-stranded DNA. Nature 415, 1051-1056. Bradford, M. M. (1976). A rapid and sensitive for the quantitation of microgram quantitites of protein utilizing the principle of protein-dye binding. Analytical Biochemistry 72, 248-254. Brawerman, G. (1981). The Role of the poly(A) sequence in mammalian messenger RNA. CRC Crit Rev Biochem 10, 1-38. Brown, Cheryl Y., Lagnado, Cathy A., and Goodall, Gregory J. (1996). A cytokine mRNA-destabilizing element that is structurally and functionally distinct from A+U-rich elements. PNAS 93, 13721-13725. Bubley, G., and Balk, S. (1996). Treatment of metastatic prostate cancer. Lessons from the androgen receptor. Hematol Oncol Clin North Am 10, 713-725. Buckanovich, R. J., and Darnell, R. B. (1997). The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo. Mol Cell Biol 17, 3194-3201.


179

Burd, C., and Dreyfuss, G. (1994). Conserved structures and diversity of functions of RNA-binding proteins. Science 265, 615-621. Bustelo, X. R., Suen, K. L., Michael, W. M., Dreyfuss, G., and Barbacid, M. (1995). Association of the vav proto-oncogene product with poly(rC)-specific RNA-binding proteins. Mol Cell Biol 15, 1324-1332. Bycroft, M., Grunert, S., Murzin, A. G., Proctor, M., and St Johnston, D. (1995). NMR solution structure of a dsRNA binding domain from Drosophila staufen protein reveals homology to the N-terminal domain of ribosomal protein S5. The EMBO Journal 14, 3563-3571. Cáceres, J. F., and Krainer, A. R. (1993). Functional analysis of pre-mRNA splicing factor SF2/ASF structural domains. The EMBO Journal 12, 4715-4726. Cai, M., Huang, Y., Sakaguchi, K., Gronenborn, A., M, and Craigie, R. (1998). An efficient and cost-effective isotope labeling protocol for proteins expressed in shape Escherichia coli. Journal of Biomolecular NMR V11, 97-102. Calabro, V., Daugherty, M. D., and Frankel, A. D. (2005). A single intermolecular contact mediates intramolecular stabilization of both RNA and protein. PNAS 102, 6849-6854. Calnan, B., Tidor, B., Biancalana, S., Hudson, D., and Frankel, A. (1991). Arginine-mediated RNA recognition: the arginine fork. Science 252, 1167-1171. Cassiday, L. A., and Maher Iii, L. J. (2002). Having it both ways: transcription factors that bind DNA and RNA. Nucl Acids Res 30, 4118-4126. Chen, C.-Y., Del Gatto-Konczak, F., Wu, Z., and Karin, M. (1998). Stabilization of Interleukin-2 mRNA by the c-Jun NH2-Terminal Kinase Pathway. Science 280, 1945-1949. Chen, C.-Y., Gherzi, R., Ong, S.-E., Chan, E. L., Raijmakers, R., Pruijn, G. J. M., Stoecklin, G., Moroni, C., Mann, M., and Karin, M. (2001). AU Binding Proteins Recruit the Exosome to Degrade ARE-Containing mRNAs. Cell 107, 451-464. Chen, C.-Y. A., and Shyu, A.-B. (1995). AU-rich elements: characterization and importance in mRNA degradation. Trends in Biochemical Sciences 20, 465-470. Chen, C. Y., and Shyu, A. B. (1994). Selective degradation of early-response-gene mRNAs: functional analyses of sequence features of the AU-rich elements. Mol Cell Biol 14, 8471-8482. Chen, T., Damaj, B. B., Herrera, C., Lasko, P., and Richard, S. (1997). Self-association of the single-KH-domain family members Sam68, GRP33, GLD-1, and Qk1: role of the KH domain. Mol Cell Biol 17, 5707-5718. Chen, T., and Richard, S. (1998). Structure-Function Analysis of Qk1: a Lethal Point Mutation in Mouse quaking Prevents Homodimerization. Mol Cell Biol 18, 4863-4871.


180

Chkheidze, A. N., and Liebhaber, S. A. (2003). A Novel Set of Nuclear Localization Signals Determine Distributions of the {alpha}CP RNA-Binding Proteins. Mol Cell Biol 23, 8405-8415. Chkheidze, A. N., Lyakhov, D. L., Makeyev, A. V., Morales, J., Kong, J., and Liebhaber, S. A. (1999). Assembly of the alpha -Globin mRNA Stability Complex Reflects Binary Interaction between the Pyrimidine-Rich 3' Untranslated Region Determinant and Poly(C) Binding Protein alpha CP. Mol Cell Biol 19, 4572-4581. Claverie, J.-M. (2001). GENE NUMBER: What If There Are Only 30,000 Human Genes? Science 291, 1255-1257. Colgan, D. F., and Manley, J. L. (1997). Mechanism and regulation of mRNA polyadenylation. Genes Dev 11, 2755-2766. Collaborative, C. P. N. (1994). The CCP4 suite: programs for protein crystallography. D Biol Crystallogr 50, 760-763. Coller, J., and Parker, R. (2004). EUKARYOTIC mRNA DECAPPING. Annual Review of Biochemistry 73, 861-890. Collingwood, T. N., Urnov, F. D., and Wolffe, A. P. (1999). Nuclear receptors: coactivators, corepressors and chromatin remodeling in the control of transcription. J Mol Endocrinol 23, 255-275. Cusack, S. (1999). RNA-protein complexes. Current Opinion in Structural Biology 9, 66-73. Czyzyk-Krzeska, M. F., and Bendixen, A. C. (1999). Identification of the Poly(C) Binding Protein in the Complex Associated With the 3' Untranslated Region of Erythropoietin Messenger RNA. Blood 93, 2111-2120. Davis, M. E., and McCammon, J. A. (1990). Electrostatics in biomolecular structure and dynamics. Chem Rev 94, 7684-7692. Davis, S. J., Ikemizu, S., Wild, M. K., and Merwe., P. A. v. d. (1998). CD2 and the nature of protein interactions mediating cell-cell recognition. Immunol Rev 163, 217-236. De Boulle, K., Verkerk, A. J. M. H., Reyniers, E., Vits, L., Hendrickx, J., Van Roy, B., Van Den Bos, F., de Graaff, E., Oostra, B. A., and Willems, P. J. (1993). A point mutation in the FMR-1 gene associated with fragile X mental retardation. Nat Genet 3, 31-35. Decker, C. J., and Parker, R. (2002). mRNA decay enzymes: Decappers conserved between yeast and mammals. PNAS 99, 12512-12514. Dejgaard, K., and Leffers, H. (1996). Characterisation of the nucleic-acid-binding activity of KH domains. Different properties of different domains. Eur J Biochem 241, 425-431.


181

Dejgaard, K., Leffers, H., Rasmussen, H. H., Madsen, P., Kruse, T. A., Gesser, B., Nielsen, H., and Celis, J. E. (1994). Identification, Molecular Cloning, Expression and Chromosome Mapping of a Family of Transformation Upregulated hnRNP-K Proteins Derived by Alternative Splicing. Journal of Molecular Biology 236, 33-48. Di Fruscio, M., Chen, T., Bonyadi, S., Lasko, P., and Richard, S. (1998). The Identification of Two Drosophila K Homology Domain Proteins. KEP1 AND SAM ARE MEMBERS OF THE Sam68 FAMILY OF GSG DOMAIN PROTEINS. J Biol Chem 273, 30122-30130. Du, Z., Lee, J. K., Tjhen, R., Li, S., Pan, H., Stroud, R. M., and James, T. L. (2005). Crystal Structure of the First KH Domain of Human Poly(C)-binding Protein-2 in Complex with a C-rich Strand of Human Telomeric DNA at 1.7 A. J Biol Chem 280, 38823-38830. Du, Z., Yu, J., Chen, Y., Andino, R., and James, T. L. (2004). Specific Recognition of the C-rich Strand of Human Telomeric DNA and the RNA Template of Human Telomerase by the First KH Domain of Human Poly(C)-binding Protein-2. J Biol Chem 279, 48126-48134. Duncan, R., Bazar, L., Michelotti, G., Tomonaga, T., Krutzsch, H., Avigan, M., and Levens, D. (1994). A sequence-specific, single-strand binding protein activates the far upstream element of c-myc and defines a new DNA-binding motif. Genes Dev 8, 465-480. Ebersole, T. A., Chen, Q., Justice, M. J., and Artzt, K. (1996). The quaking gene product necessary in embryogenesis and myelination combines features of RNA binding and signal transduction proteins. Nat Genet 12, 260-265. Faustino, N. A., and Cooper, T. A. (2003). Pre-mRNA splicing and human disease. Genes Dev 17, 419-437. Fish, R. N., and Kane, C. M. (2002). Promoting elongation with transcript cleavage stimulatory factors. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression 1577, 287-307. Flaherty, S. M., Fortes, P., Izaurralde, E., Mattaj, I. W., and Gilmartin, G. M. (1997). Participation of the nuclear cap binding complex in pre-mRNA 3' processing. PNAS 94, 11893-11898. Francis, R., Barton, M. K., Kimble, J., and Schedl, T. (1995a). gld-1, a Tumor Suppressor Gene Required for Oocyte Development in Caenorhabditis elegans. Genetics 139, 579-606. Francis, R., Maine, E., and Schedl, T. (1995b). Analysis of the Multiple Roles of gld-1 in Germline Development: Interactions With the Sex Determination Cascade and the glp-1 Signaling Pathway. Genetics 139, 607-630. Fumagalli, S., Totty, N. F., Hsuan, J. J., and Courtneidge, S. A. (1994). A target for SRC in mitosis. Nature 368, 871-874.


182

Gao, M., Fritz, D. T., Ford, L. P., and Wilusz, J. (2000). Interaction between a Poly(A)-Specific Ribonuclease and the 5' Cap Influences mRNA Deadenylation Rates In Vitro. Molecular Cell 5, 479-488. Garnick, M. B., and Fair, W. R. (1996). Prostate Cancer: Emerging Concepts: Part II. Ann Intern Med 125, 205-212. George, H. S., and Lyle, H. J. (1989). X-ray Structure Determination A Practical Guide (New York: John Wiliey & Sons). Gherzi, R., Lee, K.-Y., Briata, P., Wegmuller, D., Moroni, C., Karin, M., and Chen, C.-Y. (2004). A KH Domain RNA Binding Protein, KSRP, Promotes ARE-Directed mRNA Turnover by Recruiting the Degradation Machinery. Molecular Cell 14, 571-583. Gibson, T. J., Rice, P. M., Thompson, J. D., and Heringa, J. (1993). KH domains within the FMR1 sequence suggest that fragile X syndrome stems from a defect in RNA metabolism. Trends in Biochemical Sciences 18, 331-333. Glaser, R. W. (1993). Antigen-antibody binding and mass transport by convection and diffusion to a surface: a two-dimensional computer model of binding and dissociation kinetics. Analytical Biochemistry 213, 152-161. Goffeau, A., Barrell, B. G., Bussey, H., Davis, R. W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J. D., Jacq, C., Johnston, M., et al. (1996). Life with 6000 Genes. Science 274, 546-567. Graff, J., Cha, J., Blyn, L. B., and Ehrenfeld, E. (1998). Interaction of Poly(rC) Binding Protein 2 with the 5' Noncoding Region of Hepatitis A Virus RNA and Its Effects on Translation. J Virol 72, 9668-9675. Grishin, N. V. (2001). KH domain: one motif, two folds. Nucl Acids Res 29, 638-643. Guhaniyogi, J., and Brewer, G. (2001). Regulation of mRNA stability in mammalian cells. Gene 265, 11-23. Hahnefeld, C., Drewianka, S., and Herberg, F. (2004). Determination of kinetic data using surface plasmon resonance biosensors. Methods Mol Med 94, 299-320. Hastings, M. L., and Krainer, A. R. (2001). Pre-mRNA splicing in the new millennium. Current Opinion in Cell Biology 13, 302-309. Heinlein, C. A., and Chang, C. (2004). Androgen Receptor in Prostate Cancer. Endocr Rev 25, 276-308. Hilleren, P., and Parker, R. (1999). MECHANISMS OF mRNA SURVILLENCE IN EUKARYOTES. Annual Review of Genetics 33, 229-260. Holcik, M., and Liebhaber, S. A. (1997). Four highly stable eukaryotic mRNAs assemble 3' untranslated region RNA-protein complexes sharing cis and trans components. Biochemistry 94, 2410-2414.


183

Hollams, E. M., Giles, K. M., Thomson, A. M., and Leedman, P. J. (2002). MRNA stability and the control of gene expression: implications for human disease. Neurochem Res 27, 957-980. Holst, M. (2001). Adaptive numerical treatment of elliptic systems on manifolds. Adv Comput Math 15, 139-191. Holst, M., and Saied, F. (1993). Multigrid solution of the Poisson–Boltzmann equation. J Comput Chem 14, 105-113. Holst, M., and Saied, F. (1995). Numerical solution of the nonlinear Poisson–Boltzmann equation: developing more robust and efficient methods. J Comput Chem 16, 337-364. Honig, B., and Nicholls, A. (1995). Classical electrostatics in biology and chemistry. Science 268, 1144-1149. Ito, K., Sato, K., and Endo, H. (1994). Cloning and characterization of a single-stranded DNA binding protein that specifically recognizes deoxycytidine stretch. Nucl Acids Res 22, 53-58. Izaurralde, E., Lewis, J., McGuigan, C., Jankowska, M., Darzynkiewicz, E., and Mattaj, I. W. (1994). A nuclear cap binding protein complex involved in pre-mRNA splicing. Cell 78, 657-668. Izquierdo, J.-M., and Valcarcel, J. (2006). A simple principle to explain the evolution of pre-mRNA splicing. Genes Dev 20, 1679-1684. Jacobson, A., and Peltz, S. W. (1996). Interrelationships of the Pathways of mRNA Decay and Translation in Eukaryotic Cells. Annual Review of Biochemistry 65, 693-739. Jensen, K. B., Musunuru, K., Lewis, H. A., Burley, S. K., and Darnell, R. B. (2000). The tetranucleotide UCAY directs the specific recognition of RNA by the Nova K-homology 3 domain. PNAS 97, 5740-5745. Ji, X., Kong, J., and Liebhaber, S. A. (2003). In Vivo Association of the Stability Control Protein {alpha}CP with Actively Translating mRNAs. Mol Cell Biol 23, 899-907. Jones, A. R., Francis, R., and Schedl, T. (1996). GLD-1, a Cytoplasmic Protein Essential for Oocyte Differentiation, Shows Stage- and Sex-Specific Expression duringCaenorhabditis elegansGermline Development. Developmental Biology 180, 165-183. Jones, A. R., and Schedl, T. (1995). Mutations in gld-1, a female germ cell-specific tumor suppressor gene in Caenorhabditis elegans, affect a conserved domain also found in Src- associated protein Sam68. Genes Dev 9, 1491-1504.


184

Jonsson, U. (1991). Real-time biospecific interaction analysis using surface plasmon resonance and a sensor chip technology. Biotechniques 11, 620-627. Kati, K. W., Mika, J., Wallén, L. J., Tammela, R. L., and Vessella, T. V. (2006). Mutation screening of the androgen receptor promoter and untranslated regions in prostate cancer. The Prostate 9999, n/a. Katsamba, P. S., Park, S., and Laird-Offringa, I. A. (2002). Kinetic studies of RNA-protein interactions using surface plasmon resonance. Methods 26, 95-104. Keenan, R. J., Freymann, D. M., Walter, P., and Stroud, R. M. (1998). Crystal Structure of the Signal Sequence Binding Subunit of the Signal Recognition Particle. Cell 94, 181-191. Kharrat, A., Macias, M. J., Gibson, T. J., Nilges, M., and Pastore, A. (1995). Structure of the dsRNA binding domain of E. coli RNase III. The EMBO Journal 14, 3572-3584. Kiledjian, M., Wang, X., and Liebhaber, S. A. (1995). Identification of two KH domain proteins in the alpha-globin mRNP stability complex. EMBO J 14, 4357-4364. Kim, I., Liu, C. W., and Puglisi, J. D. (2006). Specific Recognition of HIV TAR RNA by the dsRNA Binding Domains (dsRBD1-dsRBD2) of PKR. Journal of Molecular Biology 358, 430-442. Kim, J. H., Hahm, B., Kim, Y. K., Choi, M., and Jang, S. K. (2000). Protein-protein interaction among hnRNPs shuttling between nucleus and cytoplasm. Journal of Molecular Biology 298, 395-405. Kim, S.-S., Pandey, K. K., Choi, H. S., Kim, S.-Y., Law, P.-Y., Wei, L.-N., and Loh, H. H. (2005). Poly(C) Binding Protein Family Is a Transcription Factor in {micro}-Opioid Receptor Gene Expression. Mol Pharmacol 68, 729-736. Koivisto, P., Kononen, J., Palmberg, C., Tammela, T., Hyytinen, E., Isola, J., Trapman, J., Cleutjens, K., Noordzij, A., Visakorpi, T., and Kallioniemi, O. P. (1997). Androgen receptor gene amplification: a possible molecular mechanism for androgen deprivation therapy failure in prostate cancer. Cancer Res 57, 314-319. Kong, J., Ji, X., and Liebhaber, S. A. (2003). The KH-Domain Protein {alpha}CP Has a Direct Role in mRNA Stabilization Independent of Its Cognate Binding Site. Mol Cell Biol 23, 1125-1134. Kozak, M. (2005). Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13-37. Kumar, A., and Wilson, S. (1990). Studies of the strand-annealing activity of mammalian hnRNP complex protein A1. Biochem 29, 10717-10722. Lacroix, L., Lienard, H., Labourier, E., Djavaheri-Mergny, M., Lacoste, J., Leffers, H., Tazi, J., Helene, C., and Mergny, J.-L. (2000). Identification of two human nuclear proteins that recognise the cytosine-rich strand of human telomeres in vitro. Nucl Acids Res 28, 1564-1575.


185

Leffers, H., Dejgaard, K., and Celis, J. E. (1995). Characterisation of two major cellular poly(rC)-binding human proteins, each containing three K-homologous (KH) domains. Eur J Biochem 230, 447-453. Levy, A. P., Levy, N. S., and Goldberg, M. A. (1996). Post-transcriptional Regulation of Vascular Endothelial Growth Factor by Hypoxia. J Biol Chem 271, 2746-2753. Lewis HA, C. H., Edo C, Buckanovich RJ, Yang YY, Musunuru K, Zhong R, Darnell RB, Burley SK (1999). Crystal structures of Nova-1 and Nova-2 K-homology RNA-binding domains. Structure 7, 191-203. Lewis, H. A., Musunuru, K., Jensen, K. B., Edo, C., Chen, H., Darnell, R. B., and Burley, S. K. (2000). Sequence-Specific RNA Binding by a Nova KH Domain: Implications for Paraneoplastic Disease and the Fragile X Syndrome. Cell 100, 323-332. Lewis, J. D., Izaurralde, E., Jarmolowski, A., McGuigan, C., and Mattaj, I. (1996). A nuclear cap-binding complex facilitates association of U1 snRNP with the cap-proximal 5' splice site. Genes Dev 10, 1683-1698. Lindquist, J. N., Kauschke, S. G., Stefanovic, B., Burchardt, E. R., and Brenner, D. A. (2000). Characterization of the interaction between {alpha}CP2 and the 3'-untranslated region of collagen {alpha}1(I) mRNA. Nucl Acids Res 28, 4306-4316. Liu, J., Lynch, P., Chien, C., Montelione, G., Krug, R., and Berman, H. (1997). Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein. Nat Struct Biol 4, 896-899. Lukong, K. E., and Richard, S. (2003). Sam68, the KH domain-containing superSTAR. Biochimica et Biophysica Acta (BBA) - Reviews on Cancer 1653, 73-86. Mahone, M., Saffman, E. E., and Lasko, P. F. (1995). Localized Bicaudal-C RNA encodes a protein containing a KH domain, the RNA binding motif of FMR1. EMBO J 14, 2043-2055. Makeyev, A. V., and Liebhaber, S. A. (2000). Identification of Two Novel Mammalian Genes Establishes a Subfamily of KH-Domain RNA-Binding Proteins. Genomics 67, 301-316. Makeyev, A. V., and Liebhaber, S. A. (2002). The poly (C)-binding proteins: A multiplicity of functions and a search mechanisms. RNA 8, 265-278. Malik, A. K., Flock, K. E., Godavarthi, C. L., Loh, H. H., and Ko, J. L. (2006). Molecular basis underlying the poly C binding protein 1 as a regulator of the proximal promoter of mouse [mu]-opioid receptor gene. Brain Research 1112, 33-45. Maris, C., Dominguez, C., and Allain, F. H. T. (2005). The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS Journal 272, 2118-2131.


186

Matthews, B. W. (1977). X ray Structure of Proteins (NY: Academic press). Matunis, M. J., Michael, W. M., and Dreyfuss, G. (1992). Characterization and primary structure of the poly(C)-binding heterogeneous nuclear ribonucleoprotein complex K protein. Mol Cell Biol 12, 164-171. McKnight, G. L., Reasoner, J., Gilbert, T., Sundquist, K. O., Hokland, B., McKernan, P. A., Champagne, J., Johnson, C. J., Bailey, M. C., and Holly, R. (1992). Cloning and expression of a cellular high density lipoprotein-binding protein that is up-regulated by cholesterol loading of cells. J Biol Chem 267, 12131-12141. Messias, A., and Sattler, M. (2004). Structural basis of single-stranded RNA recognition. Acc Chem Res 37, 279-287. Meyer, S., Temme, C., and Wahle, E. (2004). Messenger RNA Turnover in Eukaryotes: Pathways and Enzymes Critical Reviews in Biochemistry and Moleculary Biology 39, 197-216. Michelotti, E. F., Michelotti, G. A., Aronsohn, A. I., and Levens, D. (1996). Heterogeneous nuclear ribonucleoprotein K is a transcription factor. Mol Cell Biol 16, 2350-2360. Morton, T. A., Myszka, D. G., and Chaiken, I. M. (1995). Interpreting complex binding kinetics from optical biosensors: a comparison of analysis by linearization, the integrated rate equation, and numerical integration. Analytical Biochemistry 227, 176-185. Mukherjee, D., Gao, M., O’Connor, J. P., Raijmakers, R., Pruijn, G., Lutz, C. S., and Wilusz, J. (2002). The mammalian exosome mediates the efficient degradation of mRNAs that contain AU-rich elements. EMBO J 21, 165-174. Musco, G., Kharrat, A., Stier, G., Fraternali, F., Gibson, T. J., Nilges, M., and Pastore, A. (1997). The solution structure of the first KH domain of FMR1, the protein responsible for the fragile X syndrome. Nat StructBiol 4, 712-716. Musco, G., Stier, G., Joseph, C., Castiglione, M. M., Nilges, M., Gibson, T., and Pastore, A. (1996). Three-dimensional structure and stability of the KH domain: molecular insights into the fragile X syndrome. Cell 85, 237-245. Musunuru, K., and Darnell, R. B. (2004). Determination and augmentation of RNA sequence specificity of the Nova K-homology domains. Nucl Acids Res 32, 4852-4861. Myszka, D. G., He, X., Dembo, M., Morton, T. A., and Goldstein, B. (1998). Extending the Range of Rate Constants Available from BIACORE: Interpreting Mass Transport-Influenced Binding Data. Biophys J 75, 583-594. Narlikar, G. J., Fan, H.-Y., and Kingston, R. E. (2002). Cooperation between Complexes that Regulate Chromatin Structure and Transcription. Cell 108, 475-487.


187

Newbury, S. F. (2006). Control of mRNA stability in eukaryotes. BiochemSocTrans 34, 30-34. Nick, H. (1970). The Growth of Single Crystals. Ober, R. J., and Ward, E. S. (1999). The Choice of Reference Cell in the Analysis of Kinetic Data Using BIAcore. Analytical Biochemistry 271, 70-80. Ostareck, D. H., Ostareck-Lederer, A., Shatsky, I. N., and Hentze, M. W. (2001). Lipoxygenase mRNA Silencing in Erythroid Differentiation: The 3′UTR Regulatory Complex Controls 60S Ribosomal Subunit Joining. Cell 104, 281-290. Ostareck, D. H., Ostareck-Lederer, A., Wilm, M., Thiele, B. J., Mann, M., and Hentze, M. W. (1997). mRNA Silencing in Erythroid Differentiation: hnRNP K and hnRNP E1 Regulate 15-Lipoxygenase Translation from the 3′ End. Cell 89, 597-606. Otwinowski, Z., and Minor, W. (1997). Processing pf X-ray diffraction data collected in oscillation mode. Methods Enzymol 276, 307–326. Paillard, L., Maniey, D., Lachaume, P., Legagneux, V., and Osborne, H. B. (2000). Identification of a C-rich element as a novel cytoplasmic polyadenylation element in Xenopus embryos. Mechanisms of Development 93, 117-125. Palacios, I. M., Gatfield, D., St Johnston, D., and Izaurralde, E. (2004). An eIF4AIII-containing complex required for mRNA localization and nonsense-mediated mRNA decay. Nature 427, 753-757. Park, S., Myszka, D. G., Yu, M., Littler, S. J., and Laird-Offringa, I. A. (2000). HuD RNA Recognition Motifs Play Distinct Roles in the Formation of a Stable Complex with AU-Rich RNA. Mol Cell Biol 20, 4765-4772. Parker, R., and Song, H. (2004). The enzymes and control of eukaryotic mRNA turnover. Nature Structural & Molecular Biology 11, 121-127. Parsley, T. B., Towner, J. S., Blyn, L. B., Ehrenfeld, E., and Semler, B. L. (1997). Poly (rC) binding protein 2 forms a ternary complex with the 5'-terminal sequences of poliovirus RNA and the viral 3CD proteinase. RNA 3, 1124-1134. Paulding, W. R., and Czyzyk-Krzeska, M. F. (1999). Regulation of Tyrosine Hydroxylase mRNA Stability by Protein-binding, Pyrimidine-rich Sequence in the 3'-Untranslated Region. J Biol Chem 274, 2532-2538. Payne, J. M., Laybourn, P. J., and Dahmus, M. E. (1989). The transition of RNA polymerase II from initiation to elongation is associated with phosphorylation of the carboxyl-terminal domain of subunit IIa. J Biol Chem 264, 19621-19629. Paziewska, A., Wyrwicz, L., and Ostrowski, J. (2005). The binding activity of yeast RNAs to yeast Hek2p and mammalian hnRNP K proteins, determined using the three-hybrid system. Cell Mol Biol Lett 10, 227-235.


188

Paziewska, A., Wyrwicz, L. S., Bujnicki, J. M., Bomsztyk, K., and Ostrowski, J. (2004). Cooperative binding of the hnRNP K three KH domains to mRNA targets. FEBS Letters 577, 134-140. Peng, S. S., Chen, C. Y., and Shyu, A. B. (1996). Functional characterization of a non-AUUUA AU-rich element from the c- jun proto-oncogene mRNA: evidence for a novel class of AU-rich elements. Mol Cell Biol 16, 1490-1499. Pieretti, M., Zhang, F. P., Fu, Y. H., Warren, S. T., Oostra, B. A., Caskey, C. T., and Nelson, D. L. (1991). Absence of expression of the FMR-1 gene in fragile X syndrome. Cell 66, 817-822. Preiss, T., and Hentze, M., W. (2003). Starting the protein synthesis machine: eukaryotic translation initiation. BioEssays 25, 1201-1211. Ramos, A., Grunert, S., Adams, J., Micklem, D. R., Proctor, M. R., Freund, S., Bycroft, M., St Johnston, D., and Varani, G. (2000). RNA recognition by a Staufen double-stranded RNA-binding domain. The EMBO Journal 19, 997-1009. Ramos, A., Hollingworth, D., Major, S. A., Adinolfi, S., Kelly, G., Muskett, F. W., and Pastore, A. (2002). Role of Dimerization in KH/RNA Complexes: The Example of Nova KH3. Biochemistry 41, 4193 - 4201. Ramos, A., Hollingworth, D., and Pastore, A. (2003). The role of a clinically important mutation in the fold and RNA-binding properties of KH motifs. RNA 9, 293-298. Razin, A., and Riggs, A. D. (1980). DNA methylation and gene function. Science 210, 604-610. Rezai-Zadeh, N., Zhang, X., Namour, F., Fejer, G., Wen, Y.-D., Yao, Y.-L., Gyory, I., Wright, K., and Seto, E. (2003). Targeted recruitment of a histone H4-specific methyltransferase by the transcription factor YY1. Genes Dev 17, 1019-1029. Ross, J. (1995). mRNA stability in mammalian cells. Microbiol Rev 59, 425-450. Ross, J., and Sullivan, T. D. (1985). Half-lives of beta and gamma globin messenger RNAs and of protein synthetic capacity in cultured human reticulocytes. Blood 66, 1149-1154. Ryder, S. P., and Williamson, J. R. (2004). Specificity of the STAR/GSG domain protein Qk1: Implications for the regulation of myelination. RNA 10, 1449-1458. Ryter, J. M., and Schultz, S. C. (1998). Molecular basis of double-stranded RNA-protein interactions: structure of a dsRNA-binding domain complexed with dsRNA. The EMBO Journal 17, 7505-7513. Sachs, A. B., Sarnow, P., and Hentze, M. W. (1997). Starting at the Beginning, Middle, and End: Translation Initiation in Eukaryotes. Cell 89, 831-838.


189

Saffman, E. E., Styhler, S., Rother, K., Li, W., Richard, S., and Lasko, P. (1998). Premature Translation of oskar in Oocytes Lacking the RNA-Binding Protein Bicaudal-C. Mol Cell Biol 18, 4855-4862. Sambrook, Fritsch, and Maniatis (1989). Molecular cloning A laboratory manual Vol 2). Schuck, P. (1996). Kinetics of ligand binding to receptor immobilized in a polymer matrix, as detected with an evanescent wave biosensor. I. A computer simulation of the influence of mass transport. Biophys J 70, 1230-1249. Schuck, P. (1997a). Reliable determination of binding affinity and kinetics using surface plasmon resonance biosensors. Current Opinion in Biotechnology 8, 498-502. Schuck, P. (1997b). USE OF SURFACE PLASMON RESONANCE TO PROBE THE EQUILIBRIUM AND DYNAMIC ASPECTS OF INTERACTIONS BETWEEN BIOLOGICAL MACROMOLECULES. Annual Review of Biophysics and Biomolecular Structure 26, 541-566. Schullery, D. S., Ostrowski, J., Denisenko, O. N., Stempka, L., Shnyreva, M., Suzuki, H., Gschwendt, M., and Bomsztyk, K. (1999). Regulated Interaction of Protein Kinase Cdelta with the Heterogeneous Nuclear Ribonucleoprotein K Protein. J Biol Chem 274, 15101-15109. Shatkin, A. J., and Manely, J. L. (2000). The ends of the affair: Capping and polyadenylation. Nature Structural Biology 7, 838-842. Shaw, G., and Kamen, R. (1986). A conserved AU sequence from the 3′ untranslated region of GM-CSF mRNA mediates selective mRNA degradation. Cell 46, 659-667. Shilatifard, A., Conaway, R. C., and Conaway, J. W. (2003). THE RNA POLYMERASE II ELONGATION COMPLEX. Annual Review of Biochemistry 72, 693-715. Shim, J., and Karin, M. (2002). The Control of mRNA Stability in Response to Extracellular. Mol Cell Biol 14, 323-331. Shnyreva, M., Schullery, D. S., Suzuki, H., Higaki, Y., and Bomsztyk, K. (2000). Interaction of Two Multifunctional Proteins. HETEROGENEOUS NUCLEAR RIBONUCLEOPROTEIN K AND Y-BOX-BINDING PROTEIN. J Biol Chem 275, 15498-15503. Shuker, S. B., Hajduk, P. J., Meadows, R. P., and Fesik, S. W. (1996). Discovering high-affinity ligands for proteins: SAR by NMR. Science 274, 1531-1534. Sidiqi, M., Wilce, J. A., Porter, C. J., Barker, A., Leedman, P. J., and Wilce, M. C. (2005a). Formation of an alphaCP1-KH3 complex with UC-rich RNA. Eur Biophys J 34, 423-429.


190

Sidiqi, M., Wilce, J. A., Vivian, J. P., Porter, C. J., Barker, A., Leedman, P. J., and Wilce, M. C. J. (2005b). Structure and RNA binding of the third KH domain of poly(C)-binding protein 1. Nucl Acids Res 33, 1213-1221. Sidman, R. L., Dickie, M. M., and Appel, S. H. (1964). Mutant Mice (Quaking and Jimpy) with Deficient Myelination in the Central Nervous System. Science 144, 309-311. Silvera, D., Gamarnik, A. V., and Andino, R. (1999). The N-terminal K Homology Domain of the Poly(rC)-binding Protein Is a Major Determinant for Binding to the Poliovirus 5'-Untranslated Region and Acts as an Inhibitor of Viral Translation. J Biol Chem 274, 38163-38170. Siomi, H., Choi, M., Siomi, M. C., Nussbaum, R. L., and Dreyfuss, G. (1994). Essential role for KH domains in RNA binding: Impaired RNA binding by a mutation in the KH domain of FMR1 that causes fragile X syndrome. Cell 77, 33-39. Siomi, H., Matunis, M., Michael, W., and Dreyfuss, G. (1993). The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucl Acids Res 21, 1193-1198. Soller, M. (2006). Pre-messenger RNA processing and its regulation: a genomic perspective cell Mol Life Sci 63, 796-819. Staton, J. M., Thomson, A. M., and Leedman, P. J. (2000). Hormonal regulation of mRNA stability and RNA-protein interactions in the pituitary. J Mol Endocrinol 25, 17-34. Svitel, J., Balbo, A., Mariuzza, R. A., Gonzales, N. R., and Schuck, P. (2003). Combined Affinity and Rate Constant Distributions of Ligand Populations from Experimental Surface Binding Kinetics and Equilibria. Biophys J 84, 4062-4077. Tarun, S. Z., Wells, S. E., Deardorff, J. A., and Sachs, A. B. (1997). Translation initiation factor eIF4G mediates in vitro poly(A) tail-dependent translation. Biochemistry 94, 9046-9051. Tauson, E. L. (2004). RNA editing in different genetic systems. Zh Obshch Biol 65, 52-73. Thisted, T., Lyakhov, D. L., and Liebhaber, S. A. (2001). Optimized RNA Targets of Two Closely Related Triple KH Domain Proteins, Heterogeneous Nuclear Ribonucleoprotein K and alpha CP-2KL, Suggest Distinct Modes of RNA Recognition. J Biol Chem 276, 17484-17496. Tommerup, N., and Leffers, H. (1996). Assignment of Human KH-Box-Containing Genes byin SituHybridization:HNRNPKMaps to 9q21.32-q21.33,PCBP1to 2p12-p13, andPCBP2to 12q13.12-q13.13, Distal toFRA12A. Genomics 32, 297-298. Torreri, P., Ceccarini, M., Macioce, P., and Petrucci, T. (2005). Biomolecular interactions by Surface Plasmon Resonance technology. Ann Ist Super Sanita 41, 437-441.


191

Verkerk, A. J., Pieretti, M., Sutcliffe, J. S., Fu, Y. H., Kuhl, D. P., Pizzuti, A., Reiner, O., Richards, S., Victoria, M. F., and Zhang, F. P. (1991). Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905-914. Waggoner, S. A., and Liebhaber, S. A. (2003a). Identification of mRNAs Associated with {alpha}CP2-Containing RNP Complexes. Mol Cell Biol 23, 7055-7067. Waggoner, S. A., and Liebhaber, S. A. (2003b). Regulation of {alpha}-Globin mRNA Stability. Experimental Biology and Medicine 228, 387-395. Wahle, E., and Ruegsegger, U. (1999). 3′-End processing of pre-mRNA in eukaryotes. FEMS Microbiology Reviews 23, 277-295. Walter, B. L., Parsley, T. B., Ehrenfeld, E., and Semler, B. L. (2002). Distinct Poly(rC) Binding Protein KH Domain Determinants for Poliovirus Translation Initiation and Viral RNA Replication. J Virol 76, 12008-12022. Wang, X., and Hall Tanaka, T. M. (2001). Structural basis for recognition of AU-rich element RNA by the HuD protein. Nat Struct Biol 8, 141-145. Wang, X., Kiledjian, M., Weiss, I. M., and Liebhaber, S. A. (1995). Detection and characterization of a 3' untranslated region ribonucleoprotein complex associated with human alpha-globin mRNA stability [published erratum appears in Mol Cell Biol 1995 Apr;15(4):2331]. Mol Cell Biol 15, 1769-1777. Wang, Z., Day, N., Trifillis, P., and Kiledjian, M. (1999). An mRNA Stability Complex Functions with Poly(A)-Binding Protein To Stabilize mRNA In Vitro. Mol Cell Biol 19, 4552-4560. Wang, Z., and Kiledjian, M. (2000). The Poly(A)-Binding Protein and an mRNA Stability Protein Jointly Regulate an Endoribonuclease Activity. Mol Cell Biol 20, 6334-6341. Wang, Z., and Kiledjian, M. (2001). Functional Link between the Mammalian Exosome and mRNA Decapping. Cell 107, 751-762. Wider, G. ( 2000 ). Structure determination of biological macromolecules in solution using nuclear magnetic resonance spectroscopy. Biotechniques 29, 1278-1282. Wilce, J. A., Leedman, P. J., and Wilce, M. C. J. (2002 ). RNA-Binding Proteins That Target the Androgen Receptor mRNA. IUBMB 54, 345-349. Wilson, G. M., and Brewer, G. (1999a). Identification and Characterization of Proteins Binding A + U-Rich Elements. Methods 17, 74-83. Wilson, G. M., and Brewer, G. (1999b). The search for trans-acting factors controlling messenger RNA decay. Prog Nucleic Acid Res Mol Biol 62, 257-291.


192

Wisdom, R., and Lee, W. (1991). The protein-coding region of c-myc mRNA contains a sequence that specifies rapid mRNA turnover and induction by protein synthesis inhibitors. Genes Dev 5, 232-243. Worbs, M., Bourenkov, G. P., Bartunik, H. D., Huber, R., and Wahl, M. C. (2001). An Extended RNA Binding Surface through Arrayed S1 and KH Domains in Transcription Factor NusA. Molecular Cell 7, 1177-1189. Wu, H., Henras, A., Chanfreau, G., and Feigon, J. (2004). Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proceedings Of The National Academy Of Sciences Of The United States Of America 101, 8307-8312. Wuthrich, K. (1990). Protein structure determination in solution by NMR spectroscopy. J Biol Chem 265, 22059-22062. Xu, N., Chen, C.-Y. A., and Shyu, A.-B. (2001). Versatile Role for hnRNP D Isoforms in the Differential Regulation of Cytoplasmic mRNA Turnover. Mol Cell Biol 21, 6960-6971. Yano, M., Okano, H. J., and Okano, H. (2005). Involvement of Hu and Heterogeneous Nuclear Ribonucleoprotein K in Neuronal Differentiation through p21 mRNA Post-transcriptional Regulation. J Biol Chem 280, 12690-12699. Yeap, B. B., Krueger, R. G., and Leedman, P. J. (1999). Differential Posttranscriptional Regulation of Androgen Receptor Gene Expression by Androgen in Prostate and Breast Cancer Cells. Endocrinology 140, 3282-3291. Yeap, B. B., Voon, D. C., Vivian, J. P., McCulloch, R. K., Thomson, A. M., Giles, K. M., Czyzyk-Krzeska, M. F., Furneaux, H., Wilce, M. C. J., Wilce, J. A., and Leedman, P. J. (2002). Novel Binding of HuR and Poly(C)-binding Protein to a Conserved UC-rich Motif within the 3'-Untranslated Region of the Androgen Receptor Messenger RNA. J Biol Chem 277, 27183-27192. Zhao, Z., Chang, F.-C., and Furneaux, H. M. (2000). The identification of an endonuclease that cleaves within an HuR binding site in mRNA. Nucl Acids Res 28, 2695-2701.

Appendix A: Crystallization Screens

190

Hampton Crystal Screen™ composition


191

Hampton Crystal Screen 2™ composition


192

Hampton Natrix™ screen


193

Sigma® Crystallization Basic kit for proteins

Appendix B:REMSA

194

RNA-binding studies

REMSA was used to examine the ability of full-length, αCP1-KH2 and αCP1-KH3 to

bind to a 51-nucleotide UC-rich sequence from the 3’UTR of AR mRNA (nucleotides

3275–3325). The binding by full-length αCP1 has previously been demonstrated, but

whether the separate KH domains bind to this sequence had not yet been tested (Yeap

et al., 2002). The results are shown in Figure 1. As expected, binding to the target

RNA by full-length αCP1 is indicated by a substantial and quantitative shift in its

mobility (lane 4). A quantitative shift of this RNA target by αCP1-KH3 is also clearly

discernible (lane 3), although the degree of change in its mobility is not as marked as

that seen with full-length αCP1. This difference in the relative mobility shift is due to

the greater size of the full-length protein (37.5 kDa) in comparison with αCP1-KH3

(8 kDa). On the other hand, αCP1-KH2 exhibits no binding (lane 2), even though the

protein is present in excess over RNA. Neither full-length αCP1 nor αCP1-KH3

showed any binding interaction to pBLUESCRIPT RNA alone (results not shown),

demonstrating that the binding interaction occurs with the target RNA. This extends

the finding of Dejgaard and Leffers (Dejgaard et al., 1996), who observed at best

weak binding to poly (C) RNA by isolated αCP1-KH2 using a dot-blot assay. The

absence of binding by αCP1-KH2 also indicates that the binding observed by the

other species is not due to nonspecific protein/RNA interactions in the buffer

conditions used here. Thus, αCP1-KH3 is capable of binding independently and

specifically to sequences within the AR mRNA 3’UTR that are also contacted by full-

length αCP1. Figure 1: Binding studies of αCP1 and individual KH domains to the 3’UTR region of AR mRNA. A typical REMSA is shown, in which binding by αCP1 or isolated domains of αCP1 to radioactively labeled RNA 5-CUGGGUUUUUUUUUCUCUUUCUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3 in the presence of excess tRNA as a nonspecific competitor were examined. Lane 1, probe only; lane 2, also contains 100 ng αCP1-KH2; lane 3, also contains 100 ng CP1-KH3; lane 4, also contains 100 ng αCP1.

Appendix B:REMSA

195

We also compared the binding of full-length αCP1 and αCP1-KH1, to the target 51-nt

AR mRNA sequence (nt 3275–3325) and examined their binding to RNA vs DNA

using REMSA. Figure 1A shows the binding of full length αCP1 to the target 51-nt

AR mRNA sequence. The probe (10 nM RNA) is shifted upon the addition of

increasing concentrations of αCP1 (10 nM – 1 µM), and its shift to even higher

positions in the gel is indicative of multiple binding interactions occurring. Figure 2

demonstrates that αCP1-KH1 also binds the target RNA with high affinity. Its shift to

a relatively constant position is indicative of a single binding interaction mode, as

would be expected for a single KH domain protein.

The ‘CCCUCCC’ motif at the 3’ end of the target 51-nt AR mRNA has been shown

to be the binding site of αCP proteins through mutational analysis of the two poly (C)

triads (Yeap et al., 2002). In order to verify whether full-length αCP1 and αCP1-KH1

binding occurs to this motif in vitro, we conducted gel shift assays using an 11-nt

probe corresponding to nucleotides 3315–3325 of AR mRNA (5-UUCCCUCCCUA-

3). Figure 3 shows that the probe is shifted by full-length protein to a constant

position, indicating good binding to the probe via a single binding interaction. αCP1-

KH1 also binds, but only marginally shifts the probe under the conditions of this

experiment, indicating a weaker interaction. Interestingly, the binding profiles of

αCP1 and αCP1-KH1 to an 11-nt DNA probe analogous to the AR target sequence

above (DNA: 5-TTCCCTCCCTA-3) are very similar. Full-length αCP1 and αCP1-

KH1 both bound to the DNA with good and weak binding respectively. Interestingly,

the binding of full-length αCP1 appeared to occur with slightly higher affinity to

DNA compared with RNA, in contrast to the previous report of RNA binding being

preferential (Dejgaard et al., 1996).

Appendix B:REMSA

196

Figure 2: αCP1 full-length and αCP1-KH1 domain bind with high affinity to AR 1-51 nucleotide at the 3’UTR. The binding reactions for lanes from left to right contained no protein, 1x10-8, 2x10-8, 5x10-8, 1x10-7, 2x10-7 or 1x10-6 M αCP1 full-length and αCP1-KH1 domain respectively. All binding reactions contained 1x10-8 M of relevant target RNA. The absence of protein is indicated above by minus sign in parentheses and the wedges indicate increasing concentrations of each protein.

Figure 3: αCP1 full-length and αCP1-KH1 domain bind to the DNA and RNA sequence: 5-TTCCCTCCCTA-3 with good and weak binding respectively. The binding reactions for lanes from left to right contained no protein, 1x10-7, 3x10-7, 1x10-6, 1x10-6, 3x10-6 and 1x10-5 M αCP1 full-length and αCP1-KH1 domain respectively. All binding reactions contained 1x10-7 M of DNA and DNA. The absence of protein is indicated above by minus sign in parentheses and the wedges indicate increasing concentrations of each protein.

Appendix C: Publications

197

The structure and RNA-binding of Poly (C) Binding Protein1

Documents