THE UNIVERSITY OF WESTERN AUSTRALIA The structure and RNA-binding of Poly (C) Binding Protein1 Mahjooba Sidiqi The School of Biomedical, Biomolecular and Chemical Science, School of Medicine and Pharmacology and Western Australian Institute for Medical Research University of Western Australia Perth Australia A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy December 2006
248
Embed
The structure and RNA-binding of Poly (C) Binding Protein1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
THE UNIVERSITY OF WESTERN AUSTRALIA
The structure and RNA-binding of Poly (C) Binding Protein1
Mahjooba Sidiqi
The School of Biomedical, Biomolecular and Chemical Science, School of Medicine and Pharmacology and Western Australian Institute for Medical Research University of Western Australia Perth Australia A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy December 2006
i
Declaration The work described in this thesis was performed between March 2003 and December
2006 in the School of Biomedical, Biomolecular and Chemical Sciences (formerly the
Department of Biochemistry) and the School of Medicine and Pharmacology at the
University of Western Australia. Some facilities at Monash University, Melbourne, were
also used. Unless otherwise stated, the experiments described were performed by the
author. This work constitutes an original body of research that has not been submitted,
either in whole or in part, for the purpose of obtaining any other degree.
Mahjooba Sidiqi
ii
Detailed statement of authorship The work presented in this thesis was conducted primarily by myself, but also involved
experiments conducted by others in the Wilce Structural Biology group. I was
responsible for all cloning procedures, expression of both labeled and unlabelled
protein, and most oligonucleotide preparation. I was responsible for all protein and
protein/oligonucleotide crystallization, optimization and screening of crystals for data
collection. In the initial stages of using the X-ray generator I was assisted by Jason
Schmidbeger with the mounting on a crystal and with collection of the data set, but
after his training I was able to set up my own experiments. Complete diffracted data
sets were interpreted by Matthew Wilce. Then, with the help of Jackie Wilce, I analysed
the structures and compared them with other similar structures. The first set of NMR
experiments was conducted by me and the second set was conducted by Corrine
Porter. I was solely responsible for all SPR experiments using Biacore and the
subsequent analysis.
Co-supervisor: Professor Peter Leedman
iii
Abstract
Regulation of mRNA stability is an important posttranscriptional mechanism involved in
the control of gene expression. The rate of mRNA decay can differ greatly from one
mRNA to another and may be regulated by RNA-protein interactions. A key
determinant of mRNA decay are sequence instability (cis) elements often located in the
3’ untranslated region (UTR) of many mRNAs. For example, the AU rich elements
(AREs), are such well characterized elements, and most commonly involved in
promoting mRNA degradation, and specific binding of proteins to these elements
leading to the stabilization of some mRNAs.
Other cis-elements have been described for mRNA in which mRNA stability is a critical
component of gene regulation. This includes the androgen receptor (AR) UC-rich cis
element in its 3’UTR. The AR is a key target for therapeutics in human prostate cancer
and thus understanding the mechanism involved in regulating its expression is an
important goal. The αCP1 protein, a KH-domain containing RNA-binding protein has
been found to bind this UC-rich region of the AR and is thought to play an important
role in regulating AR mRNA expression.
αCP1 protein is a triple KH (hnRNP K homology) domain protein with specificity for C-
rich tracts of RNA and ssDNA (single stranded DNA). Relatively little is known about
the structural interaction of αCP1 with target RNA cis elements, thus the present study
aimed to better understand the nature of interaction between 30 nt 3’UTR UC-rich AR
mRNA and αCP1 protein using various biophysical techniques, in an attempt to
determine which αCP1 domain or combination of domains is involved in RNA-binding.
These studies could ultimately provide novel targets for drugs aimed to regulate AR
mRNA expression in prostate cancer cells.
At the commencement of this study little was known about the structure of the αCP1-
KH domains and their basis for poly (C) binding specificity. Therefore, the first aim of
this study was to determine the structure of each of the isolated αCP1-KH domains,
and also in complex with target RNA or DNA sequences. Chapter 3 describes the
purification of recombinant full-length αCP1 and αCP1-KH1, KH2 and KH3 domains. In
Chapter 4, I describe the crystallographically derived structure of the αCP1-KH3
domain to 2.10 Å. The αCP1-KH3 domain adopts the classical type I KH domain fold
with a triple-stranded β-sheet held against a three-helix cluster in a βααββα
configuration. A model of αCP1-KH3 bound to poly (C) RNA was generated by
iv
homology to the Nova-2-KH3-RNA structure, providing insight into the likely mode of
poly (C) RNA binding displayed by the �αCP1-KH3 domain. Nuclear magnetic
resonance (NMR) spectroscopy was used to analyse the interaction of αCP1-KH3 with
an 11 mer RNA sequence representing a component at the 3’UTR of AR mRNA. The
results indicate that the domain is likely to be folded in its correct secondary and
tertiary structures and that the protein is fully complexed with the RNA, maintaining
good solution characteristics, with no evidence of aggregation or formation of large
complexes.
Chapter 5 describes the subsequent studies of the crystallographically derived
structure of the first domain of the αCP1 bound to a single C-rich DNA 11-mer solved
to 3.0 Å resolution. αCP1-KH1 assumes a classical Type I KH domain fold with a triple
stranded β-sheet held against a triple α-helix cluster forming a narrow hydrophobic cleft
that accommodates the oligonucleotide. Extensive hydrophobic and hydrogen bond
contacts are made with four core recognition nucleotides, including critical contacts that
form the basis for cytosine specificity. The oligonucleotide positioning is similar to the
closely related hnRNP K-KH3/DNA structure. The protein/DNA complex formed with a
2:1 stoichiometry demonstrating that KH domains may bind to immediately adjacent
oligonucleotide target sites. Additional studies examined the interaction of αCP1-KH1
with a 20 mer RNA sequence 5’- CUUUCUUUUUCUUCUUCCCU-3’, representing the
αCP1 target site in the 3’UTR of AR mRNA using NMR spectroscopy. Interestingly, this
U-rich element contains a binding site for HuR, a RNA Recognition motif (RRM)-
containing RNA-binding protein involved in the regulation of AR mRNA expression. We
were interested in characterising interactions between HuR RRM 1 and 2 and αCP1-
KH1 bound to the RNA sequence containing both the C-rich site and U-rich segment.
My studies revealed no evidence for interaction between the adjacently bound αCP1-
KH1 and HuR RRM1/2. This, however, does not preclude an interaction with the αCP1
full-length protein through either αCP1-KH2 or αCP1-KH3 domains.
An additional aim of this thesis was to characterise the kinetics and binding affinities of
αCP1-KH domains with poly (C) rich site in the 3’UTR of AR mRNA, and a number of
other RNA and DNA sequences. This work is described in Chapter 6. The kinetics and
affinity of the interactions were quantified using surface plasmon resonance (SPR)
spectroscopy. I found that isolated αCP1-KH domains prefer DNA sequence over the
RNA sequence of AR mRNA with widely ranging affinities ordered as follows: αCP1-
KH1> KH3>KH2, but this was not observed when a simple 10mer sequence with only
v
one triplet cytosine site. This study highlights that each of the individual domains can
function as a discrete and independent RNA and DNA binding unit, albeit with different
levels of binding activity. αCP1-KH1 and KH3 were found to bind homopolymer (C) but
not (A), (U) and (G) RNA, consistent with previous findings. My studies are the most
detailed analysis of αCP1-KH domain binding to date. Furthermore, the data showed
that, in contrast with previous reports, αCP1-KH2 also has the capacity to bind
oligonucleotide. Taken together, these data have enabled the generation of a model of
the full-length αCP1 molecule bound to a target oligonucleotide.
We also examined αCP1-KH domain binding to a simpler cis-element sequence
consisting of a single C-triplet site and also compared RNA and DNA binding. We
found that binding of αCP1-KH1 to the C-triplet RNA target (Kd of 48 µM) was at a
lower affinity than to the corresponding DNA sequence (Kd of 4.5 µM). In contrast,
αCP1-KH3 binds RNA (Kd of 3.2 µM) and DNA (Kd of 2.2 µM) sequence equally well.
Additional studies addressed the significance of the four core recognition nucleotides
(TCCC) using a series of cytosine to thymine mutants. The findings verified some of
the results predicted from structural studies, especially the need for maximum KH
binding to a core tetranucleotide recognition sequence. Our mutational studies of the
four core bases confirmed the importance of cytosine in positions two and three as no
binding was observed, while some binding was observed when the fourth base was
mutated.
In summary, the work presented in this thesis provides new detailed insight into the
molecular interactions between the αCP1-KH domain and AR mRNA. Furthermore,
these studies shed light on the nature of protein/mRNA interactions in general, as well
as the specific complex that forms on AR mRNA. These studies have provided new
understanding into the mode of αCP1 binding at a target oligonucleotide binding site
and, provide a foundation for future studies to define structure of
multiprotein/oligonucleotide complexes involved in AR mRNA gene regulation.
Understanding the detailed interaction between the AR mRNA and αCP1 could provide
possible targets for drug development at reducing AR expression in prostate cancer
cells by interfering with the interaction of αCP1 and AR-mRNA.
vi
Acknowledgements I would like to firstly thank my supervisor, Dr Jackie Wilce for her support,
encouragement and motivation towards my work and myself. I have enjoyed working
with her and I truly admire her for her strong personality and yet calm nature, which has
helped me all throughout my PhD, especially those stressful times. I wish her always
happiness and my best regards. I would also like to thank my co-supervisor
A/Professor Matthew Wilce. I greatly appreciate his time and knowledge that he has
offered me.
I will always remain indebted to my co-supervisor Professor Peter Leedman, who kindly
accepted me in his lab towards the last two years of my PhD. Without his lab, his
encouragement, support and guidance, I don’t think I would have been able to
successfully complete my PhD. I greatly appreciated his time dedicated towards my
project and for encouraging my ambition of pursuing science and a medical degree.
I don’t think I would have survived PhD without the help and support of all the members
of the Structural Biology group. They have been absolutely great and wonderful to me.
I have enjoyed working with them and I sincerely appreciate their friendship, knowledge
and wisdom that each one offered me in their own unique way. Furthermore, I don’t
think I would have coped well with the change of labs, if it were not for the great,
extremely friendly and wonderful team in the Leedman lab. Each member of the lab
made me feel at home and assisted me greatly. I cannot thank them enough. A special
thanks goes to Christen Down and Esme Hatchell, whose friendship and support over
the last year of my PhD was invaluable and will not be forgotten.
I would like to thank Dr Lindsay Byrne and Dr Corrine Porter who provided technical
expertise and advice in all areas involving NMR. A special thank you to Jason
Schmidberger for maintaining the X-ray generator. I would also like to thank the
Biacore team at WAIMR and Rick Filonzi, for their help and advice on using the
machine and data analysis. I would like to thank Ke Nguyen and Richard Claudius for
providing me with a computer and an office for writing my thesis. In addition, I thank Dr
Ranjna Kapoor for proof reading my thesis.
There are many other people in Pharmacology and WAIMR who provided assistance
and support over the past four years, their help is gratefully acknowledged.
vii
I must also thank my dearest friends. Without them I would have probably be driven to
insanity. So a special thank you goes to Safia Al-Saeedy for her constant motivation
especially during the writing of my thesis and her concern for my well being; to Susan
Lo for listening to my complaints, for her lifts to the station and many other places and
for just being Susan; to Madhu Sharp for her help in protein concentration
determination, for her beautiful lunches and for the jokes and the laughter that she
shared with me. In addition, I like to thank all my other friends both at work and outside
work and they know who they are.
Last but not least, my parents, sister and my brothers, thank you for your constant love,
encouragement and understanding. Without it, the last four years would have been so
much harder. I hope to be able to help and care for you all in the future.
Above all I would like to thank God for His help, blessing, love and for granting me this
great opportunity to seek knowledge and to develop intellectually and spiritually.
viii
Table of Contents Declaration i Detailed Statement of authorship ii Abstract iii Acknowledgements vi Table of Contents viii Abbreviations xiii List of tables and figures xvi Three letter and one letter code for the common amino acids xxii Publications and conference presentations xxiii Chapter 1: General Introduction 1 1.1 Levels Of Gene Regulation 1 1.2 Regulation Of Transcription In Eukaryotes 2
1.4 Translation 5 1.5 mRNA Decay 7 1.5.1 mRNA half-life 7 1.5.2 Eukaryotic mRNA Decay Pathways 7 1.5.3 mRNA Stability 10 1.5.4 Adenosine Uridine (AU)-rich elements (AREs) 10 1.5.5 Non-ARE cis-elements and their binding protein 12 1.6 Role of RNA-Protein interaction 13 1.6.1 RNA Binding Motifs 13 1.6.2 The RRM (RNA Recognition Motif) motif 13 1.6.3 The Arginine rich motif (ARM) 17 1.6.4 Double stranded RNA binding domain 18 1.6.5 KH Motif 20 1.7 αCP proteins 21 1.7.1 Protein-Protein interaction 25 1.7.2 αCP KH motifs-synergy 26 1.7.3 KH Structure 29 1.7.4 KH and oligonucleotide interaction 30 1.7.5 KH containing protein and disease 34 1.8 Androgen receptor and prostate cancer 35
1.8.1 Androgen receptor mRNA stability and RNA binding proteins 37
1.9 Summary and Research aims 39 1.9.1 Hypotheses that formed the basis of this study 39 Chapter 2: Materials and Methods 41 2.1 Molecular Biology 41 2.1.1 Materials 41 2.1.2 Buffers and solutions 43
ix
2.1.3 Culture media 46 2.2 Cloning of αCP1-KH domains 47 2.2.1 Polymerase chain reaction 47 2.2.2 Agarose gel electrophoresis of DNA 48 2.2.3 Restriction endonuclease digestion of DNA 48 2.2.4 pGEX -6P-2 vector digestion 48 2.2.5 Ligation reaction 48 2.2.6 Transformation of XL1-Blue competent cells 49
2.2.7 “Colony screening” and extraction of Plasmid DNA from bacterial culture 49
3-D three-dimensional A280 absorbance at 280 nm A600 absorbance at 600 nm ADAR adenosine deaminases acting on RNA αCP poly (C) binding protein ARE AU-rich element ARM arginine rich motif AR androgen receptor AUF1 ARE/poly (U) binding degradation factor bp base pair BSA bovine serum albumin C cytosine CD circular dichroism COSY Correlation spectroscopy DICE differentiation control element DEPC diethyl pyrocarbonate DHT dihydrotestosterone DNA deoxyribonucleic acid dNTP deoxyribonucleotide 5’-triphosphate
JB Jena Bioscience kDa kilodalton KH k homology KSRP k homology splicing regulatory protein LB Luria-Bertoni medium LH luteinising hormone LOX lipoxygenase M molecular mass MALDI-TOF matrix assisted laser desorption ionisation-time of flight MD molecular dynamics MES 2-N-Morpholinoethanesulfonic acid
MOR mouse opiod receptor
MPD 2-methyl-2,4-pentane-diol
mRNA messenger ribonucleic acid
MQW Milli-Q® water MW molecular weight MWCO molecular weight cut off NaOH sodium hydroxide NLS nuclear localisation signal NMR nuclear magnetic resonance NOESY nuclear Overhauser enhancement spectroscopy PAB poly A polymerase PABP poly A binding protein PARN poly A specific ribonuclease PBS phosphate buffered saline PCBP/hnRNP E poly (C) binding protein PCR polymerase chain reaction PEG polyethylene glycol PKR RNA-activated protein kinase DAI PMSF phenylmethylsulphonyl fluoride ppm parts per million PV poliovirus REMSA RNA electrophoretic mobility shift assay REN rennin r.m.s. root mean square r.m.s.d. root mean square deviation RNA ribonucleic acid RNP ribonucleoprotein RRM RNA recognition motif RT room temperature SA streptavidin SDS sodium dodecyl sulphate SDS-PAGE sodium dodecyl sulphate polyacrylamide gel electrophoresis SELEX systematic evolution of ligands by exponential enrichment
xv
SLDE stem-loop destabilizing element SPR surface plasmon resonance STAR signal transduction and activation of RNA TAE tris-acetate EDTA TEMED N,N,N’,N’-tetramethylethylenediamine TFA trifluoroacetic acid TNFα tumour necrosis factor TPA 12-O-Tetradecanoylphorbol-13-acetate Tris tri[hydroxymethyl]aminomethane U units UTR untranslated region UVXL UV cross-linking assay UVXL-IP UV cross-linking assay immunoprecipitation VEGF vascular endothelial growth factor
xvi
LIST OF TABLES AND FIGURES
Chapter 1 Introduction 1
Figure 1.1 Schematic representation of the steps involved in gene
expression. 1
Figure 1.2 The ribosome-recycling concept 6
Figure 1.3 Eukaryotic mRNA degradation pathways 9
Figure 1.4 A Schematic representation of different features of the
mRNA 10
Figure 1.5 Cartoon representation of NMR structure of the RRM domain
of the SF2 protein (PDB code 1X4A) generated using VMD 14
Figure 1.6 Sequence alignment of a selection of RRM domains for
which the structure has been solved (PDB codes are
indicated in brackets
15
Figure 1.7 Structure of the HuD RRM domain 1 and 2 c-fos-11 complex
(PDB code: 1FXL) generated using VMD 16
Figure 1.8
NMR structure of the Jembrana disease virus (JDV) Tat
Figure 6.3 Langmuir binding of analyte A to immobilized ligand B, to
form AB complex 137
Table 6.1 A brief explanation of the rate constants 138
Figure 6.4 Graph of Req against the analyte concentration 141
Figure 6.5 Mass transfer model 142
Figure 6.6 Heterogeneous model 143
Figure 6.7 Bivalent model model 144
Figure 6.8 Conformational change model: 145
Figure 6.9 Binding studies of αCP1-KH1, KH2 and KH3 with RNA
sequence 149
Table 1 Equilibrium constants for αCP1-KH1, αCP1-KH2, αCP1-
KH3 150
xx
Figure 6.10 Binding studies of αCP1-KH1, KH2 and KH3 with DNA
sequence 152
Table 2 Equilibrium constants for aCP1-KH1, aCP1-KH2, aCP1-
KH3 interaction with DNA representing AR 153
Figure 6.11 Kinetic analysis of αCP1-KH2 and DNA representing the 30
nucleotide of the 3’UTR of AR mRNA 154
Figure 6.12 Binding studies of αCP1-KH1, KH2 and KH3 with RNA
homopolymers, poly (C), (G) and (U) 157
Figure 6.13 Binding studies of HuR and HuR RRM1/RRM2 with RNA
homopolyers, poly (C), (G) and (U) 158
Figure 6.14 Binding studies of αCP1-KH1 and KH3 to a 10mer poly (A)
(adenine) and triplet CCC (cytosine) sequence 159
Figure 6.15 Kinetic analysis of αCP1-KH1 and αCP1-KH3 to triplet CCC
RNA and DNA sequence 160
Table 3 Kinetic and affinity constants for KH domains 161
Figure 6.16 Steric hindrance 161
Figure 6.17 Binding studies of αCP1-KH1 to systematic mutation of the
triplet CCC site to thymine DNA sequences 163
Figure 6.18 Binding studies of αCP1-KH3 to systematic mutation of the
triplet CCC site to thymine DNA sequences 164
Figure 6.19 Schematic depicting the complex αCP1-KH1/DNA 166
xxi
xxii
Three letter and one letter code for the common amino acids
Residue three letter code one letter code
Alanine Ala A
Arginine Arg R
Asparagine Asn N
Aspartic acid Asp D
Cysteine Cys C
Glutamine Gln Q
Glutamic acid Glu E
Glycine Gly G
Histidine His H
Isoleucine Ile I
Leucine Leu L
Lysine Lys K
Methionine Met M
Phenylalanine Phe F
Proline Pro P
Serine Ser S
Threonine Thr T
Tryptophan Trp W
Tyrosine Tyr Y
Valine Val V
xxiii
Publications M. Sidiqi, J.A. Wilce, J.P.Vivian, C.J. Porter, P.J. Leedman and M.C.J. Wilce (2005)
Structure and RNA binding of the third domain of poly(C)-binding protein. Nucleic Acids Research 33,1213-1221.
M. Sidiqi, J.A Wilce, C.J., Porter, A. Barker, P.J. Leedman and M.C.J.Wilce (2005)
Formation of an alphaCP1-KH3 complexed with UC-rich RNA. Eur Biophys J. 34, 423-429. M. Sidiqi., Wilce J.A., Schmidberger J.W., Barker A., Leedman P.J. and Wilce J.A. Wilce M.C.J. (2005) Structure of alphaCP1-KH1 bound to C-rich DNA 11-mer: multiple KH binding mode revealed. (manuscript in preparation – Nucleic Acids Research)
Conference Poster presentations Porter, C.J., Sidiqi, M., Vivian, J.P., Leedman, P.J., Wilce, M.C.J. and Wilce, J.A. (2003) An Investigation of Protein-mRNA Interactions Affecting Androgen Receptor Expression. Proc. 28th Annual Lorne Conf. Protein Structure and Function, Lorne, VIC p149. Sidiqi, M., Vivian, J.P., Wilce, J.A. and Wilce, M.C.J. (2003) Structural studies of CP protein and its fragments with androgen receptor mRNA. AsCA’03/Crystal-23 Conference, Broome, WA. ATuP-35. Sidiqi M., Wilce J.A. Vivian J.P., Porter C.J. Leedman P.J. and Wilce M.C.J. (2004) Structure and RNA binding of the third domain of poly(C)-binding protein. Proc. 29th Annual Lorne Conf. Protein Structure and Function, Lorne, VIC. abs #136 (This was awarded the first poster prize award) Sidiqi, M.S., Wilce, J.A., Vivian, J.P., Porter, C.J., Leedman, P.J. and Wilce, M.C.J. (2004) An investigation of the poly(C)-binding protein KH domains Combined ASBMB, AZSCDB and ASS Annual Meeting. Pos-Wed-010. Sidiqi, M.S., Wilce, J.A., Vivian, J.P., Porter, C.J., Leedman, P.J. and Wilce, M.C.J. (2004) An investigation of the poly(C)-binding protein KH domains and androgen receptor mRNA interactions. 28th Annual Scientific Meeting of the Australian Society of Biophysics. P20. M.C.J. Wilce, M.Sidiqi, A. Barker, J.Schmidberger, P.J. Leedman and J.A.Wilce.(2005) Double Grip of alphaCP1-KH1 on C-rich DNA 11-mer. The Murnau Conference of Structural Biology of Molecular Recognition, Sept 15-17, Murnau, Germany P093B.
J.A. Wilce , M. Sidiqi, A. Barker, J.W. Schmidberger, L.Pattenden, P.J. Leedman, and M.C.J.Wilce (2006) Structure of αCP1-KH1/DNA complex and comparative binding to RNA vs DNA: model of the triple-KH domain protein bound to target androgen receptor mRNA. Proc. 31st Annual Lorne Conf. Protein Structure and Function, Lorne, VIC. abs #145
Chapter 1 General Introduction
Chapter 1: General Introduction
1
Figure 1.1: Schematic representation of the steps involved in gene expression. Transcription factors initiate transcription of specific genes at exposed regions of DNA. The DNA protein complex is removed exposing DNA allowing transcriptional factors to bind, initiating transcription. Once transcribed, the mRNA undergoes extensive processing which involves capping of the 5’ terminal nucleotide, polyadenylation of the 3’ end, splicing of the introns, nucleocytoplasmic transport, mRNA degradation and finally translation of the protein products. Translation is the final determinant of whether a gene is expressed or not because it is ultimately the protein that will carry out the majority of the functions of the gene.
1.1 Levels of Gene Regulation
A high level of gene regulation for functionality is required by all living organisms, from
the most basic unicellular organism to the more complex, multicellular organisms. The
number of genes ranges from approximately 6000 in yeast to 30000 in humans.
Accordingly, the coordinated regulation of these genes requires the interaction of many
complex processes. Every gene is expressed at a different level and some not at all, in
response to extrinsic and intrinsic stimuli. Different genes will be either active or inactive
through the development of the cell and the organism. Only half to three quarters are
actively expressed at any one point (Claverie, 2001; Goffeau et al., 1996).
Gene expression is a process starting from transcription all the way to protein synthesis.
There are a number of steps involved in gene expression, each representing a potential
control point for regulation. These are highlighted in Figure 1.1 (Hollams et al., 2002).
Chapter 1: General Introduction
2
1.2 Regulation of transcription in Eukaryotes
Gene expression is initiated by transcription factors, which are necessary for the production
of an RNA transcript from a DNA template. The factors that regulate transcription have
been extensively studied and include chromatin structure, the methylation status of the
gene, basal transcriptional machinery and the presence and amount of activator and
repressor transcription factors.
DNA exists in the form of chromatin in the cell. Chromatin is a highly compact structure,
covered with proteins called histones. This highly protected physical structure of the DNA
can influence the ability of transcriptional regulatory proteins and RNA polymerases to find
access to specific genes and to activate their transcription (Collingwood et al., 1999;
Narlikar et al., 2002). Chromatin is densely packaged in inactive areas of the chromosome,
whereas the chromatin is less tightly bound in transcriptionally active sites (Collingwood et
al., 1999; Narlikar et al., 2002). The packaging of the DNA around histones blocks access
to transcription initiation factors. As a consequence the cell takes advantage of a number of
mechanisms to remodel chromatin to expose DNA to initiation factors. Initially, it utilizes
trans-acting factors to disrupt the chromatin complex by displacing histones and allowing
the pre-initiation complex to form. In addition it can also acetylate the chromatin complex,
which leads to displacement of histones, permitting binding of the initiation factors and the
beginning of transcription (Collingwood et al., 1999).
The ability of factors to bind to mammalian DNA is also affected by methylation of the
cytosines in the DNA. Methylation of the cytosines in the DNA is a common method of
gene silencing. A number of studies have shown that genes with methylated cytosines in
the promoter region are inactive, while cells that have non-methylated forms of these genes
express them. Overall chromosome structure, modification of histones and methylation of
cytosines influence gene regulation by regulating the accessibility of the DNA to
transcription factors (Razin et al., 1980; Rezai-Zadeh et al., 2003).
Chapter 1: General Introduction
3
Transcription factors, or transcription regulatory proteins, influence the frequency of
transcription initiation. They assemble as multiple complexes alongside the basal
transcription machinery. A number of mechanisms regulate the activity and assembly of
transcription factors (Beckett, 2001). Some of these regulatory mechanisms include small
ligand binding, which affects DNA-binding ability, post-translational modification of
protein such as phosphorylation due to an external signal, DNA binding where the binding
of an initial factor to DNA acts as a platform for subsequent factors and other protein-
protein interactions which either lead to activation or repression of transcription factor
assembly (Beckett, 2001).
A high level of regulation is also present in the transcriptional elongation step. RNA
polymerases respond to different stimuli, including association with the auxiliary
elongation factors and the phosphorylation of the large carboxy terminal domain (CTD)
subunit of RNA polymerases, at stages subsequent to recruitment to a promoter and
establishment of a pre-initiation complex (Payne et al., 1989; Shilatifard et al., 2003). It is
possible that modifications such as these can effectively obstruct elongation, leading to
either transient pausing or full transcriptional arrest (Fish et al., 2002). A number of factors
have also been identified that lead to enhancement of elongation. One group suppresses
transient pausing and stimulates the rate of transcript elongation and another group
reactivates RNA polymerase, arrested at transcription, by stimulating cleavage activity of
the RNA polymerase (Fish et al., 2002).
1.3 Pre-mRNA processing reactions: Capping, editing, splicing, 3’end processing Once the mRNA is processed it goes through extensive processing steps, which are also
potential points of gene expression control. Pre-mRNA processing begins with capping of
the 5’ end. After the synthesis of 20 to 30 nucleotides, a N7-methyl guanosine
monophosphate (GMP) is added to the first nucleotide by a 5’-5’ triphosphate linkage. The
5’ cap is joined by nuclear cap binding complex consisting of a 20-80 kDa protein
(Izaurralde et al., 1994; Soller, 2006). The cap structure is recognized as m7(5’)Gppp(5’)N.
Chapter 1: General Introduction
4
The cap structure has many essential functions including stabilising mRNA, splicing of the
first intron, 3’ end processing and enhancing translation (Flaherty et al., 1997; Izaurralde et
al., 1994; Lewis et al., 1996; Parker et al., 2004).
In addition to 5’ end processing the pre-mRNA is also modified at the 3’ end. All
eukaryotic mRNAs are polyadenylated. Polyadenylation takes places in two steps (Colgan
et al., 1997; Shatkin et al., 2000; Wahle et al., 1999). Firstly, an endonucleolytic reaction
cleaves the transcript 10-30 nucleotides downstream of the conserved AAUAAA sequence,
while the second step involves adenylation by the Poly (A) polymerase (PAP). The
synthesized length of the poly (A) tail is species-dependant. For example, in human the
length of the poly (A) tail can be as long as 250 nucleotides while in yeast it is only up to
90 nucleotides. By the time the mRNA is transported to the cytoplasm about 50 to 100
adenines of the poly A have been degraded (Bergmann et al., 1980; Brawerman, 1981). The
mRNA breaks down completely, once the adenines falls below a critical number (Beelman
et al., 1995). This 3’ end alteration thus plays an essential role in the regulation of the gene
product by providing protection to the transcript from exoribonucleases (Beelman et al.,
1995).
Another pre-mRNA modification, involves editing of certain single nucleotides in the pre-
mRNA. Two types of base changes occur in the mammalian pre-mRNA, namely
deamination of adenosine to inosine (A to I) and also of cytidine to uridine (C to U) (Anant
et al., 2003; Barlati et al., 2005). Most of the genes found to undergo A to I editing are
expressed in the nervous system while an example of C to U in humans is the
apolipoprotein B mRNA, which is edited in some tissues, but not others (Anant et al.,
2003). The editing may create an early stop codon which, upon translation, produces a
truncated protein. Therefore, RNA editing can lead to the generation of alternative protein
products from a single gene and hence is a regulatory step in gene expression (Tauson,
2004).
Alternative protein products do not just arise from mRNA editing but also mRNA splicing
which is another alteration that the mRNA undergoes. Genes in eukaryotes are interspersed
with interfering sequences called introns that are removed during gene expression. mRNA
Chapter 1: General Introduction
5
splicing is the process by which these sequences are specifically removed and functional
sequences exons are attached together (Izquierdo et al., 2006). Pre-mRNA messages may
be spliced in several different ways, allowing a single gene to encode multiple proteins. A
process called alternative splicing. RNA splicing is a significant part of pre-mRNA
processing and is one of the major steps in the control of gene expression in eukaryotes
(Hastings et al., 2001).
1.4 Translation The mRNA is translated to its gene product in the cytoplasm. The process of translation is
mediated by specific interactions with proteins including the initiation, elongation and
termination factors. Translation initiation involves at least twelve different elongation
factors (Kozak, 2005; Tarun et al., 1997). mRNA translation is stimulated upon binding of
poly (A) binding protein (PABP) to the poly (A) tail. The PABP interacts with elongation
factors eIF4G and eIF4E, which are associated with cap binding proteins at the 5’cap end of
the mRNA. The translational synergism between the 5’cap and the poly (A) tail leads to
circularization of mRNA, which stimulates the recruitment of ribosomal machinery that
allows translation to take place (Figure 2) (Kozak, 2005; Sachs et al., 1997; Tarun et al.,
1997). The circularization of mRNA has been shown in vitro to promote translation and
lead to the stabilization of the mRNA by protecting the targets from deadenylating and
decapping enzymes (Preiss et al., 2003).
Chapter 1: General Introduction
6
Figure 1.2: The ribosome-recycling concept. A possible function of mRNA circularization could involve facilitation of a direct recycling of ribosomes or ribosomal subunits, after the termination at the stop codon, back to the 5’ region of the same mRNA. This model is supported by the observation of circular polyribosomes, as well as interactions between three molecules of PABP, initiation factors (4G and 4E) to the cap structure, and the translation termination factor eRF3 (Preiss et al., 2003).
Chapter 1: General Introduction
7
1.5 mRNA Decay 1.5.1 mRNA half-life
Messenger RNA degradation plays an important role in the regulation of gene expression.
The amount of mRNA available at any particular stage reflects the balance between its
synthesis, processing and degradation. All mRNAs have distinct half-lives that are related
to their general level of expression and also closely correlated with the function of their
protein product. For example, although the average half-life of eukaryotic mRNAs is 3 to 5
hours (Jacobson et al., 1996; Ross, 1995), regulatory proteins, such as growth factors,
whose levels change rapidly in cells, have mRNAs with a half-life of several minutes
(Chkheidze et al., 1999). In contrast, the most stable mRNAs, such as β globin and the
glycolytic enzyme glyceraldehyde-3-phosphate dehydrogenase (GAPDH) have a half-life
of a day or more (Ross et al., 1985). mRNA half-lives can fluctuate in response to several
stimuli including environmental factors, growth factors, hormones and second messengers
that are released by signaling cascades (Staton et al., 2000). It was found, for example, that
hypoxia increases vascular endothelial growth factor (VEGF) mRNA half life from 43 min
to ~106 min (Levy et al., 1996), epidermal growth factor (EGF) increases the half-life of
the mRNA of its receptor (epidermal growth factor receptor (EGFR) (Balmer et al., 2001)
and dihydrotestosterone (DHT) regulates the stability of androgen receptor mRNA (Yeap et
al., 1999). Furthermore, the mitogen,12-O-tetradecanoylphorbol-13-acetate (TPA), can
have differential effects on mRNA fate, either stabilizing it or leading towards its
degradation (Guhaniyogi et al., 2001).
1.5.2 Eukaryotic mRNA Decay Pathways
Degradation of the mRNA can be achieved by a number of pathways, and decay is initiated
by different cis-elements within the mRNA. A model for mRNA decay in both yeast and
mammals proposes that the initial step in mRNA degradation is the removal of most or all
of the poly (A) tail with 3’-5’ exonucleases (Hollams et al., 2002). The removal of the
remaining part of the mRNA can take place through two major pathways. The first one
involves the removal of the 5’ cap, followed by 5’ to 3’ exonucleolytic degradation from
the 5’ end. The second pathway also involves deadenylation and 3’-5’ exonucleolytic
degradation of the rest of the RNA but excludes the removal of the 5’ cap (Newbury, 2006).
The critical exoribonuclease in the 5’ to 3’ degradation pathway in yeast is Xrn1p, which
Chapter 1: General Introduction
8
hydrolyses the mRNA from the 5’ end, releasing single nucleotides. A related enzyme
Rat1p has also been identified in other eukaryotic cells (Newbury, 2006), while the main
exoribonuclease in the 3’ to 5’ pathway is the exosome, which is a large complex protein
containing several exoribonucleases (Newbury, 2006).
In order for the exoribonucleases to access the mRNA, it must first be deadenylated,
decapped or endonucleolytically cleaved (Figure 1.3) (Newbury, 2006). One deadenylase
that has been purified and cloned from vertebrates is PARN (poly (A)-specific
ribonuclease) (Meyer et al., 2004). Deadenylation by PARN is initiated by recognition of
an m7-guanosine cap on the RNA. The presence of the cap binding protein eIF4E prevents
deadenylation in vitro competing for the same binding site (Gao et al., 2000). Decapping of
the mRNA is achieved by the decapping proteins Dcp1p and Dcp2p (Decker et al., 2002)
(Figure 3). It is not fully understood how decapping occurs by these proteins but it appears
that Dcp2P cleaves the RNA, a process initiated by the Dcp1p protein (Coller et al., 2004).
Finally, access to mRNA is also gained through endonucleolytic cleavage of the mRNA.
Endonucleases have been identified and shown to bind premature stop codons leading to
the cleavage of the mRNA. Thus both the cap structure and the poly (A) tail play critical
roles in protecting the mRNA from degradation and their removal affects the rate of mRNA
decay (Hilleren et al., 1999; Palacios et al., 2004).
Although almost all mRNAs contain these elements there still exists a great difference in
the half-lives of different mRNAs ranging from several minutes to several days. This
emphasises that there must exist specific elements which are distinct to some mRNAs that
also contribute either to their degradation or stability (Hollams et al., 2002).
Chapter 1: General Introduction
9
Figure 1.3: Eukaryotic mRNA degradation pathways. The mRNA is first is deadenlyated, decapped or endonucleolytically cleaved. The exoribonucleases can then access the mRNA and direct its degradation either by Xrn1p or by exosome (Newbury, 2006).
Chapter 1: General Introduction
10
1.5.3 mRNA Stability
In addition to the cap structure and the poly (A) tail, mRNA stability can also be
determined by specific cis-acting sequences known as “instability sequences” in the 5’
untranslated region (5’UTR), coding sequence and/or 3’untranslated region (3’UTR)
(Figure 1.4).
Figure 1.4: A Schematic representation of different features of the mRNA. Instability elements are located at 5’UTR, coding sequence and 3’UTR.
For example, instability sequences have been identified in the 5’UTR of interleukin and in
the coding region of c-fos and c-myc mRNAs (Chen et al., 1998a; Wisdom et al., 1991).
There are also 3’UTR instability sequences in growth factor, cytokine and lymphokine
messages (Wilson et al., 1999a). These specific instability sequences generate an unstable
mRNA, leading to degradation. Their mode of function is thought to be linked with RNA
binding proteins that either directly or indirectly affect the activity of ribonucleases.
1.5.4 Adenosine Uridine (AU)-rich elements (AREs)
Another determinant of mRNA decay are the cis acting elements, which are most often
located at the 3’UTR. In mammalian mRNA the best characterized instability element is the
AU-Rich Element (ARE), which is generally found in the 3’UTR of a number of cytokines
and protooncogene mRNAs (Bakheet et al., 2001; Chen et al., 1995). Shaw and Kamen
demonstrated that AREs destabilise the mRNA by showing that inserting a 51 nucleotide
AU rich sequence from the 3’UTR of human granulocyte monocyte colony stimulating
factor (GM-CSF) mRNA into the 3’UTR of the relatively stable β-globin mRNA signaled
its degradation (Shaw et al., 1986). Computational analysis of the 3’UTR has shown that
almost 8% of human mRNAs contain AREs. These analyses suggest that AREs may be
responsible for the degradation of a vast number of unstable mRNAs (Bakheet et al., 2001;
Bakheet et al., 2003).
C
Chapter 1: General Introduction
11
AREs are generally 50 to 150 bases long and have been classified into three groups based
on their sequence features and decay characteristics (Chen et al., 1995; Wilson et al.,
1999b). Class I AREs include one to three copies of a pentanucleotide AUUUA within a U
rich region such as found in c-fos and c-myc mRNAs. Decay of mRNAs with class I AREs
begins with synchronous deadenylation resulting in mRNAs with 30 to 60 nucleotide poly
(A) tails. Class II AREs are found only in cytokines and consist of multiple overlapping
copies of UUAUUUA(U/A)(U/A) nonamers (Bakheet et al., 2001; Bakheet et al., 2003).
Decay occurs here in an asynchronous manner, which gives rise to fully deadenylated
intermediates. However, in class III AREs the classical AUUUA motif is not present but a
U rich sequence occurs (Chen et al., 1994; Peng et al., 1996; Xu et al., 2001).
Deadenylation of this class takes place in the same manner as class I. To date, no real
consensus sequences have been revealed for any of the ARE classes. Furthermore, the
classification of these AREs is not based on biological function (Barreau et al., 2006).
AREs function via their recognition by specific RNA-binding proteins (“trans-acting
factors”). A whole range of proteins associated with AREs has been found, but the
physiological significance of these RNA-protein interactions and their precise role in ARE
mediated mRNA decay still remains to be elucidated. AUF1, a 37-45 kDa molecule, also
known as ARE/poly (U) binding degradation factor or hnRNP D, binds AREs and is
thought to recruit nucleases, leading to degradation of the message (Chen et al., 2001). In
contrast HuR, a 32 kDa molecule and member of the embryonic lethal abnormal vision
(ELAV) class of RNA-binding proteins, has been shown to specifically bind ARE
sequences of labile cytokine and growth factor mRNAs, leading to their stabilization (Zhao
et al., 2000). HuR is believed to compete for the binding site of AUF1, thus preventing the
recruitment of nucleases and degradation of the message. Although AUF1 enhances
degradation it has also been reported to increase mRNA stability of several genes and it is
thought that this effect may be cell type specific (Zhao et al., 2000).
Despite extensive research, little is known about the mechanism by which ARE binding
proteins achieve mRNA degradation. However, in vitro studies have revealed that
degradation is achieved by the mammalian exosome in the 3’ to 5’ direction (Chen et al.,
2001; Mukherjee et al., 2002; Wang et al., 2001b). It has been suggested that AREs
Chapter 1: General Introduction
12
specifically interact with the exosome via ARE binding proteins and hence cause the
shortening of Poly (A) tail and an immediate decay of the mRNA body (Gherzi et al.,
2004). One such protein is the KSRP (K homology Splicing Regulatory Protein) which has
been shown to interact with the exosome both in vitro and in vivo, leading to mRNA
degradation via the ARE mRNA decay pathway (Figure 1.3) (Chen et al., 2001; Shim et
al., 2002).
1.5.5 Non-ARE cis-elements and their binding protein
In addition to the classical ARE elements, non-classical ARE elements have been identified
in several genes. Examples of these include the stem-loop destabilizing element (SLDE) in
the 3’UTR of GM-CSF mRNA and also the recently identified cis element which
influences TNFα mRNA degradation through an ARE independent pathway (Brown et al.,
1996). Another distinct destabilizing element contains a repeated GUUUG motif within a
non-AUUUA AU-containing sequence of the c-jun 3’UTR (Peng et al., 1996).
A commonly found non-ARE element in the 3’UTR of multiple mRNAs is the poly (C)
motif. This C-rich element has a consensus sequence CCUCC or CCCUCCC that is
targeted by poly (C) binding proteins such as αCP and hnRNP K proteins (Wang et al.,
1995) The binding of proteins such as the poly (C) binding protein to poly (C) motifs in
the 3’UTR has been shown to affect both mRNA stability and translation rate (Kong et al.,
2003; Makeyev et al., 2002). The C rich element is found in several mRNAs including that
of tyrosine hydroxylase, erythropoetin, lipoxygenase, α globin and of several viruses
(Makeyev et al., 2002). αCP has been identified as a component of a ribonucleoprotein
complex that assembles on the poly (C) rich regions of these mRNAs resulting in alteration
of mRNA stability or regulation of the translation rate (Makeyev et al., 2002). These
interactions have been shown to play an important role in many biological processes . A
better understanding of the nature of these interactions will provide valuable insight on how
such interactions impact on post-transcriptional gene regulation.
Chapter 1: General Introduction
13
1.6 Role of RNA-protein interactions RNA-protein interactions play an essential role in the regulation of gene expression.
Disruption of RNA-protein complexes has in many instances resulted in disease (Faustino
et al., 2003). The RNA-protein interaction is mediated through one or more protein
domains and either through the recognition of specific sequences or higher order RNA
structures (Kumar et al., 1990; Messias et al., 2004). Primary sequence analysis of RNA-
binding proteins has lead to the identification of a number of RNA-binding domains. These
include the ribonucleoprotein (RRM), K homology (KH) and double-stranded RNA
binding domain. Understanding the nature of the interaction of these RNA-protein
complexes at a molecular level is a major aim in structural biology. In particular, the goal is
to provide insight into RNA recognition and the formation of multiple protein/RNA
complexes that underlie key cellular processes from posttranscriptional regulation to
protein synthesis (Cusack, 1999; Messias et al., 2004).
Several RNA-binding protein motifs have been structurally elucidated in the absence of an
RNA ligand. Such structures provide critical information on potential RNA-binding
surfaces and also show flexible regions that may change shape upon RNA binding. Well
known RNA-binding motifs commonly occur (Keenan et al., 1998; Liu et al., 1997) and
some of these are described below.
1.6.1 RNA-Binding motifs
RNA-binding motifs, often present in tandem along the amino acid sequence of the protein,
mediate RNA interaction and are involved in both the transcriptional and
posttranscriptional regulation of gene expression. Of the RNA-binding motifs or domains
the four most common ones include the RNA recognition motif (RMM domain, also called
RNP, ribonucleoprotein motif), the arginine rich motif (ARM), the double stranded RNA-
binding domain (dsRBD) and the KH (hnRNP K homology) domain.
1.6.2 The RRM (RNA Recognition Motif) motif
The RNA recognition (RRM) motif is the most widely found and the best studied RNA-
binding motif (Maris et al., 2005). It is also referred to as a ribonucleoprotein motif (RNP),
Chapter 1: General Introduction
14
RNP Consensus sequence and RNA-binding domain. The RRM is involved in different
aspects of gene regulation; for example, the PABP contains four RRM domains and binds
to the poly (A) tails of eukaryotic mRNAs in the cytoplasm. This interaction is essential not
only for the stability of the mRNA but also for the initiation of translation through a
number of molecular interactions (Messias et al., 2004). The RNP motif is found in a
variety of other RNA-binding proteins including the sex-lethal protein, HuR and U1A, a
small nuclear ribonucleoprotein-specific proteins (Maris et al., 2005).
Several structures of the isolated RRM domain from various proteins have been elucidated.
The RRM motif is generally incorporated within four β-strands and two α−helices and
arranged in the following topology βαββαβ (Burd et al., 1994; Park et al., 2000). The
RRM fold is composed of one four-stranded antiparallel β-sheet spatially arranged in the
order of β4β1β3β2 from left to right when facing the sheet and two α helices (α1 and α1)
packed against the β-sheet (Figure 5). The hydrophobic core of the domain contains most
of the conserved residues except four conserved residues that are involved in RNA binding,
which are known as RNP1 and RNP2 (Figure 1.6). They are positioned between the central
strands of β-sheets, namely β3 and β1, which are highly conserved among RRM domains
(Park et al., 2000; Wang et al., 2001a).
Figure 1.5: Cartoon representation of NMR structure of the RRM domain of the SF2 protein (PDB code 1X4A) generated using VMD. SF2 is an RNA binding protein that has an activity important for pre-mRNA splicing in vitro (Cáceres et al., 1993). SF2 contains several RNA binding motifs including an RRM domain. NMR structure of the RRM domain of the SF2 protein shows the motif consisting of four β sheets (Magenta) and two α−helices (Green) arranged in the following topology βαββαβ. The structure is yet to be published.
Chapter 1: General Introduction
15
Figure 1.6: Sequence alignment of a selection of RRM domains for which the structure has been solved (PDB codes are indicated in brackets). The alignment was generated using the program ClustalX and ESpript. The conserved RNP 1 and RNP 2 sequences are highlighted in blue boxes
Since the depiction of the first structure of an U1A-RRM in complex with RNA (Maris et
al., 2005), several other complex structures have been determined either by NMR or X-ray
crystallography (Figure 1.7). Analysis of the RRM-RNA interface in these complexes has
revealed a common interaction code, involving four conserved protein side chains, located
in each RNP1 and RNP2, and two nucleotides. The two bases of the dinucleotide are
stacked on a conserved aromatic ring; the two sugar moieties are in contact with a
hydrophobic side chain and a positively charged side chain neutralizes the phosphodiester
group. This small set of RRM-nucleic acid interactions illustrates the perfect adaptation of
the RRM for effective binding with single stranded nucleic acids of any sequence (Maris et
al., 2005).
Chapter 1: General Introduction
16
Figure 1.7: Structure of the HuD RRM domain 1 and 2 c-fos-11 complex (PDB code: 1FXL) (Sungmin et al., 2003) generated using VMD. Hu proteins bind to AREs in the 3'UTR regions of many short-lived mRNAs, thereby stabilizing them. Cartoon diagram of the HuD1,2 c-fos-11 complex. The RNA is shown as a ribbon model, colored pink and the RRM domain is represented by the green sheets and yellow helices. The RRM fold is composed of one four-stranded antiparallel β-sheet spatially arranged in the order of β4β1β3β2 from left to right when facing the sheet and two α helices (α1 and α2) packed against the β-sheet. The hydrophobic core of the domain contains most of the conserved residues except for four conserved residues that are involved in RNA binding. These residues are known as RNP1 and RNP2. They are positioned between the central strands of β-sheets, namely β3 and β1.
Chapter 1: General Introduction
17
1.6.3 The Arginine rich motif (ARM)
Another motif found in viral, bacteriophage and ribosomal proteins that mediate RNA
interactions is the arginine-rich motif (ARM), which consists of short arginine rich
sequences. Notable examples of RNA-binding proteins with the arginine rich motif are Rev
and Tat. Rev is a regulatory RNA-binding protein that mediates the export of the unspliced
HIV pre-mRNAs from the nucleus while Tat plays a role in HIV pre-mRNA transcription
(Calnan et al., 1991). Unbound ARMs generally are unfolded and can adopt a variety of
conformations upon RNA binding (Figure 1.8 A, B), often with a concomitant change in
RNA structure. In the case of HIV, Tat remains in an extended conformation when bound to
HIV TAR RNA but causes a large conformational change in the RNA, inducing stacking
between the two helical stems and formation of a U–A:U base triple (Calabro et al., 2005).
The arginines in the ARM motif function by facilitating two distinct interactions. Firstly,
non-specific affinity for the RNA phosphate backbone is increased by the positive charge
of the arginine. The other interaction involves specific hydrogen bonding networks with the
RNA bases (Calnan et al., 1991).
Figure 1.8 (A): NMR structure of the Jembrana disease virus (JDV) Tat Arginine rich motif (ARM) – bovine immunodeficiency virus (BIV) TAR RNA complex (PDB code: 1ZBN) (Calabro et al., 2005). The peptide (Tube representation), which is unstructured in the absence of RNA, inserts deeply into the RNA major groove (Orange, line representation of the RNA) and adopts a β-ribbon-like conformation, with two antiparallel strands, residues 71–73 (Pink) and 77–79 (Green), linked by a sharp turn residues 74–76 (Blue). The figure was generated using the program VMD. (B )Comparison of Tat ARM domains and TAR RNAs. The HIV-1, BIV, and JDV Tat ARM domains are aligned based on homology between the N-terminal activation domains (partially shown, with conserved residues shaded). Analogous residues in the BIV and the JDV Tat ARM are shown in bold
A
Chapter 1: General Introduction
18
1.6.4 Double-stranded RNA-binding domain
Another common RNA binding motif is the double-stranded RNA-binding domain
(dsRBD). They bind double-stranded RNA but not dsDNA and DNA-RNA hybrids. The
dsRBD is found in proteins such as Xenopus rbpa, Drosophila Staufen, RNase III, the
protein kinase PKR and ADAR family of adenosine deaminase, which have diverse
functions in transcription, RNA processing, mRNA localization and translation (Kim et al.,
2006). Study of several structures of dsRBDs reveals that dsRBD forms a compact protein
domain with a α–β–β–β–α topology in which two α-helices are packed against the same
face of a three-stranded antiparallel β-sheet (Figure 9A) (Bycroft et al., 1995; Kharrat et al.,
1995). Furthermore, the molecular basis for the complex has been revealed in several
structures of single dsRBD–dsRNA complexes (Ramos et al., 2000; Ryter et al., 1998; Wu
et al., 2004). A single dsRBD recognizes two consecutive minor grooves and the dominant
major groove on one face of the RNA helix. The first α-helix (α1) of the dsRBD interacts
with a minor groove of the RNA helix or UUCG or AGNN tetraloop. The loop between β1
and β2 of the dsRBD interacts with a successive minor groove. The β3–α2 loop interacts
with intervening major groove (Figure 9B). These contacts are mainly involved with 2′-
hydroxyl groups of the ribose sugar providing some insight to their preference of RNA
rather than DNA (Kim et al., 2006).
B
Chapter 1: General Introduction
19
Figure 1.9: The solution structure of Rnt1p dsRBD complexed to the 5' terminal hairpin of one of its small nucleolar RNA substrates, the snR47 precursor (PDB code:1T4L). Rnt1p, a member of the RNase III family of dsRNA endonucleases, is a key component of the Saccharomyces cerevisiae RNA-processing machinery (Wu et al., 2004). The Rnt1p dsRBD has been implicated in targeting this endonuclease to its RNA substrates, by recognizing hairpins closed by AGNN tetraloops. (A) Cartoon representation of Rnt1P dsRBD forms a compact protein domain with a α–β–β–β–α topology in which two α-helices (Red) are packed against the same face of a three-stranded antiparallel β-sheet (Gold). (B) Cartoon representation of Rnt1p dsRBD complexed to the 5’ terminal hairpin of one of its small nucleolar RNA substrates the snR47 precursor. The RNA is the blue and cyan ribbons. The dsRBD contacts the RNA at successive minor, major, and tetraloop minor grooves on one face of the helix. α helix 1 is positioned into the minor groove of the RNA tetraloop. The loop between β1 and β2 of the dsRBD interacts with the minor groove as well. The β3–α2 loop interacts with intervening major groove. The figure was generated using the program VMD.
A B
Chapter 1: General Introduction
20
1.6.5 KH Motif
The K homology (KH) domain is another type of eukaryotic RNA-binding domain. The
KH unit is a motif located in many proteins that are found to be in close association with
RNA (Musco et al., 1996). The KH motif was originally detected in hnRNP K (Dejgaard et
al., 1994; Matunis et al., 1992) and subsequently identified in a variety of nucleic acid
binding domains from eukaryotes, eubacteria and archaea (Dejgaard et al., 1994; Gibson et
al., 1993; Siomi et al., 1993). A number of KH domain proteins for which biological
function and RNA targets have been recognized include the Nova proteins, associated with
the regulation of pre-mRNA splicing; the zipcode-binding protein 1, implicated in mRNA
subcellular localization; the fragile X mental retardation syndrome protein (FMRP),
implicated in translational regulation and poly (C) binding protein (also known as αCP and
hnRNP E proteins); and hnRNP K, implicated in mRNA stabilization and translation
(Kiledjian et al., 1995; Lewis HA, 1999; Musunuru et al., 2004). KH domains are also
found in a number of other RNA-binding proteins. Some of these include the proteins
vigilin, transcription activator FBP and the bacterial nusA protein (Worbs et al., 2001).
The arrangement and the number of KH domains in a given protein differ. KH containing
proteins can have from 1 to up to 14 KH domains. The STAR (signal transduction and
activation of RNA) family of RNA-binding proteins contains a single KH domain. This
family includes the cell signaling protein Sam68 (Lukong et al., 2003). The FMR protein
contains two KH domains, which are closely spaced. The three KH domain family includes
hnRNP K (Siomi et al., 1993), αCP1 (PCBP-1 and hnRNP-E1), αCP2, αCP3 and αCP4
(Leffers et al., 1995), as well as Nova-1 and Nova-2 the latter being involved in neuronal
RNA metabolism (Buckanovich et al., 1997). The organization of the three KH domains in
this group of proteins is such that the first two domains are closely located at the N-
terminus which are linked by a variable segment to the third domain at the C-terminus
(Figure 10) (Makeyev et al., 2000). The transcription activator FBP contains four KH
domains, which are regularly spaced. The final group, which contains 14 closely arranged
KH entities, includes the lipoprotein-binding protein, vigilin (Duncan et al., 1994;
McKnight et al., 1992; Musunuru et al., 2004).
Chapter 1: General Introduction
21
Figure 1.10: Schematic representation of the αCP1 and KH domain arrangement. Similar domain arrangements are observed in other members of the PCBP family. The numbers represent the cloned amino acid boundaries for the KH domains in this thesis. This naming system for the full-length protein and the KH domains will be used throughout the rest of the thesis.
The focus of the current study is the KH domain. This domain will be further discussed
below especially in the context of the αCP proteins.
1.7 αCP proteins The α globin poly (C) binding proteins (αCP) (also known as hnRNP E, PCBP) belong to
an abundant and widely expressed family of RNA-binding proteins. αCP and hnRNP K are
the two major poly (C) binding proteins in the cell. The conservation of αCP across
species, abundant expression and their presence in an extensive number of tissues suggest
that they play a role in important cellular functions (Kong et al., 2003).
There are five major αCP isoforms present in human and mouse tissues including αCP1,
αCP2, αCP3, αCP4 and a splice variant of αCP2, αCP2-KL (Chkheidze et al., 2003). The
highest degree of homology is present between αCP1 and αCP2. They share 89% amino
acid sequence identity (Tommerup et al., 1996). A significant deviation in the amino acid
sequence appears in αCP3 while αCP4 has the most divergent amino acid sequence
(Makeyev et al., 2000). Some of the αCP isoforms including αCP1, αCP2 and αCP2-2KL
are present in both the nucleus and the cytoplasm, while αCP3 and αCP4 are only found in
the cytoplasm. Both αCP1 and αCP2 possess a nuclear localization signal (NLS I), located
Chapter 1: General Introduction
22
between the KH2 and KH3 segment, which is hypothesized to contribute to a shuttling role
between the nucleus and the cytoplasm (Chkheidze et al., 2003). Furthermore, the presence
of another NLS (NLS II) within the KH3 segment of αCP2, which together with NLS I is
crucial for its nuclear accumulation, suggests a possible relationship between RNA-binding
domain and sub-cellular transport. A possible mechanism of action suggests that initial
binding of αCP2 to its RNA target through KH3 in the nucleus blocks access to NLS II and
hence promotes translocation to the cytoplasm where the αCP2-RNA complex achieves its
cytoplasmic function. Upon dissociation of the complex NLS II would be exposed again,
which would then direct the protein back to the nucleus (Chkheidze et al., 2003).
Each αCP isoform contains three KH domains. The KH domain is present in a number of
RNA-binding domains and can interact with four to five contiguous bases in a target RNA.
The interaction with RNA may be influenced by posttranslational modification as it has
been shown that phosphorylation of αCP1 and αCP2 greatly reduces their RNA-binding
activity (Makeyev et al., 2000). Isolated KH domains are capable of binding RNA
sequences independently and in the case of proteins containing a number of KH domains,
the role of each domain in interacting with the target RNA is not yet fully understood.
The target site of αCPs is usually located at the 3’UTR of the mRNA. The sequence present
at the 3’UTR is a single-stranded C-rich motif and binding of αCPs to this motif has been
correlated to a number of transcriptional and posttranscriptional regulatory processes,
including apoptotic and developmental processes (Du et al., 2004). αCPs have been shown
to bind to C-rich patches of several mRNAs including α globin (Kiledjian et al., 1995),
collagen-α1 (Lindquist et al., 2000), tyrosine hydroxylase and erythropoietin, affecting
either mRNA stabilization and or translation (Czyzyk-Krzeska et al., 1999; Paulding et al.,
1999).
The initial role of αCP in the stabilization of mRNA was identified using α2 globin
mRNA. The stability of globin mRNA is dictated by the formation of a complex at a C-rich
sequence at the 3’UTR. This α complex consists of a number of proteins including
αCP1and αCP2 (Ji et al., 2003). It is widely accepted that the presence of the α complex
Chapter 1: General Introduction
23
presents a general feature of high level of mRNA stability (Holcik et al., 1997). A number
of experimental data support this hypothesis. A reduction in mRNA was observed in vivo
and in vitro when αCP’s ability to form the α complex was destroyed through mutations in
the 3’UTR. Further studies suggest that the α complex protects the poly (A) tail from rapid
degradation (Holcik et al., 1997) A specific endoribonuclease that cleaves within the C-rich
region where αCP binds has been identified. It has been also suggested that αCP blocks the
endoribonuclease site and hence protects the mRNA from degradation (Ji et al., 2003).
αCP not only influences mRNA stabilization but also plays a role in translational control
(Waggoner et al., 2003a; Waggoner et al., 2003b). Binding of αCP and hnRNP K to the
3’UTR differential control element (DICE) of the LOX mRNA appears to keep the RNA in
a translationally silent state until the later stages of erythroid differentiation (Ostareck et al.,
2001; Ostareck et al., 1997). Interestingly, αCP is also implicated in translational
enhancement, such as binding to the 5’UTR cloverleaf structure and the stem-loop element
of the picornavirus mRNA influencing the efficiency of the cap independent mRNA
(Parsley et al., 1997). In contrast to the 3’UTR binding site of αCP in other mRNAs, the
two viral C-rich sites are in the form of structured mRNA. There are also additional studies
implicating the role of αCP in a number of other viral mRNAs (Blyn et al., 1997; Graff et
al., 1998). In addition, αCP has recently been discovered as a novel rennin mRNA-binding
protein that targets a cis-element in the 3'-UTR and regulates rennin production (Adams et
al., 2003).
Furthermore, αCPs have also been associated with translational recruitment of inactive
mRNAs during the early development of Xenopus embryo, by the regulated extension of
poly (A) tail (Paillard et al., 2000). It is therefore evident that αCPs are involved in a
number of post-transcriptional regulatory pathways directly affecting mRNA stability,
modification and expression. The interactions of αCP proteins with target mRNA appear
sequence specific but the underlying mechanism of these interactions have only been
partially elucidated.
Chapter 1: General Introduction
24
The interaction of αCPs is not just limited to RNA. αCP’s are also capable of binding to
single stranded DNA (ssDNA). Such interaction has been shown to have a regulatory
function in transcription. A recent study shows αCP binding to a polypyrimidine region in
the proximal mouse opiod receptor (MOR) promoter leading to transcription activation
(Kim et al., 2005). Specific binding of the closely related hnRNP K to a single stranded
pyrimidine sequence in the promoter region activates transcription of the human c-myc
gene (Michelotti et al., 1996). In addition, Du et al, (2005) have shown high affinity
binding of hnRNP K and αCP to a C-rich stretch of human telomeric DNA. The functional
importance of these interactions is not well understood. However, studies carried out to
date imply that proteins of the PCBP family may participate in the regulation of telomere
length and telomerase activities (Bandiera et al., 2003; Du et al., 2005; Lacroix et al.,
2000).
αCP and hnRNP K are structurally similar. Primary sequence analysis of these proteins
shows the KH motifs of these two proteins are more closely related to each other than the
KH domains from the same protein. The conservation of the KH domain number, sequence,
organization and their binding to the DICE element of the lipoxygenase (LOX) mRNA,
suggests that these proteins possess a similar mode of action. However the specific function
and role of αCPs in translational enhancement and stabilization of specific mRNAs
suggests that these proteins have separate RNA-binding specificities (Makeyev et al., 2000;
Thisted et al., 2001).
Despite comprising the major poly (C) binding proteins in the cell, hnRNP K and αCP have
quite distinct optimal binding sites. αCP binds to the 3’UTR of α globin mRNA leading to
its stabilization, while hnRNP K cannot form the α-complex at this site (Chkheidze et al.,
1999). The binding of αCPs is not a result of simple recognition of poly (C) sequence as it
has been shown that αCP-2KL binds to 3’UTR of globin mRNA with higher affinity than
to poly (C) homoribopolymers (Wang et al., 1995). In addition, it has been experimentally
shown that mutations outside the binding side affect the formation of the α complex,
suggesting that it is not only the sequence but also the structural motifs present in the RNA
which influence the preference and affinity of the cognate RNA (Wang et al., 1995).
Chapter 1: General Introduction
25
1.7.1 Protein-Protein interaction
αCP proteins are also involved in protein-protein interactions. αCP2 can form homodimers
and in yeast two hybrid experiments and interaction of αCP2 was observed with a number
of proteins including hnRNP L, hnRNP K and hnRNP I (Kim et al., 2000). The N-terminal
half of αCP2 including the two exons after the second KH domain is required for both
homodimerization and interaction with these proteins (Makeyev et al., 2000). Further
studies on the interaction of these proteins are required in order to determine their
functional significance.
The closely related hnRNP K protein can also dimerise and oligomerise with a number of
proteins. It interacts with a number of signal transduction proteins including the Src family
tyrosine kinases, proto-oncogene Vav (Bustelo et al., 1995) and protein kinase C (Schullery
et al., 1999). These observations suggest a role for hnRNP K in cell signaling. Its
interaction with TATA-binding protein and with transcriptional repressors proposes a role
in transcriptional regulation (Michelotti et al., 1996). In addition hnRNP K and αCP2 can
interact with each other and share a number of common binding partners including Y-box
binding protein, splicing factor 9G8, and hnRNP L (Kim et al., 2000; Shnyreva et al.,
2000). The basis of such interactions is not well understood, however interaction of cell
signaling proteins and the transcriptional repressors with hnRNP K appears to be through a
proline rich domain denoted KI. Conversely, interaction with αCP2 is mediated through the
N terminal KH domain of hnRNP K (Makeyev et al., 2002). Moreover, recently it has also
been shown that hnRNP K interacts with neuron specific RNA binding protein HuB
through its RGG box, which is located between the second and third KH domains (Yano et
al., 2005).
In addition, the isolated third KH domain of Nova-1 protein homodimerises in solution, in
the absence of RNA and without the involvement of other parts of the full-length protein
(Ramos et al., 2002). The residues involved in the binding interface are conserved between
KH1 and KH3 of the Nova- 1 protein. As a result, in vivo protein interactions can occur
through KH1 and KH3, which can cooperatively increase the dimerisation affinity (Ramos
et al., 2002). Such homo and hetero-dimerisation of KH domains may prove very important
Chapter 1: General Introduction
26
to the functioning of αCP proteins. Interestingly, this mechanism is not novel, as it has
been described with DNA-binding proteins (Ramos et al., 2003).
1.7.2 αCP KH motif-synergy
The availability of multiple KH domains in αCP proteins presents the question of which
domain or combination of domains is responsible for binding? In vitro experiments reveal
that single KH domains have displayed binding but whether each KH domain in the full-
length protein participates in binding is not known yet (Makeyev et al., 2002). Previous
studies of the closely related hnRNP K revealed that minimal binding was observed to the
poly CT DNA sequence for first and/or second KH domains while the isolated third KH
domain exhibited a decreased affinity compared to the protein as a whole (Braddock et al.,
2002a). On the other hand, Ito et al (Ito et al., 1994) reported that the binding of hnRNP K
to dC-rich oligonucleotide is through the KH3 domains. In contrast, Siomi et al (Siomi et
al., 1994), proposed that all three KH domains are capable of binding under strict
conditions (1 M salt). Moreover, SELEX studies revealed that the RNA target that bound
hnRNP K consisted of a single 6-7 nt long C-rich box, suggesting that only one of the three
KH domains participated in RNA binding activity (Makeyev et al., 2002). Recent studies
have shown cooperative binding of hnRNP K KH domains to mRNA targets (Paziewska et
al., 2004). Paziewska et al (Paziewska et al., 2005) have shown using the yeast three-hybrid
system, that the three KH domains bind synergistically and that a single KH domain binds
RNA weakly compared to the full-length hnRNP K.
For αCP1 and αCP2, filter binding assays showed that αCP1-KH1 and KH3 are capable of
binding with high affinity and specificity to a poly (rC) homopolymer while αCP1-KH2 did
not exhibit such activity (Dejgaard et al., 1996). However, in the same SELEX study as
hnRNP K mentioned above (Makeyev et al., 2002), it was shown that the RNA target
identified for αCP-2KL contained three C-rich patches, suggesting a three prong interaction
between this protein and its RNA target (Makeyev et al., 2002).
Chapter 1: General Introduction
27
Figure 1.11: Sequence alignment of the KH domains from the known PCBP (αCP) proteins, PCBP1-4 and hnRNP K. Alignments were carried out using the program ClustalX and ESpript. The sequence shown for the domains correspond to residues 11-82 in the full-length protein. Secondary structures were based on the crystal structure of the PCBP1 KH1 (PDB code: 2AXY). In another study trying to identify the role of the KH domains in αCP binding in the
cloverleaf and stem-loop IV structures at the 5’UTR of poliovirus, purified recombinant
proteins (KH1, KH2, and KH3) of both αCP1 and 2 were used for binding reactions with
radio labeled RNA probes (Dejgaard et al., 1996). The results showed that the
corresponding domains from both αCP1 and 2 interact with the RNA probes in a similar
fashion and the domains behaved as described previously. Although both the KH1 and
KH3 domains bind to poly (rC) homopolymers , only KH1 was capable of specifically
interacting with the poliovirus RNA structures in RNA electrophoretic shift assays
(REMSA). In addition, mutation of KH1 within PCBP2 led to the most alteration in RNA
binding. These data indicate that the KH1 domain is the major RNA-binding determinant
for recognition of poliovirus-specific RNA targets by PCBPs. However, the KH2 and KH3
domains must also play an essential part in these interactions because mutations of the
highly conserved tetra-peptide motif (Gly-X-X-Gly), which has been shown to directly
contact the RNA target in these domains, have a profound effect on the binding by the full-
length protein (Musco et al., 1997; Musco et al., 1996). It was initially thought that
mutations in the KH2 and KH3 domains of full-length PCBP causes a structural change in
the protein or promote misfolding of the protein in E. coli. However, this appears very
Chapter 1: General Introduction
28
unlikely as comparable expression and solubility levels were obtained for both the
recombinant and wild type αCP2 protein. Moreover, the mutation was made in the flexible
loop of the KH domain structure and sequence alterations in this site would not be expected
to change the overall structure of the protein. It is not known how KH2 and KH3 stabilize
the interaction of αCP2 with the viral RNA. It appears that all three motifs must be linked
within a single polypeptide to have optimal affinity for the RNA also shown by RNA
REMSA (Silvera et al., 1999).
A more recent study reveals not only a difference in RNA binding activities of αCP2 KH
domains but also distinct functions in poliovirus translation and RNA replication (Walter et
al., 2002). The integrity of the first KH segment in αCP2 was shown to be absolutely
essential for translation initiation of the Poliovirus (PV) internal ribosome entry site (IRES)
element and for replication of PV RNA, consistent with previously published data (Silvera
et al., 1999; Walter et al., 2002). On the other hand, an intact second KH domain of αCP2
was not required for translation initiation on the PV IRES element or for PV RNA
replication. It has also been shown that, an intact third KH module functions to mediate
efficient translation initiation on the PV IRES element, but is not essential for replication of
PV RNA (Walter et al., 2002). Taken together, these studies suggest distinct roles for KH
domains of αCP2 in PV translation and RNA replication.
Based on these observations it has been suggested that KH domains may collaborate and
bind cooperatively within the full-length protein (Makeyev et al., 2002). The spatial
arrangement of such multiple interactions awaits a detailed three-dimensional structure of
the full-length protein with its target RNA. However, in order to fully appreciate the
contribution of each domain in the whole protein, both structural and binding studies of the
intact protein are required to provide additional insight into this area.
Despite the absence of a three-dimensional structure of the full-length protein, there are a
number of structures available for independent KH motifs in the absence as well as in the
presence of oligonucleotide (Braddock et al., 2002b; Du et al., 2005; Musco et al., 1996).
They have provided a wealth of information and some insight into the mechanism of KH
Chapter 1: General Introduction
29
oligonucleotide binding strength and specificity, such as hydrogen bonding interactions and
the insertion of bases into the hydrophobic protein pockets.
1.7.3 KH Structure
The first structure of a KH domain was solved by Musco and coworkers (Musco et al.,
1996). They showed, using NMR, that the KH domain 6 of vigilin consists of three
antiparallel β-sheets packed against three α-helices. The KH domain was originally thought
to comprise of 45-55 amino acid residues but subsequently, based on structural studies, the
domain boundaries were redefined to 68-72 amino acids. This extension in the domain was
found to be essential for structural stability of the domain (Musco et al., 1996).
NMR and crystallographic studies revealed that the 45 amino acid motif corresponded to
βααβ and more extensive structural studies led to the identification of two types of KH
domains (Grishin, 2001). The type I KH domain contains a C terminal βα extension (KH3
of αCP1), while type II includes an N terminal αβ extension (ribosomal protein S3). The
three KH modules of αCPs are predicted by sequence alignment to fall in the type I KH
family comprising of a βααββα configuration (Figure 1.12). The structures of several
independent KH motifs all consist of similar three-stranded antiparallel β-sheets packed
against three α helices. A number of conserved hydrophobic residues are interspersed
throughout the domain, some of which extend their side chains from α-helix 2 to form
contacts with the inner face of the β-sheet, presenting a hydrophobic environment for
oligonucleotide binding (Jensen et al., 2000). Furthermore, all KH modules consist of two
distinct loops known as the GXXG loop and the variable loop. These loops play a
significant role in the recognition and binding specificity of nucleic acids.
Chapter 1: General Introduction
30
Figure 1.12: The crystal structure of αCP2-KH1 (residues 11–82) solved to 1.7 Å resolution depicted in cartoon form (Du et al., 2005)The structure is shown from the beginning of β-strand 1 to the end of α-helix 3. The GXXG motif is colored green. The ‘variable loop’ region between β-sheets 2 and 3 is colored red. These regions bind the hydrophobic oligonucleotide-binding cleft that accommodates C-rich RNA and ssDNA. The figure was generated using VMD.
1.7.4 KH and oligonucleotide interaction
A number of physiological oligonucleotide sites are known for KH domains and it is also
understood that these domains can bind a range of target structures. The size of some
currently recognized oligonucleotide targets range from 7 to 75 oligonucleotides, but each
individual domain recognizes only four core recognition bases. The binding affinity of
individual KH domains falls in the micromolar range indicative of weak binding, while for
the full-length protein it ranges from 10-6 to 10-9 M (Backe et al., 2005; Makeyev et al.,
2002; Paziewska et al., 2004).
A number of structures of KH domains with either RNA or ssDNA have been studied using
NMR and X-ray crystallography. These structures include Nova-2-KH3/RNA, hnRNP K-
Chapter 1: General Introduction
31
KH3/DNA, αCP2-KH1/DNA (Figure 1.12 and 1.13) and FBP-KH3 and KH4/ssDNA
(Braddock et al., 2002b; Du et al., 2005). A comparison of the structures has resulted in
several conclusions by the authors. Each KH domain recognized a core motif of four
nucleotides. The four core bases 5’-UCAY-3’ (RNA) for Nova-2-KH3, 5’-ACCC-3
(DNA)’ for αCP2 KH1, 5'-TTTT- 3’ (DNA) for FBP KH3, 5'-ATTC-3’ for FBP-KH4 and
5’ T/CCCC-3’ for hnRNP K-KH3. In each of these structures only pyrimidines were found
at the first and fourth positions. Furthermore, the first and fourth positions were not
involved in highly specific interactions. Position two and three of the core motif were
involved in a number of base specific interactions and these interactions were mostly
conserved. Upon oligonucleotide binding, there were no large conformational changes
except in the flexible regions of the molecule. These structures agree with the position of
the oligonucleotide binding cleft. The oligonucleotide lies in a narrow groove between the
invariant Gly-X-X-Gly motif and the variable loop, which readily accommodates
pyrimidine over purine bases owing to their smaller size (Figure 1.13). The nucleotides of
the core sequence, and also a number of water molecules participate in a dense network of
hydrogen bonds, hydrophobic interactions and stacking interactions (Backe et al., 2005; Du
et al., 2005; Jensen et al., 2000).
Figure 1.13: crystal structure of the αCP2 KH1-human telomeric (ht) DNA complex (Du et al., 2005). The KH domain is shown in cartoon representation in tan. The htDNA is shown using a sticks representation colored by elements. Secondary structure elements of the KH domain are labeled. The GXXG motif is colored red. The ‘variable loop’ region between β-sheets 2 and 3 is colored pink. These regions bind the hydrophobic oligonucleotide-binding cleft that accommodates C-rich RNA or ssDNA. The figure was generated using VMD.
Chapter 1: General Introduction
32
These observations have all shed some light on the binding mode of the KH domains and
their target. Structural studies of KH domains in the presence of oligonucleotide have
revealed a number of critical residues involved in specific interactions. For example the
complex structure of hnRNP K-KH3/DNA specific recognition of the DNA tetrad is
achieved by a number of hydrogen bonds involving residues Ile29, Ile36, Ile49 and Arg59
(for example Arg59s NH1 and NH2 hydrogen bond to the Cyt3 N3 and O2). Electrostatic
interaction with the DNA backbone is made from the backbone amide of Gly32 and the
side chains of Lys31, Lys37 and Arg40 (Backe et al., 2005). A number of corresponding
residues also make contact with the DNA in the complex structure of PCBP2-KH1
structure. Some of these include Ile29, Lys31, Lys32, Val36, Lys37, Arg40, Ile49 and
Arg57 (Du et al., 2005). Based on structural data on key residues participation in
oligonucleotide recognition and sequence alignments information, all KH1 and KH3
domains should be able to bind poly (C) sequences in a similar manner. In addition, the
KH2 domain should also be able to specifically recognize at least two cytosines at the
second and third position of the core sequence.
RNA and DNA recognition by KH domains is very similar. For example, a comparison of
the complex structures of Nova-2-KH3 with RNA and PCBP2 KH1 with DNA reveals a
number of similarities. Although the sequence specificity for nucleic acid recognition are
different, the overall structures are similar and they both adopt a common binding groove
(Figure 1.14A, B, C). Moreover, the four core recognition bases adopt similar conformation
and share the same location with very similar orientation (Du et al., 2005).
Chapter 1: General Introduction
33
Figure 1.14: Analysis of KH domains with their target RNA or DNA. (A) Structure-based sequence alignment of the KH domains of αCP2-KH1, Nova-2-KH3, hnRNP K-KH3 and FBP-KH3 and KH4. Conserved residues are colored and the GXXG and variable loop contacting oligonucleotide are indicated. (B) Backbone superposition of the KH domains from the same proteins. (C) The α-carbon deviation for each KH domain residue from the corresponding aligned residue of αCP2-KH1 is plotted versus amino acid residue number. The C root mean square (r.m.s.) of these proteins are very similar except in the variable region. The figures were generated using ESPript and VMD.
C
A
B
Chapter 1: General Introduction
34
1.7.5 KH domain-containing proteins and disease
There is substantial genetic evidence, from various species, supporting a physiological role
of the KH domain. For example, in humans gene lesions that interfere with the expression
of the KH protein FMR1 (fragile mental retardation), lead to the fragile X mental
retardation syndrome (Di Fruscio et al., 1998; Pieretti et al., 1991; Verkerk et al., 1991).
The clinical significance of the KH domain was illustrated by a point mutation, changing of
a conserved isoleucine 304 to an asparagine residue in the second KH domain of FMR1 (De
Boulle et al., 1993). This particular point mutation modifies the structure of the KH domain
(Musco et al., 1996) and impairs RNA binding activity (Siomi et al., 1994). A cytoplasmic
protein GLD-1 in C. elegans, which is required for germ cell differentiation (Francis et al.,
1995a; Francis et al., 1995b; Jones et al., 1996) leads to a recessive tumorous germ line
phenotype upon alteration of glycine 227 within the KH domain (Jones et al., 1995).
Interestingly, this conserved glycine forms part of the RNA-binding surface (Musco et al.,
1996). Mutation of the corresponding residue also abolishes RNA-binding in Sam68, which
is an RNA-binding protein, that associates with c-Src in mitosis (Chen et al., 1997;
Fumagalli et al., 1994). In mice, oligodendrocyte differentiation and subsequent formation
of myelin requires the Quaking gene. Quaking encodes Qk1, a member of the highly
conserved STAR/GSG family of RNA-binding proteins. Qk1 has been implicated in the
regulation of alternative splicing, stability, and translation control of mRNAs that code for
myelin structural components in glial cells. In mice, mutation in the Quaking gene greatly
impairs myelination and as a consequence, the mice develop a rapid tremor at postnatal day
10 (Ryder et al., 2004; Sidman et al., 1964). A missense mutation in the GSG domain part
of the KH domain of Qk1 is embryonic lethal. This point mutation has been observed to
hinder homodimerization and may be the reason for the lethality observed in mice (Chen et
al., 1998b; Ebersole et al., 1996). The Drosophila Bicaudal C (Bic-C) contains five KH
domains and gene lesions that truncate the Bic-C protein or a point mutation that replaces
glycine 295 with an arginine in the third KH domain results in defects in RNA-binding and
oogenesis (Mahone et al., 1995; Saffman et al., 1998).
Chapter 1: General Introduction
35
As RNA-binding and recognition by the KH domains are playing an emerging role in
human disease, it is important to understand the underlying mechanism of such
interactions. Therefore I have taken a structural approach to understanding this interaction
in a human disease paradigm.
1.8 Androgen receptor and prostate cancer A disease in which mRNA stability plays a role is prostate cancer. Prostate cancer is a
leading cause of male cancer mortality in Western societies. The prostate gland is
approximately the size of a walnut, which weighs about 20 grams, located immediately
below the bladder surrounding the urethra. Normally the prostate gland is highly androgen
dependent for growth and morphogenesis (Garnick et al., 1996). Consequently, androgen
deprivation leads to a dramatic regression of the gland. Similarly prostate cancer relies on
the presence of androgen action to stimulate its initial development and progression. In the
initial stages of prostate cancer depletion of androgen suppresses the proliferation of cancer
cells (Figure 1.15), although in the later stages the cells become insensitive to androgen
(Kati et al., 2006; Koivisto et al., 1997). Although hormonal manipulation is an important
step in the treatment of metastatic prostate-cancer, androgen responsiveness is transient
with ultimate relapse of disease with continued androgen blockade (Bubley et al., 1996).
Figure 1.15: Androgen ablation kills prostate cells. Prostate cells, along with prostate cancer cells, require the presence of androgens. Thus, the removal of androgens kills a large majority of prostate cancer cells. Hormone therapy or androgen ablation is still common practice today. The release of hormones from the higher brain centers, dictate the release of testosterone from the testes. The removal or blocking the release of testosterone leads to the atrophy of the prostate and death of the majority of the prostate cancer cells. FSH is follicle stimulating hormone and LH is luteinizing hormone.
Chapter 1: General Introduction
36
Research group led by Professor Leedman (Yeap et al., 1999) has demonstrated that
androgen receptor (AR) mRNA stability is a major determinant of androgen receptor gene
expression in prostate cancer and that androgens regulate AR mRNA (Kati et al., 2006;
Yeap et al., 1999). The AR mediates the primary action of androgens in androgen-sensitive
tissues, and is a member of the superfamily of nuclear receptors regulating gene expression.
Members of the superfamily are characterized by a central DNA-binding domain composed
of two highly conserved zinc finger protein motifs which bind specific DNA sequences or
response elements within target genes and regulate transcriptional activity (Figure 1.16).
The carboxy-terminal portion of the receptor functions as the ligand-binding domain, and
binding of specific ligands to their cognate receptor modulates transcriptional activation
(Bubley et al., 1996).
Figure 1.16: The androgen receptor (AR). It is an intracellular hormone receptor, which is present in tissues that respond to androgens. The AR protein has three domains, the transcription regulation domain, DNA binding domain and ligand-binding domain, each with a unique function. The AR promotes the expression of genes that are hormonally controlled by androgens.
Androgens such as testosterone (T) and dihydrotestosterone (DHT) are steroid hormones
with a central role in male sexual differentiation and maintenance of body composition.
Androgens bind AR (Figure 1.17), leading to a cascade of responses including the
proliferation of specific cancer cells, as in prostate cancer.
Chapter 1: General Introduction
37
Figure 1.17: The action of testosterone and androgen receptor (AR). Testosterone is converted to dihydrotestosterone. The AR upon binding DHT activates and dimerizes, which then enters the nucleus binding to the androgen response element leading to gene expression.
1.8.1 AR mRNA stability and RNA binding proteins
Studies conducted by the Leedman lab (Yeap et al., 2002) have shown that the proximal
3’UTR of AR mRNA contains a UC rich region, which acts as a cis element. Their studies
suggested that this region plays a role in AR mRNA turnover using Luciferase (Luc)
reporter transfection assays in LNCaP prostate cancer cells. They examined the change in
basal Luc activity induced by the UC-rich region, and showed that the presence of the AR
UC-rich sequence reduced reporter activity by 30%. As the AR UC-rich element was
capable of regulating the Luc reporter, they also investigated if this region was a target for
RNA-binding proteins using RNA electrophoretic mobility shift assay (REMSA). Multiple
RNA-protein complexes that bound to the 32P-labeled AR UC-rich transcript were
identified in LNCaP cells. Competition studies with unlabeled RNA confirmed specificity
for the AR UC-rich probe. UV cross-link (UVXL) analysis was performed to further define
the proteins binding the UC-region. The 32P-UC transcript bound multiple distinct RNA-
binding proteins from LNCaP cytoplasmic extract. Two specific proteins with masses of ~
Chapter 1: General Introduction
38
43 and 36 kDa were identified, the binding of these were significantly decreased by excess
unlabelled poly (C) and poly (U) respectively. Addition of excess poly (A) had no effect.
Given the close similarity between the UC-rich region of AR and the reported RNA target
sequences for He1-N1 and HuD, Leedman’s group examined whether the 36 kDa RNA-
protein complex contained HuR, the member of the elav/Hu family of RNA binding
proteins that is not restricted to the central nervous system. In REMSA assays a monoclonal
antibody against HuR supershifted the 36 kDa protein and no shift was observed with an
unrelated antibody. To further investigate this interaction, they also used a recombinant
GST-HuR fusion protein and observed a supershift using HuR antibody. In addition, in UV
cross-link of prostate cancer cells, AR mRNA immunoprecipitated with HuR antibody,
indicating close association of HuR and AR in prostate cancer cells.
The 43 kDa protein was subsequently identified as αCP1 and/or αCP2: first, poly (C)
competition abolished the faster migrating RNA protein complex in REMSAs; second, poly
(C) competition in UVXL assay specifically reduced binding of the major 43 kDa RNA-
binding protein; third, analysis of the UC-rich region revealed a conserved CCCUCCC
sequence identified as a component of the αCP binding motif in erythropoietin (EPO)
mRNA (Yeap et al., 2002).
To further investigate the possible identity of the 43 kDa proteins as αCP1 and αCP2, the
Leedman group conducted supershift experiments. αCP1 and αCP2 antibodies each
produced a prominent supershift and UVXL-IP performed on LNCaP cytoplasmic extract
immunoprecipitated a band at 43 kDa .
In addition, they also conducted experiments with the nuclear extracts and identified the
presence of HuR, αCP1 and αCP2 located in the nucleus of LNCaP cells. These data
suggest that each of these proteins may have a role in binding AR mRNA in both the
cytoplasm and nucleus.
Taken together, these studies on AR mRNA-protein interactions formed a strong
foundation upon which to embark on structural studies to examine UC-rich element binding
Chapter 1: General Introduction
39
to HuR and the αCPs. Thus, the focus of the current study was designed to better
understand the role of RNA-binding proteins in the regulation of AR mRNA stability. In
particular, I wished to probe the molecular interactions between αCP1 and HuR with the
UC-rich region in the 3’UTR of AR mRNA. I was specifically interested in characterizing
the structural attributes of this multi-protein/RNA complex. An understanding of the
structure of the complex involved in AR mRNA regulation could reveal ways to design
drugs that modulate AR expression and provide a platform for developing new
therapeutics.
1.9 Summary and Research aims The AR is a key modulator of prostate cancer growth and proliferation, and a prime
therapeutic target. The data generated by the Leedman group established that HuR and the
αCPs bind to the AR are likely to contribute to the regulation of AR mRNA stability. Given
the increasing importance of understanding the mechanisms underlying gene expression,
the current study was designed to explore the mechanism of binding of αCP1 to the target
UC-rich sequence at the 3’UTR of AR mRNA, as a starting point to understanding the
1) to determine the structural basis of αCP1 for its binding, to the C rich region at the
3’UTR of the AR mRNA, with reference to its affinity and specificity and
2) to characterize the kinetics and binding affinities of the isolated αCP1-KH domains 1, 2
and 3 with their target probe, as well as a variety of other RNA and DNA probes, in an
attempt to determine the strength of binding and the preferred sequence.
1.9.1 Hypotheses that formed the basis of this study
1) The AR mRNA contains an UC-rich region that is the target for the αCP proteins each of
which contributes to overall regulation of AR expression in prostate cells.
2) Specific interference with the binding of these RNA binding proteins to the UC-rich
region could modulate the stability of AR mRNA.
Chapter 1: General Introduction
40
3) αCP1 protein binds the C-rich patch of the 3’UTR via its three KH domains.
4) All of the αCP1-KH domains participate in binding the UC- rich element.
5) Each αCP1-KH domain has a different binding affinity for the oligonucleotide probe.
It was envisaged at the outset that these studies would provide a better knowledge of the
complex interactions between αCP1 and AR mRNA at a molecular level, and shed light on
the nature of protein/mRNA interactions in general. Understanding the detailed structural
interactions within the AR mRNA may provide valuable insight into the protein and mRNA
interfaces and identify possible targets for drugs aimed to regulate AR expression in
prostate cancer cells by interfering with the interaction of αCP1 and AR mRNA.
Figure 1.18: HuR/αCP1 and androgen receptor system. A molecular model of HuR RRM domains and αCP1-KH domains binding to the 3’UTR of AR mRNA (Wilce et al., 2002 )
αCP1
Chapter 2 Materials and Methods
Chapter 2: Materials and Methods
41
2.1 Molecular Biology 2.1.1 Materials
The αCP1 coding sequence was provided in a pGEX-6P-2 vector (Pharmacia), and was
available in the laboratory. E.coli XL-Blue cells were used as plasmid hosts during the
cloning and screening procedures for the αCP1-KH domains. E.coli BL21-Codon (plus)
cells (Stratagene) were used as the expression host for the final plasmid products.
Primers, restriction enzymes and the various buffers for the cloning experiments were
purchased from Promega and Sigma. Ampicillin and chloramphenicol were obtained
from Sigma. Bacto-agar and yeast extracts were from Oxoid Ltd (Basingstoke England).
Bactotryptone was obtained from Becton Dickinson (Cockeysville USA). All other
chemicals were of molecular biology or analytical reagent grade. A complete list of all
the chemicals and their suppliers used for this project are listed below in table 2.1
Table 2.1: A list of Chemicals, reagents and consumables with their suppliers used throughout this work. Product Supplier General
μg/mL choloramphenicol. All cultures were incubated overnight at 37˚C and shaking at
180 rpm.
LB media (400 mL) containing 100 μg/mL ampicillin and 25 μg/mL chloramphenicol
was inoculated with the 5mL of overnight culture and then incubated at 37˚C with
shaking at 180 rpm. This culture was grown to mid exponential phase (A600~ 0.8,
optical density of cell in media is at 0.8 when measured at λ 600 nm), and then
aliquoted in 20 mL portions into 50 mL Falcon tubes. Each tube was induced with a
Chapter 2: Materials and Methods
51
different final IPTG concentration between 0.2 mM and 0.8 mM. An IPTG titration was
conducted for all the αCP1-KH domains. The induced cultures were incubated for a
further 4 hours shaking at 180 rpm followed by centrifugation for 15 minutes at 5000
rpm, 4˚C. The cell pellets were stored at –20˚C until required. Before and after, 1mL
induction samples were collected and stored for analyses by SDS PAGE. For large-scale
protein expression an IPTG concentration of 0.2 mM was used for all the different
proteins because this was found to be the optimum concentration.
2.3.3 Overexpression of labelled αCP1-KH1 and αCP1-KH3 domains 15N-labelled protein was also prepared for αCP1-KH1 and αCP1-KH3. A fermentor
was used and the method used was adapted from (Cai et al., 1998). This approach
produces a higher cell mass per gram of 15NH4Cl than the conventional shaker flask
method. Table 2.5 summarises the composition of the minimal media. 2 L of the basic
salt solution was prepared for each fermentor grow-up. Salt solution (1.6 L) was
autoclaved in the reaction vessel and 400 mL in a baffled flask. A BIOFLO III
Fermentor/Bioreactor (New Brunswick Scientific, NJ, USA) was used.
A single colony of the desired bacterial strain was used to inoculate 5 mL LB containing
100 μg/mL ampicillin and 25 μg/mL chloramphenicol. After 8 hours incubation at 37°C
with shaking (180 rpm), 1 mL of the day culture was used to inoculate the flask
containing 400 mL of basic salt solution to which the trace metal solution (Table 2.5),
glucose, yeast extract, MgCl2 solution, ampicillin and chloramphenicol had been added.
The culture was grown overnight at 37°C with shaking (180 rpm). The following
morning, trace metal solution, glucose, yeast extract, MgCl2 solution, ampicillin and
chloramphenicol were added to the 1.6 L of basic salt solution autoclaved in the
fermentor reaction vessel. The fermentor was set to maintain a constant temperature of
37°C, a constant pH of 6.8 (maintained by the addition of 5 M NaOH) and an agitation
rate of 500 rpm. The dissolved oxygen level was maintained at 75% by altering the ratio
of air/oxygen bubbled into the culture. The entire overnight culture was added to the
solution to the fermentor and the culture was grown until all the 14N-ammonium
chloride had been depleted as indicated by a sharp spike in the dissolved O2 level. 0.5 g 15NH4Cl was added and the culture was grown until the 15N-ammonium chloride was
again depleted. At this point a further 2.5 g 15NH4Cl was added and expression of the
αCP1-KH1 or αCP1-KH3 domain was induced by the addition of IPTG to a final
Chapter 2: Materials and Methods
52
concentration of 0.2 mM. After the depletion of the last batch of ammonium chloride
(~3 h) the cells were harvested by centrifugation (5000 x g, 15 min, 4°C) and the
supernatant discarded. The cell pellet was resuspended in ice cold PBS and transferred
to 50 mL falcon tubes. The cells were pelleted by centrifugation (5000 x g, 20 min, 4°C)
and the supernatant discarded. The cell pellet was snap frozen in liquid nitrogen and
stored -80°C until required. Pre-induction and pre-harvest samples (1 mL) were taken
and protein over-expression analysed by SDS-PAGE analysis as detailed in Section
2.4.4.
Table 2.5: The composition of basic salt solution.
Stock solution per litre Stock composition
Basic salt solution a 970 mL per 970 mL
KH2PO4
K2HPO4
NaHPO4
K2SO4 14NH4Cl
13.0 g
10.0 g
9.0 g
2.4 g
1.0 g
Trace metal solution b,c 0.4 mL per 100 mL
Conc. HCl
FeCl2.4H2O
CaCl2.2H2O
MnCl2.4H2O
CoCl2.6H2O
ZnCl2
CuCl2.2H2O
H3BO3
Na2MoO4.2H2O
8 mL
5.0 g
184 mg
40 mg
18 mg
340 mg
4 mg
64 mg
605 mg
1 M MgCl2 c 10 mL MgCl2.6H2O
10.16 g/50 mL
MQW
100 mg/mL ampicillin c 1 mL ampicillin 1 g/10 mL MQW
50 mg/mL
chloramphenicol c
0.5 mL chloramphenicol 0.5 g/10 mL
ethanol
40 % w/v D-Glucose c 30 mL D-Glucose 40 g/100 mL
MQW
10 % w/v yeast extract c 20 μl yeast extract 5 g/50 mL MQW
Chapter 2: Materials and Methods
53
aAutoclaved prior to use bDifferent composition to that described by Cai et al ., (1998) cSterile filtered prior to use
2.4 Protein purification 2.4.1 Cell Lysis
The pellet from the small scale 20 mL culture (prepared as outlined in section 2.3.3)
was resuspended in 800 μl of ice-cold PBS lysis buffer. The cells were subsequently
lysed using 6 cycles of freeze and thaw where the tubes containing the cells were
immersed in liquid nitrogen for 1 min, after which the lysate was centrifuged for 45
minutes at 20000 g, 4˚C. Unbroken cells, large cellular debris and inclusion body
protein were pelleted out. At this stage 20 μl each of the supernatant and pellet were
analysed by 15% Tris-glycine SDS-PAGE gel to check for the presence of the over-
expressed protein in the soluble and insoluble fraction.
2.4.2 Glutathione-agarose bead adsorption and PreScission protease cleavage
In cases where the expression of soluble GST-fusion protein was achieved, large scale
protein expression was conducted. Everything was performed as in small scale
expression, however, cell lysis was achieved by French pressing and the cell lysate was
supplemented with PMSF, leupeptin and aprotinin to a final concentration of 0.5 mM,
to inhibit proteases. Protein purification was then conducted to separate the GST-fusion
protein from the rest of the E.coli proteins. Glutathione-agarose beads (300 mg) were
hydrated at room temperature for 30 min before approximately 80 mL of cell lysate was
added. The mixture was incubated with gentle rocking at 4˚C for 16 hours. Following
incubation, the beads were washed six times with 50 mL PBS containing 0.5% Triton
X- 100 buffer per wash to remove any non adherent protein, and once in preScission
protease buffer. The beads were then resuspended with ~7 mL preScission protease
buffer and incubated with 5 U/mL PreScission protease for approximately 48 hours at
4˚C with gentle rocking. At various stages samples were collected for SDS-PAGE
analysis. Following cleavage, the beads were spun down and the eluate collected.
Chapter 2: Materials and Methods
54
2.4.3 Regeneration of GSH-agarose beads
Used glutathione agarose beads were transferred to 50 mL tubes and washed with DDI
water (2 x 40 mL) before being incubated with 8 M Urea (40 mL) at room temperature.
The beads were allowed to settle and overlaying solution was discarded. This procedure
was repeated twice. The beads were then washed with MQW (5 X 40 mL) and stored as
a 50% slurry in 20% ethanol at 4°C until required.
2.4.4 PAGE analysis
Tris/Glycine gels were used to check the success of protein over expression and
purification protocols. SDS-PAGE was carried out using Hoefer® Tall Mighty Small®
apparatus (Amersham Biosciences). Thick gels (1.5 mm) were cast. Mark 12™ low
molecular weight protein standards (10μl) were run per gel. Gels were stained using
Coomassie blue staining. Protein gels were photographed using a Gel Doc™ EQ gel
documentation system (Bio-Rad Laboratories) and processed using QuantityOne-4.5.0
software (Bio-Rad Laboratories).
2.4.5 Tris/glycine SDS-PAGE
The 1 mL pre- and post- induction cell pellets were resuspended in 1 x loading buffer.
The volume of the loading buffer depended on the final A600 of the sample (i.e. 80 µl of
loading buffer was added to samples that had an undiluted A600 of 0.80). The samples
were boiled for 10 min at 100°C, with brief in-between mixing using a vortex. 15 μl of
each sample was loaded per well. For samples obtained during cell lysis and protein
purification an equal volume amount of 2 x loading buffer was added. All samples were
then heated at 95°C for 5 min prior to loading. For each sample, 10–15 µl was loaded
per well.
2.4.6 Electrophoresis
Gels were prepared according to (Sambrook et al., 1989). Briefly, the resolving gel
contained 0.375 M Tris (pH 8.8) and 0.1% SDS. The acrylamide concentration used in
the resolving gel was 15%. The stacking gel contained 4% acrylamide, 0.125 M Tris
(pH 6.8) and 0.1% SDS. Gels were electrophoresed at 200 V using glycine running
buffer.
Chapter 2: Materials and Methods
55
2.4.7 Size-exclusion chromatography
Size-exclusion chromatography was used to separate full-length αCP1, αCP1-KH1,
αCP1-KH2 and αCP1-KH3 from undigested fusion protein, GST and any remaining
bacterial contaminants. Size-exclusion chromatography was carried out using a
Superdex™ 75 10/300 GL column (Amersham Biosciences) connected to a BioLogic
DuoFlow Chromatography System (Bio-Rad Laboratories). The cleavage reaction
mixture was dialysed twice against, 100 x volumes of buffer A at 4°C. The dialysate
was concentrated using a vivaspin 5K concentrator to 1-2 mL. 500 μl aliquots of the
concentrate was loaded on to the superdex column pre-equilibrated in buffer A at a flow
rate of 0.4 mL/min. Protein was eluted by the application of the same buffer at a flow
rate of 0.4 mL/min. The absorbance of the eluate was monitored at λ 280 nm and the
fractions of interest were collected. The purity of the collected fractions was determined
by SDS-PAGE analysis Fractions containing pure αCP1 and αCP1-KH1, αCP1-KH2
and αCP1-KH3 domain were pooled and concentrated to 500 µl using a vivaspin 5K
concentrator and stored at -80°C until required.
2.4.8 Anion exchange chromatography
Anion exchange chromatography was also employed to further purify full-length αCP1
from GST, GST-αCP1 and any remaining bacterial contaminants. Anion exchange
chromatography was carried out using a MonoQ™ HR 10/10 (Amersham Biosciences,)
connected to a BioLogic DuoFlow chromatography system (Bio-Rad Laboratories). All
buffers were 0.2 μm filtered (Millipore) and degassed prior to use. All samples were 0.2
μm filtered (Millipore) immediately prior to loading.
The cleavage reaction mixture was dialysed (either with 5 K or 10 K cut-off membrane
depending on the protein size) twice against 100 x volumes of buffer B at 4°C. The
filtered dialysate was loaded in 2 mL aliquots on to a MonoQ™ HR 10/10. Pure αCP1
was eluted by applying a linear gradient of 0 to 60% buffer C over a volume of 40 mL
at a flow rate of 1 mL/min. Fractions were monitored at λ280 nm and the main peaks
collected and identified using 15% Tris-Gly SDS-PAGE. Those containing pure αCP1
were pooled and dialysed into the Biacore Buffer (50mM Tris-HCL pH 7.4, containing
150 mM NaCl, 0.5% Triton-X 100 and 2 mM DTT, EDTA, 62 μg/mL, 125 μg/mL) and
stored at -80˚C.
Chapter 2: Materials and Methods
56
2.4.9 Cation exchange chromatography
Cation exchange chromatography was also employed to separate αCP1-KH1, from
GST, GST-αCP1-KH1 and any remaining bacterial contaminants. Cation exchange
chromatography was carried out using a MonoS™ HR 10/10 column (Amersham
Biosciences,) connected to the BioLogic DuoFlow chromatography system (Bio-Rad
Laboratories). All buffers were filtered through 0.2 μm filters (Millipore) and degassed
prior to use. All samples were 0.2 μm filtered (Millipore) immediately prior to loading.
The cleavage reaction mixture was dialysed (with 5 K cut-off membrane) twice against
100 x volumes of buffer D at 4 °C. The filtered dialysate was loaded in 2 mL aliquots
on to a MonoS™ HR 10/10 column. Pure αCP1-KH1 domain was eluted by applying a
linear gradient of 0 to 60% buffer E over a volume of 40 mL at a flow rate of 2 mL/min.
Peak elution was monitored at λ280 nm 280, with the collection of the desired fractions.
Collected fractions were analysed by SDS-PAGE analysis and fractions containing
αCP1-KH1 domain were pooled and stored at -80°C until required.
2.5 Protein concentration Protein concentrations were determined using the detergent compatible protein standard
assay (BioRad), which was conducted as instructed in the manual and detected
spectrophotomerically at 750 nm. The assay is based on the reaction of protein with an
alkaline copper tartrate (Bradford, 1976). In addition, protein concentration was
determined spectrophotometrically using the absorbance of the sample at 280 nm. Beer-
Lambert’s law was used to calculate the protein concentration.
A280 = ε.c.l
where: A is the absorbance at 280 nm
ε is the theoretical molar extinction coefficient at 280 nm (M-1cm-1)
c is the concentration (M)
l is the path length (cm)
The molar extinction coefficients used were calculated from the primary sequence of the
protein using the expasy site and are shown in Table 2.6. The protein concentration was
further confirmed using SDS-PAGE analysis and bovine serum albumin as a standard.
Chapter 2: Materials and Methods
57
Table 2.6: Protein molar extinction coefficients
Protein Extinction coefficients (M-
1cm-1) αCP1 13450
αCP1-KH1 0.00 (absence of Cysteines or aromatic amino acid residues
αCP1-KH2 125 αCP1-KH3 1615
2.6 Mass spectrometry Mass spectral analysis using matrix-assisted laser desorption/ionisation-time of flight
mass spectrometry (MALDI-TOF MS) was performed by Proteomics International (East
Perth, WA, Aus) on the αCP1-KH1, αCP1-KH2 and αCP1-KH3 domain to confirm the
identity of the purified protein.
2.7 Circular dichroism spectropolarimetry
Circular dichroism spectra were obtained for αCP1-KH1, αCP1-KH2 and αCP1-KH3
to confirm the correct folding of the protein. Far-ultraviolet circular dichroism (CD)
spectra were collected on a J-810 spectropolarimeter (Jasco, Easton, MD, USA)
equipped with a Pharmacia LKB MultiTemp II temperature controller (Amersham
Biosciences). Quartz cuvettes of 1 cm path length were obtained from Starna Pty Ltd
(Thornleigh, NSW, Aus). The data was collected on to a personal computer and
visualised using the supplied software, Spectra Manager (Jasco).
2.7.1 Sample preparation and spectra acquisition
Samples were dialysed into 50mM Tris buffer pH 8.2, 150 mM NaCl, 1mM EDTA and
DTT using the 3k cutoff dialysis tube (Sliderlysers). The samples were then diluted in
the same buffer to a final concentration of 20 µg/ 200 µl. All spectra were collected at
25 °C, under a constant nitrogen flush (>5 L/min). Data were collected over the
wavelength range 200 and 300 nm with a data pitch of 0.2 nm and a bandwidth of 1 nm.
The final spectra represented the average of 30 scans collected at a speed of 100 nm/min
with a response time of 2 sec.
Chapter 2: Materials and Methods
58
2.8 Oligonucleotide Preparation 2.8.1 Preparation of 11-nt αCP1 target site from AR mRNA
The DNA and RNA oligonucleotides representing the complementary nucleotides
3315–3325, 5-UUCCCUCCCUA-3 of AR mRNA, was purchased from Dharmacon in a
protected crude form and further purified by denaturing PAGE. After the separation of
the sample by 20% PAGE, the band was visualised by UV shadowing once the
oligonucleotide had run approximately one half way down the gel. The sample was
recovered by excising the appropriate band, which was then crushed and eluted
overnight in 0.3 M sterile sodium acetate at 37°C. The eluent was filtered and desalted
using a reverse-phase solid-extraction cartridge (C18 Sepak cartridge, Waters). The
eluted fractions were lyophilised and deprotected. The deprotecting procedure involved
dissolving the oligonucletide pellet in 400 µl deprotecting buffer supplied by
Dharmacon. This was then incubated for 30 min at 60 °C, which was subsequently
lyophilised and dissolved in distilled water for quantification using UV spectroscopy.
Oligonucleotide concentrations were determined by measuring the absorbance at
λ260 nm and assuming one absorbance unit to be equivalent to 34 μg/mL.
2.8.2 Preparation of 50-nt containing αCP1 target site from AR mRNA
The target 5’-biotinylated mRNA (mRNA:5-
CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3) representing nt 3296-3325 of
androgen receptor (AR) mRNA was obtained from Dharmacon and the complementary
DNA sequence was purchased from Geneworks. The RNA was deprotected and also
both DNA and RNA quantified as described (section 2.8.1).
BIACORE experiments were conducted for αCP1, αCP1-KH1, αCP1-KH2 and αCP1-
KH3 domain. BIACORE is a surface plasmon resonance based instrument that uses an
optical method to measure the refractive index near a sensor surface, which is a gold-
coated glass chip. The gold surface is covered on the optical side with a thin layer of
glass and on the other side with carboxymethylated dextran matrix, onto which the
protein streptavidin is attached. Streptavidin has a high affinity for biotin and so
biotinylated single stranded oligonucleotide is immobilised over the carboxymethylated
surface. In order to detect the interaction of protein and oligonucleotide, 50 μl of protein
Chapter 2: Materials and Methods
59
solution is injected through a flow cell with constant flow of the buffer of interest. This
allows the protein solution to pass near the surface of the chip. Protein and
oligonucleotide interaction leads to a change in the refractive index near the surface,
which is detected in real time by an optical device on the other side of the chip and
plotted as a sensogram representing response unit verses time.
Figure 2.1: Schematic of the sensor chip: Biotinylated RNA is immobilized on carboxy methylated streptavidin matirx and protein solution is passed near the sensor surface, which may bind to the target RNA leading to change in the refractive index near the sensor surface
These kits were used for initial large scale screening, which involved setting up crystal
trays with protein concentrations of approximately 5 mg/mL both at 20˚C and at 4˚C
using the vapor diffusion hanging drop method.
Chapter 2: Materials and Methods
64
2.11.2 Crystallization of αCP1-KH3
Initially αCP1-KH3 crystals grew in 2μl hanging drops containing 1:1 mixtures of
protein and reservoir solutions. The protein solution contained 5 mg/mL in 25 mM
potassium phosphate pH 6.0, 1 mM DTT, 1 mM EDTA, 150 mM NaCl and the
reservoir solution was composed 0.1 M Na HEPES, pH 7.5 in 1.5 M Lithium Sulfate-
Hampton Crystal Screen reagent formulation number 16 (Hampton Research, CA).
These initial crystals were poor diffracting quality crystals. At this stage a number of
factors could be modified to optimise the growth and quality of the crystals. These
factors include the protein concentration, temperature, buffer pH, additives (addition of
divalent cations, glycerol, 2-methyl-2,4-pentane-diol (MPD) ), salt and precipitant
concentration and protein: reservoir buffer ratio. In the case of αCP1-KH3 changing the
protein: reservoir buffer ratio to 2:1 at room temperature, resulted in crystal growth in 2
days to dimensions of ~0.3 x 0.2 x 0.02mm with the outline of a rugby football.
2.11.3 Preparation of αCP1-KH1/DNA complex
The 11 nt DNA sequence (TTCCCTCCCTA) and αCP1-KH1 complexes were prepared
by dissolving the lyophilized DNA with the protein solution to a final ratio of 1:1. The
mixture was left on ice for 30 min prior to setting crystal drops. Crystal trays were setup
using Hampton screen I, Natirx screen and Sigma Screen. Crystals of αCP1-KH1/DNA
were grown using vapor diffusion in 1 μl hanging drops containing 1:1 mixtures of
protein and reservoir solutions. The complex solution contained 309 μM of protein and
DNA in 50 mM Tris-HCL pH 8.0, 1 mM DTT, 1 mM EDTA, 150 mM NaCl, and the
reservoir solution was composed of 0.1 M sodium cacodylate pH 6.5 in 0.2 M
magnesium acetate, 30% MPD, from Sigma Crystal Screen reagent formulation number
21 (Hampton Research, California, USA). Crystals typically grew in eight weeks to
dimensions of ~ 0.2 x 0.2 x 0.04 mm with the outline of a diamond.
2.11.4 Preparation of other crystallisation experiments
Crystal trials were also conducted for the isolated αCP1-KH1 and αCP1-KH2 domains.
Both sigma and Hampton screens were used. Once promising conditions were identified
a number of narrow screens around the particular conditions were conducted. The
optimisation screens were prepared from laboratory stocks, which had been prepared
using double deionised (DDI) water and analytical grade reagents. After preparation the
Chapter 2: Materials and Methods
65
stock solutions were filtered through a 0.2 μm filter and stored at 4˚C until required.
Initial optimisation screens involved changing the precipitant concentration and pH or
varying both. In general the precipitant concentration was varied in 2.00% increments
around the successful concentration and the pH was varied in increments of 0.5 pH
units. Smaller increments were only used if thought necessary. Screening also involved
changing crystallization buffers and also the addition of additives such as 5 to 10%
MPD and glycerol. Additional screens also involved changing protein concentration or
by modifying the ratio of reservoir buffer to protein in the drop.
In addition, both macro-seeding and streak seeding using a cat’s whisker were
attempted to produce crystals of suitable size and quality for diffraction experiments.
2.11.5 X-ray data collection
All diffraction experiments were carried out at 100 K. Crystals were mounted using
nylon cryo-loop and in each case, passed through a solution of reservoir buffer modified
to include 15% glycerol as cryoprotectant prior to being subjected to flash-freezing in a
nitrogen stream. X-ray diffraction data was collected at The University of Western
Australia using a Rigaku RU-200 rotating anode Cu Kα source (40kV, 100mA;
Rigaku/MSC, TX, USA) equipped with osmic mirrors (Osmic, MI, USA) and a Mar345
image plate detector (Mar Research, Hamburg, Germany). Crystals were cryo-cooled
using a nitrogen cryostream (Oxford Cryosystems, Oxford, United Kingdom). Data
were processed by Matthew Wilce, including integration and scaling, using DENZO
and SCALEPACK (Otwinowski and Minor, 1997). Structure factor amplitudes were
calculated using TRUNCATE (Collaborative, 1994).
2.11.6 Structure solution and refinement
Structures were solved by molecular replacement using AMORE (Collaborative, 1994).
Cycles of manual model building and refinement were carried out with REFMAC
(Collaborative, 1994).
Chapter 2: Materials and Methods
66
2.12 Molecular dynamics simulations using NAMD
2.12.1 Modelling of αCP1-KH3 bound to poly (C) oligonucleotide
The αCP1-KH3 structure was superposed with the structure of Nova-2-KH3 bound to
RNA (accession number: 1EC6;) using LSQMAN. In this way the coordinates of
oligonucleotides could be extracted and used to generate an 8-nt poly (C) RNA docked
to the αCP1-KH3 structure (using the Insight II software package to change the bases to
cytosine). The structure was subjected to molecular dynamics simulations using NAMD
in a fully solvated box, with overall neutral charge (through the addition of randomLy
placed sodium ions). The complex structure was allowed to equilibrate in 106 fs time
steps using the CHARMM27 energy forcefield at 310 K and 1 atm using periodic
boundary conditions. This ensured there were no steric clashes in the final model and
allowed a full set of possible intermolecular interactions to be viewed. The
stereochemistry of the oligonucleotide and the intermolecular hydrogen bond formation
during the simulation were recorded at picosecond intervals for analysis.
Chapter 3 Protein Preparation
Chapter 3: Protein Preparation
67
3.1 Chapter overview We employed molecular biology techniques to overexpress the gene product for full-length
αCP1 as well as αCP1-KH1, KH2 and KH3, all as GST fusion proteins. In order to obtain
sufficient amounts of protein for biophysical studies, the overexpression and purification of
the full-length αCP1 and KH domains were optimized as described in Methods Section
2.3.2. The following Sections detail the procedures used to optimize the quality and
quantity of the final products.
In this chapter we aimed:
1) to obtain sufficient amounts of protein for biophysical studies using an E-coli
overexpression system,
2) to purify protein using affinity chromatography and
3) to use these purified proteins in structural and mRNA binding studies, in order to
better characterize and understand the interactions between αCP1, as well as its
isolated KH domains, and the respective poly (C) target sites (Chapters 5 and 6).
3.2 αCP1-KH Domain Boundaries
We had already successfully cloned the individual αCP1-KH domains into the pGEX-6P-2
plasmid as part of my Honours project. The domain boundaries were based on structural
sequence alignment of the αCP1 fragments with the solved KH3 domain of Nova-2 protein.
These constructs were used for protein overexpression experiments in order to obtain
milligrams of pure protein for further biophysical studies. The regions cloned are illustrated
schematically below (Figure 3.1).
Chapter 3: Protein Preparation
68
Figure 3.1: Schematic representation of the cloned αCP1-KH domain boundaries, amino acid sequence alignment and cartoon representation of secondary structures. (A) A schematic diagram of the domain structure of human αCP1 and the regions cloned from biophysical analysis. The domain boundaries were based on structural alignment with the Nova-2-KH3 domain. The numbers, or the domain boundaries, represent the amino acids at the start and the end of each KH domain. Note that the domain boundary of the cloned αCP1-KH2 was reduced from 97-182 to 97-150, for solubility reasons. (B) Amino acid sequence alignment of the cloned regions of αCP1-KH domains. Conserved residues are highlighted in red and yellow. The GXXG and the variable loop of the domains vary most between KH domains. (C) The αCP1-KH domain adopts a fold with a triple-stranded β-sheet held against a three-helix cluster in a βααββα configuration. The variable loop and the GXXG loop are colored purple and blue respectively.
B
C
A
Chapter 3: Protein Preparation
69
3.3 Protein Expression
In order to obtain sufficient protein for biophysical studies (i.e., for full length αCP1 and
fragments including αCP1-GST-KH1, KH2 and KH3), two to four liters of culture were
used for bacterial overexpression (see Methods Section 2.3.2). Soluble GST-fusion protein
was obtained after cell lysis, purified by affinity chromatography and cleaved from the GST
moiety with preScission protease. The protein obtained for each construct was typically
~90% pure but further purification was required for biophysical studies. This was achieved
using affinity and size-exclusion chromatography. The following sections describe the
optimized purification protocol adapted for the αCP1 constructs.
3.3.1 αCP1 expression and purification
Recombinant αCP1 full-length protein was prepared as a GST fusion protein using BL21
(Codon plus) cells containing the pGEX-6P-2 plasmid. Reasonable levels of protein
overexpression (Figure 3.2) and solubility were obtained by growing the culture at 37°C
(see Methods Section 2.3.3). The cultures grew to a final optical density of A600 3.5,
resulting in a mass pellet of 5 g/L.
In order to get maximum levels of soluble protein it was necessary to include 0.5%
TritonX-100 in the lysis buffer (PBS). Furthermore, αCP1 was found to be very unstable. A
degradation product was always present upon cleavage, regardless of the time spent
between the purification step and the storage at –80°C. The protein stored in 50% glycerol
at – 80°C did not only have an increased degradation product compared to freshly prepared
protein, but also proved inactive after approximately 3 months. The increased degradation
product was apparent on SDS-PAGE analysis and the inactivity of the protein was
ascertained by REMSA studies. Therefore, in an attempt to minimize this degradation, a
cocktail of protease inhibitors, DTT and EDTA was always included in the lysis buffer (see
Methods Section 2.4.2); however, the protease inhibitors were excluded at the size-
exclusion purification step. Even with these precautions, there was no significant reduction
in the amount of the degradation product (Figure 3.2B).
Chapter 3: Protein Preparation
70
A further step to minimize degradation was the addition of 5% glucose and 1% sodium
azide in the lysis buffer. This significantly reduced the degradation product to
approximately less than 10% as shown on the gel (Figure 3.2A, Lane 5).
Following cell lysis, the soluble fraction was subjected to glutathione agarose batch
purification. αCP1 fusion protein was then cleaved from the GST moiety using preScission
protease. The αCP1 was then eluted from the column, resulting in approximately 90%
cleavage of the fusion protein.
Figure 3.2: Overexpression and size-exclusion chromatography of αCP1. (A) SDS-PAGE analysis of GST-αCP1 overexpression, glutathione agarose batch purification and PreScission cleavage. Lane MW: Molecular weight markers, Lane 1: Whole cell lysate before induction, Lane 2: Whole cell lysate after induction, Lane 3: insoluble fraction after cell lysis, Lane 4: soluble fraction after cell lysis, Lane 5: Pure αCP1 (with the inclusion of of 5% glucose and 1% sodium azide in the lysis buffer), after size-exclusion chromatography, Lane 6: cleaved Protein after exposure to preScission protease. (B) SDS-PAGE analysis of αCP1 after size-exclusion chromatography. Lane MW: Molecular weight markers. Lane 1: αCP1 and the presence of degradation product.
αCP1 was separated from the contaminating GST, degradation product and any uncleaved
fusion protein using either anion exchange or size exlusion chromatography (Figure 3.3).
Both methods resulted in similar protein purity level. Fractions containing the αCP1 protein
were pooled and dialysed into 50 mM Tris (pH 8.2), 150 mM NaCl, 2 mM DTT and 2 mM
EDTA, which was the buffer used in subsequent REMSA studies (Appendix B). The
molecular mass of the purified αCP1 could not be confirmed by MALDI TOF mass
spectrometry, as the purity level of the protein posed technical problems. The expected
Chapter 3: Protein Preparation
71
mass from sequence analysis and SDS-PAGE analysis is 37.5 kDa. Approximately 1mL
(1mg/mL) of purified αCP1 protein was obtained from 4 L of culture.
Figure 3.3: Size-exclusion chromatography of αCP1. A typical size-exclusion chromatogram obtained during purification of �αCP1. After PreScission protease cleavage �αCP1 was purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using size-exclusion chromatography. αCP1 eluted at approximately 30 min and fractions were collected from the regions indicated by the arrows and confirmed on SDS-PAGE as indicated in Figure 2A lane 5.
3.3.2 αCP1-KH1 expression and purification
The αCP1-KH1 domain encoding residues 13-86 was expressed as a GST fusion protein in
BL21 (Codon plus) cells (see Methods Section 2.3.2). Overexpression at 37°C produced
high levels of soluble fusion protein (Figure 3.4A). Following cell lysis, the soluble
fraction was subjected to glutathione agarose batch purification. αCP1–KH1 fusion protein
was then cleaved from the GST moiety using preScission protease. The αCP1-KH1 was
then eluted from the column, resulting in approximately 90% cleavage of the fusion protein
(Figure 3.4B).
Chapter 3: Protein Preparation
72
Figure 3.4: Overexpression of GST-αCP1-KH1 (A) SDS-PAGE analysis of GST-αCP1-KH1 overexpression, glutathione agarose batch purification and PreScission cleavage. Lane 1: Whole cell lysate before induction, Lane 2: Whole cell lysate after induction, Lane 3: soluble fraction after cell lysis, Lane 4: insoluble fraction after cell lysis, Lane 5: GST-αCP1-KH1 bound to glutathione beads. (B) αCP1-KH1 cleaved from GST. Lane 1: Protein fraction and GST beads after elution from the glutathione agarose and preScission cleavage. Lane 2: The supernantant from preScission cleavage sample. Lane MW. Molecular weight markers. The αCP1-KH1 protein was separated from GST, GST-αCP1-KH1 and other contaminants
using cation exchange chromatography. When size-exclusion chromatography was used as
an alternative to cation exchange chromatography, similar purity levels resulted (data not
shown). Using the cation exchange chromatography, the αCP1-KH1 eluted from the
column using a 0–0.6 M sodium chloride gradient in 50 mM HEPES (pH 7.00), 2 mM DTT
and 2 mM EDTA at 25 minutes as shown in Figure 5A.
A B
Chapter 3: Protein Preparation
73
Figure 3.5: Cation exchange chromatography of αCP1-KH1 and SDS-PAGE analysis. After PreScission protease cleavage� αCP1-KH1 was also purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using cation exchange chromatography. (A) A typical cation exchange chromatogram obtained during purification of αCP1-KH1. αCP1-KH1 eluted at approximately 25 min, 40% salt concentration. Fractions were collected from the regions indicated with bold dashes (Peak 2). GST and GST-KH1 eluted at 10 min as flow through, not binding to the column (Peak 1) (B) SDS PAGE analysis of fractions collected during cation exchange chromatography. Lane MW: Molecular weight markers. Lane 1: flow through or unbound fraction (peak 1). Lane 2-4: αCP1-KH1 fractions collected from peak 2.
SDS-PAGE analysis showed (Figure 3.5B) that both GST and the GST fusion protein
eluted in the unbound fraction (peak 1), while the αCP1-KH1 domain eluted as one peak,
with a broad beginning and end (peak 2). Fractions collected from the sharpest part of peak
2 were pooled, dialysed into 50 mM Tris (pH 8.2), 150 mM NaCl and 2 mM DTT and 2
mM EDTA for use in both biophysical and structural studies. The broad ends of this peak
were discarded as they contained traces of contaminants not visible on SDS-PAGE (data
not shown). Approximately 1mL (5 mg/mL) of protein was obtained from a 4 L culture.
The molecular mass of αCP1-KH1 was verified using MALDI TOFF Mass Spectroscopy
(measured MW 8681; expected MW 8690).
A B
Chapter 3: Protein Preparation
74
3.3.3 αCP1-KH2 plasmid preparation, overexpression and purification
The αCP1-KH2 domain encoding residues 97 to 182 was expressed in BL21 (codon plus)
cells as a GST-fusion protein. Initial expression and solubility tests at 37°C showed that the
fusion protein overexpressed well, with a protein band present in the induced whole cell
lysate at the expected molecular weight (~32 kDa; Figure 3.6). However, the protein was
found almost exclusively in the insoluble fraction, even with the inclusion of 0.5%
TritonX-100 in the lysis buffer (Figure 3.6A). In our laboratory, 0.5% sodium cholate has
been found to improve and enhance protein solubility of other insoluble proteins. However,
addition of 0.5% sodium cholate to the αCP1-KH2 lysis buffer did not improve the
solubility of αCP1-KH2.
Furthermore, varying the IPTG concentration (ranging from 0.02 to 1mM) and reducing
post-induction temperatures to 30°C (Figure 3.6B) and lower (23°C) did not enhance
solubility.
Figure 3.6: SDS-PAGE analysis of GST-αCP1-KH2 overexpression at 37°C (A) and 30°C (B). (A) Lane 1: Whole cell lysate before induction. Lane 2: Whole cell lysate after induction. Lane 3: soluble fraction after cell lysis. Lane 4: Insoluble fraction after cell lysis. (B) Lane 1 and 2: Whole cell lysate before induction. Lane 3 and 4: Whole cell lysate after induction. Lane 5: soluble fraction after cell lysis. Lane 6: Insoluble fraction after cell lysis. Lane MW: Molecular weight markers.
A B
Chapter 3: Protein Preparation
75
3.3.4 αCP1-KH2 sequence analysis
The amino acid sequence of αCP1-KH domains are similar, with a number of conserved
residues (Figure 3.1B). Therefore, the poor solubility of αCP1-KH2 was unexpected,
especially when both αCP1-KH1 and KH3 are highly soluble. A detailed analysis of each
αCP1-KH domain using the sequence analysis tool Protparam in the expasy site
(www.expasy.com), gave an instability half-life index of 35.42 and 34.47 for αCP1-KH1
and αCP1-KH3 respectively, compared to an instability half-life index of 56 for αCP1-
KH2. The program predicted αCP1-KH1 and KH3 with these instability scores as stable
and αCP1-KH2 as unstable. The program bases the instability score on a comparison of the
protein sequence to other stable proteins. In an attempt to obtain a more stable αCP1-KH2
domain, approximately 30 residues from the C terminus were excluded from the domain;
this corresponded to removal of the third α helix in the domain (Figure 3.1B). The
truncated sequence gave an instability half-life of 34.47, and was thus predicted to be
stable. As a consequence the domain boundary of αCP1-KH2 was revised to amino acid
residue 97 to 150, instead of 97 to 180 (Figure 3.7).
Figure 3.7: The initial and truncated domain boundary of αCP1-KH2. (A) The initial αCP1-KH2 domain boundary based on structural sequence alignment is from amino acid residue 97 to 180. The highlighted red sequence is from the cloning vector. The truncated αCP1-KH2 domain boundary based on sequence stability comparisons was amino acid residue 97 to 150, indicated with an arrow.
Chapter 3: Protein Preparation
76
3.3.5 Plasmid preparation
Primers were designed for the amplification of the DNA encoding residues 97-150 of
αCP1-KH2. PCR amplification, using the pGEX-6P2 αCP1 plasmid, produced a single
DNA fragment (Figure 3.8A). A comparison of the bands with the DNA standard revealed
that the αCP1-KH2 fragment was amplified to its correct expected size and was free of
contaminants. The PCR product was then sub-cloned into the BamHI and EcoRI restriction
sites of the E. coli expression vector pGEX-6P-2 and transformed into XL1 Blue cells.
Diagnostic restriction endonuclease digests of plasmid DNA isolated from several of the
transformed colonies confirmed the presence of an insert of the correct size. The presence
and integrity of the insert was confirmed using DNA sequencing (Figure 3.8B).
Figure 3.8: Sub-cloning of αCP1-KH2. (A) A DNA fragment encoding residues 97 –150 of αCP1 (~ 159 bp) was amplified from the pGEX-6P-2 αCP1 plasmid. The yield of DNA produced was optimised by varying the MgCl2 concentration. Lane 1 and 2: 1.0 mM and 2.0 mM MgCl2. The location of the desired product is indicated by the arrow. (B) Diagnostic restriction endonuclease digests of plasmid DNA, isolated from a transformed colony, confirmed the presence of an insert of the correct size. Lane M: pGem markers, Lane 1: Representative restriction digest of pGEX-6P-2 using BamHI and EcoRI. Lane 2: PCR product. pGem markers were used (indicated by M).
3.3.6 αCP1-KH2 Expression and Purification
The pGEX-6P-2/αCP1-KH2 plasmid was transformed into BL21 (codon plus) cells. A
small scale expression and solubility trial was undertaken at 37°C. GST-αCP1-KH2 was
successfully overexpressed. This was shown by SDS-PAGE analysis (Figure 3.9) with the
appearance of the protein band at the expected molecular weight of insoluble GST-αCP1-
KH2 (~30 kDa) in the post-induction, whole cell extract. In addition, using PBS (pH 7.4), 2
A B
Chapter 3: Protein Preparation
77
mM EDTA, 2 mM DTT, 0.5% TritonX-100 and 5mM PMSF as the lysis buffer, more than
50% of the protein was found to partition into the soluble fraction from a visual inspection
of the gel (Figure 3.9).
Figure 3.9: Overexpression of GST-αCP1-KH2. (A) SDS PAGE analysis of GST-αCP1-KH2 overexpression, glutathione agarose batch purification and preScission cleavage. Lane MW: Molecular weight markers, Lane 1: Whole cell lysate before induction, Lane 2: Whole cell lysate after induction, Lane 3: insoluble fraction after cell lysis, Lane 4: soluble fraction after cell lysis.
Following cell lysis, the soluble fraction was subjected to glutathione agarose batch
purification. αCP1–KH2 fusion protein was then cleaved from the GST moiety using
preScission protease. The αCP1-KH2 was then eluted from the column, resulting in
approximately 90% cleavage of the fusion protein. (Figure 3.10A, lane 2).
GST-αCP1-KH2 was then subjected to size-exclusion chromatography to further purify it
from GST and other contaminants. The peak corresponding to αCP1-KH2 eluted at 40 min
using a flow rate of 0.4 mL/min, which was collected as fractions of 0.2 mL (Figure
3.10B). The fractions corresponding to αCP1-KH2 were pooled and dialysed into 50 mM
Tris pH 8.2, 150 mM NaCl and 2 mM DTT and EDTA. Approximately 0.3 mg/mL of
protein was obtained from a 4 L of culture. The molecular mass (M) of the purified protein
was confirmed using MALDI-TOF mass spectrometry (measured MW 5947; expected MW
5944).
Chapter 3: Protein Preparation
78
Figure 3.10: Affinity purification αCP1-KH2. (A) Cleaving GST from GST-αCP1-KH2. Lane MW: Molecular weight markers. Lane 1: cleaved size-exclusion purified αCP1-KH2. Lane 2: The protein fraction and the GST beads after elution from the glutathione agarose and preScission cleavage. (B) Size-exclusion chromatography of αCP1-KH2. After PreScission protease cleavage�, αCP1-KH2 was purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using size-exclusion chromatography. A typical size-exclusion chromatogram was obtained during purification of� αCP1-KH2. αCP1-KH2 eluted at approximately 40 min as indicated by peak 2 and fractions were collected from the regions indicated with the dashes. Peak 1 at ~20 min corresponded to the GST and uncleaved fusion protein.
3.3.7 αCP1-KH3 expression and purification
The αCP1-KH3 domain encoding residues 279-356 was expressed as a GST fusion protein
in BL21 (Codon plus) cells (Methods Section 2.3.3). Overexpression at 37°C produced
high levels of soluble fusion protein (Figure 3.11A). Following cell lysis, the soluble
fraction was subjected to glutathione agarose batch purification. αCP1–KH3 fusion protein
was then cleaved from the GST moiety using preScission protease. The αCP1-KH3 was
then eluted from the column, resulting in approximately 90% cleavage of the fusion protein
(Figure 3.11B).
A B
Chapter 3: Protein Preparation
79
Figure 3.11: SDS PAGE analysis of GST-αCP1-KH3 overexpression. (A) Lanes 1 and 2: Whole cell lysate after induction, Lane 3: Whole cell lysate before induction. (B) Cleaved αCP1-KH3. Lane 1: Protein fraction after elution from the glutathione agarose and preScission cleavage. Lane MW: Molecular weight markers. The αCP1-KH3 domain was separated from GST, GST-αCP1-KH3 and other contaminants
using size-exclusion. The peak (peak 2) corresponding to αCP1-KH3 started eluting from
30 min to approximately 40 min using a flow rate of 0.4 mL/min, and was collected as
fractions of 0.2 mL (Figure 3.12). The fractions corresponding to αCP1-KH3 at the
sharpest part of the peak were pooled and dialysed into 50 mM Tris pH 8.2, 150 mM NaCl
and 2 mM DTT and EDTA. The broad end at the beginning of the peak contained GST
contaminant (data not shown) and was discarded. Approximately 5 mg/mL of protein was
obtained from a 4 L of culture. The molecular mass (M) of the purified protein was
confirmed using MALDI-TOF mass spectrometry (measured MW 8525; expected MW
8552). In crystallization the flexible regions, especially the ends of C- and N-terminus are
not often seen and as a consequence this may be the reason for the molecular weight
discrepancy.
A B
Chapter 3: Protein Preparation
80
Figure 3.12: Size-exclusion chromatography of αCP1-KH3 and SDS-PAGE analysis. (A) After preScission protease cleavage, αCP1-KH3 was purified from the GST, uncleaved fusion protein and any remaining bacterial contaminants using size-exclusion chromatography. A typical size-exclusion chromatogram obtained was during purification of �αCP1-KH3. αCP1-KH3 eluted at approximately 35 to 40 min (peak 2). Peak 1 corresponds to GST and other contaminants. (B) SDS-page analysis of αCP1-KH3. Lane 1: Molecular weight markers. Lane 2: peak 2 from size-exclusion corresponding to pure αCP1-KH3
3.3.8 Combined αCP1-KH domains 1 and 2, 2 and 3
Studying isolated αCP1-KH domains can provide valuable information on the role of each
individual domain. However, a detailed understanding of their role in the context of the
full-length protein cannot be gained from just the studies of the isolated KH domains. To
look at the role of the combined domains, we also successfully cloned αCP1-KH1/KH2 and
αCP1-KH2/KH3. They also expressed at high levels but, unfortunately, upon cell lysis the
protein fractionated in the insoluble fraction (data not shown).
Primary sequence analysis of the these combined domains using the Expasy Protparam tool
revealed the constructs as unstable, giving an instability score of 47.72 and 45.75 for αCP1-
KH1/KH2 and αCP1-KH2/KH3 respectively. However, the removal of ~ 30 amino acid
residues from the C terminus of the αCP1-KH2 domain in these constructs predicted a
stable protein with instability scores of 34.18 and 32.13 for αCP1-KH1/KH2 and αCP1-
KH2/KH3 respectively.
A B
Chapter 3: Protein Preparation
81
The cloning of the truncated constructs was only initially attempted for the αCP1-
KH1/KH2 domain. The PCR amplification step was not successful. However, in the future
optimization of the PCR reaction may lead to successful amplification. Due to time
constraints, the cloning expression of these domains were not pursued further.
3.4 Circular dicroism and confirmation of correct recombinant protein
folding When a recombinant protein is produced, it is important to determine whether it has been
correctly folded. Circular dichroism spectropolarimetry is a useful biophysical technique.
The two informative regions in the CD spectrum are the far UV (below 250 nm), where the
peptide contribution dominate and the near UV (250-300 nm), where the aromatic side
chains dominate. α−helices and β−sheets are the two most common secondary structures in
protein. The alpha-helical specturm is characterised by two negative bands at 208 and 222
nm and a positive band at 192 nm. The CD spectrum of a typical β sheet has a negative
band at 215 and a positive band near 198 nm.
The three αCP1-KH constructs were each subjected to CD spectropolarimetric analysis as
described in Methods Section 2.7. Each domain produced a spectrum typical of a protein
mainly consisting of alpha-helical and beta-sheet secondary structure which is indicated by
a negative peak in the 208 to 222 nm region (Figure 3.13 (A).
Chapter 3: Protein Preparation
82
Figure 3.13: Circular dichroism spectra of the KH1, KH2 and KH3 at 50 mM Tris buffer pH 8.00, 150 mM NaCl, 1mM EDTA and 1mM DTT. (A) The parameters used are as follows; band width - 1 nm, response - 2 seconds, sensitivity – Standard, measurement range - 300 - 200 nm, data pitch - 0.2 nm, scanning speed - 50 nm/min, accumulation – 100 scans At 20 ˚C. The spectra depict a negative peak at around 210 to 222 nm indicative of alpha-helical and beta-sheet secondary structured protein. Predicted circular dichroism spectrum of the KH1, KH2 and KH3 using the K2D site. (B) αCP1-KH1 spectrum based on the predicted α-helix and β-sheet percentages in the domain.αCP1-KH2 spectrum based on the predicted α-helix and β-sheet percentages in the domain, however missing the last helix in the domain. Note the missing helix from the C-terminus of αCP1-KH2 domain does not affect the fold of the protein as both domains generate similar profiles.
A
B
Chapter 3: Protein Preparation
83
The expected percentages of α-helix and β-sheet for the three αCP1-KH domains are
indicated in Table 1. This was simply calculated by adding the total number of residues
forming β-sheets or α-helix and then dividing by the total number of amino acid residues in
the domain (Figure 3.1B).
The α-helix and β-sheet percentages were used to generate a predicted CD spectrum for a
correctly folded αCP1-KH domain using K2D site (http://www.embl-
heidelberg.de/~andrade/k2d/). This was done in order to compare the fold of αCP1-KH2.
The cloned construct of αCP1-KH2 that resulted in soluble protein was excluding the third
helix (Figure 3.1B), compared to a complete αCP1-KH domain. The spectra generated are
shown in Figure 3.13 (B). The shape of the two spectra is the same, but the intensity is not.
However, it shows that αCP1-KH2, although missing a helix is folded correctly.
αCP1-KH Domains α-helix (%) β-sheet %
αCP1-KH1 0.49 0.28
αCP1-KH2 0.32 0.36
αCP1-KH3 0.45 0.26
Table 1: The expected percentage of α helix and β sheet for the three αCP1-KH domains
Chapter 3: Protein Preparation
84
3.5 Conclusions Recombinant protein is often used in laboratories when studying biological problems.
However, this is not always readily achievable due to problems faced in obtaining sufficient
quantities of soluble protein. In the current study, the main difficulty encountered was
protein aggregation, leading to insolubility, and protein instability. Protein insolubility in
αCP1-KH2 was solved, by truncating the domain. This was based on amino acid sequence
analysis, which revealed that the third helix at the C-terminus of the domain made the
domain unstable. Upon removal of the helix and re-cloning the domain, soluble protein was
readily achieved. In addition, other alternative methods not mentioned here were also tried
to increase protein solubility. These included protein overexpression at lower temperatures
than the normal 37°C and also a variation of IPTG concentrations. Protein instability in the
case of αCP1 full length was to some extent reduced with the addition of glucose and
sodium azide. Sodium azide is often used in protein purification to prevent bacterial growth
and protein degradation. N terminal sequencing was conducted for αCP1 to identify the
degradation product. It was highlighted, that the degradation site may reside in the KH2
domain. This could in the future be tested further by mutation of the appropriate residues.
Successful overexpression and purification of the various constructs made possible the
various biophysical and functional characterizations of αCP1 and the KH domains.
Chapter 4 Structural and NMR studies of
αCP1-KH3
Chapter 4: Structural and NMR studies of αCP1-KH3
85
4.1 Chapter overview
After the successful cloning, expression and purification of αCP1-KH domains, we aimed
to use the milligram quantities of pure protein in our biophysical studies. One of the initial
approaches was using structural methods. Therefore, in this chapter I used X-ray
crystallography and NMR to examine the structure and dynamics of αCP1-KH3 with its
target oligonucleotide.
In this way, I aimed to:
1) characterize the structural features underlying poly (C) binding specificity of αCP1-
KH3,
2) illustrate the three-dimensional (3-D) structure of αCP1-KH3 on a computer screen
and develop insight into how RNA and DNA targets fit in the protein binding site,
3) use molecular dynamics simulation to compare the αCP1-KH3 RNA binding
specificity, using the complex structure of Nova-2-KH3 bound to RNA as a
comparative model and
4) better characterize the solution properties of the αCP1-KH3 domain, using NMR
spectra both in the absence and presence of oligonucleotide.
4.2 Why Crystallography? The 3-D structure of molecules can greatly assist our understanding of their biological
function at the molecular level. The structure can provide information on how molecules
associate and interact, and how enzymatic reactions occur. Most importantly, molecular
structures can be exploited to help in the development of new therapeutic agents and drugs.
Information on both the fold and the atomic bonding of the molecule can be correctly
obtained from structural data. The great advantage of crystallography is that a molecule of
any size can be studied, unlike NMR which can only be used for molecules of less than
~100 kDa and more so for molecules less than ~30 kDa. However, using crystallography a
good diffracting quality crystal is required and this can be a challenging process. Also, the
structural information only provides a snapshot of the molecule. However, combining data
Chapter 4: Structural and NMR studies of αCP1-KH3
86
obtained from both NMR and X-ray studies can be highly informative about the core
regions of the molecule.
4.2.1 Crystallography
Visible light has a wavelength in the range of hundreds of nanometers while atomic
distances are in the order of 0.1 nm or 1 Å. X-rays from the electromagnetic spectrum fall
in the correct range of wavelength. Unfortunately, an X-ray microscope cannot be built
because unlike visible light, there is no known way to focus X-rays with a lens. Therefore,
the approach used involves the crystals being bombarded with a focused X-ray energy
source, the atoms cause scattering of X-rays, leading to a diffraction pattern from which
structural information is obtained (George and Lyle, 1989).
Figure 4.1: Crystals are used to diffract X-rays, resulting in a diffraction pattern. The diffraction pattern is processed using computer programs to solve the three dimensional structure of the protein.
4.2.2 Crystals and the unit cell
The first and most important step in X-ray crystallography is growing high-quality crystals.
Crystals are ordered three-dimensional structures that consist of repeating identical unit
cells. The unit cell is the smallest part of the crystal and a repeated array is representative of
a complete crystal. The repeated unit cells are important in X-ray diffraction. The
diffraction of a single unit cell is not significant but the repeated unit cells amplify the
diffraction signal, which can then be used for data analysis. A unit cell has dimensions,
including three edge lengths a, b, c and three angles alpha, beta, and gamma. Within the
unit cell, the position of the atoms are presented as their x, y, z Cartesian Coordinates
(George and Lyle, 1989).
Chapter 4: Structural and NMR studies of αCP1-KH3
87
4.2.3 Crystal Growth
Crystallization of proteins involves controlled precipitation of the protein sample. However,
precipitation does not always give crystals. This is because most often precipitation does
not involve the regular arrangement of the protein molecules into a crystal. Unfortunately
there is no way of confidently predicting protein crystal formation. It is basically a trial and
error approach. However, there are a number of parameters that need to be considered.
Anything that is likely to denature the protein, e.g., high and low pH, very low salt
concentrations and any other known conditions that may lead to the disruption of a complex
and to aggregation must be avoided (Nick, 1970).
The ionic strength of the protein sample plays a significant role in its solubility, which in
turn depends on the concentration and nature of the salt. Protein solubility is highly
influenced by the pH of the solution. Proteins are marginally soluble near their isoelectric
point (pI) because net charge is neutralized and electrostatic repulsion is minimized. At low
ionic strength the addition of salt promotes protein solubility by favorable interactions with
amino acid residues Arg, Lys, Asp and His. This process is known as “salting in”.
Therefore reversing this process can promote protein precipitation. At high ionic strength,
increased salt concentration decreases solubility (“salting out”) essentially by competition
for water. Surface charges are also modified, reducing intermolecular repulsions. Many
proteins are crystallized by this method (Alan and Phylis, 1960).
There are also a number of compounds that can be added to the protein crystallization
solution to alter protein solubility, leading to crystallization. Polyethylene glycol (PEG) is
used as a precipitant in crystallization conditions. PEG precipitates out proteins via size-
exclusion and competition for water. PEG’s of molecular weight 400-20,000 are often used,
typically in the 10-20% (w/v) range. Organic solvents ethanol, iso-propanol, tert-butanol
and 2-methyl-2,4-pentane-diol (MPD) precipitate proteins by disturbing the dielectric
constant of water and reducing ionic shielding. Temperature is also a determinant of
protein solubility. There is often a large difference in crystallization behavior by changing
the temperature, eg., from 22°C to 4°C.
Chapter 4: Structural and NMR studies of αCP1-KH3
88
Crystallization can take between a day to months and years. For small and often simple
molecules such as salt, crystals are often obtained easily by slow alteration of the solution
conditions. However, it is often very hard to obtain protein crystals, due to their larger size
and complexity. There are a number of methods for growing protein crystals, including
microseeding and macroseeding. I utilized vapor diffusion and this is probably the most
common way of crystal growth. A drop of protein solution is suspended over a reservoir
containing buffer and precipitant. Water diffuses from the drop to the solution leaving the
drop, the protein becomes supersaturated and crystal nuclei form, leading to crystal growth.
Typically, hundreds or thousands of conditions are screened before a suitable condition is
found that leads to high quality crystals. Many protein samples may not crystallise
successfully. Imperfections in the crystal structure, caused by impurities in the protein
sample, can hinder the acquisition of high resolution data.
Proteins are crystallized on such a small scale that it is sometimes difficult to reproduce the
conditions accurately. This makes crystallizing proteins almost more of an art than a
science, and sometimes multiple methods are tried before crystals of the required size are
grown. They can form in many different shapes, from perfect diamonds to sharp needles.
Chapter 4: Structural and NMR studies of αCP1-KH3
89
Figure 4.2: (A) Schematic of the hanging drop vapour diffusion. This method involves suspending a drop that is a mixture of protein solution and precipitant solution over the well that contains precipitant solution. The cover slip is sealed on the well with a ring of vaseline. Volatile components (e.g. water, alcohols, ammonia, acetate) evaporate from the drop toward the buffer reservoir in the well due to the higher concentration of precipitant in the well and as result the protein concentration in the drop increases and very slowly crystals may form. (B) Crystal growth pathway: Crystal growth and nucleation occurs beyond the saturation point. In a hanging drop experiment, initially nucleation takes place to form a few crystal nuclei. Upon the formation of a few nuclei, the protein concentration drops to the crystal growth region, where crystal growth will take place. The time spent in the nucleation region is very important. Spending too much time may either result in precipitation or too many small crystals. Spending too little time or no time will result in no crystals.
4.2.4 X-ray Diffraction
Once a crystal is generated, it is exposed to a narrow beam of X-rays. Prior to X-ray
diffraction analysis, the crystals are often cryocooled with liquid nitrogen. Cryocooling of
the crystal protects and decreases radiation damage to it during data collection and
decreases thermal motion within the crystal, giving rise to better diffraction limits and
higher quality data. The electron clouds of the atoms in the crystal diffract the X-rays. The
diffraction pattern provides information on how the protein molecule is arranged inside the
crystal and about the structure of each protein molecule. This information is extracted from
the direction and brightness of the scattered rays. The diffraction pattern is then converted
into an electron density map using mathematical Fourier transform. These maps depict
contour lines of electron density. Since electrons surround atoms, it is possible to show
where atoms are located. The crystal is rotated while exposed to X-rays, to obtain a three-
dimensional picture and a computerized detector records two dimensional electron density
A B
Chapter 4: Structural and NMR studies of αCP1-KH3
90
maps for each angle of rotation. The third dimension comes from comparing the rotation of
the crystal with the series of images. Computer programs use this method to generate three-
dimensional spatial coordinates.
The location of each spot in the diffraction pattern is determined by the size and shape of
the unit cell and the inherent symmetry present in the crystal. The intensity of the
diffraction spot is proportional to the square of the structure factor amplitude. The structure
factor of each diffraction spot contains information relating to both the amplitude and phase
of a wave. The phases are not given in the diffraction pattern. The phases must be solved in
order to get an interpretable electron density map. This is known as the phase problem. A
number of ways can be used to obtain phase information. One way is molecular
replacement. The structure of a homologous protein can be used as a search model and then
molecular replacement can be used to solve the structure, if a structure of a related protein
exists. The related structure is used as a search model and then molecular replacement is
used to identify the orientation and position of the protein of interest within the unit cell.
The phases determined via this method are then used to generate an electron density map
into which an initial model can be built.
The electron density is a blurry representation of where the atoms are inside the protein.
From the sequence of the protein the order of the amino acids is known and then using 3-D
computer graphics programs such as O, the density is interpreted. The protein is built in
stages. Initially the backbone or the overall fold of the protein is assembled and then the
amino acid side chains are added, which produces an atomic structure. The model is further
refined, to generate refined cartesian coordinates of atoms and B factors. B factors relate to
the thermal motion of the atom. The process of refinement is repeated a number of times, in
order to get the best fit to the diffraction data. Each refinement is aimed to generate a more
accurate electron density map. The model is then revised until there is a very close
correlation between the diffraction data and model. The standard crystallographic R-factor
is a measure of the quality of the atomic model. The R value is the average fractional error
in the calculated amplitude compared to the observed amplitude. Although there are a
number of other factors involved, a good structure has an R value in the range of 15 to
25%.
Chapter 4: Structural and NMR studies of αCP1-KH3
91
4.3 What is NMR? NMR is another technique used for solving molecular structures. NMR also permits the
observation of the physical flexibility of proteins and the dynamics of their interactions
with other molecules. NMR experiments detect signals from the nuclei of the atoms and not
the electrons. In the magnetic field of the NMR spectrometer the nuclei of atoms act as
small magnets, which align their poles with the poles of the larger magnet. These small
magnets possess a resonance frequency that can be detected after perturbation by radio
waves at their resonance frequency. The radio waves act to make the nuclear magnetic
moments process (“wobble”) with coherence and to shift their overall magnet moment out
of alignment with the large magnetic field of the NMR machine magnet, thus giving rise to
the NMR signal. Each nucleus gives rise to an individual signal frequency depending on its
unique electronic environment, allowing individual signals representing particular nuclei to
be observed.
4.3.1 Protein NMR
Protein NMR spectroscopy is conducted on purified aqueous samples of the protein of
interest. The sample consists of ~300 to 500 μl of protein with a concentration of 0.1 to 0.3
mM. In the current study, the protein was recombinantly expressed, which is often easier to
make and obtain in sufficient quantities and additionally, it allows for the production of
isotopically labeled protein.
In protein molecules, the most commonly found isotopes of carbon and oxygen are 12 and
16 respectively. They are not useful for NMR as they do not posses nuclear spin which is
the physical property necessary for the NMR signal. While 14N is the most abundant
isotope of nitrogen, which does posses nuclear spin, it has a large quadrupolar moment,
which hinders generating high resolution information. As a consequence, for proteins
prepared from natural sources, we are limited to obtaining nuclear magnetic data only from
their protons. However, recombinant proteins can be isotopically labeled with the less
naturally found isotopes, 13C and 15N which are preferred for NMR experiments. To
achieve this, the protein is prepared in media containing 13C-glucose and 15N-ammonium
chloride.
Chapter 4: Structural and NMR studies of αCP1-KH3
92
Protein structural studies involve the use of multidimensional NMR experiments. Each
distinct nucleus in the molecule ideally is in a distinct environment and hence gives rise to a
distinct NMR signal. The multidimensionality of the experiment allows detection of signals
between two different connecting nuclei. The NMR signals in a protein sample arise either
from transfer of magnetization through chemical bonds (COSY type experiment) or
through space independent of the bonding structure (NOSEY type experiment). The latter
type of experiment has provided the traditional method of obtaining structural information
from NMR experiments A single multidimensional NMR experiment on a protein sample
may take several hours or even days, depending on the concentration of the sample, on the
magnetic field of the spectrometer and on the type of the experiment (Shuker et al., 1996).
In protein NMR spectroscopy a particularly useful spectrum is the HSQC, which stands for
Heteronuclear Single Quantum Correlation. The two axes in the spectrum are of a proton
and a heteronucleus axis, which is most often of 13Carbon and 15Nitrogen. The signals or
peaks in the spectrum arise for each 1H covalently attached to the heteronucleus. The 15N
HSQC experiment is probably the most frequently performed experiment in protein NMR.
Each amino acid (except proline and the N terminal residues) in the protein has amide
protons attached to the nitrogen in the peptide bond. A correctly folded protein will give
rise to well dispersed peaks and most of the individual peaks can be distinguished. The
number of peaks present should correspond to the number of residues present minus
prolines and the N-terminal residues, plus the signals due to amino acid side chains that
contain nitrogen bound protons. Assignment of the peaks corresponding to the individual
residues is not possible from just a single HSQC spectrum. A number of other experiments
are required in the assignment, which will not be discussed here. However, the HSQC
experiment is particularly useful in observing interactions with ligands, as in the case of
proteins and oligonucletides. Upon interaction of the ligand with the protein, the signals
corresponding to the protein may move, broaden or disappear. By comparing the HSQC of
the free protein with the one bound to ligand, peak purturbations can reveal the residues
affected at the binding interface or restructured due to binding. In the current study, this
method was used to analyse the interactions between �CP1-KH domain with RNA.
Chapter 4: Structural and NMR studies of αCP1-KH3
93
4.4 Results 4.4.1 Crystallization of αCP1-KH3
Crystals of αCP1-KH3 were grown using vapor diffusion as described in the Methods
section 2.11.2. Crystals typically grew in 2 days to dimensions of ~ 0.3 × 0.2 × 0.02 mm
with the outline of a rugby football, and diffraction data were collected to 2.1 Å resolution
(Figure 4.3).
Figure 4.3: Crystals of αCP1-KH3. αCP1-KH3 crystals were grown in 0.1 M HEPES pH 7.5 and 1.5 mM lithium sulfate using vapour diffusion hanging drop method
4.4.2 X-ray data collection
Data were recorded with a Rigaku R-Axis V imaging plate detector as described in the
Methods section 2.11.5. Data were integrated and scaled with DENZO and SCALEPACK
(Otwinowski and Minor, 1997). Structure factor amplitudes were calculated using
TRUNCATE (Collaborative, 1994). The data collection statistics are given in Table 1.
Chapter 4: Structural and NMR studies of αCP1-KH3
94
Table 4.1: Data collection and refinement statistics
Data collection Symmetry P21212 Unit cell (Å) a = 33.4
Resolution range (Å) 35.0–2.1 Rcryst (%) 21.4 Rfree (%) 25.4 r.m.s. deviation from ideal values
Bond length (Å) 0.012 Bond angle (°) 1.3
Average temperature factor (Å 2) 26.1 Number of water molecules 55
Values in parentheses are for the last resolution shell (2.16–2.1 Å).
Rmerge = Σ|I − <I>|/<I> where I is the observed diffraction intensity and <I> is the average diffraction intensity from several measurements of one reflection.
Rcryst = Σ|Fo| − |Fc|/|Σ|Fo| where |F0| and |Fc| are the observed and calculated structure factors, respectively.
4.4.3 Structure solution and refinement
The structure of αCP1-KH3 was solved by molecular replacement using the coordinates of
the Nova-2 KH3 RNA-binding domain (accession no. 1EC6) as the search model as
implemented in AMORE (Collaborative, 1994). With one molecule of αCP1-KH3 per
asymmetric unit, the estimated solvent content of the crystals is 39%. Matthew's
Coefficient was calculated as 2.0, which is within the normal range of proteins (Matthews,
Chapter 4: Structural and NMR studies of αCP1-KH3
95
1977). Success with molecular replacement was achieved using space group P21212, which
was also consistent with observed systematic absences. Molecular replacement with other
primitive orthorhombic space groups was not successful. Cycles of manual model building
and refinement were carried out using REFMAC (Collaborative, 1994). A total of 10% of
the reflections were used for Rfree calculations. The final model, containing 74 amino acid
residues of the αCP1-KH3 construct and 55 water molecules, has a crystallographic Rcryst of
21.4% (Rfree = 25.4%) with 96% of all amino acids within the most favorable region of a
Ramachandran plot. All residues were visible in the electron density map except the N-
terminal glycine and the C-terminal eight residues (SEKGMGCS) present in the construct,
as confirmed by mass spectrometry (measured MW 8525; expected MW 8552.72). The
final model has been deposited with the Worldwide Protein Data Bank (accession no.
1WVN).
4.4.4 Structural overview
The αCP1-KH3 adopts a classic KH type I domain fold (Grishin, 2001), with a triple-
stranded β-sheet held against a three-helix cluster in a βααββα configuration (Figure 4.4A).
The β-sheet is anti-parallel and displays the usual left-handed twist. From its inner surface
emanate numerous hydrophobic residues, which contribute both to the hydrophobic core
and the oligonucleotide binding cleft. The bundle of three amphipathic helices provides the
complementary hydrophobic surface within this compact motif. The N-terminal four
residues in the model (PLGS) are not shown. These residues, which are not part of the
αCP1 sequence but present due to cloning procedures, adopt a random coil structure.
Chapter 4: Structural and NMR studies of αCP1-KH3
96
Figure 4.4: The crystal structure of αCP1-KH3 (residues 279–356) solved to 2.1 Å resolution depicted in (A) cartoon form and (B) as a molecular surface in the same orientation. The structure is shown from the beginning of β-strand 1 to the end of α-helix 3, since the regions outside these bounds were random coil or not visible in the density. The GXXG motif, common to this oligonucleotide-binding motif, is colored blue. The ‘variable loop’ region between β-sheets 2 and 3 is colored pink. These regions bound the hydrophobic oligonucleotide-binding cleft that accommodates C-rich RNA or ssDNA. (C) The electrostatic potential emanating from the αCP1-KH3 structure calculated using the APBS software package (http://agave.wustl.edu/apbs/). Potential contours are shown at +1 kT/e (blue) and −1 kT/e (red) and obtained by solution of the linearized Poisson–Boltzmann equation at 150 mM ionic strength with a solute dielectric of 2 and a solvent dielectric of 78.5. The blue contour represents striking positive potential directing oligonucleotides to the binding cleft. (D) Stereo views of KH3.
D
Chapter 4: Structural and NMR studies of αCP1-KH3
97
β-strand 1 commences with the first native amino acid residue, Gln5. This strand extends
the length of the molecule and projects residues Leu10 and Ile12 into the hydrophobic core,
before breaking into a turn at Pro13. α-helix 1 is held in position through its hydrophobic
face (including Leu16, Ile17, Ile20 and Ile21) before its structure is interrupted by the
invariant GXXG sequence (Figure 4.4B, blue) that is essential to the KH domain
oligonucleotide-binding site. In the case of αCP1-KH3, where RQ fills the XX positions,
these side chains are projected outward and provide a hydrophobic edge of the
oligonucleotide-binding cleft. Numerous hydrophobic side chains also emanate from α-
helix 2 to form contacts with the inner face of the β-sheet, and to provide a hydrophobic
environment for oligonucleotide binding (Ile28, Ile31 and the aliphatic chain of Arg32).
Gly36 facilitates a break from helical secondary structure and the remaining two strands of
the β-sheet follow. They provide hydrophobic core residues Ile39, Ile41, Arg51, Val53 and
Ile55.
β-strands 2 and 3 are separated by the ‘variable loop’ (Figure 4.4B, purple), which bulges
slightly away from the β-sheet and forms the opposing edge of the narrow oligonucleotide-
binding cleft. This is the region of the greatest sequence variability between KH domains.
The C-terminal helix extends the length of the main body of the molecule with residues
Ile62, Leu68, Ile69, Arg72 and Leu73 projected into the hydrophobic core or towards
adjacent α-helix 2. α-helix 3 is not visible over the last six residues, due to high mobility.
Chapter 4: Structural and NMR studies of αCP1-KH3
98
4.4.5 The oligonucleotide-binding cleft
The oligonucleotide-binding site has long been supposed to involve the GXXG motif. This
has been confirmed through the recent structural analysis of four KH domains in the
presence of oligonucleotide. These include Nova-2-KH3 in the presence of a 20 base loop
of RNA (Lewis HA, 1999), hnRNPK-KH3 solved with a 10 base stretch of ssDNA
(Braddock et al., 2002a), KH3/4 domains of FBP solved in the presence of a 29 base
ssDNA (Braddock et al., 2002b), hnRNPK-KH3/DNA (Backe et al., 2005) and αCP2-
KH1/DNA (Du et al., 2005). In each of these cases, the main oligonucleotide contacts have
been made with the narrow hydrophobic cleft that runs between α-helix 2 and β-sheet 2 and
across the GXXG motif. It is thought that the narrowness of the cleft confers the specificity
of these KH domains for pyrimidines. Likewise, αCP1-KH3 possesses a narrow
hydrophobic cleft that would be expected to accommodate pyrimidine-rich RNA or ssDNA.
The edges of the cleft are polar and charged with basic side chains (Arg23, Arg 32, Lys40
and Arg51) providing attractive electrostatic forces for both the docking of the
oligonucleotide as well as making specific contacts with the oligonucleotide (see further
discussion below). The electrostatic potential emanating from αCP1-KH3 was calculated
using the Adaptive Poisson–Boltzmann Solution (APBS) software package
(http://agave.wustl.edu/apbs/) (Baker et al., 2001; Bank and Holst, 2003; Holst, 2001; Holst
and Saied, 1993; Holst and Saied, 1995) and is shown in Figure 4.4C. The contours
represent a numerical solution to the Poisson–Boltzmann equation (Davis and McCammon,
1990; Honig and Nicholls, 1995) and simulate the sum total of the electrostatic potential of
the molecule in salty aqueous media. The outstanding feature of the calculation is the
positive potential arising precisely from the oligonucleotide-binding cleft (blue contour).
This positive potential would provide an attractive force for the approach of the
oligonucleotide since its potential is dominated by the electronegative phosphate backbone.
4.4.6 Comparison with other KH domain structures
αCP1-KH3 shows high structural similarity to other type I KH domains. The seven most
similar KH structures, including hnRNP K (Baber et al., 1999; Braddock et al., 2002a)
Nova-2-KH3 and Nova-1-KH3 (Lewis HA, 1999; Lewis et al., 2000), FBP-KH3 and FBP-
KH4 (Braddock et al., 2002b), vigilin-KH6 (Musco et al., 1996) and FMR-KH1
Chapter 4: Structural and NMR studies of αCP1-KH3
99
(Musco et al., 1997), are shown superimposed in Figure 4.5A (in the case of NMR-derived
structures, the first chain in the PDB file is depicted). Their backbone traces are highly
convergent with pairwise root-mean-square deviation (RMSD) scores compared with
αCP1-KH3 over the matched regions (according to LSQMAN) <1.8 Å. Vigilin-KH6 and
FMR-KH1 show the greatest deviations, with several stretches of backbone fold unmatched
to regions within αCP1-KH3 (>3.5 Å away). These include the variable loop and the region
about the GXXG motif, which are also the regions that show the least definition in the
NMR-derived structures.
Figure 4.5B shows the deviations numerically, with α-carbon distances from matched
αCP1-KH3 residues plotted against the αCP1-KH3 residue number. The divergent regions
are shown as off-scale in this plot. The KH structures are superimposed with most α-carbon
atoms within 2 Å of the corresponding αCP1-KH3 atom. Apart from Vigilin-KH6 and
FMR-KH1, greater deviations only occur at the termini and variable loop region between β-
sheets 2 and 3. A subtle variation also occurs at the GXXG motif possibly reflecting the
inherent flexibility of the glycines. It is remarkable that these KH domains retain such high
structural similarity and yet possess distinct oligonucleotide-binding preferences.
Chapter 4: Structural and NMR studies of αCP1-KH3
100
Figure 4.5: Comparison of KH domain structures. (A) Backbone trace of αCP1-KH3 (grey) shown in stereo superimposed with those of other KH domain structures as listed. These include KH domain structures both in the absence and presence of bound oligonucleotide. (B) The α-carbon deviation for each KH domain residue from the corresponding aligned residue of αCP1-KH3 is plotted versus the αCP1-KH3 residue number. Amino acids >3.5 Å, or with no corresponding aligned residue, are indicated with an off-scale score (>5 Å). αCP1-KH3 shows the greatest structural similarity to its fellow poly (C) binding family
member, hnRNP K, with an RMSD of 0.63 Å. A structure-based sequence alignment of
these KH domains with the others serves to highlight the conservation of residues
reportedly underlying oligonucleotide binding (Figure 4.6). In particular, residues about the
GXXG motif as well as those in the β-strand 2 provide the main contact surface. Of these,
Ile 20, Ile 21, Ile28 and Ile 41 are highly conserved as bulky hydrophobic residues, and
Gly18, Gly22 and Gly25 are integral to the oligonucleotide-binding motif. Basic residues
Arg23 and Arg51 have also been shown to be involved in the oligonucleotide-binding
interaction and basic residues are retained at these positions except in Vigilin-KH6 and
FMR-KH1.
Chapter 4: Structural and NMR studies of αCP1-KH3
101
Figure 4.6: Structure-based sequence alignment of seven KH domains of high structural similarity to αCP1-KH3. Each KH domain was structurally aligned using LSQMAN against αCP1-KH3. Amino acid residues with α-carbon positions within 3.5 Å of a corresponding αCP1-KH3 residue are shown in black. Highlighted in purple are the amino acid residues that do not align well with residues of αCP1-KH3. Secondary structural elements, as defined in Lewis et al., are shown above the corresponding sequence in cartoon form. Parenthesized numbers represent the amino acid numbers at the start and finish of the superimposed core region for each structure, and indicate the extent of the structure used to calculate sequence identity with αCP1-KH3 (final column). The GXXG motif and the variable loop regions are blocked with grey. Amino acid residues reported to make contact with the oligonucleotide, in the cases of structures determined in complex with either RNA or ssDNA is highlighted in red, and the αCP1-KH3 predicted to make contact with oligonucleotide in the current study is highlighted in tan. NMR structures were structurally aligned on the basis of the first chain in the deposited PDB coordinate file and all were deemed to be representative of the set of structures.
4.5 Model of αCP1-KH3 bound to poly (C) oligonucleotide The high degree of similarity of αCP1-KH3 to Nova-2-KH3 has permitted its interaction
with poly (C) RNA to be modeled. Nova-2-KH3 has been structurally characterized,
complexed with a 20 base stem–loop RNA (Lewis et al., 2000) as well as in its
uncomplexed forms (Lewis HA, 1999). Oligonucleotide binding incurred no significant
structural differences in the backbone conformation, suggesting that the αCP1-KH3
structure may also represent a close approximation of its oligonucleotide bound form.
Poly (C) RNA was therefore positioned in the binding cleft of αCP1-KH3 by analogy to
this structure to help predict interactions that may underlie its poly (C) binding specificity.
Chapter 4: Structural and NMR studies of αCP1-KH3
102
The poly (C) RNA is positioned along the hydrophobic cleft and across the GXXG motif
with four bases making most of the contacts with the binding site. The orientation of the
oligonucleotide is with the sugar–phosphate backbone directed towards the helix edge of
the cleft and the bases, planar to the protein surface and pointing towards the centre and β-
sheet 2 (Figure 4.7).
Figure 4.7: Molecular surface of αCP1-KH3 showing modeled position of poly (C) RNA (orange) based on the Nova-2-KH3-RNA structure (accession no. 1EC6). The poly(C) tetrad is viewed from above the GXXG and variable loops, highlighting their position either side of the hydrophobic binding cleft
The possible electrostatic and hydrophobic contacts between αCP1-KH3 and RNA are
summarized in Figure 4.8A. These were determined with allowance for some molecular
flexibility (as assessed using molecular dynamics simulations using the CHARMM27
energy forcefield). They include non-specific hydrophobic interactions with Ile17, Gly18,
Cys19, Ile21, Ile28 and Ile41, which form the surface of the binding cleft, as well as
numerous electrostatic contacts to the sugar–phosphate backbone involving Gly22, Arg23,
Gln24, Gly25 backbone atoms (the GXXG tetrad) and contact with the Cyt4 sugar
hydroxyl by the Lys40 side chain amino group. Interactions that may help to favour
pyrimidine binding include Arg32 and Arg51 guanidino groups positioned in close
proximity to pyrimidine carbonyls (C2 carbonyls in Cyt3 and Cyt2, respectively; Figure
4.8B).
Chapter 4: Structural and NMR studies of αCP1-KH3
103
Interactions that could underlie cytosine specificity include potential hydrogen bonds
between Ile28 and Ile41 side chains and the central two cytosine bases (via their O2, N3
and N4 atoms). These isoleucines are conserved in hnRNP K and form an extensive
methyl–oxygen and methyl–nitrogen hydrogen bond network with the equivalent bases in
ssDNA (Braddock et al., 2002a). In addition, several water-mediated hydrogen bonds
between the protein and RNA occur fleetingly during the simulation. In particular, Ile41
carbonyl oxygen alternates between being hydrogen bonded to Cyt4 carbonyl and sugar
hydroxyl groups, and thus contributes to the preference for ribopyrimidyl oligonucleotide.
Figure 4.8: (A) Summary of potential interactions occurring between the modeled αCP1-KH3 and poly (C) RNA. (A) Poly (C) RNA-tetrad is represented schematically. Potential hydrogen bond interactions are indicated by dotted lines. Those form specific residue atoms to the RNA backbone are listed on the right, and those to the cytosine bases are listed on the left. The red dotted lines represent intra-molecular hydrogen bonds that may stabilize the RNA in its binding mode to the KH domain. Solid lines indicate Hydrophobic or Van der Waals contacts to the cytosine bases. (B) The positions of Arg 32 and Arg 51 side chains are highlighted beneath the molecular surface of αCP1-KH3. Potential hydrogen bonds to the poly (C) RNA are shown as dotted lines.
A B
Chapter 4: Structural and NMR studies of αCP1-KH3
104
4.5.1 Poly (C) RNA structure may favor binding
Many of the αCP1-KH3-oligonucleotide contacts would be predicted to occur upon either
RNA or ssDNA binding, such as the hydrophobic contacts listed above and electrostatic
interactions with Gly25, Arg51 and Lys40. Other contacts are precluded from occurring in
the case of ssDNA, due to the absence of sugar hydroxyl groups. These include potential
hydrogen bonds between sugar hydroxyls and Gly25, Arg32, Arg51 and Lys40 as well as
water-mediated hydrogen bonds as mentioned above.
Inter-nucleotide phosphate hydrogen bonds may also impact on the RNA structure and
potential interactions with αCP1-KH3. Phosphates of nt 2 and 4 can hydrogen bond to
sugar hydroxyls of nt 2 and 3, respectively. Phosphates of nt 1 and 3, on the other hand,
may hydrogen bond to Cyt1 and Cyt4 amino groups. The former of these interactions are
unique to RNA and the latter are also cytosine specific. Thus, it may be that the uniquely
stable conformation of RNA in this binding cleft, and in particular that of poly (C)-RNA,
favors binding to the KH domain.
αCP1-KH3 is reported to preferentially bind poly (C) RNA over other bases and over
ssDNA (Dejgaard and Leffers, 1996), though the ssDNA sequence is not clearly specified
in this study. The crystal structure of this domain confirms its adoption of the classical type
I KH fold and has allowed a precise model of its interactions with poly (C) RNA to be
examined.
Specificity for pyrimidines can be understood in terms of its narrow binding cleft that
would only readily accommodate the smaller bases. Specificity for cytosines over uracil or
thymine can also be rationalized on the basis of specific hydrogen bond interactions to
cytosine C2 carbonyl, N3 and C4 functionalities. Preferential binding to RNA over ssDNA
would be explained in part by sugar hydroxyl intermolecular hydrogen bonding. It may also
be that a poly (C) RNA oligonucleotide is able to contour perfectly in the binding cleft,
with inter-nucleotide hydrogen bonds from sugar hydroxyls stabilizing this conformation.
On the other hand, C-rich ssDNA has been shown to adopt very similar interactions with
hnRNP K, and is reported to bind just as well, if not better, than RNA to this closely related
KH domain (Braddock et al., 2002a).
Chapter 4: Structural and NMR studies of αCP1-KH3
105
4.6 NMR Studies of αCP1-KH3 In addition to our mRNA binding measurements using REMSA and SPR (see chapter 4),
we also employed NMR to observe the interaction of αCP1-KH3 with a 11-mer RNA
sequence 5’-UUCCCUCCCUA-3’, representing the αCP1 target site in the 3’UTR of
androgen receptor mRNA and to form a complex that would be suitable for crystallization
trials. The samples were prepared as described in the Methods Section 2.10.
4.6.1 Formation of a αCP1-KH3/11-nucleotide RNA complex
Mutational analysis of the UC-rich 51-nucleotide sequence in the 3’UTR of AR mRNA has
previously shown that the binding of αCP1 or αCP2 to this region is dependent on the
presence of the two cytosine triplets at the end of the sequence (Yeap et al., 2002). The 11
last nucleotides (5-UUCCCUCCCUA-3) were therefore prepared as the target for αCP1-
KH3 binding. The 15N-labelled αCP1-KH3 was then monitored using NMR spectroscopy
to observe the effects of the addition of the oligonucleotide to the sample.
The 15N-labelled αCP1-KH3 gave rise to well-resolved HSQC spectra (Figure 4.9A),
showing excellent dispersion in both 1H and 15N dimensions as well as narrow line widths.
The dispersion, particularly in the 1H dimension, indicates that the domain is likely to be
folded in its correct secondary and tertiary structure and the narrow lines are consistent with
this construct behaving as a monomer in solution. Seventy-nine single 1H–15N amide cross
peaks are observed, as would be expected for this 83-residue construct containing three
prolines. These and the N-terminal amine do not give rise to an amide cross peak. In
addition, 12 doublet 1H–15N amine cross peaks are distinguishable (between 108 and
114 ppm in the 15N dimension), which represent signals from the six glutamine and six
asparagine side chain amines.
Chapter 4: Structural and NMR studies of αCP1-KH3
106
Figure 4.9: The 1H–15N heteronuclear single quantum correlation spectra recorded at 25°C for 15N-labelled αCP1-KH3 before and after the addition of the 11-nucleotide RNA of sequence 5-UUCCCUCCCUA-3. (A) The uncomplexed spectrum and (B) the final titration point with αCP1-KH3 fully complexed with RNA. The crosspeaks on both spectra account for all of the expected resonances in the protein, possess narrow linewidths and are well dispersed. The movement of almost half of the peaks upon complex formation with RNA is consistent with a tight protein/RNA binding interaction.
Upon the addition of RNA to the 15N-labelled αCP1-KH3 sample, the positions of the
crosspeaks changed, reflecting altered electronic environments for many backbone NH
groups in the protein. The final titration point is shown in Figure 9B. The 1H–15N HSQC
spectrum remains well dispersed and has 79 well resolved crosspeaks. However at least 35
of the crosspeaks representing backbone NH correlations have changed position. This
demonstrates that the protein is fully complexed with the RNA (i.e. no evidence of
heterogeneity) and that the complex retains good solution characteristics with no evidence
of aggregation or the formation of larger complexes. In addition, the crosspeak movement
shows that almost half the backbone NH residues experience an altered electronic
environment upon interaction with RNA—more than would be directly at the protein/RNA
interface. This is unsurprising considering the long-range electrostatic effects that would be
expected to arise from the oligonucleotide’s phosphate backbone. Spectra acquired at
intermediate stages of the titration showed no evidence of a gradual movement of the
crosspeaks from their starting to finishing positions. This would be typical of a weak
Chapter 4: Structural and NMR studies of αCP1-KH3
107
interaction in which the chemical shift values represent averaged positions in this fast-
exchange regime. Rather, the peaks disappeared and reappeared in new positions,
suggesting a tight binding interaction and slow exchange relative to the NMR timescale.
The resulting �CP1-KH3/RNA sample was thus considered for crystallization trials. The
complex that formed appeared heterogeneous, the interaction appeared to be tight and the
complex retained good solution properties. The complex was therefore subjected to
crystallization trials using the Hampton screen 1 and the Natrix screen (Appendix A).
Unfortunately, there was no appearance of crystals.
Chapter 4: Structural and NMR studies of αCP1-KH3
108
4.7 Conclusions
This study has shown that oligonucleotide binding by αCP1-KH3 is likely to involve
extensive interactions with only four bases. The question remains as to how adjacent KH
domains are arranged when full-length αCP1-KH3 binds to RNA. It may be that the KH
domains are able to bind in relatively close proximity. Indeed, the two adjacent KH
domains (KH3 and KH4 of FBP) were shown to contact stretches of 6–7 bases,
respectively, with only 5 bases in between (Braddock et al., 2002b). In addition, the
consensus binding sequence for the αCP-2KL isoform involves three C-rich stretches (of
3–5 bases) separated by 2–6 A/U stretches (Thisted et al., 2001). Thus, αCP binding may
well involve participation by all three KH domains.
NMR spectroscopy was also used to confirm the ability of αCP1-KH3 to bind to an AR
mRNA sequence, as well as to demonstrate its complete complex formation with the 11-
nucleotide sequence at the final titration point. It also showed that almost half of the αCP1-
KH3 backbone NH resonances are affected by RNA binding. Assignment of the crosspeaks
would be required to determine which residues these are. The current studies show that this
protein and this protein/RNA complex would be highly amenable to further NMR studies.
This solution study also showed that the protein/RNA complex possessed stability in
solution over several weeks and subjected to periods of 25°C. Future efforts will therefore
focus on ensuring the utmost purity of the sample, which is the most common reason for
the absence of crystal formation.
The work conducted in this chapter represents the beginning of a structural and biophysical
examination of all three KH domains of αCP1. I was able to achieve the goals outlined at
the beginning. These studies were to provide a solid foundation for the remaining chapters
of this thesis. Understanding the basis for RNA-binding affinity and specificity of the three
KH domains will allow us to predict the occurrence of αCP1 interactions with mRNA and
better understand the multi-KH domain binding complex. Structural studies of αCP1-
KH3/AR mRNA will be a step towards describing the multiprotein AR mRNA complex
that influences its stability and possibly its translational efficiency in vivo. Structural insight
into such a complex may pave the way for the development of novel therapeutics aimed at
Chapter 4: Structural and NMR studies of αCP1-KH3
109
disrupting the complex. This could lead to AR mRNA instability and reduce the amount of
the AR in prostate cancer cells.
Chapter 5 Structural and NMR studies
of αCP1-KH1
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
110
5.1 Chapter overview
Having solved and analysed the structure of the isolated αCP1-KH3 domain as detailed
in the previous chapter, the next challenge was to investigate the structural features of
αCP1-KH domain in the presence of RNA or DNA target probes. To do this we again
used X-ray crystallography and NMR to examine the structure and dynamics of αCP1-
KH1 with its target oligonucleotide, a specific UC-rich region of the 3’UTR of AR
mRNA. In addition, the possible cooperative binding of αCP1-KH1 and the RRM
domains 1 and 2 of HuR was analysed. αCP1 and HuR are considered part of the post-
transcriptional control mechanism for AR expression. αCP1 have been shown to bind to
a specific 5-CCCUCCC-3 motif immediately adjacent to a U-rich sequence (AR mRNA
nt 3275 to 3325), which is the target for HuR binding.
In this chapter I aimed to:
1) obtain the 3-D structure of αCP1-KH1 domain with the target DNA or RNA
sequence,
2) illustrate the 3-D structure of αCP1-KH1/DNA on a computer screen and
develop insight into the nature of interactions between the DNA sequence (5-
TTCCCUCCCTA-3) and the binding site of the protein,
3) detect the solution properties of αCP1-KH1, using NMR spectra both in the
absence and presence of a 20-mer RNA sequence 5-
CUUUCUUUUUCUUCUUCCCU-3 and
4) determine whether NMR could be used to detect interactions of HuR RRM 1
and 2 with αCP1-KH1 bound to an RNA sequence with both a C-rich and U-
rich site.
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
111
5.2 Crystallization of αCP1-KH1/DNA
Crystals of αCP1-KH1/DNA were grown using vapor diffusion in 1 μl hanging drops
containing 1:1 mixtures of protein and reservoir solutions as described in Methods
Section 2.11.3. Crystals typically grew in two months to dimensions of ~ 0.2 x 0.2 x
0.04 mm with the outline of a diamond (Figure 1) and diffraction data was collected to 3
Å resolution. In addition, crystal growths were conducted with other screens and in the
presence of heavy metals. However, they did not result in better diffracting quality
crystals (Figure 5.1).
Figure 5.1: Crystals of αCP1-KH1/DNA. αCP1-KH1/DNA crystals were grown in 0.1 M sodium cacodylate pH 6.5, 0.2 M magnesium acetate, 30% MPD using vapour diffusion hanging drop method.
5.2.1 Structure determination of αCP1-KH1/DNA
The DNA sequence 5-TTCCCTCCCTA-3, analogous to nucleotides 3315–3325 of AR
mRNA, plus αCP1-KH1 (residues 14-86 preceded by the sequence GPLGSPGI present
due to cloning procedures) yielded crystals containing two crystallographically
independent copies of a 2:1 protein-DNA complex in the asymmetric unit. Equivalent
crystallisation experiments utilising RNA did not produce crystals suitable for structure
determination. Experimental phases were obtained by molecular replacement using
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
112
coordinates from the Nova-2-KH3 structure (pdb code:1EC6) with oligonucleotide
removed. The current refinement model has a working R factor of 24.7 % and a free R
value of 30.7% at 3.0 Å resolution (Table 5.1), with good stereochemistry 95% in the
allowed regions of a Ramachandran plot.
Table 5.1: Data collection and refinement statistics
Data collection
Symmetry P21
Unit cell (Å) a = 45.6
b =76.8
c = 61.4
Measured reflections 381063
Unique reflections 7993
Completeness (%) 99.8 (99.9)
Rmerge(%) 5.7 (65.8)
Wilson B (Å2) 84.2 Å2
Refinement
Resolution range (Å) 30.0 – 3.0
Rcryst (%) 24.7
Rfree (%) 30.7
r.m.s. deviation from ideal values
Bond length (Å) 0.010
Bond angle (°) 1.7
Average temperature factor (Å 2) 85.0
Number of water molecules 0
Values in parentheses are for the last resolution shell (3.11 Å – 3.0 Å) Rmerge = Σ |I- <I>| / <I> where I is the observed diffraction intensity and <I> is the average diffraction intensity from several measurements of one reflection. Rcryst = Σ | |Fo|-|Fc|/ Σ |Fo| where |Fo| and |Fc| are the observed and calculated structure factors respectively.
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
113
5.2.2 Structural Overview
The αCP1-KH1/DNA structure solved to 3.0 Å resolution reveals two αCP1-KH1
domains bound at adjacent cytosine triads. The positions of eight out of eleven
nucleotides could clearly be seen in the electron density. Four KH1 monomers (named
A-D) occur within the asymmetric unit, existing as dimers as previously observed for
other KH domain structures (Figure 5.2) (Lewis et al., 2000; Sidiqi et al., 2005b).
The αCP1-KH1/DNA complex forms with a stoichiometry of 2:1, with monomer A
(and B) within the asymmetric unit clasped to the 5’ end of the oligonucleotide and
monomer D (and C) bound at the 3’ end. Although the two KH domains bound to the
same oligonucleotide are held very closely, they do not make contact with one another.
Similarly to a recent structure of hnRNP K-KH3 bound to a DNA 15-mer (Backe et al.,
2005), this reveals the way in which two KH domains may be closely juxtaposed when
bound at adjacent C-rich binding sites.
Figure 5.2: Structures of the αCP1-KH1/DNA complexes in the asymmetric unit. There are four complexes in the asymmetric unit, colored green, blue, yellow, and orange for complex A, B, C, and D, respectively.
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
114
In addition, the αCP1-KH1 dimer formation results in there being a continuous chain of
αCP1-KH1 domains linked through a dimerisation interface and through
oligonucleotide binding throughout the crystal lattice. Furthermore, it appears that these
continuous chains are crosslinked via disulphide bonds (S-S distance = 2.05 Å) between
Cys54 residues (of chains C and D within the asymmetric unit). An SDS-PAGE gel
containing αCP1-KH1 confirmed the predominance of disulphide linked dimers in the
sample, despite its initial preparation under reducing conditions (Figure 5.3)
Figure 5.3: αCP1-KH1 dimerisation. (A). SDS-PAGE analysis of αCP1-KH1. Lane 1: Molecular weight marker. Lane 2: αCP1-KH1 sample in the absence of a reducing agent. Lane 3: αCP1-KH1 sample in the presence of a reducing agent (DTT). αCP1-KH1 dimerises in the absence of a reducing agent forming a complex at ~ 16 kDa. The position of the dimer and monomer are indicated by the arrow. (B) Disulphide bond between Cytseine 54 in the complex C and D in the asymmetric unit.
The protein conforms to the classical type I KH domain structure, with a three-stranded
anti-parallel β-sheet packed against three α-helices in a βααββα topological
arrangement (Figure 5.4B). The structure is consistent with that reported for the αCP2-
KH1 homologue bound to a telomeric DNA sequence solved to 1.7 Å resolution (Figure
4A) (pdb code 2AXY: identity 97% and Cα pairwise RMSD 1.1 Å) (Du et al., 2005). In
summary, a hydrophobic core provides the structure’s stability, with hydrophobic
residues emanating from the inner face of the β-sheet (including Leu14, Ile16, Leu18,
Met20, Ala45, Ile47, Ile49, Ile59 and Leu61) and all three helices (including Glu24,
A B
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
115
Val25, Ile28, Val36, Ile39, Arg40, Ala67, Ile68, Ala71, Ile75, Lys78 and Leu79). This
core is partly exposed to create the base of the hydrophobic oligonucleotide binding
cleft. The seven N-terminal and three C-terminal residues in the cloned sequence were
not visible within the electron density, and were hence excluded from the model. The
model thus includes one residue occurring due to cloning procedures dependent upon
the cleavage site of PreScission, and residues 14-83 of αCP1 (Swiss Prot Entry
Q15365).
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
116
Figure 5.4: (A) Electron density map of αCP1-KH1/DNA. (B) Overall structure of the αCP1-KH1/DNA complex. The KH domain is rendered by cartoon representation in green (β sheets) and pink (α helices). The 5’-tetrad of the target DNA which form contacts with the first KH domain is shown, illustrating the positioning of the critical bases about α-helix 1 and between the GXXG and variable loops.
A
B
A
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
117
5.2.3 Oligonucleotide binding
The oligonucleotide is accommodated in a hydrophobic cleft formed across the top of
α-helix 1 and bounded by the GXXG and variable loops (Figure 5.4). Basic residues
surrounding the binding site, including Lys 23, 31 and 32 at the XX positions, Lys37
and Arg 40, 46 and 57, create a positive potential along the length of the cleft (Figure
5.5).
Such a potential, similarly observed for αCP1-KH3 (Sidiqi et al., 2005b), could provide
a driving force for the docking of the oligonucleotide to the site, as well as provide
specific electrostatic contacts to the bound oligonucleotide.
Figure 5.5: The electrostatic potential emanating from the αCP1-KH1 (in the same orientation as cartoon representation in Figure 4). Structure was calculated using the APBS software package (http://agave.wustl.edu/apbs/; (Baker et al., 2001; Bank and Holst, 2003; Holst, 2001; Holst and Saied, 1993; Holst and Saied, 1995). Potential contours are shown at +1 kT/e (blue) and -1 kT/e (red) and were obtained by solution of the linearized Poisson-Boltzmann equation at 150 mM ionic strength with a solute dielectric of 2 and a solvent dielectric of 78.5. The blue contour represents a positive potential directing oligonucleotides to the binding cleft.
The resolution of the data did not permit a clear distinction to be made between cytosine
and thymine bases in the nucleic acid sequences. The oligonucleotide was therefore
built from nucleotide 2 to 9 which placed “TCCC” sequences in equivalent positions in
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
118
the two αCP1-KH1 binding sites. This positioning of a cytosine triplet is consistent with
the other structural studies of KH domains bound to C-rich DNA sequences (Backe et
al., 2005; Du et al., 2005) and consistent with binding data reported in the following
section.
The oligonucleotide makes contact with αCP1-KH1 monomer A (and B) primarily
through bases 2-5 corresponding to the TCCC sequence. The cleft is narrow, with the
phosphate-sugar backbone of the oligonucleotide pressed against the bounding GXXG
loop. The pyrimidine rings lie towards the variable loop which defines the opposite edge
of the cleft. Bases 2-4 are positioned with their rings planar to the hydrophobic floor of
the cleft, where as Cyt-5 is somewhat raised away from the surface of the protein and
positioned to make base stacking interactions with the ring of Cyt-4. KH1 monomer D
(and C) makes equivalent contacts with bases 6-9 corresponding to the 3’ TCCC
sequence of the target oligonucleotide sequence, so that the two monomers are arranged
in a tail-to-head arrangement on the DNA. The oligonucleotide twists about the
phosphate bond of Cyt-4 to allow the second KH1 monomer to bind downstream of the
first site approximately 180 degrees about the oligonucleotide axis (Figure 5.6). The
second KH1 monomer makes no contact with the first TCCC sequence.
Figure 5.6: Cartoon representation of αCP1-KH1/DNA complex. αCP1-KH1/DNA complex form with two protein molecules bound to a single 11-nt strand of DNA. The protein dimer is shown in orange and the oligonucleotide is shown in green.
The αCP1-KH1/oligonucleotide interactions are summarised in Figure 5.7. Although at
low resolution, Van der Waals and potential hydrogen-bonding interactions that
underlie the interaction could be ascertained from the data. Interestingly, we observed
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
119
analogous interactions with the KH domains occurring from both the first and the
second TCCC tetrad, reinforcing the mode of the interaction. The following discussion
refers to the 5’ tetrad (bases 2-5) but equally applies to the 3’ tetrad (bases 6-9).
Residues making Van der Waals contacts with the sugar-phosphate backbone are listed
on the left of Figure 7. They include the GXXG sequence which comprises Gly30,
Lys31, Lys32 and Gly33. These glycines, which are totally conserved and the classical
sequence marker of KH domains, are positioned beneath the sugar atoms of bases 2 and
4 as well as the phosphate group of Cyt-3. The sidechains of Lysines 31 and 32
(representing XX in the GXXG motif) extend into the solvent, their aliphatic chains
providing part of the hydrophobic edge of the binding cleft. At least one arginine or
lysine is quite common at these positions amongst KH domains. Both backbone and
sidechain atoms contact backbone sugar and phosphate atoms of bases 3 and 4. In
addition, Ile29 is positioned beneath both bases 3 and 4 and form Van der Waals
contacts with their backbone atoms.
Figure 5.7: Summary of the contacts between αCP1-KH1 and bound DNA tetrad of sequence 5’-TCCC-3’. Van der Waals contacts are coloured orange, and potential hydrogen bond interactions are coloured blue. The residues making important contacts with the oligonucleotide sugar-phosphate backbone are listed on the left, and the residues making contacts with the pyrimidine ring, and thus underlying base specificity, are listed on the right. The interactions are representative of both KH domains in the αCP1-KH1/DNA complex and would also be expected to occur within a αCP1-KH1/RNA complex.
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
120
5.2.4 Residues underlying cytosine specificity
Of particular interest in this study, was the determination of the intermolecular contacts
underlying cytosine specificity – for which these proteins are named. The amino acid
residues that make contact with the nucleotide bases are listed to the right of Figure 5.7.
The first base (Thy-2) is contacted by Gly26 and Ser27 which forms an amide bond
planar to the Thy-2 pyrimidine ring. No nucleotide specific interactions are observed at
this position, and it has been shown that an adenosine may also be accommodated at this
site (Du et al., 2005). The basis for cytosine binding in the second base position (Cyt-3)
is dominated by interactions with Arg57 (Figure 5.9). Arg 57 is completely conserved
amongst poly (C) binding protein KH domains. Its sidechain projects from the variable
loop region towards the key functionalities of the cytosine base. Bipartite hydrogen
bonds can form from NH1 and NH2 to cytosine pyrimidine ring O2 and N3 atoms.
Ile29 lies directly beneath cytosine bases 3 and 4 and contributes to both sugar ring and
pyrimidine ring hydrogen bond contacts. Binding specificity for the third base (Cyt-4) is
conferred by Ile49 and Arg40, which are conservatively substituted and conserved
respectively in poly (C)-binding proteins. The backbone carbonyl of Ile49 is ideally
positioned to form a hydrogen bond with Cyt-4 N4. Arg40 extends from the C-terminal
end of α-helix 2 and is able to make hydrogen-bond contact with the Cyt-4 O2 atom.
These observations are consistent with those observed by Du et al, (2005) in their
analysis of αCP2-KH1 bound to DNA (Du et al., 2005). A slight difference occurs,
however, in the positioning of the Cyt-5, which was reported to make contacts with
Glu51 via its N4 and N3 groups. In the current αCP1-KH1 structure the Cyt-5 sugar-
phosphate backbone and base are positioned slightly differently. Instead of making
contact with Glu51, the N4 forms a hydrogen bond interaction with the phosphate of
Cyt-4. This difference may be a result of a steric impact by the adjacently bound KH
domain on the DNA 11-mer, but also suggests that a lesser contribution to binding may
be conferred by the third base of the cytosine triad. In other studies of poly (C) binding
KH domains bound to DNA, this third cytosine makes the least contact with the protein
(Backe et al., 2005; Du et al., 2005) and our SPR data (Chapter 6) show that mutation
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
121
of the third cytosine to a thymine is tolerated, where as mutation of the first and second
cytosines is not.
5.3 Comparison of αCP1-KH1 with other KH domains
The αCP1-KH1 domain retains a high structural similarity to other type1 KH domain
structures that have been solved both in the presence and absence of oligonucleotide
(Figure 5.8). These include the third KH domain of αCP1 (pdb code: 1WVN), the first
KH domain of αCP2 (Du et al., 2005), the third domain of the proteotypic poly(C)-
and FBP KH domains 3 and 4 (pdb code: 1J4W) (Baber et al., 1999; Braddock et al.,
2002a; Braddock et al., 2002b; Du et al., 2005; Lewis HA, 1999; Lewis et al., 2000;
Sidiqi et al., 2005b). With sequence identities of 31% to 37%, they retain a very high
degree of structural similarity with a pairwise r.m.s. deviation of between 1.1 and 1.5 Å
(excluding the variable loop region where their sequences and structures diverge)
(Figure 5.8A). Of note is the observation that oligonucleotide binding has little impact
on the molecular structure in the cases of hnRNP K and Nova-2-KH3 where both the
unliganded and oligonucleotide bound structures have been determined (Braddock et
al., 2002a; Lewis et al., 2000).
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
122
Figure 5.8: A structural comparison of several different KH domains. (A) Backbone superposition of several KH domains from different proteins. The structures overlay very well and are very similar except in the variable region. (B) A structural sequence alignment of closely related KH domains. The sequences are limited to the regions that have been structurally characterized in all determined structures and the numbering scheme for the first and last residue is shown in parentheses. The sequence identity and Cα pairwise RMSD of each protein structure compared with αCP1-KH1 is shown. Conserved amino acids are colored red and boxed. Amino acids that do not show any structural similarity occur predominantly in the variable loop region. Amino acids highlighted red are those shown, through structural studies, to be involved directly in oligonucleotide binding.
A
B
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
123
It is therefore of interest that the unliganded αCP2-KH1 structure (coordinates kindly
provided by the authors (Du et al., 2005), which possesses a sequence identity of 97%
with αCP1-KH1, shows a relatively low structural similarity to αCP1-KH1/DNA - with
a pairwise r.m.s. deviation 0.6 Å (excluding the variable loop). The αCP2-KH1
backbone varies significantly from that of αCP1-KH1 within the first α-helix and at the
GXXG and variable loop regions despite the sequences being identical at these
positions. This may be an artifact of the methodologies used to determine the structures,
or there may be a degree of conformational change due to oligonucleotide binding. In
particular, the variable loop region approaches the oligonucleotide relative to its
unliganded position, better positioning Arg57 for the formation of cytosine-specific
hydrogen-bond contacts (Figure 5.9). This would represent the first reported case of a
conformational change of a KH domain upon ligand binding.
Figure 5.9: Comparison of αCP1-KH1/DNA (purple) and αCP2-KH1 (cyan) backbone traces (C-terminal helix omitted for clarity). The traces deviate particularly along the first helix and GXXG loop, as well as at the variable loop. In the oligonucleotide bound structure the variable loop is positioned towards the oligonucleotide, able to make contacts via Arg57 (R57).
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
124
5.4 Comparison of αCP1-KH1 with other KH domain/oligonucleotide complexes
To compare whether the DNA binding mode of αCP1-KH1 is similar to that of other
KH domains bound to RNA or ssDNA, we superimposed the current structure with
Nova-2-KH3/RNA and hnRNP K/DNA and αCP2-KH1 structures (Figure 5.10A). The
four contacting bases are shown on the molecular surface of αCP1-KH1. The
oligonucleotide binding modes are similar. Each of the oligonucleotides is positioned
within the hydrophobic cleft – with their phosphate backbone towards the GXXG edge
of the cleft and the nucleotide rings tending towards the variable loop edge. What is
striking, however, is the significant degree of variation in the positioning of the bases –
even in the region of the conserved α-helix 1 and GXXG loop. In particular, the base
positions reported in the study of the complex between hnRNP K and a 5’d-TCCC
tetrad (Braddock et al., 2002a) are displaced by approximately half a base when
compared with the oligonucleotide bound to αCP1-KH1 in the current study (Figure
5.10B). This is unexpected, since these KH domains are relatively closely related and
both possess poly (C)-binding specificity. However, this difference may be due to the
methodologies used to solve the structures. The base positioning of the structure that did
not match well to our current structure was a solution NMR structure. The recent crystal
structure of hnRNP K with a 15-mer DNA (5’-TTCCCCTCCCCATTT-3’) (Backe et
al., 2005) is very similar to our αCP1-KH1/DNA structure. The 5’ core recognition
bases of hnRNP K-KH3 overlay very closely with αCP1-KH1 core recognition bases, in
particular the bases in positions two and three have the same base positions (Figure
5.10C).
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
125
Figure 5.10: Overlay of bound oligonucleotides from αCP1-KH1 (purple), αCP2-KH1 (cyan) hnRNP K-KH3 NMR structure (yellow), hnRNP K-KH3 crystal structure (red) and Nova-2-KH3 (green) structures as obtained by structural superimposition of KH domains. The surface of the αCP1-KH1 domain is shown. All three oligonucleotides bind in a similar fashion, but with unexpectedly large differences in base positions. The position of the αCP1-KH1 oligonuclotide is more similar to that of Nova-2-KH3 than the solution structure of hnRNP K-KH3, however very similar to the crystal structure of hnRNP K-KH3.
A
B C D
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
126
The oligonucleotide positioning within the αCP1-KH1/DNA structure is also similar to
that of the looped RNA 20-mer bound in the cleft of Nova-2-KH3 (Figure 5.10D)
(Lewis et al., 2000). This is not due to phase bias from the molecular replacement
model, as oligonucleotide was removed from the search model. Thus, despite the lower
identity of Nova2-KH3 (29 %) with αCP1-KH1 compared with hnRNP K-KH3 (37 %),
the mode of interaction with the KH domain is much more conserved. Nevertheless –
across all KH domain/oligonucleotide structures reported to date, most of the analogous
amino acid residues are reported to be involved in the oligonucleotide binding, albeit
with differing contacts.
5.5 NMR Studies of αCP1-KH1 domain In addition to our mRNA binding measurements using REMSA and SPR (Chapter 6 and
Appendix B), we also employed NMR to detect the binding of αCP1-KH1 with a 20
mer RNA sequence 5- CUUUCUUUUUCUUCUUCCCU-3, representing the αCP1 target
site in the 3’UTR of AR mRNA. This sample could then be used for crystal trials. The
samples were prepared as described in Methods Section 2.10.
Furthermore, we wanted to determine whether NMR could be used to detect interactions
of HuR RRM 1 and 2 (RRM1/2) with αCP1-KH1 bound to an RNA sequence with both
a C-rich site and U-rich segment. Again, the resulting multi protein/RNA sequence
could be used for crystal trials. The samples were prepared as described in Methods
Section 2.10.1.
Mutational analysis of the UC-rich 51-nucleotide sequence in the 3’UTR of AR mRNA
has previously shown that the binding of αCP1 or αCP2 to this region is dependent on
the presence of the two cytosine triplets, and that the binding of HuR is dependent on
the presence of one uridine triplet and one quintet at the end of the sequence (Ostareck
et al., 1997). The 20 last nucleotides (5-CUUUCUUUUUCUUCUUCCCU-3) were therefore
prepared as the target for αCP1-KH1 and HuR RRM 1/2 binding. The 15N-labelled
αCP1-KH1 was then monitored using NMR spectroscopy to observe the effects of the
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
127
addition of the oligonucleotide to the sample and also observe the effect of the addition
of HuR RRM1/2 to the complex sample.
The 15N-labelled αCP1-KH1 gave rise to well-resolved HSQC spectra (Figure 5.11A),
showing excellent dispersion in both 1H and 15N dimensions as well as sharp signals.
The dispersion, particularly in the 1H dimension, indicates that the domain is likely to be
folded in its correct secondary and tertiary structure and the sharpness of the signals are
consistent with this construct tumbling freely in solution most likely as a monomer in
solution. Seventy single 1H–15N amide cross peaks are observed, as would be expected
for this 73-residue construct containing two prolines. These proline residues and the N-
terminal amine, signals from 6glu and 6asp side chains do not give rise to an amide
cross peak.
Figure 5.11: The 1H–15N heteronuclear single quantum correlation spectra recorded at 25 °C for 15N-labelled αCP1-KH1 before and after the addition of the 20-nucleotide RNA of sequence 5- CUUUCUUUUUCUUCUUCCCU -3. (A) The uncomplexed spectrum and (B) the final titration point with αCP1-KH1 fully complexed with RNA. The crosspeaks on both spectra account for all of the expected resonances in the protein. The movement of almost half of the peaks upon complex formation with RNA is consistent with a tight protein/RNA binding interaction.
A B
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
128
Upon the addition of RNA to the 15N-labelled αCP1-KH1 sample, the positions of
specific crosspeaks changed, reflecting altered electronic environments for many
backbone NH groups in the protein. The final titration point is shown in Figure 11B.
The 1H–15N HSQC spectrum remains well dispersed and has 70 well resolved
crosspeaks. However, at least 23 of the crosspeaks representing backbone NH
correlations have changed position. This demonstrates that the protein is fully
complexed with the RNA (i.e. no evidence of heterogeneity) and that the complex
retains good solution characteristics with no evidence of aggregation or the formation of
larger complexes. In addition, the crosspeak movement shows that almost half the
backbone NH residues experience an altered electronic environment upon interaction
with RNA—more than would be directly at the protein/RNA interface. This is
unsurprising considering the long-range electrostatic effects that would be expected to
arise from the oligonucleotide’s phosphate backbone. Spectra acquired at intermediate
stages of the titration showed no evidence of a gradual movement of the crosspeaks
from their starting to finishing positions. This would be typical of a weak interaction in
which the chemical shift values represent averaged positions in this fast-exchange
regime. Rather, the peaks disappeared and reappeared in new positions, suggesting a
tight binding interaction and slow exchange relative to the NMR timescale.
To test for an interaction between HuR RRM1/2 and oligonucleotide bound αCP1-KH1,
NMR spectra were collected as the sample was slowly titrated with increasing
concentrations of HuR RRM1/2. Upon the addition of HuR RRM1/2 to the 15N-labelled
αCP1-KH1/RNA sample, the positions of the crosspeaks did not change, indicating no
apparent interaction between these proteins. The initial titration point with the lowest
concentration and the highest concentration of HuR RRM1/2 are shown in Figure 5.12
A and B respectively. Thus, assuming that HuR RRM1/2 bound to the poly U sequence
(and separate SPR experiments have shown that HuR RRM1/2 does bind poly U with
high affinity whilst αCP1-KH1 does not, Chapter 6), there was no evidence for
interaction between the adjacently bound proteins. This, however, does not preclude the
interaction of αCP1 full-length protein through either αCP1-KH3 and or αCP1-KH2
domains. Previous studies have shown interactions between HuB and hnRNP K-KH2
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
129
(Yano et al., 2005), which are HuR and αCP1 homologues. This experiment represents,
therefore, the first of a series of experiments that could be used to test for interactions
between RNA-bound proteins.
Figure 5.12: The 1H–15N heteronuclear single-quantum correlation spectra recorded at 25°C for 15N-labelled αCP1-KH1/RNA sample before and after the addition HuR RRM1/2 (A) The complexed spectrum with lowest HuR RRM1/2 and (B) the final titration point with αCP1-KH1/RNA fully titrated with highest concentration of HuR RRM1/2. The crosspeaks do not change position or appear and disappear. The signals are broadened due to protein dilution. There is no apparent interaction of HuR RRM1/2 with αCP1-KH1.
A B
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
130
5.6 Conclusions
The formation of a tertiary complex involving two αCP1-KH1 domains bound to a
single strand of DNA reveals not only the important contacts underlying poly (C)
specificity, but the possible juxtaposition of poly (C) binding KH domains at a target
oligonucleotide binding site. An optimised RNA target has been determined for the
αCP-2KL isoform (which is derived from an alternatively spliced αCP2 transcript
resulting in the deletion of 31 residues from the linker region between KH domains 2
and 3) (Thisted et al., 2001). The study revealed that the optimal target sequence for the
αCP protein encompasses three C-rich patches of 3-5 bases, each displaced by between
2 and 6 bases. Our study has confirmed that a single KH domain makes contacts with 4
bases and that there is no steric hindrance to the binding of two KH domains to adjacent
oligonucleotide stretches.
This, together with our demonstration that αCP1-KH2 does, in fact, bind to
oligonucleotide (Chapter 6), suggests that all three KH domains of full-length αCP
proteins are involved in oligonucleotide binding. The arrangement of such an
αCP1/oligonucleotide complex has been modelled on the basis of the arrangement of
KH domains seen in the αCP1-KH1/DNA crystal structure (Figure 13A/B). The KH
domains are bound to three adjacent poly (C) patches. Whilst domains 1 and 2, which
are separated by only a 13-residue linker region are most likely to bind alongside one
another (the linker is the ideal length for this arrangement), the third KH domain
(separated by 112 residues) could bind either side of this pair. In the case of the AR
mRNA sequence, two poly (C)-patches exist in the 51-nt region shown to bind αCP1
and αCP2 and to stabilise the mRNA (Yeap et al., 2002). These could be the binding
sites for KH1 and KH2. A third C-rich patch, however, exists 7 bases downstream and
could readily be targeted by KH3. The structure of the linker region between domains 2
and 3 is unknown. It is known, however, that a nuclear targeting sequence (NLS) exists
in this region (Wang et al., 1995) and it is likely, therefore that this sequence is within a
structure which appropriately presents the NLS to its recognition protein.
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
131
Figure 5.13: Model of full-length αCP1. (A) Model of the arrangement of the three KH domains of αCP1 contacting oligonucleotide on the basis of the arrangement of KH domains seen in the αCP1-KH1/DNA (B) crystal structure.
The question of whether dimer formation, as demonstrated within the crystallographic
arrangement of the αCP1-KH1/DNA and other KH domains (Lewis HA, 1999; Sidiqi et
al., 2005b), occurs in vivo remains to be investigated. However, previous studies have
shown self-association of KH modules. Nova-1-KH3 homodimerizes in solution in the
absence of RNA and without the contribution of other regions of the full-length protein.
Self association has also been observed for hnRNP K and also for αCP proteins and has
been suggested to dictate their biological function by associating with other effector
proteins (Ramos et al., 2002). It is unlikely, however, that a dimer would form between
KH domains within a single αCP/oligonucleotide complex. The dimer is formed using
interactions between the C-terminal helices and N-terminal strands, and does not
interfere with oligonucleotide binding. The oligonucleotide binding clefts, however, are
positioned too far apart for adjacent (separated by only 2-6 residues) C-rich sequences
to reach (Figure 13B). Dimerisation may play a role when one C-rich patch is distal to
the others, or it may represent a means by which several αCP molecules can interact in
a multiprotein/oligonucleotide complex. However, it is important to note that in vivo
dimerization and RNA binding will almost certainly be dictated by the presence of the
other KH domains, modulating the affinity and the cooperativity of the binding and
modifying the kinetics.
Chapter 5: Structural and NMR studies of αCP1-KH1/DNA
132
NMR spectroscopy was also used to confirm the ability of αCP1-KH1 to bind to an AR
mRNA sequence, as well as to demonstrate its complete complex formation with the 20-
nucleotide sequence at the final titration point. It showed that almost half of the αCP1-
KH1 backbone NH resonances are affected by RNA binding. As for interactions
between αCP1-KH1 and HuR RRM 1/2, our NMR studies did not show any signs of
interactions between these proteins but this does not exclude the possibility of
interaction through either αCP1-KH1 and/ or αCP1-KH2.
This study thus reveals a way in which full-length αCP molecules may interact with
their target oligonucleotides. The protein-bound oligonucleotide complex may not
involve any intramolecular interactions between RNA-binding domains, but
nevertheless effectively protects the oligonucleotide against nucleases, thus enhancing
RNA stability. The exposed surface of the complex may also provide clues as to how
the αCP/oligonucleotide complex docks with putative αCP binding partners such as
PABP, HuR and hnRNP D (Kiledjian et al., 1995; Wang et al., 1999; Wang and
Kiledjian, 2000; Yeap et al., 2002). This set of interactions may involve the interface
adjacent to the immediately 5’-bound HuR protein in the AR mRNA regulatory
sequence, raising the possibility that these interactions could underlie cooperative
binding to RNA (Kiledjian et al., 1995). The study thus provides insight into the mode
of αCP1 binding at a target oligonucleotide binding site and is the first step towards the
structural definition of multiprotein/oligonucleotide complexes involved in the
regulation of AR gene expression.
Chapter 6 SPR analysis of αCP1-KH
domains
Chapter 6: SPR analysis of αCP1-KH domains
133
6.1 Chapter Overview
Having looked at the structural features of αCP1-KH domains, both isolated and in the
presence of DNA, a number of key residues were identified as crucial for binding and
specificity. To further study the poly (C) binding specificity of αCP1-KH domains, we
were interested to thoroughly examine the binding of these domains to a number of
different RNA and DNA sequences, in particular, the 30 nucleotide at the 3’UTR of AR
mRNA, which contains the C-rich site. In addition, it has been previously shown that
interaction with the oligonucleotide target is mediated through the αCP1-KH1 and KH3
domains. However, a triplet poly (C) sequence has been shown to be optimal for binding,
suggesting that αCP1-KH2 is also likely to bind to a poly (C) sequence (Makeyev et al.,
2002). The binding of αCP1-KH2 has not, prior to this study, been demonstrated.
Therefore, in this chapter I aimed:
1) to measure the binding kinetics of the interaction between αCP1-KH domains and
the target oligonucleotides,
2) to investigate, whether there is a preference for binding to RNA over ssDNA, by
αCP1-KH domains,
3) to better understand the relative contributions of each αCP1-KH domain binding to
the overall affinity of αCP1,
4) to determine the basis of specificity of αCP1-KH domains for oligonucleotide and
5) to investigate the role of each cytosine in the four core recognition nucleotides.
6.2 Why use surface plasmon resonance (SPR)?
We chose SPR for studying αCP1 interactions with an oligonucleotide system for a number
of reasons. First, SPR does not require any labelling of the compounds for detection. It
depicts the binding process in real time. SPR can also provide kinetic data for bimolecular
interactions. This allows researchers to quantitate the binding characteristics of compounds
with their targets in terms of affinity, specificity, and association/dissociation rates, as
opposed to just the determination of the equilibrium constant such as in gel shift assays.
SPR experiments are also relatively rapid to conduct. Lastly, SPR has been used previously
Chapter 6: SPR analysis of αCP1-KH domains
134
for kinetic analysis of a number of different RNA/DNA and protein systems (Schuck,
1997b).
6.3 Principles and applications of Surface Plasmon Resonance Surface plasmon resonance is an optical sensing technique. This system detects changes in
refractive index within the vicinity (~300 nm) of the sensor surface (Figure 6.2). In the
models are then examined for their capacity to fit real data. These are briefly described
below.
Chapter 6: SPR analysis of αCP1-KH domains
137
6.4.1 The Langmuir binding model
If the interaction being examined is anticipated to occur in a 1:1 stoichiometric ratio, then
the simplest model should first be examined. The 1:1 binding model, which is also known
as the one site model or the 1:1 Langmuir binding model describes the simplest bimolecular
interaction. This model assumes that the analyte concentration does not change during
association due to the constant flow of analyte, which is intended to prevent depletion and
accumulation of the analyte in the solution. It also assumes the analyte concentration is zero
during the dissociation phase and that the binding of the analyte to the ligand is 1:1. A
schematic of the model is shown below in Figure 6.3.
Figure 6.3: 1:1 Langmuir binding of analyte A to immobilized ligand B, to form AB complex. ka is the association rate constant and kd is the dissociation constant.
The chemical equation for the model is:
Eq.1
Where A represents the free analyte in the solution, B is the ligand immobilized on the
sensor surface. AB is the complex of the analyte bound to the ligand, ka is the association
Chapter 6: SPR analysis of αCP1-KH domains
138
rate constant and kd is the dissociation rate constant (Table 1). The thermodynamic
equilibrium dissociation constant is then equal to
The differential rate equation for the 1:1 model is:
Eq.3
Where d[AB]/dt corresponds to the rate of change of the concentration of the complex AB
at time t, [B]tot is the total concentration of the ligand site. This equation may be combined
Eq.2
Table 6.1: A brief explanation of the rate constants
Chapter 6: SPR analysis of αCP1-KH domains
139
with an approximation/assumption that the analyte concentration maintains its initial value
[A]i to give
Eq.4
Because the biosensor response is directly proportional to [AB]t. Eq.4 maybe rewritten as
Eq.5
Where, Rt denotes the response at time t, and Rmax is the maximal response obtained, if all
the available ligand-binding sites are occupied. An integrated form of this equation
Eq. 6
is used to determine Rmax and the rate constants ka and kd. An alternative form of the
equation is
Eq. 7
Chapter 6: SPR analysis of αCP1-KH domains
140
Where this equation expresses the time dependence of biosensor response in terms of first
order rate constant kobs and Req, the response at equilibrium. From equations 6 and 7,
equation 8 is derived:
Eq.8
Which is an equality arising from the definition of the association equilibrium constant,
Eq.9
Therefore, analysis of SPR data by this manner can also be used to determine the response
at equilibrium.
6.4.2 Determination of Equilibrium Constants
There are a number of ways to represent the affinity of interaction as shown in below.
Chapter 6: SPR analysis of αCP1-KH domains
141
The affinity constant can be measured directly by equilibrium binding analysis or via
measurement of the kinetic rate constants using equation 2. Equilibrium analysis involves
multiple sequential injections of the analyte at various concentrations and measuring the
level of binding at equilibrium. The analyte concentration should be ideally varied from
0.01*KD to 100*KD. An assumption in these affinity measurements is that the level of the
active immobilized ligand remains constant. The time it takes to reach equilibrium is
determined by the dissociation rate constant.
The affinity constant is obtained from the data by using the steady state or the Langmuir
binding isotherm model
Where “Bound” is measured in RUs (response units) and Max is the maximum response
(RUs). CA is the concentration of the analyte.
The KD and Max values are obtained by fitting of the above equation to the data using the
BIAevaluation software. A graph of equilibrium response against the concentration of the
analyte gives the KD, which is equivalent to the concentration of the analyte at which 50%
of the binding sites are occupied (Figure 6.4).
Figure 6.4: Graph of Req against the analyte concentration. This graph used to obtain the KD, which is equivalent to the concentration of the analyte at which 50 % of the binding sites are occupied.
Eq.10
Chapter 6: SPR analysis of αCP1-KH domains
142
6.4.3 The Two compartment or Mass Transfer model
This model is similar to the 1:1 model in that it assumes one to one binding between the
analyte and the immobilized ligand (Figure 6.5). The major difference is that the
concentration of the analyte in the flow is variable. This is because binding takes place in
two steps or compartments and each step has a different analyte concentration (Glaser,
1993). The bulk compartment has the same concentration of the analyte as the initial
injection concentration, while the concentration of the analyte at the surface compartment is
influenced by the rate of mass-transport of the analyte from the bulk compartment towards
the sensor chip surface and then binding of the analyte to the immobilized ligand. Both
processes have their own independent rate constants and each are incorporated into the rate
equations for this model (Myszka et al., 1998).
6.4.4 Other binding models
A number of other different models have also been used in SPR studies. These include the
two-site model or otherwise known as the heterogeneous model, avidity model and
conformational change model.
Figure 6.5: Mass transfer model: Mass transfer model for transport of analyte A to the surface (A) and binding of the analyte to the ligand B to form the complex AB. Kt is the mass transfer coefficient.
Chapter 6: SPR analysis of αCP1-KH domains
143
The heterogeneous model can either arise from a heterogeneous ligand or a heterogeneous
analyte (Morton et al., 1995). The heterogeneity of the ligand can be a result of the
immobilization chemistry adopted such as amine coupling (Figure 6.6). Natural sources
such as polyclonal antibodies, posttranslational modifications and also impurities can all
cause heterogeneity. The major assumptions of this model are that two different forms of
the ligand are immobilized and each presents an equally accessible binding site for the
analyte, which can interact in a simple bimolecular interaction. Each form of the
immobilized ligand presents binding sites with different binding affinities and also different
rate constants for the analyte, which is reflected in the sensogram as the sum of these. It is
also assumed here that the concentration of the analyte is constant throughout the flow cell
and does not change with time (Morton et al., 1995).
Heterogeneity of the analyte can occur naturally or through enzymatic degradation. This
can lead to two possible reactions taking place. Firstly, one analyte binding to two different
ligand sites and secondly, one analyte having two different affinities for one ligand site.
In the first case, the sensogram will present the sum of these two independent analyte
reactions, which is a consequence of two anlaytes binding to their independent binding
Figure 6.6: Heterogeneous model: This model can arise from a heterogeneous ligand surface, resulting in the formation of two complexes, each with a separate association and dissociation rates.
Chapter 6: SPR analysis of αCP1-KH domains
144
sites. However, in the second case, there is a competitive reaction occurring, where there is
only one binding site and there are either two different analytes or one analyte with
different affinities.
The bivalent analyte model describes an analyte with two identical binding sites, but only
one site binds to the ligand (Baumann, 1998). The second free site can stabilize the ligand-
analyte complex, however with no extra response, but with a change in the equilibrium
constant (Figure 6.7). The model usually fits well to data collected for antibody
interactions.
Figure 6.7: Bivalent model model: This model is when the analyte has two identical binding sites but only one site binds to the ligand. Analyte A, binds to the surface of the ligand B, to form the complex AB, resulting in one association and dissociation rate constant.
The conformational change model describes a situation where the analyte and the ligand
complex change conformation after binding (Figure 6.8). Although, the change is not mass
based, it still alters the response. This is because it modifies the equilibrium between bound
and free forms of the analyte. Such interactions are seen in some receptor-hormone and
antibody-antigen interactions (Jonsson, 1991).
Chapter 6: SPR analysis of αCP1-KH domains
145
Figure 6.8: Conformational change model: The conformational change model involves the analyte and ligand complex changing conformation after binding, forming the complex AB and resulting in one association and dissociation constant. 6.5 Deviation from 1:1 model A simple bimolecular interaction model does not always fit the data. Experimental artifacts
such as surface-imposed heterogeneity, mass-transport, aggregation, crowding, matrix
effects and non-specific binding can complicate binding responses. However, careful
experimental design can help avoid a number of these unwanted effects (Svitel et al.,
2003). Some of these will be discussed briefly below.
6.5.1 Sample purity
Biosensors measure the interaction of molecules with the sensor surface; therefore impure
samples can be used to conduct experiments. However, there are a number of setbacks with
using impure substances for kinetic analysis. This is because firstly, the impurities may
interact non-specifically with the surface of the chip. In order to avoid this, pure samples
should be used and these should be tested for non-specific binding by injecting the sample
over a reference surface, which is a surface without the ligand immobilised. This should be
done at the highest analyte concentration. Furthermore, if there is still considerable
background response, experimental conditions should be modified. For example, basic
proteins interact electrostatically with the carboxymethyl dextran matrix. This effect can be
reduced by increasing the ionic strength of the buffer and by changing the charge of the
matrix by blocking the surface with amines. Other available surfaces that can minimise
non-specific binding include a dextran matrix with lower charge, a flat carboxyl surface
Chapter 6: SPR analysis of αCP1-KH domains
146
with no dextran and a plain gold surface. Secondly, the impurities may increase bulk
refractive index change during association, which can be subtracted using a reference cell.
Artefacts such as bulk refractive index changes, matrix effects, non-specific binding,
injection noise and baseline drift due to temperature variation can be avoided by using a
reference surface. It is ideal to treat the reference surface with the same chemicals used for
the immobilisation of the ligand to maintain similar environments within the matrix.
Thirdly, the active concentration of the analyte may not be quantified accurately, which is
important for determining rate constants.
6.5.2 Aggregation state
It is very important to determine the aggregation state of the sample. Samples that self-
associate can complicate the binding response. The concentration of aggregates of the
analyte in the solution may be low, but it can reach high levels when bound to the surface.
For example, a 1000 RU signal is equal to a protein concentration of 1 ng/mm2 on the
biosensor surface (Davis et al., 1998). As a consequence, it is crucial to know that the
sample does not aggregate even at such high concentrations, which can be checked on a
non-reducing gel or by size exclusion chromatography and analytical ultracentrifugation.
6.5.3 Mass-transport
The occurrence of mass-transport effects can be determined by examining the impact of
changing the flow rate on the association constant. This is readily achieved by injection of
the same concentration of analyte over the immobilised surface at three different flow rates.
If it results in a different binding response then it is indicative of reactions influenced by
mass-transport. Mass-transport essentially represents an insufficient transport of the mobile
analyte to the sensor surface and as a consequence hinders the analysis of the chemical
kinetics of bimolecular reactions (Myszka et al., 1998).
In order to limit the effect of mass-transport, experiments should be conducted at high flow
rates, which will ensure that a constant concentration of analyte is delivered to the sensor
surface. Mass-transport problems can also be avoided by having a low surface density of
ligand. In addition, there are models in the Biacore BIAevaluation software that takes into
account mass-transport, when it has been shown to occur.
Chapter 6: SPR analysis of αCP1-KH domains
147
6.5.4 Steric Hindrance A number of studies have shown that high local concentrations of immobilised ligand can
lead to steric hindrance, which will restrict access of the analyte to the ligand binding site
(Schuck, 1996; Schuck, 1997b). The amount of ligand immobilised to the surface is another
important experimental condition to consider. Ligand density determines the binding
capacity. It is always highly recommended to use the lowest capacity surface possible, for
performing kinetic experiments. This is because it can minimise artifacts such as mass-
transport, steric hindrance, crowding and aggregation. Binding curves with maximum
responses of as little as 50 RU can still readily be measured.
6.5.5 Minimal non-specific binding
It is always very important to establish experimental conditions that produce minimal
nonspecific binding (David, 1999; Morton et al., 1998). Non-specific interactions can be
measured by testing binding of the analyte at high concentrations over a nonderivatized or
reference cell of the sensor chip. One approach of reducing nonspecific binding is by
changing the buffer conditions. In our experiments and other nucleic acid-protein
interaction systems (Katsamba et al., 2001), the standard buffer (10 mM Tris–HCl, pH 7.4,
150 mM NaCl, 0.5% TritonX-100, 62.5 μg/ml bovine serum albumin (BSA), 125 μg/ml
tRNA, 2 mM DTT and EDTA), also included 62.5 μg/ml BSA and 125 μg/ml tRNA to
prevent non-specific binding. tRNA would act as a non-specific competitor. With sequence
specific binding, an intermediary complex may form in which protein is bound to
oligonucleotide with lower affinity and in a non-specific manner. Addition of excess
competitor oligonucleotide, then excess protein that is not bound in a sequence specific
manner will be bound to the competitor, of which there is more, rather than the probe, thus
avoiding the formation of non-specific complexes. The presence of tRNA, therefore
presents a potential site for the analyte to bind nonspecifically, which can then be
subtracted from the specific binding data. BSA is also added to block nonspecific protein
binding. BSA would occupy sites on the ligand, which the analyte may potentially have
bound nonspecifically if available. In addition, the first runs after desorb or cleaning of the
Biacore machine, can suffer from adsorption of the analyte to the tubing and IFC-walls. For
example, a pre-run with high protein such as BSA can reduce this effect (BIACORE,
Chapter 6: SPR analysis of αCP1-KH domains
148
1997). In addition, BSA helps to stabilise some proteins and hence is used as a carrier
protein.
6.6 Results
6.6.1 Binding measurements of αCP1-KH domains to the 30 nucleotide 3’UTR of AR
mRNA
I first set out to determine the binding affinities of the single KH domains of αCP1 to its
binding site, within the 30 nucleotide 3’UTR AR mRNA nt 3275- to-3325 (5’-
CUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3’) identified by Yeap et al (Yeap et
al., 2002). The nucleotide sequence of AR mRNA used in this study contains two cytosine
triplets at the 3’ end of the sequence. We utilised an SA chip for the analysis. The first flow
cell was the reference flow cell while in the second flow cells 5’ biotinylated RNA
representing the 30-nt target sequence from 3’UTR of AR mRNA was immobilised.
(Methods Section 2.9.2). This produced a stable and homogeneous recognition surface,
which is important for performing a detailed kinetic analysis. A high flow rate (50 μl/min)
and a low binding capacity surface (30 RU) were used to prevent the effect of mass-
transport and steric hindrance. After the immobilisation of the ligand, the sensor surface
was subjected to several injections of 2 M NaCl to test the integrity of the surface. For the
collection of binding kinetic data, a protein concentration series starting from 10 μM to
0.625 μM or 0.312 μM were injected over the immobilised surface and a reference surface
simultaneously. The protein concentrations were determined as described in Methods
Section 2.5. Responses from the reference surface were used to correct for the refractive
index changes and instrument noise, giving high quality sensor data. Each experiment was
conducted multiple times and the data from these experiments overlapped closely.
Representative sensograms are shown in Figure 6.9.
Chapter 6: SPR analysis of αCP1-KH domains
149
Figure 6.9: Binding studies of αCP1-KH1, KH2 and KH3 with RNA sequence, 5-CUGGGUUUUUUUUUCUCUUUCUCUCCUUUCUUUUUCUUCUUCCCUCCC-3, representing the 30 nucleotide sequence at the 3’UTR of AR. 30 RU biotinylated RNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1, KH2 and KH3 domains from 10 to 0.312 µM (10, 5, 2.5, 1.25, 0.625 and 0.312 µM) at increasing concentrations for 2 min using flow rate of 50 µl/min.
Visual inspection of the binding responses for the three KH domains of αCP1 shows that
each KH domain interacts with the target sequence to a different degree. αCP1-KH1
displays a quick on-rate and a relatively slow dissociation rate. αCP1-KH3 displays very fast
association and dissociation rates apparent from the steep drop to zero in the response curves.
αCP1-KH2 shows no propensity to bind at all.
The maximum response (Rmax) of each of the αCP1-KH domains to the RNA surface is
also different. αCP1-KH1 has the highest maximum response followed by αCP1-KH3 and
no apparent binding for αCP1-KH2. From the αCP1-KH1 and KH3 maximum responses
the stoichiometry of the surface molecular complex can be calculated using the equation
below
Eq.11
Chapter 6: SPR analysis of αCP1-KH domains
150
Where Rmax is the analyte binding capacity, which can be extrapolated from experimental
data, ligand level in response unit (RU) is the amount of ligand immobilized on the sensor
surface and MW stands for the molecular weight.
The nucleotide sequence of AR mRNA used in this study contains two cytosine triplets at
the 3’ end of the sequence, which constitute two αCP-KH domain target sites. The KH
domains can, therefore, potentially bind with a 2:1 stoichiometry to the surface-
immobilised oligonucleotide. The observed stoichiometry measured for both αCP1-KH1
and αCP1-KH3 using equation 11 was 0.91 and 1.1 respectively, representing a
substoichiometric interaction of the protein to the RNA surface. This is likely to be a result
of the unavailability of all of the RNA binding sites, which can be due to RNA adhering to
the chip in such a way as to block the binding sites or it may also result from the formation
of secondary structures in the RNA.
The data for αCP1-KH1 and αCP1-KH3 was hence analysed using the steady state method.
The dissociation equilibrium constants (KD) are shown in Table 1. The KD values for both
αCP1-KH domains are indicative of moderate binding affinity. However, αCP1-KH1 had a
higher affinity for the RNA target than αCP1-KH3. This is reflected in the much slower
dissociation of αCP1-KH1 from the RNA target, demonstrating the stability of αCP1-
KH1/RNA complex over time. In contrast, αCP1-KH3 had a much shorter half-life during
the same period of time, apparent from its steep dissociation rate.
Chapter 6: SPR analysis of αCP1-KH domains
151
To study the kinetics of the αCP1-KH/RNA interactions the kinetic data in Figure 6.9 were
modeled to a simple 1:1 Langmuir interaction and the other models available in the
BIAevaluation software. The use of these models did not result in good fits with the
association and dissociation curves, indicating a complexity of the interaction that may be
due to the presence of both core and secondary binding interactions.
Consistent with previous reports, no binding interaction was detectable between αCP1-
KH2 and RNA even at elevated protein concentrations (Dejgaard and Leffers, 1996; Sidiqi
et al., 2005a). Although, CD spectropolarimetry was consistent with the formation of a
folded αCP1-KH2 domain, and the knowledge that the missing final α-helix is not on the
oligonucleotide binding face of the KH domain (Chapter 3, Section 3.1), it is possible that
the truncation could have impacted on the ability of αCP1-KH2 to bind RNA. Thus, the
ability of αCP1-KH2 to participate in oligonucleotide binding remained uncertain from this
experiment.
6.6.2 Binding measurements of αCP1-KH domains to DNA sequence representing the
30 nucleotide 3’UTR of AR mRNA
This experiment aimed to see if there were any differences between the binding affinities of
αCP1-KH domains to a DNA sequence comprising the 30 nucleotide 3’UTR of AR. To
study the kinetics of αCP1-KH/DNA interaction on the biosensor, chemically synthesised
5’biotinylated DNA was captured on one SA chip flow cell, whereas a second, unmodified
flow cell served as reference surface. Responses from the reference surface were used to
correct for refractive index changes and instrument noise, producing high quality sensor
data. A representative data set of sensograms for αCP1-KH1, KH2 and KH3/DNA
interaction is shown in Figure 6.10.
Chapter 6: SPR analysis of αCP1-KH domains
152
Figure 6.10: Binding studies of αCP1-KH1, KH2 and KH3 with DNA sequence, 5-CTGGGTTTTTTTTTCTCTTTCTCTCCTTTCTTTTTCTTCTTCCCTCCC-3 representing the 30 nucleotide at the 3’UTR of AR. 30 RU biotinylated DNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1, KH3 and KH2 domains from 10 to 0.312 µM (10, 5, 2.5, 1.25, 0.625 and 0.312 µM) at increasing concentrations for 2 min using flow rate of 50 µl/min.
In contrast to RNA binding, all three αCP1-KH domains bound to the DNA sequence. A visual
inspection of the binding responses shows both αCP1-KH1 and αCP1-KH2 have a
relatively slow on and off rates while αCP1-KH3 shows both fast on rate and off rate. It is
also evident from response curves, that the maximum response of each of the αCP1-KH
domains to the DNA surface is different. αCP1-KH1 has the highest maximum response
followed by αCP1-KH3 and KH2. The stiochiometry of the molecular complex was
calculated for each αCP1-KH/DNA complex using equation 11. In the case of αCP-KH1
and αCP1-KH3 binding to DNA, the calculated stoichiometry of binding was ~ 2.3, which
indicated that about two molecules of αCP1-KH1/KH3 are binding to the target DNA
sequence. This is in agreement with the expected 2:1 theoretical response, as there are two
cytosine triplets present at the 3’ end of the target DNA sequence. However, in the case of
αCP1-KH2, the derived stoichiometry was 0.5, describing a substoichiometric interaction
of the protein to the DNA surface. The low stoichiometry may be indicating that either a
fraction of the protein was inactive or inaccessible for interaction with the ligand.
The data for αCP1-KH domains were again analysed using the steady-state method. The
dissociation equilibrium constants (KD) are shown in Table 2. The KD value for αCP1-KH1
domain is indicative of moderate binding affinity, while the KD of αCP1-KH2 and KH3 are
Chapter 6: SPR analysis of αCP1-KH domains
153
indicative of low affinity. In both cases the half-lives of the complexes are clearly shorter
than for αCP1-KH1. However, this is the first time that αCP1-KH2 has been reported to
show some binding, albiet with a very low response.
The kinetic data in Figure 6.10 were modeled with a simple 1:1 Langmuir interaction. The
model 1:1 interaction did not result in an excellent fit to the association and dissociation
curves of αCP1-KH1 and KH3 that may be associated with the occurrence of a complex
binding mechanism, either due to both core and secondary binding interactions. The data is
consistent, however, with slower off-rate and faster on-rate kinetics, underlying the higher
affinity of αCP1-KH1 over αCP1-KH2 and αCP1-KH3 for AR DNA over RNA.
Furthermore, other models such as the bivalent analyte, heterogenous ligand and analyte
models did not also produce good fits. It was not expected for the data to fit models such as
the bivalent and heterogeneous analyte model as αCP1-KH domains are monovalent and
αCP1-KH protein samples were pure and homogenous. The heterogenous ligand model
also did not result in a good fit. This model suggests that one analyte binds independently
to two ligand sites, which again is not possible for αCP1-KH domains, as structural studies
indicate that only one KH domain can bind to one triplet poly (C) site.
The only KH domain data that could be fitted using a simple Langmuir model was the
interaction of αCP1-KH2 with DNA (Figure 6.11), indicating a relatively fast on rate and
Chapter 6: SPR analysis of αCP1-KH domains
154
slow off rate, resulting in a KD of 6.49 μM. This agreed closely with the steady state KD of
5.46 ± 1.31 μM.
Figure 6.11: Kinetic analysis of αCP1-KH2 and DNA representing the 30 nucleotide of the 3’UTR of AR mRNA. 30 RU biotinylated DNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH2 domain from 10, 5, 2.5, 1.25 and 0.625 µM at increasing concentrations for 2 min using flow rate of 50 µ l/min. The black dotted line represents the experimental data and the purple lines represent the best fit to a simple bimolecular reaction model. The binding constants obtained from simple bimolecular reaction model. Steady state KD represents the equilibrium constant from steady state analysis. A KD of 5.46 µM is indicative of weak affinity.
Overall, the αCP1-KH domain RNA and DNA binding studies demonstrated that the three
KH domains have a binding preference for DNA over RNA comprising the 3’UTR of AR
sequence. However, it is not known to what extent this is due to RNA secondary structure
blocking binding, or an actual difference in binding preferences between KH domains for
RNA versus DNA. Each isolated domain is capable of binding to oligonucleotides
separately with widely ranging affinities ordered as follows: αCP1-KH1> KH3> KH2.
This study revealed for the first time that αCP1-KH2 is capable of specific binding to
DNA. The αCP1-KH2 construct used in the current study is truncated at the C-terminus
compared with the lengths of the other αCP-KH domains. The truncated C terminal region
A
Chapter 6: SPR analysis of αCP1-KH domains
155
was predicted to be unstable (Chapter 3, Section 3.1) and its removal proved necessary for
protein production. Despite this, circular dichroism spectra and the improved expression
and stability of the protein suggested it had folded correctly. In addition, since the C-
terminus exists away from the oligonucleotide binding-site, the truncation is unlikely to
impinge on oligonucleotide binding. It is therefore likely that, when held in the proximity
of oligonucleotide in the context of full-length αCP1, that αCP-KH2 participates in the
oligonucleotide binding interaction – even in the case of RNA binding. A similar RNA-
protein interaction has previously been described for the triple RRM molecule HuD (Park
et al., 2000). Whilst HuD-RRM3 was not observed to independently bind to RNA, it
clearly contributed to the binding interaction in the context of the full-length molecule –
primarily by reducing the off-rate of the protein-oligonucleotide interaction.
More detailed insight into the role of αCP1-KH2 domain and its recognition of nucleic acid
awaits structural information of the complex. However, these data are the first to
demonstrate αCP1-KH2 binding to DNA, albeit with low affinity and still no detectable
binding to RNA. It will be of great interest to investigate the binding of the longer αCP1-
KH2, if it can be successfully prepared, to DNA and RNA in order to examine the role of
the C-terminal helix.
6.6.3 αCP1-KH interaction with homopolymers
In order to understand the binding specificity of αCP1-KH domains to their target site, a
number of SPR experiments with RNA homopolymers were conducted. These included
poly (C), (G) and (U). 30 RU biotinylated poly (C), (G) and (U) RNAs were immobilised
on flow cells 2, 3 and 4 respectively. Flow cell 1 was used as a reference cell. Protein
samples were injected over the surface and binding responses of αCP1-KH1, KH2 and
KH3 at various concentrations were monitored. Representative sensograms for these results
are shown in Figure 6.12. Both αCP1-KH1and KH3 bound to homopolymer poly (C) but
none of the KH domains bound to the poly (G) or (U). In addition, αCP1-KH1 appears to
form a stronger complex than αCP1-KH3 with its much slower dissociation, in direct
contrast to the very fast off rate of αCP1-KH3. αCP1-KH2 did not show binding to any of
the RNA homopolymers.
Chapter 6: SPR analysis of αCP1-KH domains
156
A large negative refractive index change was present in the αCP1-KH2 response. This
could be attributed to a number of factors. Ober et al, (1999) have showed that there can be
considerable variability between the sensograms of the bulk shifts in the four channels,
even when the running buffer is matched with the injected buffer, bulk shifts of up to about
100 RU have been observed (Ober and Ward, 1999). Furthermore, this variability is even
more marked when a larger bulk shifts are introduced, e.g., by using buffer that is more
dilute than the running buffer, which essentially occurs when the analyte is introduced in
the buffer. It was also shown that these effects are further enhanced at low signal levels,
which is possible in the case of αCP1-KH2, as it did not bind at all. Even after subtraction
of the bulk shift from the data, large perturbations can remain present in the flow cells
(Ober and Ward, 1999). Extensive equilibration of the chip can minimise some of these
effects, however, there is no evidence that equilibration removes them completely.
To verify that αCP1-KH domains do not bind to poly (G) and poly (U) and that the lack of
binding was not due to incorrect immobilisation of the homopolymers, we injected full-
length protein HuR and HuR RRM 1 and 2 and monitored the binding responses. HuR is an
RRM-containing RNA-binding protein and binds to U rich sites. The results for these are
shown in Figure 6.13. As predicted HuR full-length and HuR RRMs bound to the poly (U)
but not poly (C) and (G), which confirm that the RNA surface was available.
A number of models were generated to the αCP1-KH1 and KH3 poly (C) homopolymer
responses. But, not surprisingly, non would fit very well. Firstly in the homopolymer there
are eight possible trinucleotide binding sites (CCCC CCCC CC). This may make the
binding response very complex. More than one molecule of αCP1-KH domain could
readily bind. The expected stoichiometry of αCP1-KH1 and KH3 using equation 11 is 1.1
and 1.3 respectively, indicating a substoichiometric interaction of the protein with the RNA
surface, suggesting that either a fraction of the protein was inactive or the DNA ligand
inaccessible for interaction with the protein.
The findings in this study are consistent with the results of previous studies using filter
binding assays and SELEX methods (Dejgaard and Leffers, 1996), where KH domains
were shown to bind only poly (C) homopolymers and not poly (G) and (U). However, a
Chapter 6: SPR analysis of αCP1-KH domains
157
detailed analysis of the importance of each of the three cytosine for αCP1-KH domain
binding has never been investigated. This is what I next aimed to do, outlined in the
following sections.
Figure 6.12: Binding studies of αCP1-KH1, KH2 and KH3 with RNA homopolymers, poly (C), (G) and (U). 30 RU biotinylated RNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1, KH3 and KH2 domains from 10 to 0.625 µM at increasing concentrations for 2 min using flow rate of 50 µl/min.
Chapter 6: SPR analysis of αCP1-KH domains
158
Figure 6.13: Binding studies of HuR and HuR RRM1/RRM2 with RNA homopolyers, poly (C), (G) and (U). 30 RU biotinylated RNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of HuR and RRM 1 and 2 domains from 100 to 6.25 nM (100, 50, 25, 12.5 and 6.25 nM) at increasing concentrations for 2 min using flow rate of 50 µl/min.
6.6.4 αCP1-KH domain binding to single poly (C) site
My next experiment aimed at examining αCP1-KH domain binding to a simpler system. A
single site oligonucleotide probe was designed comprising a C-triplet embedded in a poly
(A) sequence. This single site oligonucleotide probe was anticipated to interact with the
αCP1-KH domains with the simplest possible kinetics, since adenosine is reportedly, and
from our structural studies, not able to fit in the αCP1-KH domain oligonucleotide binding
cleft, and a 1:1 stoichiometric ratio is expected.
I again wished to compare RNA and DNA binding and therefore the following probes were
designed:
1) 5’-biotin-AAA AAA AAA A-3’ (RNA, as a control)
2) 5’-biotin-AAA AAA CCC A-3’ (RNA)
3) 5’-biotin-AAA AAA CCC A-3’ (DNA)
Chapter 6: SPR analysis of αCP1-KH domains
159
To study the kinetics of αCP1-KH1 and KH3 interactions (not αCP1-KH2 due to its
already established generally low affinity to DNA and RNA) with the above probes on the
biosensor, the RNA control probe, the RNA probe and the DNA probe were captured on
SA chip flow cells 1 to 3 respectively. A representative data set for the binding reaction is
shown in Figure 6.14. No binding was detected to the poly (A) RNA sequence.
Furthermore, as predicted, both αCP1-KH1 and KH3 bound to the C triplet RNA and DNA
motifs. To analyse the binding responses quantitatively a simple Langmuir model was
generated and fitted well to the curves as shown in Figure 16. The rate constants from
analysis of the kinetic data are shown in Table 3 along with the steady state equilibrium
constant (SS KD). These data showed several interesting features.
Figure 6.14: Binding studies of αCP1-KH1 and KH3 to a 10mer poly (A) (adenine) and triplet CCC (cytosine) sequence. 30 RU biotinylated RNA and DNA were immobilized on a SA chip. Binding interactions were measured for a series of dilutions of αCP1-KH1 and KH3 domains from 10, 5, 2.5, 1.25, 0.625 and 0.312 µM at increasing concentrations for 2 min using flow rate of 50 µl/min. αCP1-KH1 is described by a slower association and dissociation rate. αCP1-KH3 is described, by fast association and dissociation rate. A Similar pattern is observed for their interaction to DNA, however αCP1-KH1 appears to prefer DNA to RNA evident from higher response while αCP1-KH3 seems to bind equally well to RNA and DNA.
Chapter 6: SPR analysis of αCP1-KH domains
160
A visual inspection of the binding curves shows there is not a significant difference
between the RNA and DNA response of the αCP1-KH domains. However, the maximum
response for each of αCP1-KH domains to the RNA and DNA surface is different. αCP1-
KH1 gives rise to a lower maximum response when compared to αCP1-KH3 and in
addition, it has a lower response to RNA than DNA. In contrast, αCP1-KH3 gives rise to
the highest maximum response and furthermore with very similar responses for both RNA
and DNA. The stoichiometry of the molecular complex was calculated for each αCP1-
KH/RNA and DNA complex using equation 11.
In the case of αCP1-KH1 the calculated stoichiometry was 0.12 and 0.23 to RNA and DNA
respectively, describing a substoichiometric interaction of the protein to both the RNA and
DNA surfaces. In addition, αCP1-KH3 stoichiometry was also described by
substoichiometric interaction giving a value of 0.6 for both RNA and DNA. This low
stoichiometry could be due to steric hindrance or a crowding affect of the ligand (Figure
6.16), which will essentially block the analyte from gaining access to the ligand. The ligand
could have interacted with itself forming secondary structures or with the chip surface.
Other possible reasons for this low stoichiometry could be due to impure ligand in which
only a small fraction of immobilized material represents ligand molecules, though both
RNA and DNA were purchased pure. Some of the ligand could also have been inactivated
by the immobilization conditions.
Chapter 6: SPR analysis of αCP1-KH domains
161
Figure 6.15: Kinetic analysis of αCP1-KH1 and αCP1-KH3 to triplet CCC RNA and DNA sequence 30 RU of biotinylated DNA was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH1 and KH3 domain from 10, 5, 2.5, 1.25 and 0.625 and 0.325 µM at increasing concentrations for 2 min using flow rate of 50 µl/min. The black dotted line represents the experimental data and the purple solid lines represent the best fit to a simple bimolecular reaction model.
Figure 6.16: Steric hindrance. The limited availability of the binding site leading to substoichiometric binding could result from steric hindrance. 1 shows that the analyte (A) has easy access to ligand (L) 2 shows the ligand is inaccessible and 3 shows binding of one analyte may prevent access of another analyte molecule to the ligand.
Further analysis of each domain reveals that αCP1-KH1 binds more slowly to RNA than
DNA. In the DNA response the curve depicts a faster association indicated by the steep
rise. This is not observed in the RNA response. Moreover, αCP1-KH1 forms a much more
stable complex with the DNA. This is apparent from the binding constants obtained from
1:1 binding model (Table 3) (Figure 16). The association rate for αCP1-KH1/DNA is ten
times faster then that for αCP1-KH1/RNA, underlying an equilibrium dissociation constant
for αCP1-KH1/DNA 10 fold lower than that for αCP1-KH1/RNA (4.5 versus 48 μM). This
Chapter 6: SPR analysis of αCP1-KH domains
162
difference agrees with the KD obtained from steady-state analysis, although there is a ten
fold difference. These results suggest that αCP1-KH1 shows a preference for binding to
DNA.
In contrast to αCP1-KH1, the association and dissociation constants of αCP1-KH3 with
either RNA or DNA were not significantly different between the two, consistent with the
shape of the binding curves. This suggests that αCP1-KH3 binds RNA and DNA sequence
equally well. Similar binding to RNA and DNA has been described for the αCP1-KH3
domain of the closely related protein hnRNP K. Based on NMR titrations of the hnRNP K-
KH3 domain, equilibrium constants obtained for the RNA sequence UCCC and DNA
sequence TCCC corresponded to 1.8 and 2.2 μM. This is consistent with their study that no
close contacts or steric clashes were observed when RNA was modelled in the hnRNP K-
KH3/DNA complex by replacing the H2’ hydrogen in the deoxyribose of DNA by the 2’-
hydroxyl group in the ribose of RNA and the replacement of the Thy2 with a uridine
(Backe et al., 2005).
αCP1-KH2 was excluded from this experiment, as the protein had degraded, but we
predicted that it should also bind the DNA sequence. It will be of great interest to conduct
these experiments with αCP1-KH2 in the future.
6.6.5 αCP1-KH domain binding specificity
Our earlier experiments revealed that the KH domains of αCP1 are capable of binding to
both RNA and DNA sequences representing the 30 nucleotide 3’UTR of AR. In addition,
they have the ability to bind a 10 mer RNA or DNA sequence containing only a single C-
rich site (AAA AAA CCC A). Next, we were interested to investigate the significance of
each cytosine in the C-rich site by testing whether KH domains can tolerate mutations of
any of the cytosines. We systematically mutated each cytosine (C) with thymine (T), since
this pyrimidine would not preclude binding by the αCP1-KH domains due to steric
hindrance. Any loss of binding observed would signify the loss of a specific interaction.
For these experiments DNA sequences were adopted, as it was much easier to handle DNA
than RNA. The sequences included are listed below:
Chapter 6: SPR analysis of αCP1-KH domains
163
1) AAA AAA TTT A (DNA control sequence known as TTT)
2) AAA AAA TCC A (DNA sequence known as ATCC)
3) AAA AAA CTC A (DNA sequence known as ACTC)
4) AAA AAA CCT A (DNA sequence known as ACCT)
To study the kinetics of αCP1-KH1 and KH3 interaction with the above probes on the
biosensor, the DNA control probe and the target DNA probes were captured on the SA chip
flow cells 1 to 4 in the same order as listed above. A representative data set for the binding
reaction is shown in Figures 6.17 and 6.18.
Figure 6.17: Binding studies of αCP1-KH1 to systematic mutation of the triplet CCC site to thymine DNA sequences. 30 RU biotinylated DNA sequence was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH1 domain from 10, 5, 2.5, 1.25 and 0.625 µM at increasing concentrations for 2 min using flow rate of 50 µl/min. The triplet TTT sequence was used as a control. αCP1-KH1 does not bind to it. αCP1-KH1 also does not bind to TCC and CTC mutated sequences as αCP1-KH1 does slightly bind to CCT mutated sequence.
Chapter 6: SPR analysis of αCP1-KH domains
164
Figure 6.18: Binding studies of αCP1-KH3 to systematic mutation of the triplet CCC site to thymine DNA sequences. (A) 30 RU biotinylated DNA sequence was immobilized on a SA chip. Binding interactions were measured for a series of dilutions of the αCP1-KH3 domain from 10, 5, 2.5, 1.25 and 0.625 µM (from top to bottom) at increasing concentrations for 2 min using flow rate of 50 µl/min. the triplet TTT sequence was used as a control. αCP1-KH3 does not bind to it. αCP1-KH3 does not also bind to TCC and CTC mutated sequences. αCP1-KH3 does slightly bind to CCT mutated sequence. (B) The chemical structures of thymine and cytosine.
No binding was detected to the TTT DNA sequence, as anticipated. Neither αCP1-KH1
nor αCP1-KH3 showed binding to the target DNA ATCC sequence. This showed that the
(A)
(B)
Chapter 6: SPR analysis of αCP1-KH domains
165
KH domains do not tolerate another base in the second position. This matches well with the
data obtained from our structural studies of αCP1-KH1 with DNA and also others such as
αCP2-KH1/DNA, hnRNP K-KH3/DNA and Nova-2-KH3/RNA. In each case, a core motif
of four nucleotides can be identified that is recognized by the KH domains and each of
these bases plays an essential role in binding. The four core bases essential for KH binding
are TCCC (DNA), ACCC (DNA), T/CCCC (DNA) and UCAY (RNA, Y is a pyrimidine)
for αCP1-KH1, αCP2-KH1, hnRNPK-KH3 and Nova-2-KH3 respectively (Backe et al.,
2005; Du et al., 2005; Lewis et al., 2000). Analysis of these structures reveals that the
identity of the first base in position 1 is very variable. It is not involved in base-specific
interactions. Position 1 and 2 bases act as a molecular prong holding onto helix one of the
KH domain. However, recognition of the second nucleotide (cytosine in all four mentioned
complexes) in the core sequence is extremely specific. The cytosine in position 2 (Cyt2) is
specifically recognised by a number of hydrogen bonds in all the complexes. In particular, a
conserved arginine hydrogen bonds from NH1 and NH2 to cytosine pyrimidine ring 02, C2,
N3 and C4 atoms. In addition, Gly22 backbone carbonyl is positioned to make specific
hydrogen bond contacts to the Cyt N4. This recognition mode is present in all the
complexes and the unique set of hydrogen bonds is only compatible with a cytosine in this
position. As confirmed from our SPR data the equivalent interactions are not possible with
the thymine base mutated in this position. Furthermore, Van der Waals contacts are
mediated through a set of conserved hydrophobic residues and they appear not cytosine
specific but rather pyrimidine specific. However, our SPR results show that interactions
made with just the conserved hydrophobic residues are not enough for KH binding. It
requires the complete set of hydrogen bonding as well as the hydrophobic interactions,
which can only be formed if there is a cytosine base present in position 2 of the core
sequence.
αCP1-KH1 and KH3 were also intolerant of a mutation in Cyt in position 3. When cytosine
in position three was mutated to thymine (ACTC) the binding was also completely
abolished as shown in Figures 6.17 and 6.18. This is also consistent with structural studies.
In the case of poly (C) binding proteins, Cyt3 is also involved in a specific network of
hydrogen bonds. Here also a conserved arginine and isoleucine make extensive hydrogen
bonds to the N3, N4, O2 and C2 of the cytosine (Figure 6.19). Again, it is evident that this
Chapter 6: SPR analysis of αCP1-KH domains
166
unique set of hydrogen bonds is only possible with a cytosine residue in this position.
Interestingly, the third residue is an adenine in the core tetranucleotide for Nova-2-KH3,
which form the same hydrogen bonding partners as a cytosine in this position. That is N6 in
Ade and N4 in Cyt hydrogen bonds to the backbone carboxyl oxygen of a conserved
isoleucine residue. It is not known whether αCP1-KH domains would similarly tolerate
adenine.
Figure 6.19: Schematic depicting the complex αCP1-KH1/DNA. The four core bases TCCC are shown in orange, making contact with a number of amino acids shown with element colours. These contacts are essential for KH binding.
Lastly, we mutated the fourth cytosine (ACCT) and here, interestingly, the domains did
slightly tolerate thymine as evident from the small response in the binding curves. The fact
that some binding was observed for ACCT but not ATCC, suggests that the αCP1-KH
domains are binding in the intended register, rather than simply accommodating a CC in the
second and third positions. In addition, αCP1-KH3 gave a higher response than αCP1-
KH1. Recognition of the last residue in the tetranucleotide in all the complexes is less
specific. In hnRNP K-KH3 recognition of the Cyt4 is made through a water mediated
hydrogen bond to the protein while in αCP2-KH1 and Nova-2-KH3, this fourth base is
stabilised by stacking to the bases on either side and Van der Waals contacts are maintained
by conserved amino acid residues and a hydrogen bond to Glu51. Also, in our αCP1-
KH1/DNA, structure this residue is a cytosine and the structure reveals that this residue is
Chapter 6: SPR analysis of αCP1-KH domains
167
not so extensively contacted by KH domains as the first three bases. It also shows that
conserved amino acid residues are positioned close by and are able to make contacts with
O2 and N3 atoms of the pyrimidine ring. Therefore, irrespective of whether this position
contains a cytosine, uracil or thymine, some contact is still possible.
6.7 Conclusions There are several structural studies of single KH domains complexed with DNA and RNA,
which have provided great insight into the molecular basis of their sequence specific
binding. However, there is not much in the literature regarding the binding affinities and
kinetics of KH domains, in particular of αCP1-KH domains. This study is the first time
such a detailed analysis of binding of αCP1-KH domains to RNA and DNA sequences
have been conducted. The findings obtained verify some of the results predicted from
structural studies especially regarding the need for maximum KH binding to a core
tetranucleotide recognition sequence. Our mutational studies of the four core bases
confirmed the importance of cytosine in position two and three. Furthermore, this is the
first time that binding of αCP1-KH2 has been observed to DNA. The binding is consistent
with the conservation of amino acid residues shown to be important for αCP-KH domain
architecture and oligonucleotide binding. If such interaction is also observed in vivo then it
is likely that, when held in the proximity of oligonucleotide in the context of full-length
αCP1, that αCP-KH2 participates in the oligonucleotide binding interaction – even in the
case of RNA.
This study showed that isolated αCP1-KH domains prefer DNA sequence over the RNA
sequence of AR mRNA, but this was not observed when a simple 10-mer sequence with
only one triplet cytosine site was present. αCP1-KH1 exhibited a stronger affinity to the
10mer RNA but aCP1-KH3 did not show any significant difference in affinity to either the
10mer RNA or DNA. What can be concluded from this study is that each of the individual
domains can function as a discrete and independent RNA and DNA binding unit, albeit
with different levels of binding activity. These results indicate that the basis for differences
in αCP1 binding activity to RNA and DNA sequence results from binding of a different
KH domain. This differential αCP1-KH domain behavior has also been shown in previous
Chapter 6: SPR analysis of αCP1-KH domains
168
studies. For example, αCP1 participates in the regulation of mouse μ-opioid receptor
(MOR) gene regulation. EMSA studies have showed that the MOR DNA binding activity is
mainly due to KH1 domain and with some activity from KH2 and KH3 of αCP1 (Malik et
al., 2006). Another study has shown that the KH3 domain of hnRNP K, another poly (C)
binding protein, is the main domain recognizing DNA sequences (Braddock et al., 2002a;
Paziewska et al., 2004). These studies highlight structural differences between αCP1 and
hnRNP K as both can bind C-rich sequences but primarily with a different domain.
Each KH domain of αCP1 may behave differently in the context of the full-length protein.
For example, αCP1 can bind to the α-globin 3'-UTR effectively, leading to stabilization of
the mRNA, but hnRNP K cannot (Waggoner and Liebhaber, 2003b). In addition, αCP1 can
bind MOR DNA sequence (containing poly C sequence), but not hnRNP K. These studies
raise an interesting question of whether the differences in binding of these two proteins
result from protein topology differences or structural cooperativity between the different
domains.
To study stringently the binding behavior of αCP1-KH domains and their preferences to
RNA or DNA sequences, it is also important to consider a combination of two sequential
domains. This may elucidate either additive or synergistic effects of the combined domains
on the overall binding and may then shed light on sequence preference. However, it is
important to note that binding of αCP1-KH domain to either RNA or DNA sequences in
biological systems, both may lead to modulation of gene expression.
Chapter 7 General Discussion and Future
Work
Chapter 7: General Discussion and Future Work
169
7.1 Chapter Overview
We now appreciate that mRNA stability represents a key point of regulation in the control
of gene expression of a vast array of molecules. The duration of time a mRNA transcript
spends in the cytoplasm or nucleus prior to degradation can have profound influence on the
amount of the final protein product in the cell, which in turn can modulate biological
activity of the cell. Some mRNAs have short mRNA half-life of minutes while for some it
is several hours (Chkheidze et al., 1999; Jacobson and Peltz, 1996). Given that a large
range of proteins involved in key biological processes are regulated at the level of mRNA
decay, the field has focused in understanding the functional and structural biology
associated with basic protein-mRNA interactions.
The identification of mRNA cis-elements which interact with trans-acting RNA-binding
proteins has enabled detailed characterisation of the molecular mechanism involved in
regulating mRNA decay. The cis-element that I focused on in this thesis resides in the
3’UTR of AR mRNA, a UC-rich highly conserved 50 nt sequence.
The AR plays a critical role in the growth of prostate cancer. Prostate cancer constitutes a
major health issue in Western countries where it is now the second leading cause of cancer
deaths (Heinlein and Chang, 2004). Of both scientific and clinical interest is defining the
mechanisms that modulate AR gene expression, which would be of great value for the
development of novel prostate cancer therapies. AR expression is maintained throughout
prostate cancer progression, and the majority of androgen-independent or hormone
refractory prostate cancers express AR. Mutations of AR especially affect AR ligand
specificity and may add to the progression of prostate cancer and the failure of endocrine
therapy by permitting AR transcriptional activation in response to antiandrogens. In
addition, differences in the relative expression of AR coregulators have been found to occur
with prostate cancer progression and may contribute to differences in AR ligand specificity
or transcriptional activity (Heinlein and Chang, 2004).
The UC-rich cis-element of AR is thus an ideal target for investigation, as the AR mRNA is
regulated significantly at the level of stability. The element is a target for at least two
Chapter 7: General Discussion and Future Work
170
families of RNA-binding proteins and reporter assays indicate the importance of the
element in regulating AR mRNA turnover.
The present study was designed to explore the mechanism of binding of αCP1 to the target
AR UC-rich sequence at the 3’UTR of AR mRNA, in order to begin to understand how the
larger HuR/αCP1 complex might regulate the stability of AR mRNA. My aims were to
determine the structural basis of αCP1 and its binding to the C-rich region of AR mRNA,
with reference to its affinity and specificity. In addition, I aimed to characterise the kinetics
and binding affinities of the isolated αCP1-KH domains 1, 2, 3 with target probes, as well
as a variety of other RNA and DNA probes.
7.2 Stability of αCP1-KH domains
Previous studies have suggested that the minimum size of a KH domain is 68-72 amino
acid with the following amino acid domain boundaries for αCP-KH1 15-80, αCP-KH2
100-167 and αCP-KH3 282-348 (Ito et al., 1994). Our studies of αCP1 KH domains of 62-
65 amino acids proved unstable, resulting in aggregation of the protein in the insoluble
fraction. Each domain was missing approximately 10 amino acid residues from the C-
terminus when compared to the domain boundaries of previous studies (Dejgaard and
Leffers, 1996), suggesting that the aggregation of the protein may have been due to the
missing residues. It appeared that the C-terminus conferred a stabilization effect upon the
domain. The residues from the C-terminus corresponded to almost half of the third α helix.
The absence of this helix can expose some of the hydrophobic core residues to a
hydrophilic environment and hence cause the aggregation of the protein. This C-terminus
stability effect was supported when protein aggregation did not occur upon the extension of
αCP1-KH2 domain at the C-terminus. However, this extension only produced soluble
protein for αCP1-KH1 and KH3 but not KH2. We were able only once to obtain a
minimum amount of KH2 protein, which was not functional. Interestingly, for αCP1-KH2
a soluble domain was only obtained with the exclusion of the third helix. Based upon
peptide sequences commonly found in unstable proteins, the C-terminal region of αCP1-
KH2 in particular renders this domain unstable. Our truncated αCP1-KH2 not only proved
to be stable but was also functional. The protein was correctly folded, confirmed by our
Chapter 7: General Discussion and Future Work
171
circular dichroism data, and in addition, it bound oligonucleotides despite contrary data
from previous studies (Dejgaard and Leffers, 1996).
7.3 Structural studies of αCP1
It was assumed that αCP interacts with AR mRNA through its KH domain based on
structural studies of single KH motifs in the presence of RNA. Our structure and molecular
dynamic studies of αCP1-KH3 not only reveal that αCP1-KH3 adopts a classical type I KH
domain fold with a triple-stranded β-sheet held against a three-helix cluster in a βααββα
configuration, but also our homology model of αCP1-KH3 with poly (C) RNA provided
insight to the molecular basis for oligonucleotide binding and poly (C) RNA specificity as a
initial step towards characterising the full-length protein.
Structural analysis of several KH domains in the presence of oligonucleotide have revealed
the main oligonucleotide contacts to involve the narrow hydrophobic cleft that runs
between α-helix 2 and β-sheet 2 and across the GXXG motif. It is thought that the
narrowness of the cleft confers the specificity of these KH domains for pyrimidines.
Likewise, αCP1-KH3 possesses a narrow hydrophobic cleft that would be expected to
accommodate pyrimidine-rich RNA or ssDNA, rather than the larger purine bases (Chapter
4). Specificity for cytosines over uracil or thymine can also be rationalized on the basis of
specific hydrogen bond interactions to cytosine C2 carbonyl, N3 and C4 functionalities.
Preferential binding to RNA over ssDNA would be explained in part by sugar hydroxyl
intermolecular hydrogen bonding. It may also be that a poly (C) RNA oligonucleotide is
able to contour perfectly in the binding cleft, with inter-nucleotide hydrogen bonds from
sugar hydroxyls stabilizing this conformation. On the other hand, C-rich ssDNA has been
shown to adopt very similar interactions with hnRNP K-KH3, a closely related KH domain,
and is reported to bind just as well, if not better, than RNA (Braddock et al., 2002a).
Our study of a single isolated KH domain provides insight into the way in which single KH
domains recognize RNA, but does not reveal the way in which tandem repeats interact with
RNA. The presence of multiple KH domains in αCP protein raises the question of which
domain or combination of domains dictates RNA-binding specificity and affinity.
Chapter 7: General Discussion and Future Work
172
Previous studies of αCP isomers have revealed that the optimal target sequence for αCP2
encompasses three short C-stretches within the RNA target, suggesting that each of the
three KH domains may play a role in binding to the RNA (Chkheidze et al., 1999). This
mode of nucleic acid/αCP interaction would increase the sequence specificity and affinity
of the interaction, potentially by maintaining the interacting partners in a certain significant
biological conformation. In another study (Thisted et al., 2001), the interaction of αCP1 or
2 with the C-rich sequence of the 5’UTR of poliovirus RNA is mediated via KH1.
Although all three KH domains are capable of binding nucleic acids, here they functionally
differ. In addition, hnRNP K, which is closely related to the αCP proteins both in the
number and organization of KH domains, the optimal target sequence has been shown to be
a single short C stretch (Thisted et al., 2001). These data suggest that whereas a single KH
domain in hnRNP K mediates a high affinity interaction, a tandem array of three patches
maximises αCP binding to its RNA target. Therefore, the binding of αCP to its optimised
target might reflect individual interactions by each of the three KH domains. The question
of how these multiple contacts are organised and why the closely related αCP and hnRNP
K differ in their RNA binding await further structural studies of the respective RNA-protein
complexes.
It is likely that the three KH domains of αCPs act synergistically within the protein and
thereby modulate the overall affinity of the individual domains. An example of KH-domain
collaboration is seen for FUSE binding protein in which KH domains 3 and 4 are both
necessary and sufficient for the binding of the protein to the DNA promoter region
upstream of c-myc (Braddock et al., 2002b).
The tandem arrangement of poly (C) stretches on the human AR C-rich motif is very
similar to some of the other well characterized nucleic acid targets for αCP1/2 proteins
such as the 3’UTR of α-globin mRNA. Three C-rich stretches are present at the 3’UTR of
AR mRNA (Yeap et al., 2002), arranged spatially in a manner to allow the possible binding
of all three KH domains. In the case of the AR mRNA sequence, two poly (C) regions exist
in the 51-nt cis element shown to bind αCP1 and αCP2 and to stabilise the mRNA (Yeap et
al., 2002). These could be the binding sites for αCP1-KH1 and KH2. A third C-rich stretch,
Chapter 7: General Discussion and Future Work
173
however, exists 7 bases downstream and could readily be targeted by αCP1-KH3. This was
partially revealed to us by our αCP1-KH1/DNA structure (Chapter 5). The αCP1-KH1
structure with the 11 nucleotide (5’-TTCCCTCCCTA-3’) DNA sequence containing the
two C-rich target elements at the 3’UTR of AR formed a dimer, with one monomer
contacting 5’-TCCC and the other contacting the TCCC-3’. Although the two KH domains
bound to the same oligonucleotide are positioned very closely, they do not make contact
with one another. This reveals the way in which two KH domains may be closely
juxtaposed when bound at adjacent C-rich binding sites. Our study has confirmed that a
single KH domain makes contacts with 4 bases and that there is no steric hindrance to the
binding of two KH domains to adjacent oligonucleotide stretches. This, together with our
demonstration that αCP1-KH2 does, in fact, bind to oligonucleotide (Chapter 6), suggests
that all three KH domains of full-length αCP proteins are involved in oligonucleotide
binding.
7.3.1 �αCP1-KH1/DNA and other KH/nucleotide complexes
Previous studies had primarily focused on the biological role of αCP1-KH domains on
RNA except the recent report of αCP2-KH1 with telomeric DNA (Du et al., 2005). In this
study we have established the ability of αCP1-KH1 to recognise the 11-mer poly (C) DNA
sequence representing the poly (C) rich site at the 3’UTR of AR mRNA. Comparison of the
DNA binding of the αCP1-KH1/DNA with previously solved structures of KH domains
with RNA reveals the same nucleic binding cleft and a very similar mode of interaction,
suggesting that binding of αCP1-KH to RNA will also adopt very similar interactions. This
is somewhat different to other dual-specificity nucleic acid-binding proteins including the
transcription factor, Xenopus TFIIIA, which recognises RNA and DNA targets with
different protein motifs but where also the RNA and DNA structure recognised is different
(Cassiday and Maher I, 2002).
Analysis of structures of various KH domains in complexes with RNA and DNA has
revealed a number of common features, despite the different nucleic acid targets and
detailed interactions dictating specificity. All of the KH domains maintain the overall same
topology and a common binding groove formed by the variable and the GXXG loop, the
Chapter 7: General Discussion and Future Work
174
two α helices and the second � strand (Chapters 4 and 5). The floor of this binding groove
is hydrophobic and the edges on both sides are charged and hydrophobic. The core
recognition sequence is single stranded and extended, positioned in the binding groove with
the 5’end of the sequence at the top of the groove, the bases of the sequence point inward to
the right of the groove and the sugar-phosphate backbone face the left side of the groove,
having a neutralizing affect on the positively charged residues located on this left side of
the binding site. Of general importance in all of the complexes is the hydrophobic
interaction between the nucleic acid bases and the hydrophobic floor, whereas specificity is
dictated by specific hydrogen bonding between the protein side chains and the functional
groups of the bases. The core recognition sequence recognised in all these structures
comprise of four residues in length. These observations are also seen in our structure of
αCP1-KH1/DNA, albeit with a number of specific hydrogen bonds defining poly (C)
specificity.
7.4�αCP1-KH binding Kinetics
Previous studies have reported that αCP not only binds poly (C) rich RNA but also binds
single and double stranded DNA (Dejgaard and Leffers, 1996). In our study, all three αCP-
KH domains were shown to be able to bind to the poly (C) containing AR DNA, with a
binding affinity order of αCP1-KH1>KH3>KH2. Binding affinity of αCP1-KH1 and αCP-
KH3 to AR RNA was considerably lower than to DNA. However, this was not the case
when the binding was monitored to a simpler system (Chapter 6, Section 6.6.4). αCP1-KH1
preferred the single poly (C) DNA sequence over the RNA sequence, while αCP1-KH3 did
not show any significant difference in RNA and DNA binding affinities. Furthermore,
αCP1-KH2 did not exhibit any binding to RNA, consistent with previous studies.
Our kinetic studies also revealed the preferred and minimum sequence required for αCP1-
KH interaction with oligonucleotides. These domains do not bind any other sequence
except a C-rich sequence. Binding to the C-rich sequence is primarily mediated by four
core recognition bases (XCCC, X is a different base in different complex systems). The KH
domains did not tolerate any other base in position 2 and 3 of the core recognition
sequence. Binding was completely abolished. This is consistent with data from structural
Chapter 7: General Discussion and Future Work
175
studies of several KH domains with oligonucleotides revealing that the bases at position 2
and 3 are involved in a number of specific hydrogen bonding to the nucleic acid that can
only be achieved with a cytosine present at these positions. Position 4 slightly tolerated a
different base, which agreed with that showed fewer specific hydrogen bonds to the base.
The interaction of αCP with single stranded DNA on the sensor chip, in our SPR study and
our crystal structure is consistent with previous studies. The closely related hnRNP K-KH3
binds specifically to the single stranded C-rich sequence in the promoter of human c-myc
gene, activating transcription. It was also shown in vitro experiments that both hnRNP K
and αCP1 bind a C-rich strand of human telomeric DNA; whether such interaction is
biologically significant awaits further studies.
7.5 Future directions
In summary, these studies have provided new insight into the structural and biophysical
features of the individual αCP1-KH domains and together with our studies of interactions
with nucleic acids, enabled determination of binding and affinity that have highlighted the
basis of poly (C) specificity. However, in order to contemplate the mode of interaction by
the full-length protein, further biophysical and structural studies are required, including
studies in the presence of both the target RNA and ssDNA. This will reveal the three-
dimensional arrangement of the protein complexed with the nucleic acid, which will
elucidate the essential role of each αCP1-KH domain in interacting with the probe. In
addition, another interesting system that should be investigated is the combined αCP1-
KH1/2 and αCP1-KH2/3. This will not only elucidate the role of αCP1-KH2 but also the
linker region between these domains. Furthermore, the binding affinities of these
complexes will reveal whether KH2 has any cooperative affect on the binding of the KH1
and KH3 domains.
�he αCP and AR mRNA complex will not be the only target for drugs. This is because the
biological functions imparted due to αCP interactions with their target RNAs involve multi
complex protein-RNA systems. In the AR mRNA, there is also HuR, which may be closely
associated and required for its stability. Future experiments should also look at the
Chapter 7: General Discussion and Future Work
176
structural and binding affinities of each RRM domain in HuR and ultimately examine at
how they are all associated to form a multiprotein/RNA complex.
There are other examples of multiprotein/RNA complexes, such as α�globin mRNA,
which is stabilised by the formation of the α�complex, comprising PABP, a number of
unidentified proteins and αCP (Kiledjian et al., 1995; Wang et al.,
σψστεμ τηατ αCP is involved, there are other proteins that play important roles towards
executing a particular biological function. Similarly, the HuR/αCP/AR mRNA complex
may require all proteins present to be stable and functional, which could, in the long term,
present multiple targets for novel therapeutics, where the goal is to specifically disrupt the
αCP and AR mRNA interaction, downregulate AR expression and reduce growth of
prostate cancer cells.
Chapter 8 References
Chapter 8: References
177
References
Adams, D. J., Beveridge, D. J., van der Weyden, L., Mangs, H., Leedman, P. J., and Morris, B. J. (2003). HADHB, HuR, and CP1 Bind to the Distal 3'-Untranslated Region of Human Renin mRNA and Differentially Modulate Renin Expression. J Biol Chem 278, 44894-44903. Alan, H., and Phylis, S. (1960). Crystals and Crystal Growing (New York: Anchor Books-Doubleday). Anant, S., Blanc, V., and Davidson, N. O. (2003). Molecular regulation, evolutionary, and functional adaptations associated with C to U editing of mammalian apolipoproteinB mRNA. Prog Nucleic Acid Res Mol Biol 75, 1-41. Baber, J. L., Libutti, D., Levens, D., and Tjandra, N. (1999). High Precision Solution Structure of the C-terminal KH Domain of Heterogeneous Nuclear Ribonucleoprotein K, a c-myc Transcription Factor. Journal of Molecular Biology 289, 949-962. Backe, P. H., Messias, A. C., Ravelli, R. B. G., Sattler, M., and Cusack, S. (2005). X-Ray Crystallographic and NMR Studies of the Third KH Domain of hnRNP K in Complex with Single-Stranded Nucleic Acids. Structure 13, 1055-1067. Baker, N. A., Sept, D., Joseph, S., Holst, M. J., and McCammon, J. A. (2001). Electrostatics of nanosystems: Application to microtubules and the ribosome. PNAS 98, 10037-10041. Bakheet, T., Frevel, M., Williams, B. R. G., Greer, W., and Khabar, K. S. A. (2001). ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucl Acids Res 29, 246-254. Bakheet, T., Williams, B. R. G., and Khabar, K. S. A. (2003). ARED 2.0: an update of AU-rich element mRNA database. Nucl Acids Res 31, 421-423. Balmer, L. A., Beveridge, D. J., Jazayeri, J. A., Thomson, A. M., Walker, C. E., and Leedman, P. J. (2001). Identification of a Novel AU-Rich Element in the 3' Untranslated Region of Epidermal Growth Factor Receptor mRNA That Is the Target for Regulated RNA-Binding Proteins. Mol Cell Biol 21, 2070-2084. Bandiera, A., Tell, G., Marsich, E., Scaloni, A., Pocsfalvi, G., Akintunde Akindahunsi, A., Cesaratto, L., and Manzini, G. (2003). Cytosine-block telomeric type DNA-binding activity of hnRNP proteins from human cell lines. Archives of Biochemistry and Biophysics 409, 305-314. Bank, R., and Holst, M. (2003). A new paradigm for parallel adaptive meshing algorithms. SIAM Rev 45, 291-323. Barlati, S., and Barbon, A. (2005). RNA editing: a molecular mechanism for the fine modulation of neuronal transmission. Acta Neurochir Suppl 93, 53-57.
Chapter 8: References
178
Barreau, C., Paillard, L., and Osborne, H. B. (2006). AU-rich elements and associated factors: are there unifying principles? Nucl Acids Res 33, 7138-7150. Baumann, S. (1998). Indirect immobilization of recombinant proteins to a solid phase using the albumin binding domain of streptococcal protein G and immobilized albumin;. Immunol Methods 221, 95-106. Beckett, D. (2001). Regulated assembly of transcription factors and control of transcription initiation. Journal of Molecular Biology 314, 335-352. Beelman, C. A., and Parker, R. (1995). Degradation of mRNA in eukaryotes. Cell 81, 179-183. Bergmann, I. E., and Brawerman, G. (1980). Loss of the polyadenylate segment from mammalian messenger RNA : Selective cleavage of this sequence from polyribosomes. Journal of Molecular Biology 139, 439-454. BIACORE, A. (1997). Kinetic and affinity analysis using BIA - Level 1). Blyn, L. B., Towner, J. S., Semler, B. L., and Ehrenfeld, E. (1997). Requirement of poly(rC) binding protein 2 for translation of poliovirus RNA. J Virol 71, 6243-6246. Braddock, D. T., Baber, J. L., Levens, D., and Clore, G. M. (2002a). Molecular basis of sequence-specific single-stranded DNA recognition by KH domains: solution structure of a complex between hnRNP K KH3 and single-stranded DNA. EMBO J 21, 3476-3485. Braddock, D. T., Louis, J. M., Baber, J. L., Levens, D., and Clore, G. M. (2002b). Structure and dynamics of KH domains from FBP bound to single-stranded DNA. Nature 415, 1051-1056. Bradford, M. M. (1976). A rapid and sensitive for the quantitation of microgram quantitites of protein utilizing the principle of protein-dye binding. Analytical Biochemistry 72, 248-254. Brawerman, G. (1981). The Role of the poly(A) sequence in mammalian messenger RNA. CRC Crit Rev Biochem 10, 1-38. Brown, Cheryl Y., Lagnado, Cathy A., and Goodall, Gregory J. (1996). A cytokine mRNA-destabilizing element that is structurally and functionally distinct from A+U-rich elements. PNAS 93, 13721-13725. Bubley, G., and Balk, S. (1996). Treatment of metastatic prostate cancer. Lessons from the androgen receptor. Hematol Oncol Clin North Am 10, 713-725. Buckanovich, R. J., and Darnell, R. B. (1997). The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo. Mol Cell Biol 17, 3194-3201.
Chapter 8: References
179
Burd, C., and Dreyfuss, G. (1994). Conserved structures and diversity of functions of RNA-binding proteins. Science 265, 615-621. Bustelo, X. R., Suen, K. L., Michael, W. M., Dreyfuss, G., and Barbacid, M. (1995). Association of the vav proto-oncogene product with poly(rC)-specific RNA-binding proteins. Mol Cell Biol 15, 1324-1332. Bycroft, M., Grunert, S., Murzin, A. G., Proctor, M., and St Johnston, D. (1995). NMR solution structure of a dsRNA binding domain from Drosophila staufen protein reveals homology to the N-terminal domain of ribosomal protein S5. The EMBO Journal 14, 3563-3571. Cáceres, J. F., and Krainer, A. R. (1993). Functional analysis of pre-mRNA splicing factor SF2/ASF structural domains. The EMBO Journal 12, 4715-4726. Cai, M., Huang, Y., Sakaguchi, K., Gronenborn, A., M, and Craigie, R. (1998). An efficient and cost-effective isotope labeling protocol for proteins expressed in shape Escherichia coli. Journal of Biomolecular NMR V11, 97-102. Calabro, V., Daugherty, M. D., and Frankel, A. D. (2005). A single intermolecular contact mediates intramolecular stabilization of both RNA and protein. PNAS 102, 6849-6854. Calnan, B., Tidor, B., Biancalana, S., Hudson, D., and Frankel, A. (1991). Arginine-mediated RNA recognition: the arginine fork. Science 252, 1167-1171. Cassiday, L. A., and Maher Iii, L. J. (2002). Having it both ways: transcription factors that bind DNA and RNA. Nucl Acids Res 30, 4118-4126. Chen, C.-Y., Del Gatto-Konczak, F., Wu, Z., and Karin, M. (1998). Stabilization of Interleukin-2 mRNA by the c-Jun NH2-Terminal Kinase Pathway. Science 280, 1945-1949. Chen, C.-Y., Gherzi, R., Ong, S.-E., Chan, E. L., Raijmakers, R., Pruijn, G. J. M., Stoecklin, G., Moroni, C., Mann, M., and Karin, M. (2001). AU Binding Proteins Recruit the Exosome to Degrade ARE-Containing mRNAs. Cell 107, 451-464. Chen, C.-Y. A., and Shyu, A.-B. (1995). AU-rich elements: characterization and importance in mRNA degradation. Trends in Biochemical Sciences 20, 465-470. Chen, C. Y., and Shyu, A. B. (1994). Selective degradation of early-response-gene mRNAs: functional analyses of sequence features of the AU-rich elements. Mol Cell Biol 14, 8471-8482. Chen, T., Damaj, B. B., Herrera, C., Lasko, P., and Richard, S. (1997). Self-association of the single-KH-domain family members Sam68, GRP33, GLD-1, and Qk1: role of the KH domain. Mol Cell Biol 17, 5707-5718. Chen, T., and Richard, S. (1998). Structure-Function Analysis of Qk1: a Lethal Point Mutation in Mouse quaking Prevents Homodimerization. Mol Cell Biol 18, 4863-4871.
Chapter 8: References
180
Chkheidze, A. N., and Liebhaber, S. A. (2003). A Novel Set of Nuclear Localization Signals Determine Distributions of the {alpha}CP RNA-Binding Proteins. Mol Cell Biol 23, 8405-8415. Chkheidze, A. N., Lyakhov, D. L., Makeyev, A. V., Morales, J., Kong, J., and Liebhaber, S. A. (1999). Assembly of the alpha -Globin mRNA Stability Complex Reflects Binary Interaction between the Pyrimidine-Rich 3' Untranslated Region Determinant and Poly(C) Binding Protein alpha CP. Mol Cell Biol 19, 4572-4581. Claverie, J.-M. (2001). GENE NUMBER: What If There Are Only 30,000 Human Genes? Science 291, 1255-1257. Colgan, D. F., and Manley, J. L. (1997). Mechanism and regulation of mRNA polyadenylation. Genes Dev 11, 2755-2766. Collaborative, C. P. N. (1994). The CCP4 suite: programs for protein crystallography. D Biol Crystallogr 50, 760-763. Coller, J., and Parker, R. (2004). EUKARYOTIC mRNA DECAPPING. Annual Review of Biochemistry 73, 861-890. Collingwood, T. N., Urnov, F. D., and Wolffe, A. P. (1999). Nuclear receptors: coactivators, corepressors and chromatin remodeling in the control of transcription. J Mol Endocrinol 23, 255-275. Cusack, S. (1999). RNA-protein complexes. Current Opinion in Structural Biology 9, 66-73. Czyzyk-Krzeska, M. F., and Bendixen, A. C. (1999). Identification of the Poly(C) Binding Protein in the Complex Associated With the 3' Untranslated Region of Erythropoietin Messenger RNA. Blood 93, 2111-2120. Davis, M. E., and McCammon, J. A. (1990). Electrostatics in biomolecular structure and dynamics. Chem Rev 94, 7684-7692. Davis, S. J., Ikemizu, S., Wild, M. K., and Merwe., P. A. v. d. (1998). CD2 and the nature of protein interactions mediating cell-cell recognition. Immunol Rev 163, 217-236. De Boulle, K., Verkerk, A. J. M. H., Reyniers, E., Vits, L., Hendrickx, J., Van Roy, B., Van Den Bos, F., de Graaff, E., Oostra, B. A., and Willems, P. J. (1993). A point mutation in the FMR-1 gene associated with fragile X mental retardation. Nat Genet 3, 31-35. Decker, C. J., and Parker, R. (2002). mRNA decay enzymes: Decappers conserved between yeast and mammals. PNAS 99, 12512-12514. Dejgaard, K., and Leffers, H. (1996). Characterisation of the nucleic-acid-binding activity of KH domains. Different properties of different domains. Eur J Biochem 241, 425-431.
Chapter 8: References
181
Dejgaard, K., Leffers, H., Rasmussen, H. H., Madsen, P., Kruse, T. A., Gesser, B., Nielsen, H., and Celis, J. E. (1994). Identification, Molecular Cloning, Expression and Chromosome Mapping of a Family of Transformation Upregulated hnRNP-K Proteins Derived by Alternative Splicing. Journal of Molecular Biology 236, 33-48. Di Fruscio, M., Chen, T., Bonyadi, S., Lasko, P., and Richard, S. (1998). The Identification of Two Drosophila K Homology Domain Proteins. KEP1 AND SAM ARE MEMBERS OF THE Sam68 FAMILY OF GSG DOMAIN PROTEINS. J Biol Chem 273, 30122-30130. Du, Z., Lee, J. K., Tjhen, R., Li, S., Pan, H., Stroud, R. M., and James, T. L. (2005). Crystal Structure of the First KH Domain of Human Poly(C)-binding Protein-2 in Complex with a C-rich Strand of Human Telomeric DNA at 1.7 A. J Biol Chem 280, 38823-38830. Du, Z., Yu, J., Chen, Y., Andino, R., and James, T. L. (2004). Specific Recognition of the C-rich Strand of Human Telomeric DNA and the RNA Template of Human Telomerase by the First KH Domain of Human Poly(C)-binding Protein-2. J Biol Chem 279, 48126-48134. Duncan, R., Bazar, L., Michelotti, G., Tomonaga, T., Krutzsch, H., Avigan, M., and Levens, D. (1994). A sequence-specific, single-strand binding protein activates the far upstream element of c-myc and defines a new DNA-binding motif. Genes Dev 8, 465-480. Ebersole, T. A., Chen, Q., Justice, M. J., and Artzt, K. (1996). The quaking gene product necessary in embryogenesis and myelination combines features of RNA binding and signal transduction proteins. Nat Genet 12, 260-265. Faustino, N. A., and Cooper, T. A. (2003). Pre-mRNA splicing and human disease. Genes Dev 17, 419-437. Fish, R. N., and Kane, C. M. (2002). Promoting elongation with transcript cleavage stimulatory factors. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression 1577, 287-307. Flaherty, S. M., Fortes, P., Izaurralde, E., Mattaj, I. W., and Gilmartin, G. M. (1997). Participation of the nuclear cap binding complex in pre-mRNA 3' processing. PNAS 94, 11893-11898. Francis, R., Barton, M. K., Kimble, J., and Schedl, T. (1995a). gld-1, a Tumor Suppressor Gene Required for Oocyte Development in Caenorhabditis elegans. Genetics 139, 579-606. Francis, R., Maine, E., and Schedl, T. (1995b). Analysis of the Multiple Roles of gld-1 in Germline Development: Interactions With the Sex Determination Cascade and the glp-1 Signaling Pathway. Genetics 139, 607-630. Fumagalli, S., Totty, N. F., Hsuan, J. J., and Courtneidge, S. A. (1994). A target for SRC in mitosis. Nature 368, 871-874.
Chapter 8: References
182
Gao, M., Fritz, D. T., Ford, L. P., and Wilusz, J. (2000). Interaction between a Poly(A)-Specific Ribonuclease and the 5' Cap Influences mRNA Deadenylation Rates In Vitro. Molecular Cell 5, 479-488. Garnick, M. B., and Fair, W. R. (1996). Prostate Cancer: Emerging Concepts: Part II. Ann Intern Med 125, 205-212. George, H. S., and Lyle, H. J. (1989). X-ray Structure Determination A Practical Guide (New York: John Wiliey & Sons). Gherzi, R., Lee, K.-Y., Briata, P., Wegmuller, D., Moroni, C., Karin, M., and Chen, C.-Y. (2004). A KH Domain RNA Binding Protein, KSRP, Promotes ARE-Directed mRNA Turnover by Recruiting the Degradation Machinery. Molecular Cell 14, 571-583. Gibson, T. J., Rice, P. M., Thompson, J. D., and Heringa, J. (1993). KH domains within the FMR1 sequence suggest that fragile X syndrome stems from a defect in RNA metabolism. Trends in Biochemical Sciences 18, 331-333. Glaser, R. W. (1993). Antigen-antibody binding and mass transport by convection and diffusion to a surface: a two-dimensional computer model of binding and dissociation kinetics. Analytical Biochemistry 213, 152-161. Goffeau, A., Barrell, B. G., Bussey, H., Davis, R. W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J. D., Jacq, C., Johnston, M., et al. (1996). Life with 6000 Genes. Science 274, 546-567. Graff, J., Cha, J., Blyn, L. B., and Ehrenfeld, E. (1998). Interaction of Poly(rC) Binding Protein 2 with the 5' Noncoding Region of Hepatitis A Virus RNA and Its Effects on Translation. J Virol 72, 9668-9675. Grishin, N. V. (2001). KH domain: one motif, two folds. Nucl Acids Res 29, 638-643. Guhaniyogi, J., and Brewer, G. (2001). Regulation of mRNA stability in mammalian cells. Gene 265, 11-23. Hahnefeld, C., Drewianka, S., and Herberg, F. (2004). Determination of kinetic data using surface plasmon resonance biosensors. Methods Mol Med 94, 299-320. Hastings, M. L., and Krainer, A. R. (2001). Pre-mRNA splicing in the new millennium. Current Opinion in Cell Biology 13, 302-309. Heinlein, C. A., and Chang, C. (2004). Androgen Receptor in Prostate Cancer. Endocr Rev 25, 276-308. Hilleren, P., and Parker, R. (1999). MECHANISMS OF mRNA SURVILLENCE IN EUKARYOTES. Annual Review of Genetics 33, 229-260. Holcik, M., and Liebhaber, S. A. (1997). Four highly stable eukaryotic mRNAs assemble 3' untranslated region RNA-protein complexes sharing cis and trans components. Biochemistry 94, 2410-2414.
Chapter 8: References
183
Hollams, E. M., Giles, K. M., Thomson, A. M., and Leedman, P. J. (2002). MRNA stability and the control of gene expression: implications for human disease. Neurochem Res 27, 957-980. Holst, M. (2001). Adaptive numerical treatment of elliptic systems on manifolds. Adv Comput Math 15, 139-191. Holst, M., and Saied, F. (1993). Multigrid solution of the Poisson–Boltzmann equation. J Comput Chem 14, 105-113. Holst, M., and Saied, F. (1995). Numerical solution of the nonlinear Poisson–Boltzmann equation: developing more robust and efficient methods. J Comput Chem 16, 337-364. Honig, B., and Nicholls, A. (1995). Classical electrostatics in biology and chemistry. Science 268, 1144-1149. Ito, K., Sato, K., and Endo, H. (1994). Cloning and characterization of a single-stranded DNA binding protein that specifically recognizes deoxycytidine stretch. Nucl Acids Res 22, 53-58. Izaurralde, E., Lewis, J., McGuigan, C., Jankowska, M., Darzynkiewicz, E., and Mattaj, I. W. (1994). A nuclear cap binding protein complex involved in pre-mRNA splicing. Cell 78, 657-668. Izquierdo, J.-M., and Valcarcel, J. (2006). A simple principle to explain the evolution of pre-mRNA splicing. Genes Dev 20, 1679-1684. Jacobson, A., and Peltz, S. W. (1996). Interrelationships of the Pathways of mRNA Decay and Translation in Eukaryotic Cells. Annual Review of Biochemistry 65, 693-739. Jensen, K. B., Musunuru, K., Lewis, H. A., Burley, S. K., and Darnell, R. B. (2000). The tetranucleotide UCAY directs the specific recognition of RNA by the Nova K-homology 3 domain. PNAS 97, 5740-5745. Ji, X., Kong, J., and Liebhaber, S. A. (2003). In Vivo Association of the Stability Control Protein {alpha}CP with Actively Translating mRNAs. Mol Cell Biol 23, 899-907. Jones, A. R., Francis, R., and Schedl, T. (1996). GLD-1, a Cytoplasmic Protein Essential for Oocyte Differentiation, Shows Stage- and Sex-Specific Expression duringCaenorhabditis elegansGermline Development. Developmental Biology 180, 165-183. Jones, A. R., and Schedl, T. (1995). Mutations in gld-1, a female germ cell-specific tumor suppressor gene in Caenorhabditis elegans, affect a conserved domain also found in Src- associated protein Sam68. Genes Dev 9, 1491-1504.
Chapter 8: References
184
Jonsson, U. (1991). Real-time biospecific interaction analysis using surface plasmon resonance and a sensor chip technology. Biotechniques 11, 620-627. Kati, K. W., Mika, J., Wallén, L. J., Tammela, R. L., and Vessella, T. V. (2006). Mutation screening of the androgen receptor promoter and untranslated regions in prostate cancer. The Prostate 9999, n/a. Katsamba, P. S., Park, S., and Laird-Offringa, I. A. (2002). Kinetic studies of RNA-protein interactions using surface plasmon resonance. Methods 26, 95-104. Keenan, R. J., Freymann, D. M., Walter, P., and Stroud, R. M. (1998). Crystal Structure of the Signal Sequence Binding Subunit of the Signal Recognition Particle. Cell 94, 181-191. Kharrat, A., Macias, M. J., Gibson, T. J., Nilges, M., and Pastore, A. (1995). Structure of the dsRNA binding domain of E. coli RNase III. The EMBO Journal 14, 3572-3584. Kiledjian, M., Wang, X., and Liebhaber, S. A. (1995). Identification of two KH domain proteins in the alpha-globin mRNP stability complex. EMBO J 14, 4357-4364. Kim, I., Liu, C. W., and Puglisi, J. D. (2006). Specific Recognition of HIV TAR RNA by the dsRNA Binding Domains (dsRBD1-dsRBD2) of PKR. Journal of Molecular Biology 358, 430-442. Kim, J. H., Hahm, B., Kim, Y. K., Choi, M., and Jang, S. K. (2000). Protein-protein interaction among hnRNPs shuttling between nucleus and cytoplasm. Journal of Molecular Biology 298, 395-405. Kim, S.-S., Pandey, K. K., Choi, H. S., Kim, S.-Y., Law, P.-Y., Wei, L.-N., and Loh, H. H. (2005). Poly(C) Binding Protein Family Is a Transcription Factor in {micro}-Opioid Receptor Gene Expression. Mol Pharmacol 68, 729-736. Koivisto, P., Kononen, J., Palmberg, C., Tammela, T., Hyytinen, E., Isola, J., Trapman, J., Cleutjens, K., Noordzij, A., Visakorpi, T., and Kallioniemi, O. P. (1997). Androgen receptor gene amplification: a possible molecular mechanism for androgen deprivation therapy failure in prostate cancer. Cancer Res 57, 314-319. Kong, J., Ji, X., and Liebhaber, S. A. (2003). The KH-Domain Protein {alpha}CP Has a Direct Role in mRNA Stabilization Independent of Its Cognate Binding Site. Mol Cell Biol 23, 1125-1134. Kozak, M. (2005). Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene 361, 13-37. Kumar, A., and Wilson, S. (1990). Studies of the strand-annealing activity of mammalian hnRNP complex protein A1. Biochem 29, 10717-10722. Lacroix, L., Lienard, H., Labourier, E., Djavaheri-Mergny, M., Lacoste, J., Leffers, H., Tazi, J., Helene, C., and Mergny, J.-L. (2000). Identification of two human nuclear proteins that recognise the cytosine-rich strand of human telomeres in vitro. Nucl Acids Res 28, 1564-1575.
Chapter 8: References
185
Leffers, H., Dejgaard, K., and Celis, J. E. (1995). Characterisation of two major cellular poly(rC)-binding human proteins, each containing three K-homologous (KH) domains. Eur J Biochem 230, 447-453. Levy, A. P., Levy, N. S., and Goldberg, M. A. (1996). Post-transcriptional Regulation of Vascular Endothelial Growth Factor by Hypoxia. J Biol Chem 271, 2746-2753. Lewis HA, C. H., Edo C, Buckanovich RJ, Yang YY, Musunuru K, Zhong R, Darnell RB, Burley SK (1999). Crystal structures of Nova-1 and Nova-2 K-homology RNA-binding domains. Structure 7, 191-203. Lewis, H. A., Musunuru, K., Jensen, K. B., Edo, C., Chen, H., Darnell, R. B., and Burley, S. K. (2000). Sequence-Specific RNA Binding by a Nova KH Domain: Implications for Paraneoplastic Disease and the Fragile X Syndrome. Cell 100, 323-332. Lewis, J. D., Izaurralde, E., Jarmolowski, A., McGuigan, C., and Mattaj, I. (1996). A nuclear cap-binding complex facilitates association of U1 snRNP with the cap-proximal 5' splice site. Genes Dev 10, 1683-1698. Lindquist, J. N., Kauschke, S. G., Stefanovic, B., Burchardt, E. R., and Brenner, D. A. (2000). Characterization of the interaction between {alpha}CP2 and the 3'-untranslated region of collagen {alpha}1(I) mRNA. Nucl Acids Res 28, 4306-4316. Liu, J., Lynch, P., Chien, C., Montelione, G., Krug, R., and Berman, H. (1997). Crystal structure of the unique RNA-binding domain of the influenza virus NS1 protein. Nat Struct Biol 4, 896-899. Lukong, K. E., and Richard, S. (2003). Sam68, the KH domain-containing superSTAR. Biochimica et Biophysica Acta (BBA) - Reviews on Cancer 1653, 73-86. Mahone, M., Saffman, E. E., and Lasko, P. F. (1995). Localized Bicaudal-C RNA encodes a protein containing a KH domain, the RNA binding motif of FMR1. EMBO J 14, 2043-2055. Makeyev, A. V., and Liebhaber, S. A. (2000). Identification of Two Novel Mammalian Genes Establishes a Subfamily of KH-Domain RNA-Binding Proteins. Genomics 67, 301-316. Makeyev, A. V., and Liebhaber, S. A. (2002). The poly (C)-binding proteins: A multiplicity of functions and a search mechanisms. RNA 8, 265-278. Malik, A. K., Flock, K. E., Godavarthi, C. L., Loh, H. H., and Ko, J. L. (2006). Molecular basis underlying the poly C binding protein 1 as a regulator of the proximal promoter of mouse [mu]-opioid receptor gene. Brain Research 1112, 33-45. Maris, C., Dominguez, C., and Allain, F. H. T. (2005). The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression. FEBS Journal 272, 2118-2131.
Chapter 8: References
186
Matthews, B. W. (1977). X ray Structure of Proteins (NY: Academic press). Matunis, M. J., Michael, W. M., and Dreyfuss, G. (1992). Characterization and primary structure of the poly(C)-binding heterogeneous nuclear ribonucleoprotein complex K protein. Mol Cell Biol 12, 164-171. McKnight, G. L., Reasoner, J., Gilbert, T., Sundquist, K. O., Hokland, B., McKernan, P. A., Champagne, J., Johnson, C. J., Bailey, M. C., and Holly, R. (1992). Cloning and expression of a cellular high density lipoprotein-binding protein that is up-regulated by cholesterol loading of cells. J Biol Chem 267, 12131-12141. Messias, A., and Sattler, M. (2004). Structural basis of single-stranded RNA recognition. Acc Chem Res 37, 279-287. Meyer, S., Temme, C., and Wahle, E. (2004). Messenger RNA Turnover in Eukaryotes: Pathways and Enzymes Critical Reviews in Biochemistry and Moleculary Biology 39, 197-216. Michelotti, E. F., Michelotti, G. A., Aronsohn, A. I., and Levens, D. (1996). Heterogeneous nuclear ribonucleoprotein K is a transcription factor. Mol Cell Biol 16, 2350-2360. Morton, T. A., Myszka, D. G., and Chaiken, I. M. (1995). Interpreting complex binding kinetics from optical biosensors: a comparison of analysis by linearization, the integrated rate equation, and numerical integration. Analytical Biochemistry 227, 176-185. Mukherjee, D., Gao, M., O’Connor, J. P., Raijmakers, R., Pruijn, G., Lutz, C. S., and Wilusz, J. (2002). The mammalian exosome mediates the efficient degradation of mRNAs that contain AU-rich elements. EMBO J 21, 165-174. Musco, G., Kharrat, A., Stier, G., Fraternali, F., Gibson, T. J., Nilges, M., and Pastore, A. (1997). The solution structure of the first KH domain of FMR1, the protein responsible for the fragile X syndrome. Nat StructBiol 4, 712-716. Musco, G., Stier, G., Joseph, C., Castiglione, M. M., Nilges, M., Gibson, T., and Pastore, A. (1996). Three-dimensional structure and stability of the KH domain: molecular insights into the fragile X syndrome. Cell 85, 237-245. Musunuru, K., and Darnell, R. B. (2004). Determination and augmentation of RNA sequence specificity of the Nova K-homology domains. Nucl Acids Res 32, 4852-4861. Myszka, D. G., He, X., Dembo, M., Morton, T. A., and Goldstein, B. (1998). Extending the Range of Rate Constants Available from BIACORE: Interpreting Mass Transport-Influenced Binding Data. Biophys J 75, 583-594. Narlikar, G. J., Fan, H.-Y., and Kingston, R. E. (2002). Cooperation between Complexes that Regulate Chromatin Structure and Transcription. Cell 108, 475-487.
Chapter 8: References
187
Newbury, S. F. (2006). Control of mRNA stability in eukaryotes. BiochemSocTrans 34, 30-34. Nick, H. (1970). The Growth of Single Crystals. Ober, R. J., and Ward, E. S. (1999). The Choice of Reference Cell in the Analysis of Kinetic Data Using BIAcore. Analytical Biochemistry 271, 70-80. Ostareck, D. H., Ostareck-Lederer, A., Shatsky, I. N., and Hentze, M. W. (2001). Lipoxygenase mRNA Silencing in Erythroid Differentiation: The 3′UTR Regulatory Complex Controls 60S Ribosomal Subunit Joining. Cell 104, 281-290. Ostareck, D. H., Ostareck-Lederer, A., Wilm, M., Thiele, B. J., Mann, M., and Hentze, M. W. (1997). mRNA Silencing in Erythroid Differentiation: hnRNP K and hnRNP E1 Regulate 15-Lipoxygenase Translation from the 3′ End. Cell 89, 597-606. Otwinowski, Z., and Minor, W. (1997). Processing pf X-ray diffraction data collected in oscillation mode. Methods Enzymol 276, 307–326. Paillard, L., Maniey, D., Lachaume, P., Legagneux, V., and Osborne, H. B. (2000). Identification of a C-rich element as a novel cytoplasmic polyadenylation element in Xenopus embryos. Mechanisms of Development 93, 117-125. Palacios, I. M., Gatfield, D., St Johnston, D., and Izaurralde, E. (2004). An eIF4AIII-containing complex required for mRNA localization and nonsense-mediated mRNA decay. Nature 427, 753-757. Park, S., Myszka, D. G., Yu, M., Littler, S. J., and Laird-Offringa, I. A. (2000). HuD RNA Recognition Motifs Play Distinct Roles in the Formation of a Stable Complex with AU-Rich RNA. Mol Cell Biol 20, 4765-4772. Parker, R., and Song, H. (2004). The enzymes and control of eukaryotic mRNA turnover. Nature Structural & Molecular Biology 11, 121-127. Parsley, T. B., Towner, J. S., Blyn, L. B., Ehrenfeld, E., and Semler, B. L. (1997). Poly (rC) binding protein 2 forms a ternary complex with the 5'-terminal sequences of poliovirus RNA and the viral 3CD proteinase. RNA 3, 1124-1134. Paulding, W. R., and Czyzyk-Krzeska, M. F. (1999). Regulation of Tyrosine Hydroxylase mRNA Stability by Protein-binding, Pyrimidine-rich Sequence in the 3'-Untranslated Region. J Biol Chem 274, 2532-2538. Payne, J. M., Laybourn, P. J., and Dahmus, M. E. (1989). The transition of RNA polymerase II from initiation to elongation is associated with phosphorylation of the carboxyl-terminal domain of subunit IIa. J Biol Chem 264, 19621-19629. Paziewska, A., Wyrwicz, L., and Ostrowski, J. (2005). The binding activity of yeast RNAs to yeast Hek2p and mammalian hnRNP K proteins, determined using the three-hybrid system. Cell Mol Biol Lett 10, 227-235.
Chapter 8: References
188
Paziewska, A., Wyrwicz, L. S., Bujnicki, J. M., Bomsztyk, K., and Ostrowski, J. (2004). Cooperative binding of the hnRNP K three KH domains to mRNA targets. FEBS Letters 577, 134-140. Peng, S. S., Chen, C. Y., and Shyu, A. B. (1996). Functional characterization of a non-AUUUA AU-rich element from the c- jun proto-oncogene mRNA: evidence for a novel class of AU-rich elements. Mol Cell Biol 16, 1490-1499. Pieretti, M., Zhang, F. P., Fu, Y. H., Warren, S. T., Oostra, B. A., Caskey, C. T., and Nelson, D. L. (1991). Absence of expression of the FMR-1 gene in fragile X syndrome. Cell 66, 817-822. Preiss, T., and Hentze, M., W. (2003). Starting the protein synthesis machine: eukaryotic translation initiation. BioEssays 25, 1201-1211. Ramos, A., Grunert, S., Adams, J., Micklem, D. R., Proctor, M. R., Freund, S., Bycroft, M., St Johnston, D., and Varani, G. (2000). RNA recognition by a Staufen double-stranded RNA-binding domain. The EMBO Journal 19, 997-1009. Ramos, A., Hollingworth, D., Major, S. A., Adinolfi, S., Kelly, G., Muskett, F. W., and Pastore, A. (2002). Role of Dimerization in KH/RNA Complexes: The Example of Nova KH3. Biochemistry 41, 4193 - 4201. Ramos, A., Hollingworth, D., and Pastore, A. (2003). The role of a clinically important mutation in the fold and RNA-binding properties of KH motifs. RNA 9, 293-298. Razin, A., and Riggs, A. D. (1980). DNA methylation and gene function. Science 210, 604-610. Rezai-Zadeh, N., Zhang, X., Namour, F., Fejer, G., Wen, Y.-D., Yao, Y.-L., Gyory, I., Wright, K., and Seto, E. (2003). Targeted recruitment of a histone H4-specific methyltransferase by the transcription factor YY1. Genes Dev 17, 1019-1029. Ross, J. (1995). mRNA stability in mammalian cells. Microbiol Rev 59, 425-450. Ross, J., and Sullivan, T. D. (1985). Half-lives of beta and gamma globin messenger RNAs and of protein synthetic capacity in cultured human reticulocytes. Blood 66, 1149-1154. Ryder, S. P., and Williamson, J. R. (2004). Specificity of the STAR/GSG domain protein Qk1: Implications for the regulation of myelination. RNA 10, 1449-1458. Ryter, J. M., and Schultz, S. C. (1998). Molecular basis of double-stranded RNA-protein interactions: structure of a dsRNA-binding domain complexed with dsRNA. The EMBO Journal 17, 7505-7513. Sachs, A. B., Sarnow, P., and Hentze, M. W. (1997). Starting at the Beginning, Middle, and End: Translation Initiation in Eukaryotes. Cell 89, 831-838.
Chapter 8: References
189
Saffman, E. E., Styhler, S., Rother, K., Li, W., Richard, S., and Lasko, P. (1998). Premature Translation of oskar in Oocytes Lacking the RNA-Binding Protein Bicaudal-C. Mol Cell Biol 18, 4855-4862. Sambrook, Fritsch, and Maniatis (1989). Molecular cloning A laboratory manual Vol 2). Schuck, P. (1996). Kinetics of ligand binding to receptor immobilized in a polymer matrix, as detected with an evanescent wave biosensor. I. A computer simulation of the influence of mass transport. Biophys J 70, 1230-1249. Schuck, P. (1997a). Reliable determination of binding affinity and kinetics using surface plasmon resonance biosensors. Current Opinion in Biotechnology 8, 498-502. Schuck, P. (1997b). USE OF SURFACE PLASMON RESONANCE TO PROBE THE EQUILIBRIUM AND DYNAMIC ASPECTS OF INTERACTIONS BETWEEN BIOLOGICAL MACROMOLECULES. Annual Review of Biophysics and Biomolecular Structure 26, 541-566. Schullery, D. S., Ostrowski, J., Denisenko, O. N., Stempka, L., Shnyreva, M., Suzuki, H., Gschwendt, M., and Bomsztyk, K. (1999). Regulated Interaction of Protein Kinase Cdelta with the Heterogeneous Nuclear Ribonucleoprotein K Protein. J Biol Chem 274, 15101-15109. Shatkin, A. J., and Manely, J. L. (2000). The ends of the affair: Capping and polyadenylation. Nature Structural Biology 7, 838-842. Shaw, G., and Kamen, R. (1986). A conserved AU sequence from the 3′ untranslated region of GM-CSF mRNA mediates selective mRNA degradation. Cell 46, 659-667. Shilatifard, A., Conaway, R. C., and Conaway, J. W. (2003). THE RNA POLYMERASE II ELONGATION COMPLEX. Annual Review of Biochemistry 72, 693-715. Shim, J., and Karin, M. (2002). The Control of mRNA Stability in Response to Extracellular. Mol Cell Biol 14, 323-331. Shnyreva, M., Schullery, D. S., Suzuki, H., Higaki, Y., and Bomsztyk, K. (2000). Interaction of Two Multifunctional Proteins. HETEROGENEOUS NUCLEAR RIBONUCLEOPROTEIN K AND Y-BOX-BINDING PROTEIN. J Biol Chem 275, 15498-15503. Shuker, S. B., Hajduk, P. J., Meadows, R. P., and Fesik, S. W. (1996). Discovering high-affinity ligands for proteins: SAR by NMR. Science 274, 1531-1534. Sidiqi, M., Wilce, J. A., Porter, C. J., Barker, A., Leedman, P. J., and Wilce, M. C. (2005a). Formation of an alphaCP1-KH3 complex with UC-rich RNA. Eur Biophys J 34, 423-429.
Chapter 8: References
190
Sidiqi, M., Wilce, J. A., Vivian, J. P., Porter, C. J., Barker, A., Leedman, P. J., and Wilce, M. C. J. (2005b). Structure and RNA binding of the third KH domain of poly(C)-binding protein 1. Nucl Acids Res 33, 1213-1221. Sidman, R. L., Dickie, M. M., and Appel, S. H. (1964). Mutant Mice (Quaking and Jimpy) with Deficient Myelination in the Central Nervous System. Science 144, 309-311. Silvera, D., Gamarnik, A. V., and Andino, R. (1999). The N-terminal K Homology Domain of the Poly(rC)-binding Protein Is a Major Determinant for Binding to the Poliovirus 5'-Untranslated Region and Acts as an Inhibitor of Viral Translation. J Biol Chem 274, 38163-38170. Siomi, H., Choi, M., Siomi, M. C., Nussbaum, R. L., and Dreyfuss, G. (1994). Essential role for KH domains in RNA binding: Impaired RNA binding by a mutation in the KH domain of FMR1 that causes fragile X syndrome. Cell 77, 33-39. Siomi, H., Matunis, M., Michael, W., and Dreyfuss, G. (1993). The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucl Acids Res 21, 1193-1198. Soller, M. (2006). Pre-messenger RNA processing and its regulation: a genomic perspective cell Mol Life Sci 63, 796-819. Staton, J. M., Thomson, A. M., and Leedman, P. J. (2000). Hormonal regulation of mRNA stability and RNA-protein interactions in the pituitary. J Mol Endocrinol 25, 17-34. Svitel, J., Balbo, A., Mariuzza, R. A., Gonzales, N. R., and Schuck, P. (2003). Combined Affinity and Rate Constant Distributions of Ligand Populations from Experimental Surface Binding Kinetics and Equilibria. Biophys J 84, 4062-4077. Tarun, S. Z., Wells, S. E., Deardorff, J. A., and Sachs, A. B. (1997). Translation initiation factor eIF4G mediates in vitro poly(A) tail-dependent translation. Biochemistry 94, 9046-9051. Tauson, E. L. (2004). RNA editing in different genetic systems. Zh Obshch Biol 65, 52-73. Thisted, T., Lyakhov, D. L., and Liebhaber, S. A. (2001). Optimized RNA Targets of Two Closely Related Triple KH Domain Proteins, Heterogeneous Nuclear Ribonucleoprotein K and alpha CP-2KL, Suggest Distinct Modes of RNA Recognition. J Biol Chem 276, 17484-17496. Tommerup, N., and Leffers, H. (1996). Assignment of Human KH-Box-Containing Genes byin SituHybridization:HNRNPKMaps to 9q21.32-q21.33,PCBP1to 2p12-p13, andPCBP2to 12q13.12-q13.13, Distal toFRA12A. Genomics 32, 297-298. Torreri, P., Ceccarini, M., Macioce, P., and Petrucci, T. (2005). Biomolecular interactions by Surface Plasmon Resonance technology. Ann Ist Super Sanita 41, 437-441.
Chapter 8: References
191
Verkerk, A. J., Pieretti, M., Sutcliffe, J. S., Fu, Y. H., Kuhl, D. P., Pizzuti, A., Reiner, O., Richards, S., Victoria, M. F., and Zhang, F. P. (1991). Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905-914. Waggoner, S. A., and Liebhaber, S. A. (2003a). Identification of mRNAs Associated with {alpha}CP2-Containing RNP Complexes. Mol Cell Biol 23, 7055-7067. Waggoner, S. A., and Liebhaber, S. A. (2003b). Regulation of {alpha}-Globin mRNA Stability. Experimental Biology and Medicine 228, 387-395. Wahle, E., and Ruegsegger, U. (1999). 3′-End processing of pre-mRNA in eukaryotes. FEMS Microbiology Reviews 23, 277-295. Walter, B. L., Parsley, T. B., Ehrenfeld, E., and Semler, B. L. (2002). Distinct Poly(rC) Binding Protein KH Domain Determinants for Poliovirus Translation Initiation and Viral RNA Replication. J Virol 76, 12008-12022. Wang, X., and Hall Tanaka, T. M. (2001). Structural basis for recognition of AU-rich element RNA by the HuD protein. Nat Struct Biol 8, 141-145. Wang, X., Kiledjian, M., Weiss, I. M., and Liebhaber, S. A. (1995). Detection and characterization of a 3' untranslated region ribonucleoprotein complex associated with human alpha-globin mRNA stability [published erratum appears in Mol Cell Biol 1995 Apr;15(4):2331]. Mol Cell Biol 15, 1769-1777. Wang, Z., Day, N., Trifillis, P., and Kiledjian, M. (1999). An mRNA Stability Complex Functions with Poly(A)-Binding Protein To Stabilize mRNA In Vitro. Mol Cell Biol 19, 4552-4560. Wang, Z., and Kiledjian, M. (2000). The Poly(A)-Binding Protein and an mRNA Stability Protein Jointly Regulate an Endoribonuclease Activity. Mol Cell Biol 20, 6334-6341. Wang, Z., and Kiledjian, M. (2001). Functional Link between the Mammalian Exosome and mRNA Decapping. Cell 107, 751-762. Wider, G. ( 2000 ). Structure determination of biological macromolecules in solution using nuclear magnetic resonance spectroscopy. Biotechniques 29, 1278-1282. Wilce, J. A., Leedman, P. J., and Wilce, M. C. J. (2002 ). RNA-Binding Proteins That Target the Androgen Receptor mRNA. IUBMB 54, 345-349. Wilson, G. M., and Brewer, G. (1999a). Identification and Characterization of Proteins Binding A + U-Rich Elements. Methods 17, 74-83. Wilson, G. M., and Brewer, G. (1999b). The search for trans-acting factors controlling messenger RNA decay. Prog Nucleic Acid Res Mol Biol 62, 257-291.
Chapter 8: References
192
Wisdom, R., and Lee, W. (1991). The protein-coding region of c-myc mRNA contains a sequence that specifies rapid mRNA turnover and induction by protein synthesis inhibitors. Genes Dev 5, 232-243. Worbs, M., Bourenkov, G. P., Bartunik, H. D., Huber, R., and Wahl, M. C. (2001). An Extended RNA Binding Surface through Arrayed S1 and KH Domains in Transcription Factor NusA. Molecular Cell 7, 1177-1189. Wu, H., Henras, A., Chanfreau, G., and Feigon, J. (2004). Structural basis for recognition of the AGNN tetraloop RNA fold by the double-stranded RNA-binding domain of Rnt1p RNase III. Proceedings Of The National Academy Of Sciences Of The United States Of America 101, 8307-8312. Wuthrich, K. (1990). Protein structure determination in solution by NMR spectroscopy. J Biol Chem 265, 22059-22062. Xu, N., Chen, C.-Y. A., and Shyu, A.-B. (2001). Versatile Role for hnRNP D Isoforms in the Differential Regulation of Cytoplasmic mRNA Turnover. Mol Cell Biol 21, 6960-6971. Yano, M., Okano, H. J., and Okano, H. (2005). Involvement of Hu and Heterogeneous Nuclear Ribonucleoprotein K in Neuronal Differentiation through p21 mRNA Post-transcriptional Regulation. J Biol Chem 280, 12690-12699. Yeap, B. B., Krueger, R. G., and Leedman, P. J. (1999). Differential Posttranscriptional Regulation of Androgen Receptor Gene Expression by Androgen in Prostate and Breast Cancer Cells. Endocrinology 140, 3282-3291. Yeap, B. B., Voon, D. C., Vivian, J. P., McCulloch, R. K., Thomson, A. M., Giles, K. M., Czyzyk-Krzeska, M. F., Furneaux, H., Wilce, M. C. J., Wilce, J. A., and Leedman, P. J. (2002). Novel Binding of HuR and Poly(C)-binding Protein to a Conserved UC-rich Motif within the 3'-Untranslated Region of the Androgen Receptor Messenger RNA. J Biol Chem 277, 27183-27192. Zhao, Z., Chang, F.-C., and Furneaux, H. M. (2000). The identification of an endonuclease that cleaves within an HuR binding site in mRNA. Nucl Acids Res 28, 2695-2701.
Appendix A: Crystallization Screens
190
Hampton Crystal Screen™ composition
Appendix A: Crystallization Screens
191
Hampton Crystal Screen 2™ composition
Appendix A: Crystallization Screens
192
Hampton Natrix™ screen
Appendix A: Crystallization Screens
193
Sigma® Crystallization Basic kit for proteins
Appendix B:REMSA
194
RNA-binding studies
REMSA was used to examine the ability of full-length, αCP1-KH2 and αCP1-KH3 to
bind to a 51-nucleotide UC-rich sequence from the 3’UTR of AR mRNA (nucleotides
3275–3325). The binding by full-length αCP1 has previously been demonstrated, but
whether the separate KH domains bind to this sequence had not yet been tested (Yeap
et al., 2002). The results are shown in Figure 1. As expected, binding to the target
RNA by full-length αCP1 is indicated by a substantial and quantitative shift in its
mobility (lane 4). A quantitative shift of this RNA target by αCP1-KH3 is also clearly
discernible (lane 3), although the degree of change in its mobility is not as marked as
that seen with full-length αCP1. This difference in the relative mobility shift is due to
the greater size of the full-length protein (37.5 kDa) in comparison with αCP1-KH3
(8 kDa). On the other hand, αCP1-KH2 exhibits no binding (lane 2), even though the
protein is present in excess over RNA. Neither full-length αCP1 nor αCP1-KH3
showed any binding interaction to pBLUESCRIPT RNA alone (results not shown),
demonstrating that the binding interaction occurs with the target RNA. This extends
the finding of Dejgaard and Leffers (Dejgaard et al., 1996), who observed at best
weak binding to poly (C) RNA by isolated αCP1-KH2 using a dot-blot assay. The
absence of binding by αCP1-KH2 also indicates that the binding observed by the
other species is not due to nonspecific protein/RNA interactions in the buffer
conditions used here. Thus, αCP1-KH3 is capable of binding independently and
specifically to sequences within the AR mRNA 3’UTR that are also contacted by full-
length αCP1. Figure 1: Binding studies of αCP1 and individual KH domains to the 3’UTR region of AR mRNA. A typical REMSA is shown, in which binding by αCP1 or isolated domains of αCP1 to radioactively labeled RNA 5-CUGGGUUUUUUUUUCUCUUUCUCUCCUUUCUUUUUCUUCUUCCCUCCCUA-3 in the presence of excess tRNA as a nonspecific competitor were examined. Lane 1, probe only; lane 2, also contains 100 ng αCP1-KH2; lane 3, also contains 100 ng CP1-KH3; lane 4, also contains 100 ng αCP1.
Appendix B:REMSA
195
We also compared the binding of full-length αCP1 and αCP1-KH1, to the target 51-nt
AR mRNA sequence (nt 3275–3325) and examined their binding to RNA vs DNA
using REMSA. Figure 1A shows the binding of full length αCP1 to the target 51-nt
AR mRNA sequence. The probe (10 nM RNA) is shifted upon the addition of
increasing concentrations of αCP1 (10 nM – 1 µM), and its shift to even higher
positions in the gel is indicative of multiple binding interactions occurring. Figure 2
demonstrates that αCP1-KH1 also binds the target RNA with high affinity. Its shift to
a relatively constant position is indicative of a single binding interaction mode, as
would be expected for a single KH domain protein.
The ‘CCCUCCC’ motif at the 3’ end of the target 51-nt AR mRNA has been shown
to be the binding site of αCP proteins through mutational analysis of the two poly (C)
triads (Yeap et al., 2002). In order to verify whether full-length αCP1 and αCP1-KH1
binding occurs to this motif in vitro, we conducted gel shift assays using an 11-nt
probe corresponding to nucleotides 3315–3325 of AR mRNA (5-UUCCCUCCCUA-
3). Figure 3 shows that the probe is shifted by full-length protein to a constant
position, indicating good binding to the probe via a single binding interaction. αCP1-
KH1 also binds, but only marginally shifts the probe under the conditions of this
experiment, indicating a weaker interaction. Interestingly, the binding profiles of
αCP1 and αCP1-KH1 to an 11-nt DNA probe analogous to the AR target sequence
above (DNA: 5-TTCCCTCCCTA-3) are very similar. Full-length αCP1 and αCP1-
KH1 both bound to the DNA with good and weak binding respectively. Interestingly,
the binding of full-length αCP1 appeared to occur with slightly higher affinity to
DNA compared with RNA, in contrast to the previous report of RNA binding being
preferential (Dejgaard et al., 1996).
Appendix B:REMSA
196
Figure 2: αCP1 full-length and αCP1-KH1 domain bind with high affinity to AR 1-51 nucleotide at the 3’UTR. The binding reactions for lanes from left to right contained no protein, 1x10-8, 2x10-8, 5x10-8, 1x10-7, 2x10-7 or 1x10-6 M αCP1 full-length and αCP1-KH1 domain respectively. All binding reactions contained 1x10-8 M of relevant target RNA. The absence of protein is indicated above by minus sign in parentheses and the wedges indicate increasing concentrations of each protein.
Figure 3: αCP1 full-length and αCP1-KH1 domain bind to the DNA and RNA sequence: 5-TTCCCTCCCTA-3 with good and weak binding respectively. The binding reactions for lanes from left to right contained no protein, 1x10-7, 3x10-7, 1x10-6, 1x10-6, 3x10-6 and 1x10-5 M αCP1 full-length and αCP1-KH1 domain respectively. All binding reactions contained 1x10-7 M of DNA and DNA. The absence of protein is indicated above by minus sign in parentheses and the wedges indicate increasing concentrations of each protein.