-
Regulation of Core Splicing Factors by Alternative Splicing and
Nonsense-mediated mRNA Decay
by
Arneet L. Saltzman
A thesis submitted in conformity with the requirements for the
degree of Doctor of Philosophy
Department of Molecular Genetics
University of Toronto
© Copyright by Arneet L. Saltzman 2011
-
ii
Regulation of Core Splicing Factors by Alternative Splicing and
Nonsense-mediated mRNA Decay
Arneet L. Saltzman
Doctor of Philosophy
Department of Molecular Genetics
University of Toronto
2011
Abstract
The majority of human genes are transcribed into a precursor
messenger RNA (pre-mRNA) that
is processed to produce multiple mRNA variants through
alternative splicing. Although
alternative splicing is known for its role in generating
proteomic diversity, it can also regulate
gene expression by introducing premature termination codons that
target the spliced transcript
for nonsense-mediated mRNA decay (AS-NMD). In order to
understand the impact of AS-NMD
on gene expression, I performed quantitative AS microarray
profiling of NMD-inhibited human
cells. Using this system, I address the prevalence, trans-acting
factor requirements and the range
of cellular functions regulated by AS-NMD. While this pathway
had been implicated in
homeostatic feedback regulation of genes encoding
splicing-regulatory proteins, my results
revealed highly conserved alternative exons regulated by AS-NMD
in genes encoding basal or
‘core’ splicing factors. I further characterized one of these
exons in the gene encoding SmB/B′,
and demonstrated that SmB/B′ autoregulates its expression
through AS-NMD. Furthermore, AS
profiling revealed that knockdown of this core splicing factor
affects the inclusion levels of
additional alternative exons enriched in genes with functions in
RNA processing and RNA
binding. In summary, my results reveal a role for AS-NMD in
regulating the expression of core
splicing factors, as well as a role for the core spliceosomal
machinery in coordinating a network
of alternative exons in RNA processing factor genes.
-
iii
Acknowledgments
I am grateful to many people for their support during my
graduate work. My supervisor and
mentor Dr. Ben Blencowe has never wavered from the full support
that he offered me right from
the time I joined the lab as a naïve and “generally keen”
student. He fostered my scientific
development by providing me with the opportunity to do exciting
research, present at
conferences and publish my work. Ben’s guidance and
encouragement have been essential for
my progress as a graduate student and beyond. I would also like
to thank my supervisory
committee members, Dr. Howard Lipshitz, Dr. Tim Hughes and Dr.
Quaid Morris, whose
support and advice have helped me to develop as a scientist.
For their direct contributions to the work in this thesis, I
would like to thank Matthew Fagnani
and my collaborators Dr. Yoon Ki Kim, Dr. Lynne Maquat, Dr. Ofer
(“the data are the data”)
Shai and Dr. Brendan Frey. For their indirect contributions, I
would like to thank the people
behind the UCSC genome browser and Galaxy, who have made the
genome accessible to the
masses. For financial support, I am grateful to NSERC and the
Jennifer Dorrington Graduate
Student Endowment Fund.
I’ve had the privilege to work with many great people during my
time in the Blencowe lab. I
wish to sincerely thank all my current and former lab-mates,
whose friendship, support, helpful
advice, and love of fine beverages have been invaluable. I am
particularly indebted to our Lab
“Sages” Mr. Dave O’Hanlon and Dr. Susan McCracken. For
supporting me along my path to
graduate school, I also thank my past teachers and mentors, Mr.
Flemming Kress, Dr. Shelagh
Mirski, Ms. Kathy Sparks, Dr. Peter Davies and Dr. Igor
Bendik.
Of course words cannot express my gratitude to my family and my
‘life partner’ for their
unflappable love and support.
-
iv
Table of Contents
Acknowledgments
........................................................................................................................
iii
Table of Contents
.........................................................................................................................
iv
List of Tables
..............................................................................................................................
viii
List of
Figures...............................................................................................................................
ix
List of
Appendices........................................................................................................................
xi
Abbreviations
Used....................................................................................................................
xiii
Chapter 1
........................................................................................................................................1
1
Introduction...............................................................................................................................2
1.1 Coordination of the gene expression
machinery..................................................................2
1.1.1 Interdependence among transcription, mRNA processing and
chromatin ..............2
1.1.2 mRNA processing remodels the messenger
RNP....................................................5
1.2 Pre-mRNA
splicing..............................................................................................................6
1.2.1 Core and auxiliary splicing signals
..........................................................................6
1.2.2 Spliceosome assembly
.............................................................................................7
1.2.3 Exon
definition.........................................................................................................9
1.2.4 Spliceosomal snRNPs and Sm proteins
.................................................................10
1.3 Regulation of alternative splicing
......................................................................................12
1.3.1 Roles of alternative
splicing...................................................................................13
1.3.2 Mechanisms of alternative splicing regulation
......................................................13
1.3.3 Families of alternative splicing regulatory
factors.................................................14
1.3.4 Regulation of splice site
recognition......................................................................15
1.3.4.1 SR and SR-related proteins
.....................................................................15
1.3.4.2
hnRNPs....................................................................................................16
1.3.5 Regulation of splice site pairing and
catalysis.......................................................17
1.3.6 Roles of basal splicing factors in alternative splicing
regulation ..........................17
1.3.7 Breaking the ‘code’ of cis-acting alternative splicing
regulatory sequences.........18
1.3.8 Large-scale analysis of alternative splicing
regulation..........................................19
1.3.9 Overview of large-scale alternative splicing detection
methods used in this
thesis
......................................................................................................................20
1.3.9.1 Alternative splicing microarray
profiling................................................22
1.3.9.2 AS profiling by high throughput RNA sequencing (RNA-Seq)
.............22
1.4 Nonsense-mediated mRNA decay (NMD)
........................................................................23
1.4.1 Features targeting transcripts for NMD
.................................................................23
1.4.2 NMD trans-acting factors and mechanisms of
decay............................................24
1.4.3 Discriminating between normal and premature nonsense
codons: integrating
the EJC-dependent and faux-3′UTR
models..........................................................26
1.5 Feedback regulation of gene expression
............................................................................28
1.5.1 Post-transcriptional autoregulation
........................................................................29
1.5.1.1 Splicing regulatory factors
......................................................................29
1.5.1.2 Ribosomal proteins, translation factors and other
examples ...................31
1.5.2 Roles of post-transcriptional
autoregulation..........................................................32
-
v
1.5.2.1 Developmentally-regulated AS programs
...............................................32
1.5.2.2 Plant circadian oscillations
......................................................................32
1.5.2.3 Coordinating gene expression
.................................................................33
1.5.3 Sequence and functional
conservation...................................................................33
1.6 Rationale and outline
.........................................................................................................34
Chapter 2
......................................................................................................................................36
2 Impact of nonsense-mediated mRNA decay (NMD) factors on
alternative splicing
(AS)
...........................................................................................................................................37
2.1
Introduction........................................................................................................................37
2.1.1 Prevalence of
AS-NMD.........................................................................................37
2.1.2 Differential requirements for UPF factors in
NMD...............................................37
2.1.3 Summary
................................................................................................................38
2.2 Materials and
Methods.......................................................................................................39
2.2.1 Cell culture, siRNA and plasmid transfection
.......................................................39
2.2.2 RT-PCR assays and Western
blotting....................................................................39
2.2.3 Microarray design and hybridization
.....................................................................40
2.2.4 Microarray data analysis
........................................................................................40
2.2.5 Annotation of PTC-introducing AS
events............................................................40
2.2.6 Categorization of conserved and species-specific
alternative exons .....................41
2.3
Results................................................................................................................................41
2.3.1 Predicted PTC-containing splice variants represent minor
isoforms across ten
mouse tissues
.........................................................................................................41
2.3.2 Most predicted PTC-introducing AS events are not conserved
between human
and mouse
..............................................................................................................46
2.3.3 Alternative splicing microarray profiling following
knockdown of the
essential NMD factor UPF1 in HeLa cells
............................................................48
2.3.4 A subset of PTC-introducing AS events are regulated by
NMD...........................48
2.3.5 Effect of UPF1 knockdown on the expression of genes
containing PTC-
introducing AS
events............................................................................................50
2.3.6 Alternative splicing microarray profiling following
individual knockdowns of
NMD factors UPF1, UPF2 or UPF3X
...................................................................52
2.3.7 Overlapping but distinct effects of UPF1, UPF2 and UPF3X
knockdowns on
PTC-introducing AS events
...................................................................................54
2.4
Discussion..........................................................................................................................56
2.4.1 Function versus ‘noise’ in PTC-introducing AS events
........................................56
2.4.2 Alternative branches of the mammalian NMD pathway
.......................................56
Chapter 3
......................................................................................................................................58
3 Conserved AS-NMD in genes encoding core splicing factors
.............................................59
3.1
Introduction........................................................................................................................59
3.1.1 Cellular functions regulated by AS-NMD
.............................................................59
3.1.2 Summary
................................................................................................................59
3.2 Materials and
Methods.......................................................................................................60
3.2.1 RT-PCR and Western blotting
...............................................................................60
3.2.2 Analysis of conservation of flanking intron sequence and
conserved AS.............60
3.2.3 Identification of AS events in spliceosomal and control
gene sets........................60
-
vi
3.2.4 Statistical
Analysis.................................................................................................61
3.3
Results................................................................................................................................61
3.3.1 PTC-introducing AS events affected by UPF knockdowns are
flanked by
highly conserved
sequences...................................................................................61
3.3.2 Core spliceosomal proteins are new regulatory targets of
AS-NMD ....................64
3.3.3 Conserved AS in genes encoding spliceosomal factors
enriched in PTC-
introducing
events..................................................................................................65
3.3.4 Autoregulation of core splicing factors by
AS-NMD............................................69
3.4
Discussion..........................................................................................................................70
3.4.1 AS-NMD and the regulation of core spliceosomal
proteins..................................71
Chapter 4
......................................................................................................................................72
4 Auto-regulation of the core splicing factor SmB/B′ via
AS-NMD......................................73
4.1
Introduction........................................................................................................................73
4.1.1 AS-NMD of SNRPB, encoding SmB/B′
................................................................73
4.1.2 Summary
................................................................................................................73
4.2 Materials and
Methods.......................................................................................................74
4.2.1 Cell culture, siRNA and plasmid transfection
.......................................................74
4.2.2 Estimation of mRNA half-lives
.............................................................................74
4.2.3 RNA and protein isolation, RT-PCR and Western
blotting...................................74
4.2.4 Plasmid Construction
.............................................................................................75
4.3
Results................................................................................................................................75
4.3.1 Inclusion of a highly conserved premature termination
codon (PTC)-
introducing alternative exon in SNRPB pre-mRNA is affected by
SmB/B′
protein
levels..........................................................................................................75
4.3.2 Knockdown of the core snRNP protein SmD1 affects the
inclusion of the
conserved SNRPB alternative exon
.......................................................................78
4.3.3 Knockdown of SmB/B′ or SmD1 affects the levels of Sm-class
snRNAs ............79
4.3.4 Cis-acting elements regulating inclusion of the SNRPB
alternative exon ............80
4.3.5 Mutations that strengthen the 5′ss reduce the effects of
SmB/B′ knockdown.......82
4.4
Discussion..........................................................................................................................83
4.4.1 Feedback and cross-regulation of splicing
factors.................................................84
Chapter 5
......................................................................................................................................86
5 Regulation of alternative splicing by the core spliceosomal
machinery.............................87
5.1
Introduction........................................................................................................................87
5.1.1 Summary
................................................................................................................87
5.2 Materials and
Methods.......................................................................................................88
5.2.1 Analysis of AS and transcript levels by RNA-Seq
................................................88
5.2.2 Calculation of Splice Site Strength
........................................................................89
5.2.3 Gene ontology (GO)
analysis.................................................................................89
5.2.4 Statistical
Analysis.................................................................................................89
5.3
Results................................................................................................................................90
5.3.1 A widespread role for core splicing factors in promoting
the inclusion of
alternative exons
....................................................................................................90
5.3.2 Characteristics of SmB/B′ knockdown-dependent alternative
exons ....................95
-
vii
5.3.3 Changes in transcript levels associated with SmB/B′
knockdown-dependent
PTC-introducing alternative exons
........................................................................95
5.3.4 SmB/B′ knockdown affects AS events in RNA-processing
factor genes..............97
5.4
Discussion..........................................................................................................................97
5.4.1 Mechanisms of AS regulation by core splicing factors
.........................................97
5.4.2 Physiological roles of AS regulation by general splicing
factors ..........................98
Chapter 6
....................................................................................................................................100
6 Conclusions
............................................................................................................................101
6.1 Future Directions
.............................................................................................................102
6.1.1 What features underlie the differential dependencies of
NMD substrates on
UPF2 and
UPF3/UPF3X?....................................................................................102
6.1.2 Mechanisms of core splicing factor-dependent AS regulation
............................102
6.1.3 Origins of ultra- and highly-conserved nonsense exons
......................................103
6.1.4 Networks of auto- and cross-regulation among RNA
processing factors............104
References...................................................................................................................................106
Appendices..................................................................................................................................133
-
viii
List of Tables
Table 1-1. Post-transcriptional auto- and cross-regulation of
proteins with roles in RNA
biogenesis and metabolism.
..............................................................................................
30
Table 3-1. Selected microarray PTC-introducing AS events in
genes with functions related to
RNA processing.
...............................................................................................................
65
Table 3-2. Conserved, PTC-introducing AS events identified in
transcripts from spliceosome-
associated proteins.
...........................................................................................................
68
-
ix
List of Figures
Figure 1-1. Coordination of transcription and pre-mRNA
processing machineries. ...................... 3
Figure 1-2. Overview of core splicing signals and early stages
of spliceosome assembly. ........... 7
Figure 1-3. Outline of microarray and RNA-Seq AS profiling
methods used in this work. ........ 21
Figure 1-4. Alternative splicing of cassette-type exons can lead
to introduction of a premature
termination codon (PTC) in the included or skipped splice
variant (AS-NMD).............. 24
Figure 1-5. An integrated model for discrimination between
premature and normal stop codons.
...........................................................................................................................................
28
Figure 1-6. Simplified model for autoregulation of a
splicing-regulatory factor through AS-
NMD.
................................................................................................................................
31
Figure 2-1. Overview of Chapter 2.
..............................................................................................
38
Figure 2-2. Alternative splicing microarray data reveal that
predicted PTC-introducing splice
variants represent minor forms across ten mouse tissues.
................................................ 43
Figure 2-3. Representative RT-PCRs of PTC upon inclusion and PTC
upon skipping AS events
in ten mouse tissues.
.........................................................................................................
45
Figure 2-4. Predicted PTC-introducing AS events are more often
species-specific than conserved
between human and mouse.
..............................................................................................
47
Figure 2-5. Knockdown of the essential NMD factor UPF1 leads to
an increase in a subset of
PTC-containing splice variants.
........................................................................................
49
Figure 2-6. Changes in % exon inclusion and transcript levels
upon UPF1 knockdown predicted
by the AS microarray are confirmed by
RT-PCR.............................................................
51
Figure 2-7. Overlapping but distinct effects of UPF protein
knockdowns on PTC-introducing AS
events.
...............................................................................................................................
53
Figure 2-8. Representative RT-PCR assays showing effects of UPF
protein knockdowns on
levels of PTC-introducing alternative
exons.....................................................................
55
Figure 3-1. Overview of Chapter 3.
..............................................................................................
59
Figure 3-2. Conservation of intron sequences flanking
PTC-introducing exons affected by UPF
factor knockdowns.
...........................................................................................................
62
Figure 3-3. PTC upon inclusion alternative exons that show UPF1-
or UPF2-dependent changes
in inclusion level are often flanked by highly conserved
intronic sequences................... 63
Figure 3-4. Conserved PTC-introducing AS events in genes
encoding spliceosomal proteins.... 67
Figure 3-5. SNRPB (also known as SmB/B’) or SMNDC1 (also known
as SPF30) over-
expression leads to increased levels of the respective
PTC-containing (PTC+) alternative
transcript.
..........................................................................................................................
70
Figure 4-1. Overview of Chapter 4.
..............................................................................................
73
Figure 4-2. The inclusion of a highly conserved PTC-introducing
alternative exon in SNRPB is
affected by SmB/B′ knockdown.
......................................................................................
77
-
x
Figure 4-3. The half-life of the endogenous SNRPB PTC-containing
included splice variant (A)
but not that of the exon-included variant from the SNRPB
reporter ‘miniSmB’ (B) is
increased upon treatment with cycloheximide (CHX) to inhibit
NMD............................ 78
Figure 4-4. Knockdown of SmD1 leads to more skipping of the
SNRPB alternative exon in
miniSmB (A), and knockdown of SmB/B′ (B) or SmD1 (C) affects
snRNA levels. ....... 79
Figure 4-5. Auxiliary cis-acting elements regulating inclusion
of the SNRPB alternative exon in
miniSmB are proximal to the splice sites.
........................................................................
81
Figure 4-6. Mutations that strengthen the 5′ss (splice site),
but not mutations that strengthen the
3′ss, reduce the effects of SmB/B′ knockdown on miniSmB AS.
.................................... 83
Figure 5-1. Overview of Chapter 5.
..............................................................................................
87
Figure 5-2. Quantitative analysis of alternative splicing by
RNA-Seq reveals that knockdown of
SmB/B′ leads to increased skipping of alternative exons.
................................................ 91
Figure 5-3. Changes in alternative exon inclusion levels
measured by RNA-Seq are confirmed by
RT-PCR assays.
................................................................................................................
93
Figure 5-4. Confirmation of the effects of SmB/B′ knockdown on
alternative exon inclusion in
two independent knockdowns with different siRNAs.
..................................................... 94
Figure 5-5. Characteristics of alternative exons affected by
knockdown of SmB/B′. .................. 96
-
xi
List of Appendices
Appendices to Chapter 2: Impact of nonsense-mediated mRNA decay
(NMD) factors on
alternative splicing (AS)
Appendix 1. Reprint: Pan Q, Saltzman AL, Kim YK, Misquitta C,
Shai O, Maquat LE, Frey BJ,
Blencowe BJ. 2006. Quantitative microarray profiling provides
evidence against
widespread coupling of alternative splicing with
nonsense-mediated mRNA decay to
control gene expression. Genes Dev 20 (2):
153-158.....................................................
133
Appendix 2. Reprint: Saltzman AL, Kim YK, Pan Q, Fagnani MM,
Maquat LE, Blencowe BJ.
2008. Regulation of multiple core spliceosomal proteins by
alternative splicing-coupled
nonsense-mediated mRNA decay. Mol Cell Biol 28 (13): 4320-4330.
.......................... 133
Appendix 3. Correlation of probe intensities (A) or % exon
inclusion (B) between Cy3 and Cy5
fluor reversals for six samples.
.......................................................................................
134
Appendix 4. Correlation of % inclusion between pairs of AS
events with duplicate probes on the
AS
microarray.................................................................................................................
135
Appendix 5. Correlation between % exon skipping (A) or
knockdown-dependent difference in %
exon skipping (B) measurements by AS microarray or
RT-PCR................................... 136
Appendix 6. Microarray data for 1704 AS events that met our
detection criteria...................... 137
Appendix 7. Annotation for 1704 microarray-monitored AS events
that met our detection
criteria.
............................................................................................................................
137
Appendix 8. Significant overlaps in AS events with a consistent
change in exon inclusion levels
when comparing any two UPF KDs.
..............................................................................
138
Appendix 9. Effects of each UPF factor knockdown on
PTC-introducing AS events. .............. 139
Appendix 10. Frequency of changes in exon inclusion level upon
knockdown of UPF1, UPF2, or
UPF3X for all detectable AS events (A) or for specific
categories (B-D). .................... 140
Appendices to Chapter 3: Conserved AS-NMD in genes encoding core
splicing factors
Appendix 11. Cumulative distribution function (CDF) plots of
flanking intron sequence overlap
with phastCons elements for the ‘No PTC’
group..........................................................
141
Appendix 12. Annotation for microarray-monitored PTC-introducing
AS events with conserved
flanking intron sequences.
..............................................................................................
142
Appendix 13. Annotation of cassette AS events identified in
spliceosome-associated genes.... 142
Appendix 14. Annotation of cassette AS events identified in the
control gene set. ................... 142
Appendices to Chapter 4: Auto-regulation of the core splicing
factor SmB/B′ via AS-NMD
Appendix 15. Reprint: Saltzman AL, Pan Q, Blencowe BJ. 2011.
Regulation of alternative
splicing by the core spliceosomal machinery. Genes Dev 25 (4),
373-384.................... 142
Appendix 16. Comparison of SmB/B′ and SmN amino acid sequences
(A) and mRNA expression
patterns across 84 tissue and cell types
(B).....................................................................
143
-
xii
Appendix 17. Abrogation of NMD by treatment of HeLa cells with
the translation inhibitor
cycloheximide (CHX) leads to an increase in the steady-state
level of the endogenous
exon-included PTC-containing SNRPB variant (A), but not the
exon-included variant
from the SNRPB reporter ‘miniSmB’ (B).
......................................................................
144
Appendix 18. A deletion adjacent to the 5′ss that strengthens
potential base-pairing to U1 snRNA
abrogates SmB/B′ knockdown-dependent skipping.
...................................................... 145
Appendix 19. Mutations that strengthen the 3′ss do not abrogate
SmB/B′ knockdown-dependent
skipping...........................................................................................................................
146
Appendices to Chapter 5: Regulation of alternative splicing by
the core spliceosomal
machinery
Appendix 20. Data and annotations for 5752 AS events monitored
by RNA-Seq that passed our
filtering criteria.
..............................................................................................................
147
Appendix 21. Data and annotations for 8626 triplets of
consecutive 'constitutive' exons
monitored by RNA-Seq that passed our filtering criteria.
.............................................. 147
Appendix 22. Gene Ontology (GO) and Pathway Commons enrichment
analysis for 235 genes
containing AS events with a ≥30% change in % exon inclusion upon
SmB/B'
knockdown......................................................................................................................
147
Appendix 23. Exon inclusion levels and knockdown-dependent
changes for all 27 assayed
alternative exons agree well with RNA-Seq predictions.
............................................... 148
-
xiii
Abbreviations Used
AS alternative splicing
cDNA complementary DNA
CTD carboxy-terminal domain
EJC exon junction complex
EST expressed sequence tag
GO gene ontology
mRNP messenger ribonucleoprotein
NMD nonsense-mediated mRNA decay
NT/siNT non-targeting siRNA
pol II RNA polymerase II
PTC premature termination codon
RNA ribonucleic acid
RNA-Seq high throughput RNA sequencing
RT-PCR reverse transcription-polymerase chain reaction
snRNA small nuclear RNA
snRNP small nuclear ribonucleoprotein
siRNA short interfering RNA
UTR untranslated region
-
1
Chapter 1
-
2
1 Introduction
A major theme of my thesis research is how gene expression can
be regulated by coordination
between different steps, particularly alternative splicing (AS)
and nonsense-mediated mRNA
decay (NMD). I will therefore begin with an overview of the
coordination among different steps
in mammalian gene expression (Section 1.1). Next, I will discuss
AS and its regulation, focusing
on the relatively uncharacterized roles of the basal splicing
machinery and on insights from high-
throughput analysis methods (Sections 1.2 and 1.3). This is
followed by an introduction to the
NMD pathway, focusing on the recognition of premature stop
codons and on AS-NMD (Section
1.4). Finally, I will discuss how genes with diverse roles in
RNA biogenesis and metabolism take
advantage of their own cellular functions to autoregulate their
expression (Section 1.5).
1.1 Coordination of the gene expression machinery
Almost all human protein-coding genes are transcribed into a
precursor messenger RNA (pre-
mRNA) that must be extensively processed before it is exported
to the cytoplasm and recognized
by the translation machinery. In the nucleus, pre-mRNA undergoes
capping, splicing, cleavage
and polyadenylation. These processes are integrated with
transcription by RNA polymerase II
(pol II) and also result in the association of proteins with the
mRNA to form a messenger
ribonucleoprotein (mRNP). Extensive crosstalk among these
processes plays important roles in
the fidelity, efficiency and regulation of gene expression
(Figure 1-1) (reviewed in Maniatis and
Reed 2002; Komili and Silver 2008; Pandit et al. 2008; Moore and
Proudfoot 2009).
1.1.1 Interdependence among transcription, mRNA processing and
chromatin
The carboxy-terminal domain (CTD) of the largest subunit of pol
II plays a central role in the
crosstalk between transcription and pre-mRNA processing
(reviewed in Perales and Bentley
2009; Munoz et al. 2010). The mammalian CTD contains 52 heptamer
repeats with the
consensus sequence YS2PTS5PS, and it is required for efficient
mRNA processing (McCracken
et al. 1997b). During the transcription cycle, changes in the
phosphorylation pattern of the CTD
serine residues allow the recruitment of pre-mRNA processing,
elongation, and histone-
modifying factors (reviewed in Buratowski 2009). Early in
transcription, the CTD is
phosphorylated on Ser-5 by TFIIH, and the Ser-5-phosphorylated
CTD recruits and activates the
mRNA capping enzymes (Cho et al. 1997; McCracken et al. 1997a;
Cho et al. 1998; Ho and
-
3
Shuman 1999) (Figure 1-1A). The nuclear cap-binding complex
(CBC, CBP80/20 heterodimer)
recognizes the capped mRNA 5′-end and promotes efficient
splicing, export and translation
initiation (Section 1.1.2). As elongation proceeds, the CTD
becomes highly phosphorylated on
Ser-2 residues by the positive transcription elongation factor b
(P-TEFb) and pol II enters into
the productive elongation phase (Figure 1-1B) (Marshall et al.
1996) (reviewed in Bres et al.
2008). The Ser-2-phosphorylated CTD recruits the cleavage and
polyadenylation machinery,
which then stimulates 3′end formation once the poly(A) site is
transcribed (Figure 1-1D)
(Licatalosi et al. 2002; Ahn et al. 2004; Meinhart and Cramer
2004; Ni et al. 2004; Rosonina and
Blencowe 2004).
Figure 1-1. Coordination of transcription and pre-mRNA
processing machineries.
See text for details.
Crosstalk between transcription elongation and splicing is
mediated by the CTD as well as by
factors that associate with the nascent transcript and the
chromatin template (Figure 1-1B)
(reviewed in Perales and Bentley 2009). This ‘coupling’ is
functionally important for the
efficiency of splicing (Das et al. 2006; Hicks et al. 2006) and
for regulating the differential use of
splice sites through alternative splicing (AS) (Cramer et al.
1997; Auboeuf et al. 2002; Kadener
-
4
et al. 2002; Nogues et al. 2002; de la Mata et al. 2003; Pagani
et al. 2003; Ip et al. 2011).
Transcription can influence AS through both ‘recruitment’ and
‘kinetic’ coupling (reviewed in
Munoz et al. 2010). In recruitment coupling, specific splicing
regulatory proteins as well as
factors with dual roles in transcription and splicing regulation
(see below for examples) are
recruited to the transcribing polymerase, often via association
with the CTD. In kinetic coupling,
changes in the pol II elongation rate affect splice site choice
by influencing the timing of
presentation of splicing signals in the pre-mRNA to the splicing
machinery. The pol II
elongation rate may be influenced by promoter identity and
associated transcriptional activators
or co-activators and by elongation factors associated with the
CTD.
Studies of the Ser/Arg-rich (SR) proteins, a class of
sequence-specific RNA-binding factors that
bind pre-mRNA to regulate AS (Section 1.3.4), illustrate the
interdependence between
transcription and AS regulation. The activities of several SR
proteins are modulated in a
promoter- and CTD-dependent manner (Cramer et al. 1999; de la
Mata and Kornblihtt 2006). In
addition, several factors involved in splicing, including the SR
protein SRSF2 (also known as
SC35), enhance transcription through the recruitment or
stimulation of elongation factors such as
the CTD kinase P-TEFb (Figure 1-1B) (Fong and Zhou 2001; Bres et
al. 2005; Lin et al. 2008).
Thus, transcription can affect splicing and, reciprocally,
splicing can affect transcription.
During the transcription cycle, Ser-2 or Ser-5-phosphorylated
pol II and associated elongation
factors recruit chromatin modifying complexes that establish or
maintain characteristic patterns
of histone modifications on active genes (reviewed in Buratowski
2009). In human cells, the 5′-
ends of active genes are typically marked by histone H3 lysine 4
trimethylation (H3K4me3)
(Bernstein et al. 2005). This chromatin mark is recognized by
CHD1 (chromodomain helicase
DNA binding protein 1), which can recruit the splicing machinery
(U2 snRNP; see Section 1.2.2)
to facilitate efficient pre-mRNA splicing (Sims et al. 2007). In
addition, the histone modification
H3K36me3 is enriched in gene regions encoding alternative exons
regulated by the splicing
factor PTB (polypyrimidine tract binding protein; see Section
1.3.3). These modified histone
tails are recognized by MRG15 (MORF-related gene 15), which
enhances the recruitment of
PTB to the pre-mRNA (Luco et al. 2010). Thus, physical crosstalk
between chromatin and the
splicing machinery represents an additional layer of gene
regulation (Figure 1-1C) (reviewed in
Allemand et al. 2008; Luco et al. 2011).
-
5
1.1.2 mRNA processing remodels the messenger RNP
Capping, splicing and polyadenylation in the nucleus result in
the association of protein
complexes with the mRNA, which in turn influence mRNA export,
localization, translation and
stability. The transcription and export (TREX) complex is
recruited to the 5′ end of mRNAs in a
cap- and splicing-dependent manner in human cells (Figure 1-1B)
(Masuda et al. 2005; Cheng et
al. 2006). The TREX subunit ALY (also known as REF, THOC4 or
Yra1 in yeast) directly binds
the mRNA as well as the CBP80 subunit of the cap-binding
complex. ALY functions as an
mRNA export adapter by transferring the mRNA to TAP
(TIP-associated protein; also known as
NXF1, nuclear RNA export factor 1, or Mex67 in yeast). Together
with its partner p15, TAP
interacts with the nuclear pore complex to mediate mRNA export
(Hautbergue et al. 2008).
Additonal RNA-binding proteins, including some SR proteins, can
also act as TAP-dependent
export adapters (Huang et al. 2003).
Splicing results in the deposition of a multi-protein exon
junction complex (EJC) approximately
20 nt upstream of exon-exon junctions (Figure 1-1B) (reviewed in
Le Hir and Andersen 2008).
The four core factors of the EJC are eIF4A3, Y14, MAGOH
(mago-nashi homologue), and
MLN51 (metastatic lymph node gene 51; also known as Barentz,
Btz, CASC3). These four
proteins along with RNPS1 and UPF3 remain associated with the
mRNA during export, until
they are removed during the first round of translation (Dostie
and Dreyfuss 2002; Lejeune et al.
2002). Additional splicing-related proteins are peripherally
associated with the EJC in the
nucleus but do not remain bound during export. While it was
initially believed that all splice
junctions are marked by the EJC, recent evidence in fly cells
suggests that EJC deposition may
be a regulated process (Sauliere et al. 2010). The EJC factors
have multiple roles in RNA
metabolism, including in mRNA localization (Palacios et al.
2004) and translation (Wiegand et
al. 2003; Nott et al. 2004). The EJC also communicates the
positions of splice junctions to
cytoplasmic factors involved in nonsense-mediated mRNA decay
(NMD), a pathway that
degrades mRNAs containing premature termination codons (PTC).
Specifically, the presence of
an EJC downstream of a PTC strongly stimulates mammalian NMD
(see Section 1.4). In
addition to these post-splicing roles, new findings in
Drosophila show that the EJC functions in
the splicing of exons flanked by long introns (Ashton-Beaucage
et al. 2010; Roignant and
Treisman 2010).
-
6
1.2 Pre-mRNA splicing
Approximately 92% of human protein-coding genes are interrupted
by introns, and on average
each gene contains 8-9 introns (Fedorova and Fedorov 2005). The
excision of introns from pre-
mRNA, or splicing, is catalyzed by the spliceosome, a large
ribonucleoprotein (RNP) complex
comprising the U1, U2, U4/6 and U5 small nuclear (sn)RNPs and a
few hundred protein factors
(reviewed in Wahl et al. 2009). Both RNA and protein components
of the spliceosome play
important roles in recognition of the core splicing signals and
in catalysis. This section outlines
the recognition of the core splicing signals and subsequent
assembly of the spliceosome on the
pre-mRNA. I will also focus on the core snRNP Sm proteins, which
will be relevant in the later
chapters of my thesis.
1.2.1 Core and auxiliary splicing signals
The core splicing signals in the pre-mRNA are short motifs with
considerable sequence
flexibility. The 5′ (donor) and 3′ (acceptor) splice sites (ss)
are located at the 5′ and 3′ boundaries
of the intron, respectively, and the branch point is located
upstream of the 3′ss. Consensus
sequences for the mammalian core splicing signals are shown in
Figure 1-2A. The splicing
reaction involves two successive trans-esterifications. In the
first step, the 2′ hydroxyl of the
branch point adenosine attacks the phosphodiester bond at the
5′ss, generating a free 3′ hydroxyl
on the 5′ exon and a branched intron-lariat-3′exon as
intermediates. In the second step, the free 3′
hydroxyl of the 5′ exon attacks the phosphodiester bond at the
3′ss, resulting in ligation of the
exons and release of the intron lariat.
The short, degenerate core splicing signals that mark the
boundaries of introns do not contain
sufficient information to accurately define the exons in human
transcripts (Lim and Burge 2001).
Introns often contain many ‘pseudoexons’ – intronic sequences
flanked by ‘decoy’ consensus ss
sequences that are not normally recognized by the splicing
machinery. Thus additional cis-acting
regulatory sequences are necessary to distinguish introns and
exons (reviewed in Chasin 2007).
These auxiliary sequences are known as exonic or intronic
splicing enhancers when they
promote splicing (ESE/ISEs), or as splicing silencers when they
inhibit splicing (ESS/ISS).
Splicing enhancers and silencers are usually short, degenerate
sequence motifs (5-10 nt) and they
play roles in the recognition of constitutive exons (Section
1.2.3) as well as in the regulation of
inclusion of alternative exons (Section 1.3.2).
-
7
Figure 1-2. Overview of core splicing signals and early stages
of spliceosome assembly.
(A) Consensus sequences of the mammalian core splicing signals.
PPT, polypyrimidine tract; ss,
splice site.
(B) Early stages of spliceosome assembly are shown. The U1, U2,
and U4/6.U5 snRNPs contain
the indicated snRNA(s) and associated proteins. Sequences of U1
and U2 snRNAs that base-pair
with the 5′ss and branch site, respectively, are shown in white
text. Ψ, pseudouridine; R, A/G; Y,
U/C.
1.2.2 Spliceosome assembly
The consensus model of spliceosome assembly has been mostly
characterized using in vitro
approaches (reviewed in Matlin and Moore 2007; Wahl et al.
2009). Spliceosome assembly is a
step-wise process involving recruitment of snRNPs and proteins
to the pre-mRNA and dynamic
rearrangements of RNA–RNA, RNA–protein and protein–protein
interactions (Figure 1-2B). In
the early (E) complex, also known in yeast as the ‘commitment
complex’, the 5′ss is recognized
-
8
by U1 snRNP, the branch point is recognized by SF1 (Splicing
Factor 1; also known in yeast as
BBP, branchpoint binding protein), and the PPT and 3′ss are
recognized by the subunits of the
U2 accessory factor (U2AF) heterodimer (U2AF65 and U2AF35,
respectively). Recognition of
the 5′ss involves base-pairing between the 5′-end of U1 snRNA
and the pre-mRNA, which is
stabilized by proteins in the U1 snRNP (Zhang and Rosbash
1999).
The U2 snRNP then replaces SF1 at the branchpoint, forming the A
complex (also referred to as
the pre-spliceosome) (Figure 1-2B). Formation of the A complex
is ATP-dependent and involves
base-pairing of U2 snRNA at the branch site region, which is
stabilized by components of the U1
and U2 snRNPs and by U2AF65 (Barabino et al. 1990; Valcarcel et
al. 1996; Gozani et al.
1998). A bulged duplex formed between the U2 snRNA and the
branch site region specifies the
protruding adenosine as the nucleophile for the first
trans-esterification reaction of splicing
(Query et al. 1994). The bulged adenosine is also recognized by
the U2 snRNP protein p14
(SF3B14) (MacMillan et al. 1994; Schellenberg et al. 2011).
While the splice sites are
recognized in E complex, the pairing of splice sites for
catalysis occurs at, or subsequent to, A
complex formation (Chiara and Reed 1995; Lim and Hertel 2004;
Kotlajich et al. 2009).
The U4/6.U5 tri-snRNP then joins the spliceosome, forming the B
complex. This complex
undergoes extensive remodeling to form the catalytically active
spliceosome. The multiprotein
PRP19/CDC5L complex (also known in yeast as the NineTeen Complex
or NTC) and additional
RNA helicases also associate with the spliceosome and function
in spliceosome activation and
splicing fidelity (reviewed in Valadkhan 2007; Hogg et al.
2010). The remodelling of RNA–
RNA interactions during spliceosome activation includes
disruption of U4–U6 snRNA base-
pairing to allow base-pairing of U6 snRNA with intronic
nucleotides at the 5′ss, release or
destabilization of the U1 and U4 snRNPs, and rearrangement of
interactions between U2 and U6
snRNA and within U6 snRNA. Following the two
trans-esterification reactions of splicing, the
products are released and the components of the spliceosome are
recycled.
In contrast to the step-wise model of spliceosome assembly
characterized in vitro, the isolation
of a ‘penta-snRNP’ from yeast cells led to the hypothesis that
the spliceosome may encounter the
pre-mRNA in a pre-assembled form in vivo (Stevens et al. 2002).
However, it has been suggested
that the two models might be reconciled if the step-wise
assembly characterized in vitro could be
viewed instead as step-wise rearrangement and activation of the
penta-snRNP (reviewed in Brow
-
9
2002; Nilsen 2002). Recent work also supports the relevance of
the step-wise assembly model in
vivo. Several groups used chromatin immunoprecipitation to
monitor the co-transcriptional
recruitment of snRNP components and other splicing factors to
nascent transcripts of yeast
intron-containing genes. These studies showed a sequential
pattern of snRNP or splicing factor
recruitment that was consistent with step-wise spliceosome
assembly (Gornemann et al. 2005;
Lacadie and Rosbash 2005; Tardiff and Rosbash 2006). In
addition, live imaging of snRNP
components tagged with fluorescent proteins revealed distinct
interaction dynamics of individual
snRNPs with pre-mRNA, in support of a step-wise recruitment
model in human cells (Huranova
et al. 2010).
1.2.3 Exon definition
The splicing reaction takes place between 5′ and 3′ splice sites
paired across an intron. However,
in metazoan genes, where introns are often longer than exons by
an order of magnitude or more,
it is likely that splicing is facilitated by a process termed
‘exon definition’ (Berget 1995). In the
exon definition model, the factors bound to the splice sites on
either side of internal exons
initially interact and are stabilized across the exon (Figure
1-2B). Early evidence for exon
definition included the finding that the presence and strength
of a 5′ss downstream of an exon
affects the recognition and splicing of the upstream intron
(Nasim et al. 1990; Robberson et al.
1990; Talerico and Berget 1990; Kuo et al. 1991). In addition,
using a reporter containing an
isolated exon flanked by splice sites, it was found that the
5′ss sequence and U1 snRNP
promoted UV crosslinking of U2AF65 at the PPT/3′ss, thus
providing further evidence for the
importance of cross-exon interactions (Hoffman and Grabowski
1992). Key mediators of this
cross-exon bridging activity include proteins in the SR family
(Section 1.3.4). These proteins
interact with exonic splicing enhancers (ESEs) and promote
binding of U1 snRNP and U2AF to
the pre-mRNA through direct interactions as well as through
interactions with splicing co-
activator proteins (reviewed in Blencowe 2000). The RS domains
of SR proteins also promote or
stabilize RNA–RNA contacts between the core splicing signals and
the U-snRNAs (Shen and
Green 2006). Computational analysis of splicing signals in human
and mouse also support the
exon definition model. Compensatory changes in the strength of
5′ and 3′ splice sites are
observed across exons, but not across introns (Xiao et al.
2007). Furthermore, splice sites, ESEs,
and ESSs coevolve to preserve the overall exon strength (Xiao et
al. 2007).
-
10
Our understanding of exon definition complexes is incomplete,
since the majority of in vitro
spliceosome assembly assays have used reporter pre-mRNAs
containing two exons separated by
a single short intron. However, several recent studies have shed
light on exon-defined
complexes. Assembly of spliceosome complexes in vitro on a
three-exon pre-mRNA reporter
revealed that an exon-defined E complex can be chased into an
exon-defined A complex in the
presence of ATP (Sharma et al. 2008). In addition, proteomics
analysis indicated that these exon-
defined complexes were similar in composition to previously
characterized intron-defined
complexes (Sharma et al. 2008). The mechanism for conversion of
cross-exon interactions into
cross-intron interactions is also an area under active
investigation. Recently, using an in vitro
trans-splicing assay, it was shown that the U4/6.U5 tri-snRNP
can associate with an exon-
defined A complex, without requiring prior establishment of
cross-intron interactions between
U1 and U2 snRNP (Schneider et al. 2010). In addition, the
establishment of cross-intron
interactions upstream of an exon did not require disruption of
the interactions formed across that
exon (Schneider et al. 2010). In a related study, conversion of
cross-intron to cross-exon
interactions was investigated using pre-mRNA reporters with
multiple introns. Following
splicing of one intron, U1 snRNP previously engaged in
cross-exon interactions on the 3′exon
remains associated with the mRNA and promotes efficient splicing
of the neighbouring intron
(Crabb et al. 2010).
1.2.4 Spliceosomal snRNPs and Sm proteins
The snRNPs are major components of the spliceosome. Each snRNP
contains a uridine-rich
snRNA (U1, U2, U4, U5 or U6) and associated proteins, however U4
and U6 are base-paired in
a U4/6 di-snRNP (Bringmann et al. 1984; Hashimoto and Steitz
1984) which is also found
associated with U5 snRNP in a U4/6.U5 tri-snRNP complex
(Konarska and Sharp 1987). The
purification of snRNP components from mammalian cells was
fortuitously accomplished using
serum from a patient with the autoimmune disease systemic lupus
erythematosus (SLE) (Lerner
and Steitz 1979). This SLE serum was known to contain antibodies
that react with a nuclear
antigen present in many mammalian tissues (Tan and Kunkel 1966).
The nuclear antigen was
designated ‘Sm’, for ‘Smith’, in honour of Stephanie Smith, the
SLE patient from whom the
serum was isolated (Tan and Kunkel 1966) (reviewed in Reeves et
al. 2003). Using the anti-Sm
serum, RNPs containing the U-snRNAs and 7 small (12-35 kDa)
proteins designated A-G were
-
11
immunoprecipitated from mammalian cell extracts (Lerner and
Steitz 1979). A subset of these
proteins that are common to the U1, U2, U4 and U5 snRNPs became
known as the Sm proteins.
The snRNPs contain both common and unique proteins. The seven
common Sm proteins (B/B′
(see Chapter 4), D1, D2, D3, E, F and G) are assembled onto the
snRNAs by the SMN complex
(survival of motor neuron) (reviewed in Neuenkirchen et al.
2008). Formation of this snRNP
‘core’ is essential for subsequent steps in the biogenesis of
mature snRNP particles. The Sm
proteins bind a conserved single-stranded ‘Sm site’ with
consensus sequence PuA[U3-6]GPu,
located between two stem-loops near the 3′ end of the ‘Sm-class’
snRNAs (U1, U2, U4 and U5)
(Branlant et al. 1982; Liautard et al. 1982). Based on crystal
structures of the B-D3 and D1-D2
Sm protein dimers along with previous biochemical data, a model
was proposed in which the Sm
site RNA passes through the central cavity formed by a
hetero-heptameric Sm protein ring
(Kambach et al. 1999). This model was recently confirmed by two
crystal structures of the U1
snRNP assembled from recombinant components (Pomeranz Krummel et
al. 2009) or generated
by limited proteolysis of native snRNPs isolated from HeLa cells
(Weber et al. 2010).
The Sm proteins are essential for the assembly and stability of
snRNPs. However, their role in
the splicing process is not well characterized. In yeast, Sm
proteins B, D1 and D3 contact the
pre-mRNA near the 5′ss in the commitment/E complex (Zhang and
Rosbash 1999). These three
Sm proteins have extensions or ‘tails’ located C-terminal to
their conserved Sm domains.
Splicing assays in yeast strains harboring tail-truncated Sm
proteins suggested that the tails of
Sm B, D1 and D3 contribute to the stability of the U1
snRNA–pre-mRNA interaction, perhaps
through basic arginine and lysine residues in the yeast Sm
protein tails (Zhang et al. 2001). The
mammalian C-terminal tails are also rich in positively charged
residues. The D1 and D3 tails
contain glycine-arginine (GR) repeats. In contrast, the SmB tail
in mammals is quite divergent
from that of yeast and contains a striking stretch of repeats of
3-4 prolines interspersed with
glycine, methionine and arginine residues (e.g.
GMPPPGMRPPPPGMR). These ‘PGM’ motifs
in the tail interact with the WW domain of FBP21 (formin-binding
protein 21), a spliceosome-
associated protein implicated in cross-intron bridging
interactions (Bedford et al. 1998).
However, the function of this interaction in splicing has not
been studied. In addition, while the
U1 snRNP crystal structures mentioned above provided insight
into recognition of the snRNA by
the Sm ring, they were less informative regarding the function
of the C-terminal Sm tails, since
these repetitive regions were either omitted from recombinant
proteins or found to be disordered
-
12
(Pomeranz Krummel et al. 2009; Weber et al. 2010). Overall,
while the C-terminal tails of the
Sm B/B′, D1 and D3 proteins play a role in nuclear localization
of the snRNPs (Bordonne 2000;
Girard et al. 2004), the roles of the mammalian tails in U1
snRNA–pre-mRNA interaction or
other steps in splicing remain to be studied.
Additional insights into the function of Sm proteins in splicing
might be inferred by analogy to
the functions these proteins in other RNA–protein complexes. In
addition to the spliceosomal
snRNPs, Sm proteins form a related but distinct heptamer on U7
snRNA. The U7 heptamer
contains five Sm proteins (B/B′, D3, E, F, and G), along with
two Sm-like (LSm) proteins
LSm10 and LSm11, which replace Sm proteins D1 and D2,
respectively. The U7 snRNP
functions in histone 3′end processing (reviewed in Dominski and
Marzluff 2007). A recent study
found that the U7 snRNP components SmB, SmD3 and LSm10
UV-crosslinked to the histone
mRNA (Yang et al. 2009). A model was proposed in which these
proteins might function as a
‘molecular ruler’ to specify the histone mRNA cleavage site at a
fixed distance upstream of an
RNA sequence (the ‘histone downstream element’) that is
recognized by base-pairing to U7
snRNA (Yang et al. 2009). In this model, Sm proteins B and D3
function as part of the heptamer
to mediate RNA–RNA interactions between the U7 snRNA and the
histone mRNA. This
function is reminiscent of the proposed role of the yeast Sm
complex in U1 snRNA–pre-mRNA
interaction discussed above (Zhang et al. 2001). However, it is
not known if the Sm protein–
RNA interaction occurs via the C-terminal tails, as suggested in
the yeast model, or another
region of the Sm proteins.
1.3 Regulation of alternative splicing
Alternative splicing (AS) is the process of differential splice
site usage to generate multiple
mRNA variants from a single pre-mRNA. Upon release of the draft
human genome, it was
estimated that at least 59% of genes undergo AS, based on
aligning expressed sequence tags
(ESTs) and cDNAs to coding genes on chromosome 22 (International
Human Genome
Sequencing Consortium 2001). A higher frequency of AS, affecting
74% of multi-exon genes,
was then estimated based on data from tissue profiling on exon
junction microarrays and
EST/cDNA evidence (Johnson et al. 2003). More recently, the use
of high-throughput RNA
sequencing (RNA-Seq) has led to an estimate that transcripts
from 95% of human multi-exon
genes undergo AS (Pan et al. 2008; Wang et al. 2008).
Alternative splicing affects transcript
-
13
diversity in several ways, including cassette-type exons,
mutually exclusive exons, alternative 5′
or 3′ss selection, alternative promoters, alternative
polyadenylation, and intron retention. In my
work, I will focus on cassette-type exons, which are either
included or skipped in the spliced
mRNA, and which represent the most common type of AS (Castle et
al. 2008; Wang et al. 2008).
Although AS is widespread, the functional importance of most
splice variants remains to be
investigated.
1.3.1 Roles of alternative splicing
Very soon after the discovery that genes are interrupted by
introns, it was proposed that exons
might be joined in different combinations to generate multiple
polypeptides from a single gene
(Gilbert 1978). This role of AS in expansion of the proteome has
been particularly emphasized
following the sequencing of the human genome (International
Human Genome Sequencing
Consortium 2001), which was found to encode fewer protein-coding
genes than anticipated by
many (reviewed in Aparicio 2000; Pennisi 2003). A primary
outcome of AS is the expansion of
transcriptome complexity. An important consequence is an
increase in the diversity of the
encoded proteome (reviewed in Maniatis and Tasic 2002; Nilsen
and Graveley 2010). However,
an additional outcome of transcriptome expansion by AS is an
increase in post-transcriptional
regulatory potential. For example, differences in the coding
region, 5′UTR or 3′UTR between
mRNA variants produced from the same pre-mRNA can affect
translation (e.g. upstream ORFs),
stability (e.g. microRNA binding sites, AU-rich elements,
premature stop codons), and mRNA
localization, and thus have important consequences for the
regulation of gene expression
(Majoros and Ohler 2007; Tan et al. 2007; Mayr and Bartel 2009;
Resch et al. 2009; Bell et al.
2010; Salomonis et al. 2010) (reviewed in Smith et al. 1989;
Hughes 2006). The roles of AS in
regulating gene expression will be discussed further in Section
1.5 below.
1.3.2 Mechanisms of alternative splicing regulation
Alternative splicing can be controlled in a developmental stage-
and cell type-specific manner, as
well as in response to signaling or environmental cues (reviewed
in Chen and Manley 2009).
This AS regulation is achieved through multiple levels of
control. For example, transcription
elongation rate, chromatin modification, EJC deposition (see
Section 1.1) and pre-mRNA
secondary structure (reviewed in Warf and Berglund 2010) can
influence splice site choice.
However, the best-characterized mechanism of AS regulation is
through the recognition of short
-
14
cis-acting RNA sequence motifs (ESE/S, ISE/S) by
splicing-regulatory proteins. Initial studies of
AS regulation focused on the enhancement or repression of splice
site recognition at the early
stages of spliceosome assembly (Section 1.3.4). In contrast,
some regulatory mechanisms affect
splice site pairing, rather than recognition, or recruitment of
the U4/U6.U5 tri-snRNP. These
diverse mechanisms allow regulation of splice site choice at
later stages of spliceosome assembly
or even during splicing catalysis (Section 1.3.5).
1.3.3 Families of alternative splicing regulatory factors
The most extensively studied groups of splicing-regulatory
factors are the SR (Ser/Arg-rich),
SR-related and hnRNP (heterogeneous ribonucleoprotein) families,
which I will discuss in the
next section (1.3.4). Many of these proteins are widely
expressed and thought to affect AS
regulation in a concentration-dependent manner (Mayeda et al.
1993; Caceres et al. 1994;
Hanamura et al. 1998) (reviewed in Chen and Manley 2009).
However, some members of these
families have tissue-restricted expression patterns. For
example, our lab recently identified and
characterized the first example of a nervous system-specific
SR-related protein, nSR100 (also
known as SRRM4, serine/arginine repetitive matrix 4) (Calarco et
al. 2009). In addition, the
hnRNP family member PTBP1 (polypyrimidine tract binding protein
P1; also known as PTB,
hnRNPI) is widely expressed, while two PTBP1 paralogues, PTBP2
(also known as nPTB,
brPTB, neural/brain PTB) and ROD1 (regulator of differentiation
1) are expressed in specific
cell types. Interestingly, regulation of the AS of the genes
encoding these proteins plays a role in
establishing their expression patterns (see Section 1.5)
(Wollerton et al. 2004; Boutz et al. 2007b;
Makeyev et al. 2007; Spellman et al. 2007).
Several other AS factors with tissue-restricted expression have
also been characterized. Members
of the NOVA (neuron-oncological ventral antigen) and ELAV-like
(embryonic lethal, abnormal
vision-like; also known as paraneoplastic encephalomyelitis
antigen Hu) families are expressed
in neurons, FOX (Feminizing gene On X homolog) and CELF
(CUG-binding protein and ETR3-
like family, also known as Bruno-like) proteins are expressed in
the brain, heart or muscle,
(reviewed in Li et al. 2007) and ESRPs (epithelial splicing
regulatory proteins) are expressed in
epithelial cells (Warzecha et al. 2009). Like many of the SR
proteins and hnRNPs, these factors
bind short RNA motifs in a sequence-specific manner, through RNA
recognition motifs (RRMs)
or hnRNP-K homology (KH) domains (Cook et al. 2011).
-
15
1.3.4 Regulation of splice site recognition
1.3.4.1 SR and SR-related proteins
The SR proteins contain 1-2 N-terminal RNA recognition motifs
(RRMs) and a C-terminal RS
domain that is rich in alternating serine and arginine
dipeptides (reviewed in Lin and Fu 2007;
Long and Caceres 2009). The prototypical SR proteins function in
both constitutive and
alternative splicing. Based on in vitro splicing assays, these
SR proteins appear to be functionally
redundant in their ability to complement splicing-deficient HeLa
S100 extract (Fu et al. 1992;
Mayeda et al. 1992). However, additional studies indicate that
SR proteins bind distinct RNA
sequences and that they have non-redundant AS functions in vivo
(reviewed in Long and Caceres
2009). For example, depletion of the prototypical SR protein
SRSF1 (also known as SF2, ASF)
in C. elegans by RNAi results in late embryonic lethality
(Longman et al. 2000). Similarly, loss
of SRSF1 in chicken DT-40 cells or in mouse embryos is lethal
(Wang et al. 1996; Xu et al.
2005). Moreover, tissue-specific ablation of SRSF1 in the mouse
heart resulted in misregulation
of an SRSF1-dependent AS event in Ca2+
/calmodulin-dependent kinase IIδ (CaMKIIδ) and a
defect in postnatal heart remodelling (Xu et al. 2005). Thus, SR
proteins have specific, non-
redundant functions in the regulation of AS.
In addition to the prototypical SR proteins, many other
‘SR-related’ proteins also function as
regulators of splicing and AS. These proteins often contain RS
and RRM domains, but in a
different configuration than the classical SR proteins. Examples
of such SR-related proteins
include TRA2A and TRA2B, which are homologues of transformer-2,
an AS regulator involved
in Drosophila sex determination. Other SR-related proteins
contain RS domains alone or in
combination with other RNA-binding domains (reviewed in Blencowe
et al. 1999).
Though best known as positive regulators of AS, SR proteins can
both promote and inhibit the
inclusion of alternative exons (reviewed in Lin and Fu 2007;
Long and Caceres 2009). SR
proteins function in ESE-dependent splicing in several ways
(reviewed in Blencowe 2000;
Graveley 2000). SR proteins can bind specific ESE sequences and
recruit the splicing machinery
via interactions of their RS domains with snRNP components (e.g.
U2AF35 and U170K)
(Lavigueur et al. 1993; Wu and Maniatis 1993; Wang et al. 1995;
Zuo and Maniatis 1996;
Graveley et al. 2001). Alternatively, some SR-related proteins
can function in ESE-dependent
splicing by acting as splicing co-activators that bridge
interactions between ESE-bound SR/SR-
-
16
related proteins and snRNPs (Blencowe et al. 1998; Eldridge et
al. 1999; Blencowe et al. 2000).
Binding of SR proteins can also enhance exon inclusion by
antagonizing the activity of negative
regulators bound at nearby silencer elements (Kan and Green
1999). Recent results also show
that inclusion of an alternative exon can be repressed by strong
interactions of SR proteins with
the flanking constitutive exons (Han et al. 2011). In addition
to roles in AS regulation, some SR
and SR-related proteins function in transcription, 3′end
formation, mRNA export and translation
(reviewed in Blencowe et al. 1999; Long and Caceres 2009).
1.3.4.2 hnRNPs
The heterogeneous ribonucleoproteins (hnRNPs) are a diverse
group of proteins functionally
defined by their association with nascent hnRNA (pre-mRNA). The
hnRNPs typically contain
one to four RNA-binding domains (RRMs, quasiRRMs or KH domains),
as well as other
auxiliary domains such as RGG boxes (Arg-Gly-Gly) or Gly-rich
domains (reviewed in
Martinez-Contreras et al. 2007). Many of the hnRNPs that have
been implicated in AS regulation
can inhibit splice site recognition through binding to specific
silencer sequences (Caputi et al.
1999; Chen et al. 1999; Del Gatto-Konczak et al. 1999). Some
hnRNPs such as hnRNPA1 may
also cooperatively multimerize on the pre-mRNA to block the
association of other factors at a
distance (Zhu et al. 2001). The recognition of silencers by
hnRNPs can thus block or compete
with the recognition of either nearby or distal enhancer
sequences by positive regulatory factors.
Alternatively, hnRNPs may block or compete with the binding of
snRNP-associated factors such
as U2AF to the core splicing signals (Lin and Patton 1995; Singh
et al. 1995). Some hnRNPs
also stimulate intron definition through interactions between
multiple proteins recognizing sites
at the boundaries of long introns (Martinez-Contreras et al.
2006). In addition, when intronic
hnRNP binding sites flank an alternative exon, interaction
between the hnRNPs can lead to exon
silencing by ‘looping out’ the alternative exon and bringing the
splice sites of the flanking exons
into close proximity (Chabot et al. 1997; Blanchette and Chabot
1999). However, at least in one
case of such a looping mechanism, the binding of U1 snRNP to the
5′ss of the silenced exon was
not inhibited (Chabot et al. 1997; Blanchette and Chabot 1999).
Therefore, this mechanism may
involve inhibition of splice site pairing rather than
recognition, as described in the next section.
-
17
1.3.5 Regulation of splice site pairing and catalysis
In addition to the regulation of splice site recognition at the
earliest stages of spliceosome
assembly, a number of recent studies have revealed that AS can
be regulated at later stages,
including the subsequent steps involved in the pairing of splice
sites or the recruitment of the tri-
snRNP (reviewed in House and Lynch 2008). Moreover, some
trans-acting splicing factors can
regulate AS at both early and late stages of spliceosome
assembly. For example, the hnRNP
PTBP1 can repress alternative exon inclusion by inhibiting early
steps leading to exon definition
(Izquierdo et al. 2005; Sharma et al. 2005). However, in another
mechanism, PTB can act after
exon definition, by binding in an intron and blocking the
functional cross-intron pairing of U1
and U2 snRNPs already associated with the splice sites (Sharma
et al. 2008). Repression of
alternative exon inclusion by hnRNP-L and hnRNP-E2 can also
occur through a post–exon
definition mechanism. In this case, the binding of the hnRNPs to
an exon prevents the U1 and
U2 snRNPs bound at its splice sites from forming productive
cross-intron interactions with
snRNPs at the flanking exons (House and Lynch 2006). Post exon
definition mechanisms are
also not limited to hnRNPs. The SR-related tumor suppressor RBM5
can repress exon inclusion
by a dual mechanism involving both blocking the transition to
intron definition of the snRNP-
recognized splice sites flanking a repressed alternative exon,
as well as facilitating the pairing of
the splice sites of the flanking constitutive exons (Bonnal et
al. 2008). Splice site choice can also
be regulated during catalysis. In the Drosophila melanogaster
sex determination gene Sex-lethal,
the Sex-lethal protein causes skipping of an alternative exon in
its own transcript through an
interaction with the splicing factor SPF45 that blocks splicing
at the second catalytic step
(Lallena et al. 2002). Together, these studies reveal the
diversity of splicing regulatory
mechanisms.
1.3.6 Roles of basal splicing factors in alternative splicing
regulation
Studies in yeast and metazoans have shown that the levels of
some basal or ‘core’ components of
the splicing machinery can affect splice site choice. Microarray
profiling revealed transcript-
specific splicing effects in yeast strains harboring mutations
in or deletions of core splicing
components (Clark et al. 2002; Pleiss et al. 2007; Kawashima et
al. 2009). In addition, an RNAi
screen in Drosophila cells identified transcript-specific
effects on AS upon depletion of general
spliceosome factors, including U2AF and components of U1, U2 and
U4/U6 snRNPs (Park et al.
2004). Studies in C. elegans and mammalian cells also suggested
that the U2AF subunits and the
-
18
U2 snRNP component SAP155 can affect splice site choice
(Massiello et al. 2006; Pacheco et al.
2006; Hastings et al. 2007; Ma and Horvitz 2009). Two very
recent studies implicate additional
core splicing factors in AS regulation and identify associated
target sequence features. The
branchpoint recognition factor SF1 may regulate AS of some
transcripts by binding to branch
site-like sequences (Corioni et al. 2011). Also, transcriptome
profiling in zebrafish embryos
deficient in the U1 snRNP-specific protein U1C revealed altered
splice site choice in targets with
intronic U-rich sequences (Rosel et al. 2011). In a mouse model
of spinal muscular atrophy
(SMA), deficiency of the snRNP assembly factor SMN (Survival of
Motor Neuron) resulted in
tissue-specific perturbations in snRNP levels and splicing
defects (Gabanella et al. 2007; Zhang
et al. 2008; Baumer et al. 2009). Tiling microarray profiling
analysis of fission yeast RNA also
revealed transcript-specific splicing defects of a
temperature-degron allele of SMN, and that
some of the defects could be alleviated by strengthening the
pyrimidine tract upstream of the
branch-point (Campion et al. 2010). In addition to these
studies, the work in my thesis will
provide new evidence for the role of core splicing factors in AS
regulation (Saltzman et al.
2011).
In summary, the features that underlie the differential
sensitivity of introns or alternative exons
to particular defects in the core splicing machinery are only
beginning to be explored. Moreover,
in contrast to the AS regulatory factors described above, the
mechanisms of these effects are
poorly understood. Some clues may be provided by analogy to the
kinetic proofreading model of
splicing fidelity in yeast. This model broadly predicts that any
changes that alter the kinetics of
transitions in the splicing pathway, including the availability
or activity of core splicing factors,
can alter splice site choice (Yu et al. 2008) (reviewed in Smith
et al. 2008).
1.3.7 Breaking the ‘code’ of cis-acting alternative splicing
regulatory sequences
A goal of the study of AS is to build predictive models for AS
regulation, or a splicing regulatory
‘code’ (reviewed in Matlin et al. 2005; Blencowe 2006; Wang and
Burge 2008). Deciphering the
rules that control AS will be important for understanding gene
expression on a genome-wide
scale, and for the ability to predict how mutations affect this
regulation. However, the nature of
splicing regulation complicates the path from genomic sequence
to AS predictions. For example,
a particular cis-regulatory sequence can have opposite effects
on AS regulation depending on its
position within an intron or exon, even when the sequence is
recognized by the same trans-acting
-
19
regulator (reviewed in Chen and Manley 2009). The activity of an
AS regulator can also depend
on local sequence context (Xiao et al. 2009; Motta-Mena et al.
2010) or on its post-translational
modification state (Feng et al. 2008). Many regulated
alternative exons and their flanking introns
also have binding sites for multiple factors, suggesting they
are controlled in a combinatorial
manner. Nevertheless, significant advances have been made
recently in identifying sequence
features that predict tissue-regulated AS as well as regulation
by specific trans-acting factors
(Barash et al. 2010; Zhang et al. 2010). This progress has been
accelerated by integrating
information from multiple sources, especially sequence
conservation across species, splicing
regulatory motifs identified through bioinformatic and
experimental screening approaches, RNA
target binding data for AS regulators, RNA structural features,
and splice variant profiling data
from microarrays or high throughput RNA sequencing
(RNA-Seq).
1.3.8 Large-scale analysis of alternative splicing
regulation
Many insights into AS and its regulation have been made possible
using high-throughput
methods to study the transcriptome. Technologies used to detect
and quantify the levels of splice
variants in an mRNA sample include microarrays (tiling, exon,
exon-junction and exon/exon-
junction combinations) (Shoemaker et al. 2001; Johnson et al.
2003; Pan et al. 2004) (reviewed
in Calarco et al. 2007; Hallegger et al. 2010), fibre-optic bead
arrays (Yeakley et al. 2002), high-
throughput RT-PCR (Klinck et al. 2008), and RNA-Seq (Cloonan et
al. 2008; Mortazavi et al.
2008; Pan et al. 2008; Sultan et al. 2008; Wang et al. 2008)
(reviewed in Blencowe et al. 2009).
These methods have been used to profile differences in the
mammalian splice variant repertoire
among tissues, individuals, developmental stages and cell
culture models of developmental
transitions, as well as in cancer versus normal tissues
(reviewed in Calarco et al. 2007; Hartmann
and Valcarcel 2009; Hallegger et al. 2010). High throughput
methods have also been used to
identify functional targets of specific AS regulators by
profiling AS following knockdown or
loss of a particular protein (Blanchette et al. 2005; Ule et al.
2005) (reviewed in Calarco et al.
2007; Hallegger et al. 2010). Combining this profiling data with
factor binding site preferences
determined by methods such as SELEX (Tuerk and Gold 1990) or
RNAcompete (Ray et al.
2009) can then provide insights into the biological function of
an AS regulator. Furthermore, to
distinguish direct from indirect targets, methods such as UV
Cross-linking and
Immunoprecipitation coupled with high throughput sequencing
(CLIP-Seq; also known as high
throughput sequencing of RNA isolated by CLIP, HITS-CLIP) allow
the isolation of RNA
-
20
targets directly bound by a protein of interest on a genome-wide
scale (Ule et al. 2003) (reviewed
in Witten and Ule 2011).
In addition to cataloguing transcriptome complexity, the
approaches mentioned above have
revealed sequence features associated with AS regulation and
allowed construction of ‘RNA
splicing maps’ of the position-dependent effects of AS
regulators (reviewed in Witten and Ule
2011). More generally, while mRNA expression profiling
microarrays showed that functionally
related genes are often co-expressed in mammalian cells and
tissues (Eisen et al. 1998; Su et al.
2004; Zhang et al. 2004), AS microarray profiling studies
revealed that functionally related
genes are also coordinately regulated by AS. These ‘AS networks’
or ‘exon networks’ have
functional properties reflecting tissue identity, but the groups
of genes are often distinct from
those co-regulated at the transcriptional level (Le et al. 2004;
Pan et al. 2004; Fagnani et al.
2007; Castle et al. 2008). In addition, functionally related
genes are often co-regulated by tissue-
restricted AS factors such as NOVA, nSR100, ESRP and CELF/MBNL
(reviewed in Licatalosi
and Darnell 2010; Calarco et al. 2011). The coordination of gene
expression through AS
networks extends previous models proposing that mRNPs represent
“post-transcriptional
operons” in eukaryotes (Keene and Tenenbaum 2002).
1.3.9 Overview of large-scale alternative splicing detection
methods used in this thesis
In my thesis work, I used both microarray- and RNA-Seq-based
methods to quantify the relative
abundance of mRNA splice variants. An overview comparing and
contrasting these approaches
is presented in Figure 1-3. In both cases, the experimental
workflow begins with isolation of
polyadenylated (polyA+) RNA from cells or tissues which is then
reverse-transcribed to cDNA
(Figure 1-3A). Fluor-labeled single-stranded cDNA is generated
for hybridization to AS
microarrays (Hughes et al. 2006), whereas fragmented,
double-stranded cDNA flanked by
adapters is generated for RNA-Seq following the Illumina
mRNA-Seq protocol. In parallel to
these steps, a database of cassette-type AS events is generated,
by identifying cassette-type AS
events in cDNA and EST sequences that have been aligned to the
genome (Figure 1-3B)
(performed by Sandy Pan) (Pan et al. 2004; Pan et al. 2005).
This AS database is used to design
oligonucleotide probes for the AS microarray, or as a set of
exon-exon junction sequences onto
which RNA-Seq reads are bioinformatically aligned (Figure 1-3A).
The % alternative exon
inclusion measurements (‘% inclusion’, i.e. the percentage of
transcripts in which the alternative
-
21
exon is included) calculated using the AS microarray platform or
the RNA-seq method are then
quality-filtered using simple criteria. The resulting AS
predictions correlate well with
measurements made by independent methods such as RT-PCR (Chapter
2, Chapter 5).
Figure 1-3. Outline of microarray and RNA-Seq AS profiling
methods used in this work.
(A) Left: For AS microarray profiling, fluor-labeled cDNAs are
hybridized to the AS microarray.
The GenASAP algorithm is then used to estimate the % exon
inclusion levels and confidence
ranks from the signal intensities of the scanned microarray
images.
Right: For RNA-Seq AS profiling, 50-nt high-throughput short
read sequencing is performed on
cDNA libraries using the Illumina Genome Analyzer II. The % exon
inclusion levels are
calculated by counting the number of sequence reads that align
to the included or skipped
junctions in the AS database.
(B) Construction of a database of cassette-type AS events mined
from ESTs/cDNAs. These AS
events are used to design exon and exon-exon junction microarray
probes or to align RNA-Seq
reads to exon-exon junction sequences.
-
22
1.3.9.1 Alternative splicing microarray profiling
The AS microarray platform developed by the Blencowe and Frey
labs contains sets of six
probes for ~3000 AS events (three exon probes: C1, A, C2 and
three junction probes C1-A, A-
C2, C1-C2) (Figure 1-3A) (Pan et al. 2004). Ideally, both splice
variants should hybridize to the
C1 and C2 exon probes, whereas the included variant should
hybridize specifically to the C1-A,
A, and A-C2 probes, and the skipped variant should hybridize
specifically to the C1-C2 junction
probe. Although the probes are designed for optimal specificity,
in practice the probe signals do
not correspond to this ‘ideal hybridization profile’, especially
as a result of cross-hybridization of
the splice variants to the junction probes. In addition,
accurate prediction of relative splice
variant levels for some AS events is complicated by outlier
probes, whose signals are not
consistent with the other five probes for the AS event, as well
as by other sources of noise.
Therefore, a Bayesian learning algorithm called the Generative
model for the Alternative
Splicing Array Platform (GenASAP) is used to accurately predict
the AS levels (% inclusion)
from the microarray data (Shai et al. 2006) (Figure 1-3A).
GenASAP uses the microarray data to
model the hybridization of the included and skipped splice
variants to the six probes. This
significantly improves the accuracy of the % inclusion
predictions in comparison to using the
‘ideal’ hybridization profile described above. In addition,
Ge