Running te: An RNA thermosensor controls APE2 3'ss selection.

1

in the yeast genome reveals an RN A thermosensor that

mediates alternative splicing

Markus Meyer1,4, Mireya Plass2,&, Jorge Pérez-Valle4,&, Eduardo Eyras2,3, and Josep

Vilardell1,3,4,*

Running title: An RNA thermosensor controls APE2 3'ss selection.

1- Centre de Regulació Genòmica, Dr. Aiguader 88, 08003 Barcelona, Spain

2- Computational Genomics Group, Universitat Pompeu Fabra, Dr. Aiguader 88,

08003 Barcelona, Spain

3- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain (ICREA).

4- (present address) Molecular Biology Institute of Barcelona (IBMB), Baldiri

Reixach 10-12, 08028 Barcelona, Spain. Tel: +34.93.4020549, fax: +34.93.4034979.

&- Both authors contributed equally to this work

* corresponding author

e-mail: [email protected]

2

Highlights

Yeast introns have a diverse range of branch-‐

RNA structure in the BS-‐ The APE2 choice

SUMMARY

Poor understanding of the spliceosomal mechanisms to select intronic 3' ends (3'ss) is

a major obstacle to deciphering eukaryotic genomes. Here we discern the rules for

intronic branch site (BS), and that

mediated by pre-mRNA structures. Our results reveal that one of these RNA folds

acts as an RNA thermosensor, modulating alternative splicing in response to heat

lity. Thus, our data point to a deeper role

for the pre-mRNA in the control of its own fate, and to a simple mechanism for some

alternative splicing.

INTRODUCTION

Precise identification of intron boundaries (splice sites [ss]) in eukaryotic pre-mRNAs

is critical. Errors result in incoherent and possibly deleterious transcripts and are

conductive to diseases (Cooper et al., 2009)

interactions with U1 snRNP (Wahl et al., 2009). In contrast, the identification of the

3

of its multiple occurrence and the lack of a clear selector comparable to U1. Several

intron branch site (BS) or from a downstream pyrimidine-rich track (Patterson and

Guthrie, 1991; Smith et al., 1993) (Goguel and Rosbash,

1993), or with the downstream exon (Crotti and Horowitz, 2009); distance from the

BS (Cellini et al., 1986; Luukkonen and Seraphin, 1997); and interactions with

splicing factors (Smith et al., 2008; Umen and Guthrie, 1995). But despite these

required to fully decode eukaryotic genomes.

The relative simplicity of the Saccharomyces cerevisiae genome, with 5% genes

containing an intron (Spingola et al., 1999) and readily identifiable BS, provides a

splicing is underrepresented (Yassour et al., 2009), indicating a predominantly

additional 3'ss choices (sequence HAG, H : [UCA]) between the BS and the annotated

Here we undertook bioinformatics and molecular approaches to document a set of

model genome. Remarkably, a

balance between pre-mRNA folding and spliceosome flexibility is required,

suggesting important evolutionary implications and alternative ways of splicing

regulation. We verified this possibility by showing that in one intron the pre-mRNA

structure acts modulating splicing in response to temperature changes.

RESULTS AND DISCUSSION

4

Intronic features downstream from the branch site

Yeast introns show a variable distance between the BS adenosine (BS-A) and the

5 nucleotides (Figure S1). This is surprising as previous reports

indicate that such a large distance drastically hinders splicing efficiency (Cellini et al.,

1986), but this must be overcome as most yeast transcripts are efficiently spliced

(Clark et al., 2002). Interestingly, the number of potentially cryptic 3'ss between the

BS and the annotated 3'ss is significantly lower than expected by chance ( 2 p = 1.71

x 10-17). This suggests a selective pressure against HAGs upstream of the annotated

reduces VMA10 splicing (Figure S1).

have a negative impact. Interestingly, this mutation destabilizes a putative secondary

structure that would occlude this HAG, suggesting the possibility that RNA folding

would have a dual role by preventing spliceosomal targeting of occluded HAGs and

promoting the use of one located downstream. To test the generality of this hypothesis

we used RNAfold (Hofacker, 2009) to predict an optimal secondary structure (i.e. the

most likely one) between the BS and the annotated 3'ss using a dataset of 282 S.

cerevisiae introns (SGD 2009) (see Methods and Supplementary Material). The

results of these predictions, summarized in Figures 1 and S1, are available at

http://regulatorygenomics.upf.edu/Yeast_Introns. Their analysis reveals several

First, 40% (113) of S. cerevisiae introns can potentially form a structure in the BS-

-

nucleotides [nt] (Figure 1A, black line). Second, the effective BS-3'ss distance

distribution of those introns, calculated by subtracting the number of nt contained in

5

the structure (Figure 1B), is not significantly distinct from that of introns without

structures (Figure 1A, light grey line). Notably, the effective BS-

introns is predicted to be never larger than ~45 nt. This may represent the maximum

distance for efficient splicing in S. cerevisiae, or indeed in yeasts, since a similar

result is also observed for other species (Figure S1).

It has been shown that suboptimal RNA structure predictions can be relevant as well

to determine the biological effects of RNA folds (Rogic et al., 2008). Thus, to assess

the consistency of our results, we evaluated 1000 suboptimal structure predictions for

each BS-

they introduce. The results by this approach (Figure S1) are consistent with our

previous analysis, supporting our conclusions.

In addition, to test whether the properties of the effective distance are specific to real

introns, we examined sets of randomized intron sequences (Supplemental Material)

with the same BS-

observed that in randomized sequences RNA structures also shorten the BS-

effective distance to values similar to those found in sequences without a predicted

secondary structure. However, the maximum effective distance observed is

significantly longer (~60 nt) than in real introns (Figure S1).

(accessibility) in the BS- the downstream exon

(exonic HAG). We found that accessibility values are significantly higher for real

accessible than intronic HAGs, which is not expected in randomized sequences as

longer segments are more likely to fold, increasing the probability of occluding

6

HAGs. These results, replicated in other yeast species analyzed (Figure S1), are

ted by RN A folding

mutations expected to disrupt the BS-

restoring the original structure by additional mutations would re-establ

selection. This strategy was tested in the RPS23B

an RNA structure, and that disruption of this structure should activate it. Thus, the

RPS23B intron was inserted into a CUP1 reporter (Lesser and Guthrie, 1993) and

splicing was monitored by primer extension analyses (Figure 2). The results show that

in the wt RPS23B only the annotated splice site is used (RPS23B-1; Figure 2B, lane

1). This pattern switches when the structure is disrupted and the alternative AG

becomes the sole target of the spliceosome (RPS23B-2; Figure 2B, lane 2).

Importantly, the wt pattern is restored when the structure is reinstated by

complementary mutations (RPS23B-3; Figure 2B, lane 3). In this case the splicing

pattern is undistinguishable from that of the wt, in agreement with our predictions.

The loss of splicing to the annotated 3'ss when the structure is disrupted could be

attributed to a preference by the spliceosome for the first AG from the BS (Smith et

al., 1993), or to an excessive (>45 nt) distance of the second AG from the BS, as

suggested by our bioinformatics analyses. To distinguish between these two

RPS23B-2. This mutation

blocks the second step of splicing as shown by the lack of accumulation of pre-

mRNA and of mRNA, and by the build-up of first step lariat intermediate in a dbr1

7

strain (RPS23B-4; Figure 2B, lane 6) (dbr1 cells lack debranching enzyme, required

for the proper turn-over of first step lariat intermediates). These observations are

consistent with the possibility that the annota

spliceosome without a structure that brings it closer to the BS. Thus, we conclude that

the predicted RPS23B structure forms in vivo and allows for the proper selection of

VMA10 intron (Figure S2). Hence we used

closer to the BS. We found that distances greater than ~45 nt diminish splicing

efficiency (Figure 2C-D), in agreement with our in silico findings.

While the existence of an optimal range for 3'ss selection has been previously shown

((Chua and Reed, 2001) and references therein), the suggested scope (20-30 nt from

the BS) does not explain many naturally occurring introns. Our conclusions are

consistent with an alternate spliceosomal range for the second step of catalysis and

help to explain S. cerevisiae global 3'ss usage. We envision two compatible

explanations for the spliceosomal range; it could reflect a balance between a search

for a splice site and the activation of a proofreading mechanism, or it could represent

the physical distance that can be reached by the spliceosome active center. While

further research is necessary to evaluate these two models, we speculate that a pure

kinetic model could allow for changes in scope, for example by stabilizing the

conformation of the spliceosome required for the second step (Konarska et al., 2006).

According to our predictions, 81 out of the 134 HAGs located between the BS and the

(http://regulatorygenomics.upf.edu/Yeast_Introns). Out of the remaining 53 HAGs, 32

http://regulatorygenomics.upf.edu/Yeast_Introns

8

are located at a distance of 9 nt or less from the BS, whereas the shortest natural

occurrence is of 10 nt. As for the rest (21), they are all AAGs. Therefore, we made

two additional predictions. First, that AGs located closer than 10 nt to the BS cannot

be used by the spliceosome, as was indicated for the ACT1 intron (Patterson and

Guthrie, 1991); second, that the yeast spliceosome selects C/UAG over AAG, as

reported in other systems (Smith et al., 1993). These predictions were tested in vivo

with constructs based on the DMC1 intron with an AAG and a UAG at 16 and 28 nt

from the BS, respectively. In the wt construct the spliceosome mostly selects the

downstream 3'ss (DMC1-1; Figure 3B, lane 1). This preference is not due to the

are equally used when they are both UAG (Figure 3B, lanes 2-3). However, to be

selected an AG must be at more than 9 nt from the BS (Figure 3B, lane 4). To

determine whether the presence of an RNA fold between two available HAGs

interferes with their selection we introduced the RPS23B stem into DMC1-2, with two

active UAG. Remarkably, there is no change in the ratio of usage of either 3'ss

(DMC1-5; Figure 3B, lane 5; and Figure S3), indicating that the spliceosome is able

to probe for potential 3'ss upstream as well as downstream from the stem (but not

inside). Consistent with our model, disrupting the stem (construct DMC1-6) renders

the downstream AG inaccessible, now too far from the BS (Figure 3B, lane 6). Taken

together these results enable us to describe how the spliceosome selects a 3'ss. To be

linearly or with the help of an RNA fold. Moreover, if there are two splice sites within

this window, but outside a structure, both will be equally used, unless one is stronger

than the other (PyAG > AAG).

9

The spliceosome will open a stem located too close to the branch site

We next asked whether a structure can occur anywhere between the BS and the

decreased the distance between the BS and the stem loop of

the VMA10 intron, from 20 to 5 nt. Our data show that a stem is not fully formed

when predicted to be close to the BS. In fact, positions closer than 10 nt from the BS

will remain unpaired. Thus, an HAG that would otherwise be occluded becomes

available if a stem in moved closer to the BS (Figure 3D, lane 3). Moreover, moving

the stem even closer can in fact put out of spliceosomal reach a downstream HAG

previously available, since the lack of folding leads to an increased effective distance

to the BS (Figure 3D, lane 4). This, taken into account into our in silico predictions,

indicates that while the spliceosome shows a remarkable tolerance to a diversity of

RNA folds, a region of ~9 nt downstream the BS will not harbor any structure, and

likewise AGs located here will be ignored. Thus it appears that the spliceosome needs

to have access to this region before catalyzing exon ligation. This is consistent with

both substrate requirements for in vitro spliceosome assembly (Fabrizio et al., 2009),

and second-step blocking by a strong hairpin downstream of the BS (Liu et al., 1997).

and we show that the spliceosome use

Consistent with this, scanning does not explain splicing patterns of some human

introns (Chen et al., 2000). Interestingly, the minor U12-dependent spliceosome

removes introns with a small BS-3'ss region, similar to that of yeast (Will and

Luhrmann, 2005).

Our data indicate that RNA folds are more than a particularity of some transcripts.

They are critical for splicing and widespread in several fungi species (this work and

10

(Deshler and Rossi, 1991)

conserved in homologous genes (http://regulatorygenomics.upf.edu/Yeast_Introns),

arguing for a selective pressure to keep them. Notably, our defined set of rules for

re S3) explains the splice site choice of all but one

(REC102, Figure S3) yeast introns.

Comparison of sequence elements inside and outside secondary structures shows an

enrichment of U-rich motifs outside structures, which reflects the higher propensity of

sequences containing purines to be in a fold (Tables S2 and S3). Thus, if a factor were

to be required for the function of these diverse structures, it would be unlikely that

this is mediated by a particular sequence motif. Likewise, there is no apparent

correlation between the presence of an RNA structure and the function of factors

prp18 (Umen and Guthrie, 1995) and slu7 (Frank

et al., 1992) (Figure S3). Taken together these results suggest that the RNA structures

in the BS-

factors that could modulate this role

The APE2 BS-

Given that 40% of S. cerevisiae introns potentially harbor a structure between the BS

selection. This could be mediated by binding regulatory factors, by working as a

riboswitch (Wachter, 2010), or by acting directly as a thermosensor (Kaempfer, 2003;

Neupert and Bock, 2009). In prokaryotes it has been shown that RNA structures can

block ribosome access in response to temperature changes (Neupert and Bock, 2009).

http://regulatorygenomics.upf.edu/Yeast_Introns

11

hus, in a search for yeast

by temperature change (heat-shock), we identified APE2, encoding an amino

peptidase. Splicing of APE2 pre- k (Yassour et

al., 2009), and our predictions show that their the BS-

stem loop occluding two AAG and bringing the annotated CAG into spliceosomal

range (Figure 4A and S4). Thus, according to our model, formation of this stem will

trigger CAG selection. Our data are consistent with this and the previous report

(Yassour et al., 2009), although the dual AAG/CAG usage suggests that the stem only

forms completely in a fraction of transcripts. Upon heat shock the stem becomes less

stable and selection of the distant CAG is diminished (Figure 4B, lane 2).

Accordingly, splicing to the upstream AAG is promoted, leading to the addition of 6

residues to the Ape2 N-terminus region. We predicted that mutations weakening the

APE2 stem will mimic the heat-shock response at low temperatures, possibly up to

e tested this by

sequentially opening the stem. Notably, a single-nucleotide mutation at the bottom of

leaving a residual heat sensitivity (Figure 4B, lanes 3-4). Further disruption of the

stem leads to an APE2 splicing pattern that favours mostly AAG selection, and is

insensitive to heat (Figure 4C, lanes 5-6). Importantly, reinstating the stem by

compensatory mutations restores the splicing switch induced by heat (Figure 4C,

lanes 7-8). We argue that the increased AAG/CAG ratio in this construct is caused by

the weaker reconstituted stem (AU vs GC base pair). Altogether, our data are

consistent with the APE2 intron folding into a small stem loop that acts as an RNA

thermosenso

12

accordingly. While in prokaryotes thermosensors and riboswitches are relatively

common controlling transcription and translation, in eukaryotes only one class of

riboswitches has been described. They modulate pre-mRNA splicing by sophisticated

conformational changes induced upon thiamine pyrophosphate (TPP) binding

(Neupert and Bock, 2009; Wachter, 2010). Our findings for the APE2 thermosensor

indicate that simpler switching strategies, analogous to the prokaryotic thermosensors

that control ribosome binding, are also present in eukaryotes to control spliceosome

access. This presents additional possibilities of splicing control, and it will need to be

taken into account when deciphering eukaryotic genomes.

EXPERIMENTAL PROCEDURES

Bioinformatics analyses

A total of 282 introns, defined as unambiguous sequences with canonical splice sites

in chromosomal genes, were extracted from the S. cerevisiae genome (SGD July

2009) . To predict branch sites (BS), introns were scanned for NNNTRACNN motifs

TACTRACNN motif were predicted as BS. When several motifs with identical

Hamming distances were found, an additional selection based on the potential base

pairing to U2 snRNA was applied using RNAcofold (Hofacker, 2009), forcing the

BS-A to be unpaired. When several motifs had the same potential, the closer to the

s selected.

For secondary structure predictions we used RNAfold (Hofacker, 2009) with default

parameters, using the sequence region between 8 nt downstream from the BS-A and

in introns we

13

compared the observed and expected frequencies, calculated assuming that nucleotide

positions are independent. Individual nucleotide frequencies were calculate in the

-A. The

expected probability of each triplet was then calculated as the product of the

individual observed frequencies. Accessibility of a nucleotide was defined as one

minus the probability of it being paired. Pairing probabilities were calculated using

RNAfold (Hofacker, 2009). Given the same sequence and absence of bias, a larger

window is expected to give lower accessibility values, as the folding probability

increases.

The predictions for all yeast species analyzed (Table S1), pair probabilities for the

structures and accessibility values can be found at

http://regulatorygenomics.upf.edu/Yeast_Introns/. Further details on the

bioinformatics analyses can be found in the Supplemental Material.

Strains and reporter plasmids

S.cerevisiae strains are derived from BY4741. The reporter plasmids are based on

pCC71 (Collins and Guthrie, 1999) where the ACT1 intron has been replaced by the

various versions of the different introns studied. As a result, the introns are flanked by

50 nt of ACT1 leader, and the CUP1 gene. Cloning strategies and oligonucleotides are

available in Supplemental Material.

Primer extension

Performed as described in (Siatecka et al., 1999), on RNA from strain BY4741

carrying the reporter plasmid unless stated otherwise. Primers used are

14

complementary to CUP1 and to U6, used as loading control. Quantifications are of

three independent experiments +/- SEM.

ACKNOWLEDGMENTS

We thank C. Query, J. Valcárcel, and J. Warner for helpful discussions and comments

on the manuscript. This research has been supported by the Spanish Ministry of

Science (BIO2008-01091 and 363), MC #510183, EURASNET (LSHG-CT-2005-

518238), CSIC (200920I195), and Agaur. M. M. is supported in part by a Training of

Researchers (FI) fellowship (Agaur) and M. P. by a Carlos III PhD fellowship.

REFERENCES

Cellini, A., Felder, E., and Rossi, J.J. (1986). Yeast pre-messenger RNA splicing efficiency depends on critical spacing requirements between the branch point and 3' splice site. Embo J 5, 1023-1030. Chen, S., Anderson, K., and Moore, M.J. (2000). Evidence for a linear search in bimolecular 3' splice site AG selection. Proc Natl Acad Sci U S A 97, 593-598. Chua, K., and Reed, R. (2001). An upstream AG determines whether a downstream AG is selected during catalytic step II of splicing. Mol Cell Biol 21, 1509-1514. Clark, T.A., Sugnet, C.W., and Ares, M., Jr. (2002). Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 296, 907-910. Collins, C.A., and Guthrie, C. (1999). Allele-specific genetic interactions between Prp8 and RNA active site residues suggest a function for Prp8 at the catalytic core of the spliceosome. Genes Dev 13, 1970-1982. Cooper, T.A., Wan, L., and Dreyfuss, G. (2009). RNA and disease. Cell 136, 777-793. Crotti, L.B., and Horowitz, D.S. (2009). Exon sequences at the splice junctions affect splicing fidelity and alternative splicing. Proc Natl Acad Sci U S A 106, 18954-18959. Deshler, J.O., and Rossi, J.J. (1991). Unexpected point mutations activate cryptic 3' splice sites by perturbing a natural secondary structure within a yeast intron. Genes Dev 5, 1252-1263.

15

Fabrizio, P., Dannenberg, J., Dube, P., Kastner, B., Stark, H., Urlaub, H., and Luhrmann, R. (2009). The evolutionarily conserved core design of the catalytic activation step of the yeast spliceosome. Mol Cell 36, 593-608. Frank, D., Patterson, B., and Guthrie, C. (1992). Synthetic lethal mutations suggest interactions between U5 small nuclear RNA and four proteins required for the second step of splicing. Molecular and cellular biology 12, 5197-5205. Goguel, V., and Rosbash, M. (1993). Splice site choice and splicing efficiency are positively influenced by pre-mRNA intramolecular base pairing in yeast. Cell 72, 893-901. Hofacker, I.L. (2009). RNA secondary structure analysis using the Vienna RNA package. Curr Protoc Bioinformatics Chapter 12, Unit12 12. Kaempfer, R. (2003). RNA sensors: novel regulators of gene expression. EMBO Rep 4, 1043-1047. Konarska, M.M., Vilardell, J., and Query, C.C. (2006). Repositioning of the reaction intermediate within the catalytic center of the spliceosome. Mol Cell 21, 543-553. Lesser, C.F., and Guthrie, C. (1993). Mutational analysis of pre-mRNA splicing in Saccharomyces cerevisiae using a sensitive new reporter gene, CUP1. Genetics 133, 851-863. Liu, Z.R., Laggerbauer, B., Luhrmann, R., and Smith, C.W. (1997). Crosslinking of the U5 snRNP-specific 116-kDa protein to RNA hairpins that block step 2 of splicing. RNA 3, 1207-1219. Luukkonen, B.G., and Seraphin, B. (1997). The role of branchpoint-3' splice site spacing and interaction between intron terminal nucleotides in 3' splice site selection in Saccharomyces cerevisiae. Embo J 16, 779-792. Neupert, J., and Bock, R. (2009). Designing and using synthetic RNA thermometers for temperature-controlled gene expression in bacteria. Nat Protoc 4, 1262-1273. Patterson, B., and Guthrie, C. (1991). A U-rich tract enhances usage of an alternative 3' splice site in yeast. Cell 64, 181-187. Rogic, S., Montpetit, B., Hoos, H. H., Mackworth, A. K., Ouellette, B. F., and Hieter, P. (2008). Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiae. BMC Genomics 9, 355-373. Siatecka, M., Reyes, J.L., and Konarska, M.M. (1999). Functional interactions of Prp8 with both splice sites at the spliceosomal catalytic center. Genes Dev 13, 1983-1993. Smith, C.W., Chu, T.T., and Nadal-Ginard, B. (1993). Scanning and competition between AGs are involved in 3' splice site selection in mammalian introns. Mol Cell Biol 13, 4939-4952. Smith, D.J., Query, C.C., and Konarska, M.M. (2008). "Nought may endure but mutability": spliceosome dynamics and the regulation of splicing. Mol Cell 30, 657-666. Spingola, M., Grate, L., Haussler, D., and Ares, M., Jr. (1999). Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae. Rna 5, 221-234. Umen, J.G., and Guthrie, C. (1995). The second catalytic step of pre-mRNA splicing. RNA 1, 869-885. Wachter, A. (2010). Riboswitch-mediated control of gene expression in eukaryotes. RNA Biol 7, 67-76. Wahl, M.C., Will, C.L., and Luhrmann, R. (2009). The spliceosome: design principles of a dynamic RNP machine. Cell 136, 701-718.

16

Will, C.L., and Luhrmann, R. (2005). Splicing of a rare class of introns by the U12-dependent spliceosome. Biol Chem 386, 713-724. Yassour, M., Kaplan, T., Fraser, H.B., Levin, J.Z., Pfiffner, J., Adiconis, X., Schroth, G., Luo, S., Khrebtukova, I., Gnirke, A., et al. (2009). Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc Natl Acad Sci U S A 106, 3264-3269.

FIGURE LEGENDS

Figure 1. BS- trons. (A) Introns are separated in two

categories, those that have potential for secondary structure in this region (black); and

those that do not (light gray). When the number of nucleotides in the secondary

structures of the former is removed from the actual distance (schematized in (B)), the

distribution of these effective distances (dark gray) does not exceed 45 nt, as with the

introns without structure (Wilcoxon signed-rank p = 3.4 x10-1) Density expresses the

proportion of cases at a given distance f

for details on calculations). (C) Boxplot diagram showing accessibility values for

from bottom to top, Q1 (25% of the datapoints), Q2 (the median indicated by a thick

line) and Q3 (75% of the datapoints). The dashed lines, which are limited by the thin

lines, indicate the distribution of values outside the Q1 and Q3 values. Outliers, which

are shown as open circles, correspond to those values that lie outside the interval [Q1

1.5(Q3-Q1), Q3 + 1.5(Q3-

HAGs (Wilcoxon signed-rank p [ann vs intronic] = 5.5 x 10-5, p [ann vs exonic] = 2.2

x 10-16).

Figure 2. mediated by RNA structures. (A, B) Validation of the RPS23B

fold (Figure S2) (A) Diagrams of the tested RPS23B

17

numbered, and the used one is shown in black. Paired and mutated regions are

indicated by hyphens and small marks, respectively. (B), Primer extension analyses of

RNA from cells containing the constructs shown in (A), as indicated. Extension

5-6 indicate samples from a dbr1 strain, lacking debranching enzyme (C , D) The

VMA10 intron (Figure S2) was used to assess the BS-

efficiency, measured by primer extension. (C) Diagrams of the VMA10 constructs

used, formatted as in (A). The distance (in nt) between the BS-

shown. (D) Primer extension analyses of RNA from cells containing the constructs

shown in (C), as indicated, formatted as in (B). Lane 1, VMA10-1 (wt); lane 2,

VMA10-2 with mutated stem; lanes 3-4, VMA10-5 and 6 (respectively) constructs,

based on VMA10-2 but with AG-1 increasingly closer to the BS, as indicated. The *

indicates an uncharacterized band.

Figure 3. A)(B) Splice site strength and

minimal distance between BS (A) Diagrams of the different DMC1

constructs (see also Figure S3) monitored by primer extension analyses shown in (B).

Constructs DMC1-5 and 6 include the RPS23B stem loop (wt and open, respectively)

between the UAGs in DMC1-2. -A. (C)(D) Requirement for a

minimal distance between the BS and the beginning of a structure. Splicing patterns

of constructs depicted in (C), based on VMA10 but with a decreasing distance

between the BS-A and the base of the stem, are shown in (D), as analyzed by primer

extension. Top row numbers indicate the VMA10 construct (1 is wt), showing below

the distance to the stem from the BS-A. Primer extension products are indicated on

the right of panels (B) and (D), with numbers indicating the selected AG.

18

Figure 4. The APE2 RNA thermosensor. (A) RNA structure that controls selection of

-A, and the effective distance is shown in

italics. Two possible alternate thermosensor states are depicted in balance, indicating

factors promoting either state. Mutations (in circles) introduced to disrupt (grey) and

C23, as the AAG30 only gets activated by mutating the stem up to A22 or to C23

(Figure S4). (B) Primer extension analyses monitoring splicing of APE2 constructs at

independent experiments is shown +/- SEM. Bars indicate the AAG (white) and CAG

(black) fraction of transcripts.

Figure-1 (Meyer)

A

B

5’SS 3’SS BS

structure

5’SS 3’SS BS

effectivedistance

distance

intron

no structure

0 50 100 1500.00

0.02

0.04

0.06

0.08

distance BS to 3’ss (nt)

Density

structureeffective dist.no structure

C

3’ssintron exon0.00.20.40.60.81.0

accessibility

crypticann.

Figure 1

A

Figure-2 (Meyer)

(wt)

________A

RPS23B-1

AG

A

RPS23B-2

AG

xxx

________A

RPS23B-3

AG

xx

xx

x

xxx

RPS23B-4

xx

A AG

xxxxx

xx

x

1 2

1 2

1 2

2

(wt)

B

C D

RPS23B

VMA10

VMA10-

U6

1

3

d (nt)

*

1

45515757

432

1 5 62(wt)

VMA10-2

d=57 ntA AG

1

2 3

xxxxx

VMA10-5

d=51 ntA AG

xxx

xx

1

2 3

VMA10-6

d=45 ntA AG

xxx

xx

1

2 3

__________A AG

1

2 3A

AVMA10-1

d=57 nt

RPS23B-dbr1

U6

2 41431

2 65431

21

Figure 2

A B

Figure-3 (Meyer)

A AAG UAG

A UAG UAG

A UAG AAG

DMC1-1

DMC1-2

DMC1-3

16 nt

A UAG AAGDMC1-49 nt

A UAG UAGDMC1-5

A UAG UAGDMC1-6xxxxx

1 2

U6

DMC1-654321

654321

1

1

2

12 nt C VMA10-1 VMA10-7 to 9__________A AG

1

2 3

__________A AG

d

1

2 3

VMA10-

U6

23

d (nt)

1

8 1 5 8 20 14

4 3 2

7 9 D

Figure 3

mut AA A

UGUAC

CCA A

AA

GUACGA A A A

AAUAC

CCA A

AA

GUACGA A

mut BA

AAUAC

CCA A

AA

GUAUUA A

mut C

Figure-4 (Meyer)

A

5'

20

50

A UUGUAC

CCA A

AA

GUACGA A C

30

40

G G A A A C U A A A A U G A C A G 3'Cwt

U

5'

2050

A UUGU

AC

CCA A

AA

GU

A

CGA A C

30

40

G G A A A C U A A A A U G A C A G 3'C17 29 47

+T -T

17 21 39

B

CAG

AAG

23 37wt

23 37mut A

23 37mut B

23 37mut C

U6Isoform fraction

0.0 0.1 0.2

0.3 0.4 0.5 0.6 0.7

1 2 3 4 5 6 7 8

AAG

CAG

*

(T ºC)

Figure 4

SUPPL E M E N T A L M A T E RI A L



Markus Meyer, Mireya Plass, Jorge Pérez-Valle, Eduardo Eyras, and Josep Vilardell*

*To whom correspondence should be addressed: [email protected]

File contents:

Supplemental Figures S1-S4

Supplemental Tables S1-S3

Supplemental Experimental Procedures

Supplemental References

Inventory of Supplemental Information

mailto:[email protected]

Meyer et al. Supplemental Material - Page 1 of 26

SUPPL E M E N T A L M A T E RI A L



Markus Meyer, Mireya Plass1, Jorge Pérez-Valle1, Eduardo Eyras, and Josep

Vilardell*

1Equal contributors.

*To whom correspondence should be addressed: [email protected].

Supplemental Text and Figures

mailto:[email protected]




Supplemental F igure S1 (related to main F igure 1)

Saccharomyces cerevisiae introns. Density expresses the proportion of cases with a given BS-

(b, c) A mutation 48 nt upstream of the VMA10

likely weakens a predicted stem- s (boxes). Numbers refer to the first nucleotide downstream of the branch-site A. The effect

extension analysis, shown in (c). An impairment of the second step of splicing is evidenced by the 1st step lariat accumulation in a dbr1debranching enzyme). Primer extension intermediates are depicted on the right. U6 is the internal control.

Introns are

separated in two categories, those predicted to have a secondary structure between

corresponds to the effective distance distribution of introns with a predicted secondary structure. For each species, the number of introns in each category is


given. The p-value of the comparison of their effective length distributions is given, indicating that they do not statistically differ. Only yeast species with at least 30 sequenced introns in each group were considered. Density expresses the proportion of cases with a given BS-

(e - h) Comparison of BS-

minimum free energy (MFE) structures or by calculating 1000 suboptimal structures for each BS-(e) shows the effective BS-predictions, with a secondary structure (dark grey line), without a secondary structure (light grey line), and the whole set (black line). Each of these MFE-based distributions is compared to the corresponding distribution obtained by calculating 1000 suboptimal structures for the same region. Thus, panel (f) shows the comparison of distributions for the whole set of introns; panel (g) shows those for introns with MFE structure; and panel (h) for introns without MFE structure. There is no statistically significant difference present in any case. Density expresses the proport

(i, j) Summary of the effective distance distributions of S. cerevisiae introns, grouped

by having a predicted MFE secondary structure (i) or without it (j), and plotted according to the number of nucleotideeach intron 1000 suboptimal structures, with their corresponding effective distances, have been calculated (see panels e-h). The mode (most common value) is shown by a grey dot. Vertical bars show the range of effective distance values considering 95% of cases (we eliminate the top and the bottom 2.5% extreme values). For comparison, the effective distance based on MFE predictions (see Figure 1 of the main manuscript) is shown by a red dot. There are no significant differences (Wilcoxon signed-ranked test introns with a predicted optimal secondary structure p-value= 0.036; without a predicted optimal secondary structure p-value = 0.343). For clarity purposes each panel is also shown below with the introns sorted by increasing BS-effective distances calculated according to MFE predictions.

(k) Distribution of maximum effective distances BS-3'ss for randomized datasets with

st intron set. The randomized datasets were constructed using either the same intronic nucleotide content (grey) or that of the whole yeast genome (black). The analysis shown in Figure 1 was repeated on 1000 these sets. The distribution of the resulting 1000 maximum effective distances is shown. The maximum effective BS-real introns (~45 nt, indicated by a red dot) is significantly shorter than the prevalent in randomized datasets with the same nucleotide content (p-value= 0.002) or with a nucleotide content based on that of the whole yeast genome (p-value=0.008). Density expresses the proportion of sets with a given maximum effective distance.

(l) Boxplot diagrams showing accessibility values for annotated (green) and cryptic

(intcerevisiae is shown also in Figure 1c). Dashed lines indicate the value distribution between the maximum and minimum (thin horizontal lines). Boxes include 50% of the values. Thick lines indicate median values and outliers are shown as open


be less accessible even though the analyzed sequence is shorter, and therefore it would be less likely to adopt many structures in absence of bias.

(m, n) Histogram showing the distribution of PhastCons conservation scores of the

nucleotides in the BS- both signals). The BS-was divided into three groups as shown in (m). (n) Nucleotides inside stems have higher PhastCons conservation scores than nucleotides outside the secondary structure or in loop/bulges (Wilcoxon single-rank test p-value STEM vs OUT = 2.823E-07; STEM vs LOOP/BULGE = 0.01329). There is not any significant difference between PhastCons scores of nucleotides outside the secondary structure and those in loops or bulges (Wilcoxon single-rank test p-value= 0.3136). Density expresses the proportion of cases with a given score. See also Table S3 and S4 for a list of conserved motifs.



Supplemental F igure S2 (related to main F igure 2) (a, b) RPS23B BS- -

site a RPS23B, indicating mutations introduced to disrupt (left arrows) and to restore (right arrows) the structure (see Figure 2 of the main

relative to the first position after the BS-A. (b) Multiple alignment of the BS-region of RPS23B from several Saccharomyces species, as indicated. The bracket notation of the secondary structure predicted in S. cerevisiae region is shown. S. kluyveri and castellii have a shorter BS-


(c - g) VMA10 VMA10 is depicte

Nucleotide numbers are relative to the first position after the BS-A. Mutations to disrupt (left arrows) and to restore (right arrows) the stem loop are shown. Nucleotides deleted in VMA10 5-9 constructs (Figures 2 (delta) followed by the construct number. The splicing pattern of constructs (d), as indicated, was analyzed by primer extension (e). Mutations predicted to disrupt the structure of VMA10 (VMA10-2 construct) lead to splicing switching from the annotated splice site (AG-3 in VMA10-1) to AG-1, which is a weak second-step substrate (VMA10-2). The wt splicing pattern is restored when complementary mutations are introduced (VMA10-3). When the 63 nucleotides including the structure are deleted from the construct (VMA10-4), there is no apparent difference between splicing of this construct and the wt. Primer extension products are depicted on the right, including lariat intermediates from the VMA10 constructs 1-3 (marked with *) and 4 (marked with **). (f) Constructs used in Figure 2 are included to help identifying the mutations that they include. (g) Multiple alignment of the VMA10 BS- Saccharomyces species, as indicated. The bracket notation of the predicted secondary structure of the cerevisiae region is shown.

Alignments in panels (b) and (g) were edited with Jalview (Waterhouse et al., 2009).



Supplemental F igure S3 (related to main F igure 2 and 3) (a) Sequence of the BS- DMC1 constructs (in bold) in Figure 3. The

DMC1-4, and the black triangle the insertion point of the RPS23B stem loops (wt and mutant, with the indicated substitutions).

situated inside a window of 10 to 45 nucleotides downstream of the BS. This distance can be the actual number of nucleotides between the BS and the HAG, or the effective distance that results from the formation of a secondary structure. HAGs contained in such a structure are occluded from the spliceosome. If two suitable HAGs are present, both are used equally, unless one is stronger (CAG=UAG>AAG). A indicates the BS adenosine, unsuitable HAGs are shown in grey and those recognized by the spliceosome are depicted in black.

(c - e) The possibility of a genetic interaction between RNA structures and splicing

of RNA extracted from wt, , and slu7 strains. Cells were transformed with the constructs VMA10-1 (wt VMA10 intron), VMA10-2 (VMA10 intron with the stem loop disrupted), VMA10-4 (VMA10 intron with the stem loop removed, see Figure S2), DMC1-2 (DMC1 intron with two UAG), and DMC1-5 (DMC1 intron with the RPS23B stem loop in between two UAG, see Figure 3 of the main manuscript and panel (a) here). Prior to RNA extraction slu7 cells were subjected


to temperature shift as originally described in Frank et al. (1992). (c) Shows the efficiencies, relative to wt (lane 1), of first and second step of splicing in the VMA10 constructs. Both mutants display a lower efficiency for the second step, as expected. The effect of the stem loop is minor (compare lanes 2-3 and 8-9). Interestingly slu7 and yield an improved 1st step in the VMA10-2 construct (which has the stem loop disrupted and is a poor second step substrate). For each reaction, the first step efficiency was calculated as (Mature + Lariat) / (Precursor + Mature + Lariat). The second step efficiency was calculated as Mature / (Mature + Lariat). Panels (d) and (e) show the relative selection of two UAGs and how that affected by the introduction of a stem loop in between. Interestingly in this intron slu7 and have opposite effects on relative UAG selection. This is not changed by the introduction of the RNA structure. Relative UAG selection was calculated as the ratio of the upstream UAG (AG-1) over the downstream (AG-2), and normalized to the value in wt cells.

(f - i) Th REC102/

YLR329W has a predicted stem that we have verified. However, one of the two accessible HAGs is preferentially used. The secondary structure predicted between

REC102

that is not contained in a structure (AG3) are also indicated (nucleotide numbers are relative to the first position after the BS-A). (g) shows the constructs analyzed by primer extension in (h). The stem-loop region of each construct is shown in (i), indicating the mutations (red).

In the wt construct REC102-1, splicing is predicted to go to AG3 and AG4; but instead AG3 is not used. When the secondary structure is disrupted by different mutations in constructs REC102-2 and 4, the splicing pattern is altered and the strong splice sites, no longer sequestered in a structure, are used; unless they have been mutated to open the secondary structure (REC102-4). When the secondary structure is restored by introduction of complementary mutations, the wt splicing pattern is restored (REC102-3 and 5). The alternative splice site AG3 is never used, revealing an exception to the rules of binding of an additional factor, or an alternative folding, blocks splicing to the proximal AG3.



Supplemental F igure S4 (related to main F igure 4) (a - c) Validation of the APE2 selection. (a)

Predicted secondary structure in the BS - APE2 intron. Only the region with structure is shown. The small stem is included although it is not consistent across predictions. Numbers refer to the first position after BS-A. The two intronic AAGs are indicated (numbers in circles). The effect on splicing of mutations in the two stems was tested by primer extension analysis (mutations in red, including the predicted fold of the mutant), shown in (b). The wt splicing pattern (lane 1), showing splicing to both AAG-1 and CAG, suggests that the large stem fails to form completely in all transcripts, allowing the weak AAG to compete with a CAG that is impaired by a large distance to the BS. This pattern also suggests that the small stem is not formed in most molecules, since it would block recognition of both AAG-1 and CAG. When both stems are disrupted (mut I) splicing goes to both AAG-1 and AAG-2, whereas the annotated CAG is out of spliceosomal range (50 nt from BS; lane 2). This indicates that in the wt AAG-2 is mostly occluded by the stem-loop. In mut II only the large stem is disrupted, and formation of the small stem is expected to block splicing to CAG and AAG-1. However, the result (lane 3) is more consistent with its absence, since AAG-1 is used and CAG is not (indicating that it is out of range). The inability to use CAG leads to lariat accumulation (lanes 2-the second step of splicing. We conclude that the large stem is the main modulator of APE2 splicing, and that the small stem is not formed under the conditions tested. (c) Multiple alignment of the APE2 BS-Saccharomyces species, as indicated. The bracket notation of the predicted secondary structure of the cerevisiae region is shown. In agreement with our data, the long stem sequence is more conserved. Jalview (Waterhouse et al., 2009) was used for editing.

(d, e) Effect of increasingly disrupting the APE2

selection. (d) Depicts the thermosensor in its two states, with the mutants tested by primer extension (see Figure 4 of the main text) shown in (e). The two AAGs included in the stem loop are numbered. Primer extension was performed with samples from cells before (23C) and after the heat shock (37C), as indicated. AAG- -pair is required to keep the loop (opening the bottom four positions in the stem is indistinguishable from mut B [data not shown]). The upper band (*) apparent at

-PCR and sequencing mut B, while

in mut D it is the CAU in the stem.


Table S1. Homologous introns identified in other yeast species

Species name Nº of introns S. cerevisiae 282 S. paradoxus 259 S. mikatae 262 S kudriavzevii 255 S. bayanus 258 S. castellii 150


Table S2. Enrichment, in/out of secondary structures, of k-mers in the region between the BS and the 3'ss

1PhastCons score (Figure S1)


Table S3. Top ten k-mers present in the BS - 3' ss region

1Calculated as the average of PhastCons scores (Figure S1)


SUPPL E M E N T A L E XPE RI M E N T A L PR O C E DUR ES

Strains

Deletion strains are from the YKO MATa Strain Collection (Open Biosystems).

Plasmids and primer extension analyses

All constructs used for this study were made by a BamHI/PacI digestion of a

GPD::ACT1-CUP1 2-micron reporter plasmid (Lesser and Guthrie, 1993) and a PCR

product that contains a BamHI site followed by 50 nt of the actin exon 1, ATG, an

intron followed by 14 to 21 nt of exon 2 and a PacI site. The oligonucleotides used for

every construct are detailed bellow. All PCRs were made with W303a genomic DNA

as template unless stated otherwise, and upon cloning all constructs were sequenced.

Primer extension analyses were performed as described in (Siatecka et al., 1999), on

RNA from strain BY4741 upf1

otherwise. Primers used are complementary to CUP1 and to U6 snRNA, used as

loading control. Heat shocks were performed by resuspending cells in warm media.

Cells were flash-frozen before RNA extraction (Kos and Tollervey, 2010).

Quantifications, as in Fig. 4, are represented as mean from three independent

experiments +/- SEM.

Primers used in this work

CUP1 -GGCACTCATGACCTTC-

U6 snRNA -GAACTGCTGATCATCTCTG-

RPS23B-1 (pMM42): -

ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgACAAGTATGTACAATT

ATAGAAGATTG- - -

tacgttaattaaCTCTTCTTATAGTTGTTTTCGGCCCAACGGCTGTTT-

RPS23B-2 (pMM43): JV1358- -

tacgttaattaaCTCTTCTTATAGTTGTTTTCGGCCCAACGGCTGTTTTAAACGAaT

AtCTTTacACAAtACATAGTTTCATCAAAAAGATT-

RPS23B-3 (pMM44): JV1358- -


AtCTTTacACAAtACATAGTaTCATgtAAAAGAaTAtTAAATTTGTTAGTAAAT

G-


RPS23B-4 (pMM52): JV1358- -


AtaTTTacACAAtACATAGTTTCATCAAAAAGATT-

VMA10-1 (pMM18): -

ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgATGGTATGTGCCATTA

CATTAC- - -tacgttaattaaTCCGTTTTTTTGGGACTAAAG-

VMA10-2 (pMM29): JV1128- -

tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAGCA

ACAATGAAAAAACCAATACCTTTGTCACGAGcaACCAcTCCAAgAAGAGAG

gACAGTTTTAAAA-

VMA10-3 (pMM30): JV1128- -

tacgttaattaaTCCGTTTTTTTGGGACTAAAGAGAATATTACATAGTTCTGAcCA

ACAATcAAAAAAgCAATtgCTTTGTCACGAGcaACCAcTCCAAgAAGAGAGgA

CAGTTTTAAAA-

VMA10-4 (pMM31): JV1128- -

tacgttaattaatccgtttttttgggactaaagagaatattacatagttTTTTAAAAAAGTTTTTCATGTTA

G-

VMA10-5 (pMM47): JV1128- -


ACAATGAAAAAACCAATACCTTTGTCACGAGcaACCAcTCCAAgAAGAGAG

gACAG- - -

caaccactccaagaagagaggacagAAAAGTTTTTCATGTTAGTAAGAAC-

VMA10-6 (pMM48): JV1128-JV1380 using as template PCR JV1128- -

caaccactccaagaagagaggacagTTTTCATGTTAGTAAGAACGC-

VMA10-7 (pMM60): JV1128- -


ACAATGAAAAAACCAATACCTTTGTCACGAGGTACCAGTCCAACAAGAGA

GCACAG- - -

gtaccagtccaacaagagagcacagAAAAGTTTTTCATGTTAGTAAGAAC-


gtaccagtccaacaagagagcacagTTTTCATGTTAGTAAGAACGC-


gtaccagtccaacaagagagcacagtttTGTTAGTAAGAACGCTTGTTAG-


VMA10-10 (pMM21): JV1128- -


ACAATGAAAAAACCAATACaTTTGTCACGAGGTAC-

DMC1-1 (pMM63): JV1406 -

ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgGTATGTTATAATAACA

TTTTAAAACC- - -tacgttaattaaTCTTGTTGTTGACAAAACGGT-

DMC1-2 (pMM65): JV1406- -

tacgttaattaaTCTTGTTGTTGACAAAACGGTCTATAAAAGTTCCTaTCCAAATTA

TTAGTTAGTAAAAG- )

DMC1-3 (pMM66): JV1406- -

tacgttaattaaTCTTGTTGTTGACAAAACGGTCTtTAAAAGTTCCTaTCCAAATTA

TTAGTTAGTAAAAG-

DMC1-4 (pMM76): JV1406- -

tacgttaattaatcttgttgttgacaaaacggtctttaaaagttcctaTATTAGTTAGTAAAAGAAAGGGG

-

DMC1-5 (pMM71): JV1406- -

tacgttaattaatcttgttgttgacaaaacggtctataaaaattaacttttgacaaaacatagtttcatcaaaaagatta

atGTTCCTaTCCAAATTATTAGTTAGTAAAAG-

DMC1-6 (pMM72): JV1406- -

tacgttaattaatcttgttgttgacaaaacggtctataaaaaatatctttacacaatacatagtttcatcaaaaagatta

atGTTCCTaTCCAAATTATTAGTTAGTAAAAG-

REC102-1 (pMM102): JV1177 (5'-

ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaaTGAAAGTATGTTTCTCC

TAGCA-3') - JV1178 (5'-tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCT)

REC102-2 (pMM103): JV1177 - JV1179 (5'-

tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCTGGTACGAGTGACAAAG

CTAggtGTTATTATAGTTAGTA-3')

REC102-3 (pMM104): JV1177 - JV1180 (5'-

tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCaccTACGAGTGACAAAGC

TAggtGTTATTATAGTTAGTA-3')

REC102-4 (pMM105): JV1177 - JV1181 (5'-

tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCTGGTACGAGTGACAAAG

CatgCAGTTA-3')


REC102-5 (pMM106): JV1177-JV1182 (5'-

tacgttaattaaAAAATTCAGTTTAATCTTCACTTTTCCTGcatCGAGTGACAAAGCa

tgCAGTTA-3')

APE2 WT (pJPV8): -

ctaggatccccggcgactcttttagatttttcacgcttcactgcttttttcttcccaaatgTGCCAAAAAACGTCA

CAATTAC- -J -ggaagttaattaaAGGAACGACATTATCAGGAAG-

APE2 mutA (pJPV12): JV1517- -

ggaagttaattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTTTT

ACTGGTCATTTTAGTTGTCCTTCGTACTTTTGGGTACAtTTAATAGAGAATG

TAAG-

APE2 mutB (pJPV14): JV1517- -


ACTGGTCATTTTAGTTGTCCTTCGTACTTTTGGGTAtttTTAATAGAGAATGT

AAG-

APE2 mutC (pJPV18): JV1517- -


ACTGGTCATTTTAGTTGTCCTTaaTACTTTTGGGTAttATTAATAGAGAATGT

AAG- )

APE2 mutD (pJPV16): JV1517-JV1213 ( -


ACTGGTCATTTTAGTTGTCCTTacatCTTTTGGGatgtaTTAATAGAGAATGTAA

G-

APE2 mut I (pJPV9): JV1517- -

GAAGTTAattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTT

TTACTGATTTTAGTTGTCCTTCGTACTTTTGGtTtgAtTTAATAGAGAATGTA

AG-

APE2 mut II (pJPV10): JV1517- -

GAAGTTAattaaAGGAACGACATTATCAGGAAGAATTTCACGATTTGGGGTT

TTACTGGTCATTTTAGTTGTCCTTCGTACTTTTGGtTtgAATTAATAGAGAAT

GTAAG-


S. cerevisiae intron dataset

We downloaded the annotation and genomic sequence of S. cerevisiae from the

Saccharomyces Genome Database (SGD July 2009) (Hong et al., 2008). We then

extracted all introns from chromosomal genes (327) and kept only those that had

length > 0 nt, canonical splice sites (

not have any ambiguous nucleotide (N) in the sequence, obtaining a final set of 282

introns.

B ranch Site Prediction

To predict branch sites introns were scanned for NNNTRACNN motifs up to 200 nt

upstream f

TACTRACNN sequence were predicted as BS. When several motifs with identical

Hamming distances were found, an additional selection based on potential base

pairing to U2 snRNA was applied (RNAcofold (Hofacker, 2009)), forcing the

unpairing of the branching A. If several motifs had the same potential, the closer to

Secondary structure prediction

For each intron, we recovered the sequence between the BS and

both signals. From this region we further removed the first eight nucleotides after the

BS A, as previous experiments show that these nucleotides cannot belong to a

secondary structure. In the selected region we did the secondary structure prediction

using the program RNAfold from the Vienna package (Hofacker, 2009) with default

parameters.

Homologous introns dataset

We used Galaxy (Giardine et al., 2005) to extract the homologous regions to the S.

cerevisae introns in 5 Saccharomyces species (S. paradoxus, S. mikatae, S.

kudriavzevii, S. bayanus, S. castellii). We extracted the genomic alignments for the

yeast species provided by UCSC (Kuhn et al., 2009) and kept only those containing

canonical splice sites and no ambiguous nucleotides in the sequence (Supplementary

Table S1). For each of the homologous introns obtained we did independent BS

predictions applying the same method used for S. cerevisiae. Subsequently, for each

S. cerevisiae intron we built a multiple sequence with its putative homologs using


PRANK (Loytynoja and Goldman, 2008). In the webpage we show only those introns

with homologous BS (those that contained the BS aligned with the S. cerevisiae BS).

E ffective distance measure

between the A of the BS and

TACTAACACNNNN|TAG would be a distance of 10 nt. We defined the effective

BS-

after removing the secondary structure. More specifically, we removed all the bases

that were part of a structured region, and 2 bases corresponding to the beginning and

the end of the structured region were added substituting each structured region.

E ffective distance for random sets

To assess the significance of our findings regarding the effective distance we

generated two sets of random sequences to compare to. First, for each of the

8 nt after the Branch Site A. From each of these sequences we then built 1000

randomized sequences (i.e. maintaining the sequence content and length distribution).

To generate the second set, for each of the real-sequence length we extracted 1000

random sequences of the same length from the genome (i.e. maintaining only the

same length distribution). Then, for each of these sequences we did an RNA structure

prediction using the program RNAfold from the Vienna package (Hofacker, 2009)

with default parameters and measured the effective distance as explained before. For

each of the 2000 random datasets (1000 sets composed of 282 randomized sequences

and 1000 sets composed of 282 random sequences) we measured the maximum

effective length obtained in each of them. Both distributions are significantly different

from the real dataset (Figure S1), with the calculated empirical p-values for the

comparisons to randomized introns and random introns are 0.002 and 0.008,

respectively.

Accessibility measurement

Accessibility is defined as 1 minus de probability of being paired (pair probability).

We calculated pair probabilities using the program RNAfold (Hofacker, 2009). For


nucleotides (HAG) using 4 different windows of increasing length size with lengths n,

n+5, n+10, and n+15 respectively, where n is the BS-HAG distance. The final value is

the average of the values of the three nucleotides averaged over the four windows.

Conservation analysis

We downloaded the phastCons conservation scores from the 7 yeast multiple species

alignment from UCSC (Fujita et al., 2010). PhastCons scores give the probability that

a given nucleotide is part of a conserved element (in a multiple alignment)

considering the information of the alignment column and the flanking alignment

columns, and taking into account the phylogeny of the species used (Siepel et al.,

2004)

both signals) and obtained the PhastCons conservation scores for each of them. Next,

we divided the data in three groups according to the localization of the nucleotides in

the secondary structure (Figure S1). OUT if the nucleotides were outside the

secondary structure; STEM if the nucleotides were base-paired in the predicted

optimal structure; LOOP/BULGE if the nucleotides were in unpaired regions from the

optimal structure predicted. The distribution of the values is shown in Figure S1. We

found that there is a gradient in the conservation, being the nucleotides in stems the

most conserved and nucleotides outside the secondary structure the least conserved.

Motif search

We looked for motifs overrepresented or underrepresented inside the predicted

optimal secondary structures. For each BS-

collected all 4-mers and 5-mers that were fully inside (including loops and bulges) or

outside of the secondary structure. For each of the motifs inside secondary structures,

we checked if it is overrepresented or underrepresented compared to the same motif

outside the secondary structure by using a zscore

zscore

XaNa

XbNb

Na1 Nb

1 Xa XbNa Nb

1Xa XbNa Nb

were Xa is the number of counts of a given k-mer inside the secondary structure, Xb

is the number of counts of the same k-mer outside the secondary structure, Na is the


total amount of k-mer counts inside the secondary structure and Nb is the total

amount of k-mer counts outside the secondary structure. For each zscore, we

calculated a two-tailed p-value assuming a normal distribution. We did multiple

testing correction to all the p-values applying the Benjamini and Hochberg False

Discovery Rate (Benjamini et al., 1995). Then, for each of the significant k-mers, we

computed the average phastCons scores inside and outside of the secondary structure.

The complete list of all significant k-mers is shown in the Table S2.

For comparison, we collected all k-mers that were fully in stems, in loops or bulges,

or outside secondary structures (Table S3).

Suboptimal structure prediction

For each

energy varies at the most 5% relative to the optimal secondary structure using the

program RNAsubopt (Wuchty et al., 1999), similarly to what has been done

previously (Rogic et al., 2008). Using these structures, we can predict a distribution of

effective distances for each of the introns analyzed, which is summarized Figures S1.

Furthermore, if we compare the distribution of effective distances obtained using the

optimal secondary structure and the 1000 suboptimal structures per intron for all

introns, for introns with a predicted optimal secondary structure, or for introns

without a secondary structure; with the effective length distributions obtained for the

same introns using only the optimal secondary structure, we find that there are no

significant differences (Wilcoxon signed-ranked test all introns p-value= 0.019;

introns with a predicted optimal secondary structure p-value= 0.036; without a

predicted optimal secondary structure p-value = 0.343).


SUPPL E M E N T A L R E F ER E N C ES Benjamini, Y., and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological). 57, 289-300 Fujita, P.A, Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., Diekhans, M., Dreszer, T.R., Giardine, B.M., Harte, R.A., Hillman-Jackson, J., Hsu, F., Kirkup, V., Kuhn, R.M., Learned, K., Li, C.H., Meyer, L.R., Pohl, A., Raney, B.J., Rosenbloom, K.R., Smith, K.E., Haussler, D., Kent, W.J., (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res. (Database issue): D876-82. Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., et al. (2005). Galaxy: a platform for interactive large-scale genome analysis. Genome Res 15, 1451-1455. Hofacker, I.L. (2009). RNA secondary structure analysis using the Vienna RNA package. Curr Protoc Bioinformatics Chapter 12, Unit12 12. Hong, E.L., Balakrishnan, R., Dong, Q., Christie, K.R., Park, J., Binkley, G., Costanzo, M.C., Dwight, S.S., Engel, S.R., Fisk, D.G., Hirschman, J.E., Hitz, B.C., Krieger, C.J., Livstone, M.S., Miyasato, S.R., Nash, R.S., Oughtred, R., Skrzypek, M.S., Weng, S., Wong, E.D., Zhu, K.K., Dolinski, K., Botstein, D., Cherry, J.M. (2008) Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. 36 (Database issue):D577-581. Kos, M., and Tollervey, D. (2010). Yeast pre-rRNA processing and modification occur cotranscriptionally. Mol Cell 37, 809-820. Kuhn, R.M., Karolchik, D., Zweig, A.S., Wang, T., Smith, K.E., Rosenbloom, K.R., Rhead, B., Raney, B.J., Pohl, A., Pheasant, M., et al. (2009). The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 37, D755-761. Lesser, C.F., and Guthrie, C. (1993). Mutational analysis of pre-mRNA splicing in Saccharomyces cerevisiae using a sensitive new reporter gene, CUP1. Genetics 133, 851-863. Rogic, S., Montpetit, B., Hoos, H. H., Mackworth, A. K., Ouellette, B. F., and Hieter, P. (2008). Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiae. BMC Genomics 9, 355-373 Loytynoja, A., and Goldman, N. (2008). Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320, 1632-1635. Siatecka, M., Reyes, J.L., and Konarska, M.M. (1999). Functional interactions of Prp8 with both splice sites at the spliceosomal catalytic center. Genes Dev 13, 1983-1993. Siepel, A., Haussler, D. (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 21, 468-488. Waterhouse, A.M., Procter, J.B., Martin, D.M., Clamp, M., Barton, G.J., (1999). Jalview Version 2 -- a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009 25, 1189-1191. Wuchty, S., Fontana, W., Hofacker, I.L., Schuster, P. (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers. 49, 145-165.

Running te: An RNA thermosensor controls APE2 3'ss selection.

Documents