Top Banner
Published online 4 April 2008 Nucleic Acids Research, 2008, Vol. 36, No. 9 3001–3010 doi:10.1093/nar/gkn142 Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components Marcela Da ´ vila Lo ´ pez 1 , Magnus Alm Rosenblad 2 and Tore Samuelsson 1, * 1 Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, Box 440 and 2 Department of Cell and Molecular Biology, University of Gothenburg, Box 462, SE-405 30 Go ¨ teborg, Sweden Received February 8, 2008; Revised March 13, 2008; Accepted March 14, 2008 ABSTRACT The RNA molecules of the spliceosome are critical for specificity and catalysis during splicing of eukar- yotic pre-mRNA. In order to examine the evolution and phylogenetic distribution of these RNAs, we analyzed 149 eukaryotic genomes representing a broad range of phylogenetic groups. RNAs were predicted using high-sensitivity local alignment methods and profile HMMs in combination with covariance models. The results provide the most comprehensive view so far of the phylogenetic dis- tribution of spliceosomal RNAs. RNAs were pre- dicted in many phylogenetic groups where these RNA were not previously reported. Examples are RNAs of the major (U2-type) spliceosome in all fungal lineages, in lower metazoa and many proto- zoa. We also identified the minor (U12-type) spli- ceosomal U11 and U6atac RNAs in Acanthamoeba castellanii, where U12 spliceosomal RNA as well as minor introns were reported recently. In addition, minor-spliceosome-specific RNAs were identified in a number of phylogenetic groups where previously such RNAs were not observed, including the nema- tode Trichinella spiralis, the slime mold Physarum polycephalum and the fungal lineages Zygomycota and Chytridiomycota. The detailed map of the dis- tribution of the U12-type RNA genes supports an early origin of the minor spliceosome and points to a number of occasions during evolution where it was lost. INTRODUCTION An essential step of gene expression in eukaryotes is the removal of introns from the pre-mRNA and the ligation of exons to form the mature RNA. It occurs by two sequential trans-esterification reactions and is catalyzed by a multicomponent complex, the spliceosome (1). To date, two intron classes are known, a U2-type and a low- abundance U12-type. Splicing of U2-type introns is catal- yzed by the U2-dependent (major) spliceosome, which includes the U1, U2, U4, U5 and U6 spliceosomal RNAs as well as multiple protein factors. The U12-dependent (minor) spliceosome, responsible for the excision of the U12-type introns, is structurally similar to the U2-type spliceosome. It contains protein subunits and the U5 RNA as well as the U11, U12, U4atac and U6atac spliceosomal RNAs that are functionally and structurally related to the U1, U2, U4 and U6 RNAs of the major spliceosome. All spliceosomal RNAs, except U6 and U6atac, are synthesized by Pol II (2) and contain a conserved single-stranded region, referred to as the Sm site, with the consensus PuAU 4-6 GPu that is normally flanked by two hairpins and serves as the binding site for the Sm proteins (3). For U2-type introns, spliceosome assembly is initiated by the interaction of U1 snRNP with the 5 0 splice site and U2 snRNP with the branch site. Here, the U1 and U2 RNAs play important roles as they pair with 5 0 splice site and branch site sequences, respectively. A U4–U5–U6 tri-snRNP complex, where U4 and U6 RNA are asso- ciated by base-pairing, associates with U1, U2 and the pre-mRNA to form a spliceosome. Structural rearrange- ments then take place such that U6 separates from U4 to allow pairing between U6 and U2. U6 also interacts with the 5 0 splice site and U1 is displaced from the spliceosome. The U6/U2 complex plays an important role in the catalytic reaction (4). Assembly of the U12-dependent spliceosome is similar to that of the U2-dependent spliceosome but a major difference is that U11 and U12 snRNPs form a highly stable di-snRNP that binds cooperatively to the 5 0 splice site and branch site (5,6). U2-type introns are ubiquitous in eukaryotes while U12-type introns have so far been demonstrated only in vertebrates, insects, cnidarians (7), Rhizopus oryzae, Phytophthora and Acanthamoeba castellanii (8). They are *To whom correspondence should be addressed. Tel: +46 31 786 3468; Fax: +46 31 41 6108; Email: [email protected] ß 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
10

Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

May 11, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

Published online 4 April 2008 Nucleic Acids Research 2008 Vol 36 No 9 3001ndash3010doi101093nargkn142

Computational screen for spliceosomal RNA genesaids in defining the phylogenetic distribution ofmajor and minor spliceosomal componentsMarcela Davila Lopez1 Magnus Alm Rosenblad2 and Tore Samuelsson1

1Department of Medical Biochemistry and Cell Biology Institute of Biomedicine Sahlgrenska Academy Box 440and 2Department of Cell and Molecular Biology University of Gothenburg Box 462 SE-405 30 Goteborg Sweden

Received February 8 2008 Revised March 13 2008 Accepted March 14 2008

ABSTRACT

The RNA molecules of the spliceosome are criticalfor specificity and catalysis during splicing of eukar-yotic pre-mRNA In order to examine the evolutionand phylogenetic distribution of these RNAs weanalyzed 149 eukaryotic genomes representing abroad range of phylogenetic groups RNAs werepredicted using high-sensitivity local alignmentmethods and profile HMMs in combination withcovariance models The results provide the mostcomprehensive view so far of the phylogenetic dis-tribution of spliceosomal RNAs RNAs were pre-dicted in many phylogenetic groups where theseRNA were not previously reported Examples areRNAs of the major (U2-type) spliceosome in allfungal lineages in lower metazoa and many proto-zoa We also identified the minor (U12-type) spli-ceosomal U11 and U6atac RNAs in Acanthamoebacastellanii where U12 spliceosomal RNA as well asminor introns were reported recently In additionminor-spliceosome-specific RNAs were identified ina number of phylogenetic groups where previouslysuch RNAs were not observed including the nema-tode Trichinella spiralis the slime mold Physarumpolycephalum and the fungal lineages Zygomycotaand Chytridiomycota The detailed map of the dis-tribution of the U12-type RNA genes supports anearly origin of the minor spliceosome and points toa number of occasions during evolution where itwas lost

INTRODUCTION

An essential step of gene expression in eukaryotes is theremoval of introns from the pre-mRNA and the ligationof exons to form the mature RNA It occurs by twosequential trans-esterification reactions and is catalyzed by

a multicomponent complex the spliceosome (1) To datetwo intron classes are known a U2-type and a low-abundance U12-type Splicing of U2-type introns is catal-yzed by the U2-dependent (major) spliceosome whichincludes the U1 U2 U4 U5 and U6 spliceosomal RNAsas well as multiple protein factors The U12-dependent(minor) spliceosome responsible for the excision of theU12-type introns is structurally similar to the U2-typespliceosome It contains protein subunits and the U5RNA as well as the U11 U12 U4atac and U6atacspliceosomal RNAs that are functionally and structurallyrelated to the U1 U2 U4 and U6 RNAs of the majorspliceosomeAll spliceosomal RNAs except U6 and U6atac

are synthesized by Pol II (2) and contain a conservedsingle-stranded region referred to as the Sm site withthe consensus PuAU4-6GPu that is normally flanked bytwo hairpins and serves as the binding site for the Smproteins (3)For U2-type introns spliceosome assembly is initiated

by the interaction of U1 snRNP with the 50 splice site andU2 snRNP with the branch site Here the U1 and U2RNAs play important roles as they pair with 50 splice siteand branch site sequences respectively A U4ndashU5ndashU6tri-snRNP complex where U4 and U6 RNA are asso-ciated by base-pairing associates with U1 U2 and thepre-mRNA to form a spliceosome Structural rearrange-ments then take place such that U6 separates from U4 toallow pairing between U6 and U2 U6 also interacts withthe 50 splice site and U1 is displaced from the spliceosomeThe U6U2 complex plays an important role in thecatalytic reaction (4) Assembly of the U12-dependentspliceosome is similar to that of the U2-dependentspliceosome but a major difference is that U11 and U12snRNPs form a highly stable di-snRNP that bindscooperatively to the 50 splice site and branch site (56)U2-type introns are ubiquitous in eukaryotes while

U12-type introns have so far been demonstrated onlyin vertebrates insects cnidarians (7) Rhizopus oryzaePhytophthora and Acanthamoeba castellanii (8) They are

To whom correspondence should be addressed Tel +46 31 786 3468 Fax +46 31 41 6108 Email ToreSamuelssonmedkemguse

2008 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicenses

by-nc20uk) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited

absent from the yeast Saccharomyces cerevisiae (9) andfrom the nematode Caenorhabditis elegans (7) In order tounderstand the evolution of the splicing machinery and ofspliceosomal RNAs we wanted to systematically examinethe phylogenetic distribution of these RNAs In generalncRNAs are poorly conserved in sequence but each classof ncRNA is typically characterized by a specific second-ary structure This is also true for spliceosomal RNAsalthough many spliceosomal RNAs are conserved also insequence like U2 and U6 RNAs (10) Nevertheless forsome spliceosomal RNAs the primary sequence is highlyvariable In the case of U1 RNA also the secondarystructure is subject to variation as observed in yeast (11)and in Trypanosoma (12) Therefore the computationalidentification of spliceosomal RNA genes as with manyother noncoding RNA genes is challenging A largenumber of spliceosomal RNAs from different organismshave been identified experimentally as well as computa-tionally (13) and have been deposited in sequence data-bases For instance a large number of spliceosomal RNAsequences are available in the Rfam database (13) aimedat prediction of ncRNAs using covariance models (14)However there are phylogenetic groups where spliceoso-mal RNAs have not been identified and it is not clearwhether this is due to poor performance of predictionmethods or because such RNAs are lacking in theseorganisms In order to improve on this situation we havedeveloped a simple protocol for computational identifica-tion of spliceosomal RNA based on local alignmentmethods profile HMMs and covariance models (14)Our method is efficient as we are able to present a largenumber of previously unrecognized spliceosomal RNAorthologues

MATERIALS AND METHODS

Sources of genomic and protein sequences

Genomic sequences were obtained from NCBI (httpwwwncbinlmnihgoventrez ftpncbinihgovgenomes)EMBL (httpwwwebiacuk) ENSEMBL (httpwwwensemblorg) TraceDB (ftpncbinlmnihgovpubTraceDB) TIGR (ftpftptigrorgpubdata) the USDepartment of Energy Joint Genome Institute (httpwwwjgidoegov) the WU Genome Sequencing Center(httpgenomewustledu) the Sanger Institute (httpwwwsangeracuk) the HGSC at Baylor College (httpwwwhgscbcmtmceduprojects) as well as specificGenome Project Databases CryptoDB (httpwwwcryptodborgcryptodb) PlasmoDB (httpwwwplasmodborg) GiardiaDB (httpwwwjbpcmbleduGiardia-HTMLindex2html) ToxoDB (httpwwwtoxodborgtoxohomejsp) DictyBase (httpdictybaseorg) theCyanidioschyzon merolae Genome Project (httpmerolaebiolsu-tokyoacjp) and the Galdieria sulphurariaGenome Project (httpgenomicsmsuedugaldieria)Access to the provisional 4 assembly of Mucor circinel-loides genome was granted by the DoE Joint GenomeInstitute and the Mucor genome project (httpmucorgenumes) More details on database versions are in

Supplementary Data 4 Protein sequences were retrievedfrom Uniprot (httpbetauniprotorg)

Identification of spliceosomal RNA orthologues

Sequences of RNAs annotated as spliceosomal RNAs(U1 U2 U4 U5 U6 U11 U12 U4atac and U6atac)were assembled (Supplementary Data 1) from Rfam (13)These sequences were used as initial queries withBLASTN (15) and FASTA (16) against genomicsequences of the organisms listed in Supplementary Data4 The E-value threshold was set to 10 while the word sizewas 7 and 6 for BLAST and FASTA respectively Hitsincluding 200 nt upstream and downstream sequenceswere retrieved and analyzed with cmsearch of the Infernalpackage (14) using the relevant covariance model fromRfam For yeast U1 RNA the Rfam model specific to thisgroup was used

A threshold was then set for each one of the spliceosomalRNAs that was based on the initial query RNA giving riseto the lowest score from cmsearch These threshold valueswere for U1 U2 U4 U5 U6 U11 U12 U4atac andU6atac 5574 5275 4063 6060 5424 3680 5174 4003and 3923 respectively All sequences above these thresh-old values were considered as reliable predictions Forspecies where sequences with scores above threshold werefound all sequences below the threshold were discardedFor the remaining species sequences with a score below thethreshold but greater than 15 were considered for furtheranalysis Considering relatively low scores was in this casemotivated by the fact that Rfam covariance models tendto be phylogenetically biased as sequences from mam-mals and other well studied species are overrepresentedRelatively few sequences (between 1 and 12 depend-ing on the RNA family) belonged to this category of low-scoring sequences They were evaluated using a procedurewhere the presence of specific conserved primary sequencemotifs as well secondary structure was examined Werequired exact matches to the primary sequence motifslisted in Supplementary Data 4 The cmsearch output wasused to produce structure plots based on the covariancemodel used These plots were manually browsed to verifythe presence of secondary structure elements according tothe consensus secondary structure of the specific spliceo-somal RNA If relevant primary and secondary structurefeatures were present the sequences were considered asreliable predictions

The resulting predicted sequences were then used asqueries in a second round of searches to retrieve homo-logues in species where that particular RNA orthologuewas not identified The resulting hits were analyzed asdescribed earlier and any reliable predictions obtained wereused in yet another round of searches This procedure wasrepeated until no more significant hits were retrieved

In species where we were not able to find a reliablespliceosomal RNA WU-BLAST blastn was used withword size 2 Sequences identified in such searches wereanalyzed with the respective covariance model and usingthe same criteria as described earlier in order to identifyreliable predictions which were used in a second round ofsearches In addition we performed hmmsearch searches

3002 Nucleic Acids Research 2008 Vol 36 No 9

using HMM models based on Rfam alignments againstthe set of reliable spliceosomal RNA sequences andagainst genomes where we did not previously find aspecific RNA Sequences with E-values lower than 10 wereretrieved and examined with cmsearch and processed asdescribed earlier All 17 136 sequences considered asreliable candidates identified in this study togetherwith the 356 sequences used as initial queries are inSupplementary Data 1

Multiple alignments of sequences were created usingClustalW 183 (17) or T-Coffee (18) Alignments obtainedwith cmalign of Infernal are shown in SupplementaryData 3 Secondary structure was predicted with MFOLD(19) as well as with Infernal

RESULTS AND DISCUSSION

Spliceosomal RNAs may be efficiently identified using acombination of high-sensitivity local alignment methodsprofile HMMs and covariance models

Genomic sequences from 149 eukaryotic organisms(Figure 1) were analyzed with respect to spliceosomalRNAs The Infernal software (14) to identify ncRNAsusing covariance models is effective to identify members ofa specific RNA family but is computationally demandingand not practical for the analysis of large genomesTherefore a first step to filter sequences is necessary Inour method we used NCBI BLAST (wordsize 7) andFASTA (wordsize 6) RNA sequences represented in theRfam database and annotated as spliceosomal RNAs (U1U2 U4 U5 U6 U11 U12 U4atac and U6atac) were firstassembled (a total of 356 sequences see SupplementaryData 1) and used as queries in BLAST and FASTAsearches against genomic sequences To maximize sensi-tivity we considered all hits irrespective of E-valueidentified in these searches for analysis with covariancemodels of spliceosomal RNAs collected from Rfam

The predictions that represented novel spliceosomalRNA sequences and that were considered reliable (seeunder Materials and Methods section for details) wereused in a second round of searches to search againstorganisms where we were missing the respective spliceo-somal RNA Novel hits were analyzed as described withcovariance models and this procedure was repeated untilno further reliable predictions could be obtained Forgenomes where we were not able to find a spliceosomalRNA orthologue using BLAST or FASTA we also madeuse of WU-BLAST (wordsize 2) [Gish W (1996ndash2004)httpblastwustledu] and HMMER searches (httphmmerjaneliaorg) using profile HMM models basedon Rfam alignments For comparison we also usedInfernal to analyze the following genomes with selectedcovariance models without any initial filtering ofsequences Trichinella spiralis (U11 U12 and U4atac)A castellanii (U11 U4atac) G sulphuraria (U1 U2 U4U5 and U6) Giardia lamblia (U1 U2 U4 U5 and U6)Physarum polycephalum (U11 U12 and U4atac)Naegleria gruberi (U1) Trichomonas vaginalis(U1) Batrachochytrium dendrobatidis (U1 U11 andU12) Antonospora locustae (U1) Encephalitozoon

cuniculi (U1) Phycomyces blakesleeanus (U12) andCyanidioschyzon merolae (U1 U2 U4 U5 and U6)We thus obtained 17 136 sequences predicted as

spliceosomal RNAs All these sequences are distributedamong 147 species as shown in Figure 1 and Supplemen-tary Data 1 and 2 It should be noted that many animalsand plants have numerous copies of each RNA gene and afraction of these are fragmented genes or pseudogenes Asit is very difficult to distinguish a true gene from apseudogene using computational methods a fraction ofour candidates in animals and plants are presumablypseudogenes In some phylogenetic groups such as fungiheterokonts and Apicomplexa each of the spliceosomalRNAs are represented by one or a few genes and in thiscase the predicted sequences are more likely to be bonafide spliceosomal RNA genesThe results using the different methods NCBI BLAST

FASTA WU-BLAST and HMMER are comparedin Figure 2 As expected the sensitivity of FASTAWU-BLAST and HMMER was much greater than that ofNCBI BLAST (W7) and HMMER is the most sensitivemethod of the four As there are speed disadvantages toWU-BLAST and HMMER we did not systematicallyexamine every possible genome with these methodsInstead we searched with these methods only genomeswhere we did not find a specific spliceosomal RNA withFASTA or BLAST We also analyzed with WU-BLASTand HMMER all the RNA sequences found usingFASTA and BLAST In the results shown in Figure 2therefore the efficiency of WU-BLAST and HMMER isprobably underestimated In general the results obtainedin our searches are consistent with previous results (20)where different software including programs used herewere tested against a set of previously known ncRNAsIn summary our results demonstrate that sensitive

sequence alignment methods including profile HMMsare important as a first filtering step to identify ncRNAcandidates In addition a combination of the differentmethods maximizes sensitivity There are two RNAsfrom G sulphuraria and A locustae (Figure 1) that couldonly be identified using an Infernal search against thecomplete genome This finding illustrates that we could belacking more orthologues not identified by HMMERWU-BLAST or FASTA However it is our impressionthat very few RNA genes are missed by the initial screenDuring the production of this article spliceosomal

RNA sequences from Plasmodium (21) Entamoebahistolytica (22) as well as Candida albicans and otherhemiascomycetous yeasts (11) were identified and char-acterized All these sequences are identical to thesequences identified in the present study providingsupport to the reliability of our prediction method

RNAs of the major spliceosome

The spliceosomal RNAs as identified here are summarizedin Figure 1 (actual sequences are in SupplementaryData 1)In 107 species we are able to report one or more ofspliceosomal RNAs where that particular RNA had notbeen reported before as highlighted in the Figure 1 withblue boxes As a guide to the phylogenetic relationships

Nucleic Acids Research 2008 Vol 36 No 9 3003

Figure 1 Phylogenetic distribution of the major and minor spliceosomal RNAs Results of computational prediction of spliceosomal RNAsGreen boxes show instances where a sequence was previously known and it was used as query in searches Blue boxes show RNAs predicted in this workA star () indicates that an RNAwas previously described in the literature (Supplementary Data 5) and lsquoRrsquo shows that the sequence was in Rfam (13) Alsoindicated are sequences known to have introns (lsquoirsquo) and U6atac RNAs of the CC variant type (lsquocrsquo see Results and Discussion section)

3004 Nucleic Acids Research 2008 Vol 36 No 9

between the species investigated here a schematic phylo-genetic tree is shown in Figure 3

We have analyzed RNAs of the U2-type spliceosomeas well as those of the minor U12-type spliceosome(Figures 1 and 3) As to the U2-type we have identifiedsuch RNAs in virtually every species examined The onlyexceptions are the red alga C merolae and the deeplybranching protist G lamblia where we could notidentify any spliceosomal RNAs Of spliceosomalRNAs in T vaginalis only the U2 RNA was identifiedT vaginalis possesses a gene encoding the essentialspliceosomal component PRP8 (23) as well as manyputative introns (24) G lamblia possesses three intronsto date (2526) and 27 spliceosomal proteins (27)In C merolae introns as well as conserved U2 and U5snRNP-protein specific subunits are known to be present(28) Therefore splicing is likely to occur in these organ-isms and it is puzzling that we fail to identify spliceosomalRNAs particularly in C merolae where the genomesequence is complete (29) This could mean that spliceo-somal RNAs are lacking and have been replaced byprotein functions But these organisms could also havespliceosomal RNAs very different from most otherspecies or these genes could be present in a part of thegenome for some reason not yet covered by the genomesequencing A U1 RNA was the only spliceosomal RNAthat we identified in G sulphuraria another red algaAdditional spliceosomal RNAs might be found once itsgenome is fully sequenced

RNA components of the major spliceosome areknown to be present in fungi and recently the evolutionof such RNAs in the hemiascomycetous yeasts wasexamined (11) Major spliceosomal RNAs have previously

been reported in the Basidiomycota Rhodotorula (3031)and Cryptococcus neoformans (Rfam) Here we show thatsuch RNAs are ubiquitous in the Basidiomycota lineage(Figure 1) More deeply branching in the fungi tree areZygomycota and Chytridiomycota (Figures 1 and 3)We show for the first time that spliceosomal RNAs arepresent in the Zygomycota P blakesleeanus R oryzaeand M circinelloides as well as in the ChytridiomycotaB dendrobatidis Spizellomyces punctatus and AllomycesmacrogynusThe microsporidia are believed to be positioned close to

the root of the fungal branch They have been reducedseverly in genome size as compared to other fungi InA locustae and E cuniculi U2 and U6 orthologues arereported in Rfam Here we have also identified the U4and U5 RNA orthologues in both of these Microsporidia(Figure 1) and U1 RNA in A locustae (SupplementaryData 3) We may therefore conclude that the majorspliceosomal RNAs are ubiquitous in all fungal groupsincluding MicrosporidiaThe nucleomorphs of Guillardia theta (a cryptomonad)

and Bigelowiella natans (a chlorarachniophyte) representthe smallest eukaryotic genomes known It is interesting tonote that also in these two genomes spliceosomal RNAgenes are identified (Figure 1) With respect to G thetathe results of our predictions (only a U6 RNA) are com-pletely consistent with available annotation (Douglaset al John Archibald and Paul Gilson personal commu-nication) whereas there are differences with respect topublished annotation for the B natans genome (32)

Minor-spliceosome-specific RNAs are identified in the wormTrichinella spiralis inPhysarum polycephalum and in thefungal lineages Zygomycota and Chytridiomycota

U12-type introns were previously identified in plants inmost of the metazoan taxa including vertebrates insectsand cnidarians (7) and more recently in R oryzae ofZygomycota in Acanthamoeba and in the heterokontPhytophthora (8) Minor spliceosomal RNAs have beenfound in metazoaAcanthamoeba plants and Phytophthora(for references see Supplementary Data 5) A small numberof organisms that have been well studied seem to lack theU12-type splicing such as S cerevisiae S pombe andC elegans (733)In this investigation we discovered many novel minor

spliceosomal RNA orthologues (Figure 1) More impor-tantly phylogenetic groups are represented where suchRNAs were not previously reported These are nematodes(T spiralis) mycetozoa (P polycephalum) and the fungallineages Basidiomycota Zygomycota and Chytridiomy-cota as discussed in more detail subsequently

Trichinella spiralis The major spliceosomal RNAs ofthe nematode C elegans have been characterized (34)Previous analyses have failed to identify minor spliceoso-mal components including U12-type introns in thisorganism (7) In this investigation we analyzed differentspecies of the Rhabditida branch and Brugia malayi ofthe Chromadorea branch In neither of these speciesU12-type RNAs were identified However we identified

Figure 2 Comparison of local alignment and profile HMM methods toidentify spliceosomal RNAs Venn diagram showing the number ofspliceosomal RNA genes found by NCBI BLAST (W7) WU-BLAST(W2) FASTA (W6) and HMMER The number of species where theRNAs are distributed is shown within parentheses

Nucleic Acids Research 2008 Vol 36 No 9 3005

U11 U12 and U6atac RNAs in another nematodeT spiralis (Figure 1) Predicted secondary structures ofthese RNAs are shown in Figure 4 We also foundevidence of U11U12 specific proteins in T spiralis(Supplementary Data 4) providing further support ofa minor spliceosome in this organism

Basidiomycota Zygomycota and Chytridiomycota Nominor spliceosomal components have been described infungi except for minor spliceosomal proteins and potentialU12-type introns in R oryzae (8) a species in the fungal

Zygomycota lineage However we here identified minorspliceosomal RNA components in the ZygomycotaP blakesleeanus R oryzae M circinelloides and inthe Chytridiomycota B dendrobatidis S punctatus andA macrogynus Secondary structure predictions ofR oryzae RNAs are shown in Figure 4 and furtherstructures are shown in Supplementary Data 3 Theseresults provide strong evidence of a U12-type spliceosomein these phylogenetic groups

In the Basidiomycota phylum there was previouslyno evidence of a U12-type spliceosome However we

Figure 3 Schematic phylogenetic tree Phylogenetic groups and their relationships are shown together with example species (genus in italics) Specieswhere one or more U12-type spliceosomal RNAs were found are highlighted (red circles) as well as branches where the U12-type RNAs seem to havebeen lost (dotted lines) In the case of Basidiomycota only two different U12-type RNAs have been identified and for this reason there is only weakevidence of a minor spliceosome Numbers at branches indicate 1) number of genomes analyzed 2) number of query sequences used and 3) numberof new sequences identified

3006 Nucleic Acids Research 2008 Vol 36 No 9

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 2: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

absent from the yeast Saccharomyces cerevisiae (9) andfrom the nematode Caenorhabditis elegans (7) In order tounderstand the evolution of the splicing machinery and ofspliceosomal RNAs we wanted to systematically examinethe phylogenetic distribution of these RNAs In generalncRNAs are poorly conserved in sequence but each classof ncRNA is typically characterized by a specific second-ary structure This is also true for spliceosomal RNAsalthough many spliceosomal RNAs are conserved also insequence like U2 and U6 RNAs (10) Nevertheless forsome spliceosomal RNAs the primary sequence is highlyvariable In the case of U1 RNA also the secondarystructure is subject to variation as observed in yeast (11)and in Trypanosoma (12) Therefore the computationalidentification of spliceosomal RNA genes as with manyother noncoding RNA genes is challenging A largenumber of spliceosomal RNAs from different organismshave been identified experimentally as well as computa-tionally (13) and have been deposited in sequence data-bases For instance a large number of spliceosomal RNAsequences are available in the Rfam database (13) aimedat prediction of ncRNAs using covariance models (14)However there are phylogenetic groups where spliceoso-mal RNAs have not been identified and it is not clearwhether this is due to poor performance of predictionmethods or because such RNAs are lacking in theseorganisms In order to improve on this situation we havedeveloped a simple protocol for computational identifica-tion of spliceosomal RNA based on local alignmentmethods profile HMMs and covariance models (14)Our method is efficient as we are able to present a largenumber of previously unrecognized spliceosomal RNAorthologues

MATERIALS AND METHODS

Sources of genomic and protein sequences

Genomic sequences were obtained from NCBI (httpwwwncbinlmnihgoventrez ftpncbinihgovgenomes)EMBL (httpwwwebiacuk) ENSEMBL (httpwwwensemblorg) TraceDB (ftpncbinlmnihgovpubTraceDB) TIGR (ftpftptigrorgpubdata) the USDepartment of Energy Joint Genome Institute (httpwwwjgidoegov) the WU Genome Sequencing Center(httpgenomewustledu) the Sanger Institute (httpwwwsangeracuk) the HGSC at Baylor College (httpwwwhgscbcmtmceduprojects) as well as specificGenome Project Databases CryptoDB (httpwwwcryptodborgcryptodb) PlasmoDB (httpwwwplasmodborg) GiardiaDB (httpwwwjbpcmbleduGiardia-HTMLindex2html) ToxoDB (httpwwwtoxodborgtoxohomejsp) DictyBase (httpdictybaseorg) theCyanidioschyzon merolae Genome Project (httpmerolaebiolsu-tokyoacjp) and the Galdieria sulphurariaGenome Project (httpgenomicsmsuedugaldieria)Access to the provisional 4 assembly of Mucor circinel-loides genome was granted by the DoE Joint GenomeInstitute and the Mucor genome project (httpmucorgenumes) More details on database versions are in

Supplementary Data 4 Protein sequences were retrievedfrom Uniprot (httpbetauniprotorg)

Identification of spliceosomal RNA orthologues

Sequences of RNAs annotated as spliceosomal RNAs(U1 U2 U4 U5 U6 U11 U12 U4atac and U6atac)were assembled (Supplementary Data 1) from Rfam (13)These sequences were used as initial queries withBLASTN (15) and FASTA (16) against genomicsequences of the organisms listed in Supplementary Data4 The E-value threshold was set to 10 while the word sizewas 7 and 6 for BLAST and FASTA respectively Hitsincluding 200 nt upstream and downstream sequenceswere retrieved and analyzed with cmsearch of the Infernalpackage (14) using the relevant covariance model fromRfam For yeast U1 RNA the Rfam model specific to thisgroup was used

A threshold was then set for each one of the spliceosomalRNAs that was based on the initial query RNA giving riseto the lowest score from cmsearch These threshold valueswere for U1 U2 U4 U5 U6 U11 U12 U4atac andU6atac 5574 5275 4063 6060 5424 3680 5174 4003and 3923 respectively All sequences above these thresh-old values were considered as reliable predictions Forspecies where sequences with scores above threshold werefound all sequences below the threshold were discardedFor the remaining species sequences with a score below thethreshold but greater than 15 were considered for furtheranalysis Considering relatively low scores was in this casemotivated by the fact that Rfam covariance models tendto be phylogenetically biased as sequences from mam-mals and other well studied species are overrepresentedRelatively few sequences (between 1 and 12 depend-ing on the RNA family) belonged to this category of low-scoring sequences They were evaluated using a procedurewhere the presence of specific conserved primary sequencemotifs as well secondary structure was examined Werequired exact matches to the primary sequence motifslisted in Supplementary Data 4 The cmsearch output wasused to produce structure plots based on the covariancemodel used These plots were manually browsed to verifythe presence of secondary structure elements according tothe consensus secondary structure of the specific spliceo-somal RNA If relevant primary and secondary structurefeatures were present the sequences were considered asreliable predictions

The resulting predicted sequences were then used asqueries in a second round of searches to retrieve homo-logues in species where that particular RNA orthologuewas not identified The resulting hits were analyzed asdescribed earlier and any reliable predictions obtained wereused in yet another round of searches This procedure wasrepeated until no more significant hits were retrieved

In species where we were not able to find a reliablespliceosomal RNA WU-BLAST blastn was used withword size 2 Sequences identified in such searches wereanalyzed with the respective covariance model and usingthe same criteria as described earlier in order to identifyreliable predictions which were used in a second round ofsearches In addition we performed hmmsearch searches

3002 Nucleic Acids Research 2008 Vol 36 No 9

using HMM models based on Rfam alignments againstthe set of reliable spliceosomal RNA sequences andagainst genomes where we did not previously find aspecific RNA Sequences with E-values lower than 10 wereretrieved and examined with cmsearch and processed asdescribed earlier All 17 136 sequences considered asreliable candidates identified in this study togetherwith the 356 sequences used as initial queries are inSupplementary Data 1

Multiple alignments of sequences were created usingClustalW 183 (17) or T-Coffee (18) Alignments obtainedwith cmalign of Infernal are shown in SupplementaryData 3 Secondary structure was predicted with MFOLD(19) as well as with Infernal

RESULTS AND DISCUSSION

Spliceosomal RNAs may be efficiently identified using acombination of high-sensitivity local alignment methodsprofile HMMs and covariance models

Genomic sequences from 149 eukaryotic organisms(Figure 1) were analyzed with respect to spliceosomalRNAs The Infernal software (14) to identify ncRNAsusing covariance models is effective to identify members ofa specific RNA family but is computationally demandingand not practical for the analysis of large genomesTherefore a first step to filter sequences is necessary Inour method we used NCBI BLAST (wordsize 7) andFASTA (wordsize 6) RNA sequences represented in theRfam database and annotated as spliceosomal RNAs (U1U2 U4 U5 U6 U11 U12 U4atac and U6atac) were firstassembled (a total of 356 sequences see SupplementaryData 1) and used as queries in BLAST and FASTAsearches against genomic sequences To maximize sensi-tivity we considered all hits irrespective of E-valueidentified in these searches for analysis with covariancemodels of spliceosomal RNAs collected from Rfam

The predictions that represented novel spliceosomalRNA sequences and that were considered reliable (seeunder Materials and Methods section for details) wereused in a second round of searches to search againstorganisms where we were missing the respective spliceo-somal RNA Novel hits were analyzed as described withcovariance models and this procedure was repeated untilno further reliable predictions could be obtained Forgenomes where we were not able to find a spliceosomalRNA orthologue using BLAST or FASTA we also madeuse of WU-BLAST (wordsize 2) [Gish W (1996ndash2004)httpblastwustledu] and HMMER searches (httphmmerjaneliaorg) using profile HMM models basedon Rfam alignments For comparison we also usedInfernal to analyze the following genomes with selectedcovariance models without any initial filtering ofsequences Trichinella spiralis (U11 U12 and U4atac)A castellanii (U11 U4atac) G sulphuraria (U1 U2 U4U5 and U6) Giardia lamblia (U1 U2 U4 U5 and U6)Physarum polycephalum (U11 U12 and U4atac)Naegleria gruberi (U1) Trichomonas vaginalis(U1) Batrachochytrium dendrobatidis (U1 U11 andU12) Antonospora locustae (U1) Encephalitozoon

cuniculi (U1) Phycomyces blakesleeanus (U12) andCyanidioschyzon merolae (U1 U2 U4 U5 and U6)We thus obtained 17 136 sequences predicted as

spliceosomal RNAs All these sequences are distributedamong 147 species as shown in Figure 1 and Supplemen-tary Data 1 and 2 It should be noted that many animalsand plants have numerous copies of each RNA gene and afraction of these are fragmented genes or pseudogenes Asit is very difficult to distinguish a true gene from apseudogene using computational methods a fraction ofour candidates in animals and plants are presumablypseudogenes In some phylogenetic groups such as fungiheterokonts and Apicomplexa each of the spliceosomalRNAs are represented by one or a few genes and in thiscase the predicted sequences are more likely to be bonafide spliceosomal RNA genesThe results using the different methods NCBI BLAST

FASTA WU-BLAST and HMMER are comparedin Figure 2 As expected the sensitivity of FASTAWU-BLAST and HMMER was much greater than that ofNCBI BLAST (W7) and HMMER is the most sensitivemethod of the four As there are speed disadvantages toWU-BLAST and HMMER we did not systematicallyexamine every possible genome with these methodsInstead we searched with these methods only genomeswhere we did not find a specific spliceosomal RNA withFASTA or BLAST We also analyzed with WU-BLASTand HMMER all the RNA sequences found usingFASTA and BLAST In the results shown in Figure 2therefore the efficiency of WU-BLAST and HMMER isprobably underestimated In general the results obtainedin our searches are consistent with previous results (20)where different software including programs used herewere tested against a set of previously known ncRNAsIn summary our results demonstrate that sensitive

sequence alignment methods including profile HMMsare important as a first filtering step to identify ncRNAcandidates In addition a combination of the differentmethods maximizes sensitivity There are two RNAsfrom G sulphuraria and A locustae (Figure 1) that couldonly be identified using an Infernal search against thecomplete genome This finding illustrates that we could belacking more orthologues not identified by HMMERWU-BLAST or FASTA However it is our impressionthat very few RNA genes are missed by the initial screenDuring the production of this article spliceosomal

RNA sequences from Plasmodium (21) Entamoebahistolytica (22) as well as Candida albicans and otherhemiascomycetous yeasts (11) were identified and char-acterized All these sequences are identical to thesequences identified in the present study providingsupport to the reliability of our prediction method

RNAs of the major spliceosome

The spliceosomal RNAs as identified here are summarizedin Figure 1 (actual sequences are in SupplementaryData 1)In 107 species we are able to report one or more ofspliceosomal RNAs where that particular RNA had notbeen reported before as highlighted in the Figure 1 withblue boxes As a guide to the phylogenetic relationships

Nucleic Acids Research 2008 Vol 36 No 9 3003

Figure 1 Phylogenetic distribution of the major and minor spliceosomal RNAs Results of computational prediction of spliceosomal RNAsGreen boxes show instances where a sequence was previously known and it was used as query in searches Blue boxes show RNAs predicted in this workA star () indicates that an RNAwas previously described in the literature (Supplementary Data 5) and lsquoRrsquo shows that the sequence was in Rfam (13) Alsoindicated are sequences known to have introns (lsquoirsquo) and U6atac RNAs of the CC variant type (lsquocrsquo see Results and Discussion section)

3004 Nucleic Acids Research 2008 Vol 36 No 9

between the species investigated here a schematic phylo-genetic tree is shown in Figure 3

We have analyzed RNAs of the U2-type spliceosomeas well as those of the minor U12-type spliceosome(Figures 1 and 3) As to the U2-type we have identifiedsuch RNAs in virtually every species examined The onlyexceptions are the red alga C merolae and the deeplybranching protist G lamblia where we could notidentify any spliceosomal RNAs Of spliceosomalRNAs in T vaginalis only the U2 RNA was identifiedT vaginalis possesses a gene encoding the essentialspliceosomal component PRP8 (23) as well as manyputative introns (24) G lamblia possesses three intronsto date (2526) and 27 spliceosomal proteins (27)In C merolae introns as well as conserved U2 and U5snRNP-protein specific subunits are known to be present(28) Therefore splicing is likely to occur in these organ-isms and it is puzzling that we fail to identify spliceosomalRNAs particularly in C merolae where the genomesequence is complete (29) This could mean that spliceo-somal RNAs are lacking and have been replaced byprotein functions But these organisms could also havespliceosomal RNAs very different from most otherspecies or these genes could be present in a part of thegenome for some reason not yet covered by the genomesequencing A U1 RNA was the only spliceosomal RNAthat we identified in G sulphuraria another red algaAdditional spliceosomal RNAs might be found once itsgenome is fully sequenced

RNA components of the major spliceosome areknown to be present in fungi and recently the evolutionof such RNAs in the hemiascomycetous yeasts wasexamined (11) Major spliceosomal RNAs have previously

been reported in the Basidiomycota Rhodotorula (3031)and Cryptococcus neoformans (Rfam) Here we show thatsuch RNAs are ubiquitous in the Basidiomycota lineage(Figure 1) More deeply branching in the fungi tree areZygomycota and Chytridiomycota (Figures 1 and 3)We show for the first time that spliceosomal RNAs arepresent in the Zygomycota P blakesleeanus R oryzaeand M circinelloides as well as in the ChytridiomycotaB dendrobatidis Spizellomyces punctatus and AllomycesmacrogynusThe microsporidia are believed to be positioned close to

the root of the fungal branch They have been reducedseverly in genome size as compared to other fungi InA locustae and E cuniculi U2 and U6 orthologues arereported in Rfam Here we have also identified the U4and U5 RNA orthologues in both of these Microsporidia(Figure 1) and U1 RNA in A locustae (SupplementaryData 3) We may therefore conclude that the majorspliceosomal RNAs are ubiquitous in all fungal groupsincluding MicrosporidiaThe nucleomorphs of Guillardia theta (a cryptomonad)

and Bigelowiella natans (a chlorarachniophyte) representthe smallest eukaryotic genomes known It is interesting tonote that also in these two genomes spliceosomal RNAgenes are identified (Figure 1) With respect to G thetathe results of our predictions (only a U6 RNA) are com-pletely consistent with available annotation (Douglaset al John Archibald and Paul Gilson personal commu-nication) whereas there are differences with respect topublished annotation for the B natans genome (32)

Minor-spliceosome-specific RNAs are identified in the wormTrichinella spiralis inPhysarum polycephalum and in thefungal lineages Zygomycota and Chytridiomycota

U12-type introns were previously identified in plants inmost of the metazoan taxa including vertebrates insectsand cnidarians (7) and more recently in R oryzae ofZygomycota in Acanthamoeba and in the heterokontPhytophthora (8) Minor spliceosomal RNAs have beenfound in metazoaAcanthamoeba plants and Phytophthora(for references see Supplementary Data 5) A small numberof organisms that have been well studied seem to lack theU12-type splicing such as S cerevisiae S pombe andC elegans (733)In this investigation we discovered many novel minor

spliceosomal RNA orthologues (Figure 1) More impor-tantly phylogenetic groups are represented where suchRNAs were not previously reported These are nematodes(T spiralis) mycetozoa (P polycephalum) and the fungallineages Basidiomycota Zygomycota and Chytridiomy-cota as discussed in more detail subsequently

Trichinella spiralis The major spliceosomal RNAs ofthe nematode C elegans have been characterized (34)Previous analyses have failed to identify minor spliceoso-mal components including U12-type introns in thisorganism (7) In this investigation we analyzed differentspecies of the Rhabditida branch and Brugia malayi ofthe Chromadorea branch In neither of these speciesU12-type RNAs were identified However we identified

Figure 2 Comparison of local alignment and profile HMM methods toidentify spliceosomal RNAs Venn diagram showing the number ofspliceosomal RNA genes found by NCBI BLAST (W7) WU-BLAST(W2) FASTA (W6) and HMMER The number of species where theRNAs are distributed is shown within parentheses

Nucleic Acids Research 2008 Vol 36 No 9 3005

U11 U12 and U6atac RNAs in another nematodeT spiralis (Figure 1) Predicted secondary structures ofthese RNAs are shown in Figure 4 We also foundevidence of U11U12 specific proteins in T spiralis(Supplementary Data 4) providing further support ofa minor spliceosome in this organism

Basidiomycota Zygomycota and Chytridiomycota Nominor spliceosomal components have been described infungi except for minor spliceosomal proteins and potentialU12-type introns in R oryzae (8) a species in the fungal

Zygomycota lineage However we here identified minorspliceosomal RNA components in the ZygomycotaP blakesleeanus R oryzae M circinelloides and inthe Chytridiomycota B dendrobatidis S punctatus andA macrogynus Secondary structure predictions ofR oryzae RNAs are shown in Figure 4 and furtherstructures are shown in Supplementary Data 3 Theseresults provide strong evidence of a U12-type spliceosomein these phylogenetic groups

In the Basidiomycota phylum there was previouslyno evidence of a U12-type spliceosome However we

Figure 3 Schematic phylogenetic tree Phylogenetic groups and their relationships are shown together with example species (genus in italics) Specieswhere one or more U12-type spliceosomal RNAs were found are highlighted (red circles) as well as branches where the U12-type RNAs seem to havebeen lost (dotted lines) In the case of Basidiomycota only two different U12-type RNAs have been identified and for this reason there is only weakevidence of a minor spliceosome Numbers at branches indicate 1) number of genomes analyzed 2) number of query sequences used and 3) numberof new sequences identified

3006 Nucleic Acids Research 2008 Vol 36 No 9

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 3: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

using HMM models based on Rfam alignments againstthe set of reliable spliceosomal RNA sequences andagainst genomes where we did not previously find aspecific RNA Sequences with E-values lower than 10 wereretrieved and examined with cmsearch and processed asdescribed earlier All 17 136 sequences considered asreliable candidates identified in this study togetherwith the 356 sequences used as initial queries are inSupplementary Data 1

Multiple alignments of sequences were created usingClustalW 183 (17) or T-Coffee (18) Alignments obtainedwith cmalign of Infernal are shown in SupplementaryData 3 Secondary structure was predicted with MFOLD(19) as well as with Infernal

RESULTS AND DISCUSSION

Spliceosomal RNAs may be efficiently identified using acombination of high-sensitivity local alignment methodsprofile HMMs and covariance models

Genomic sequences from 149 eukaryotic organisms(Figure 1) were analyzed with respect to spliceosomalRNAs The Infernal software (14) to identify ncRNAsusing covariance models is effective to identify members ofa specific RNA family but is computationally demandingand not practical for the analysis of large genomesTherefore a first step to filter sequences is necessary Inour method we used NCBI BLAST (wordsize 7) andFASTA (wordsize 6) RNA sequences represented in theRfam database and annotated as spliceosomal RNAs (U1U2 U4 U5 U6 U11 U12 U4atac and U6atac) were firstassembled (a total of 356 sequences see SupplementaryData 1) and used as queries in BLAST and FASTAsearches against genomic sequences To maximize sensi-tivity we considered all hits irrespective of E-valueidentified in these searches for analysis with covariancemodels of spliceosomal RNAs collected from Rfam

The predictions that represented novel spliceosomalRNA sequences and that were considered reliable (seeunder Materials and Methods section for details) wereused in a second round of searches to search againstorganisms where we were missing the respective spliceo-somal RNA Novel hits were analyzed as described withcovariance models and this procedure was repeated untilno further reliable predictions could be obtained Forgenomes where we were not able to find a spliceosomalRNA orthologue using BLAST or FASTA we also madeuse of WU-BLAST (wordsize 2) [Gish W (1996ndash2004)httpblastwustledu] and HMMER searches (httphmmerjaneliaorg) using profile HMM models basedon Rfam alignments For comparison we also usedInfernal to analyze the following genomes with selectedcovariance models without any initial filtering ofsequences Trichinella spiralis (U11 U12 and U4atac)A castellanii (U11 U4atac) G sulphuraria (U1 U2 U4U5 and U6) Giardia lamblia (U1 U2 U4 U5 and U6)Physarum polycephalum (U11 U12 and U4atac)Naegleria gruberi (U1) Trichomonas vaginalis(U1) Batrachochytrium dendrobatidis (U1 U11 andU12) Antonospora locustae (U1) Encephalitozoon

cuniculi (U1) Phycomyces blakesleeanus (U12) andCyanidioschyzon merolae (U1 U2 U4 U5 and U6)We thus obtained 17 136 sequences predicted as

spliceosomal RNAs All these sequences are distributedamong 147 species as shown in Figure 1 and Supplemen-tary Data 1 and 2 It should be noted that many animalsand plants have numerous copies of each RNA gene and afraction of these are fragmented genes or pseudogenes Asit is very difficult to distinguish a true gene from apseudogene using computational methods a fraction ofour candidates in animals and plants are presumablypseudogenes In some phylogenetic groups such as fungiheterokonts and Apicomplexa each of the spliceosomalRNAs are represented by one or a few genes and in thiscase the predicted sequences are more likely to be bonafide spliceosomal RNA genesThe results using the different methods NCBI BLAST

FASTA WU-BLAST and HMMER are comparedin Figure 2 As expected the sensitivity of FASTAWU-BLAST and HMMER was much greater than that ofNCBI BLAST (W7) and HMMER is the most sensitivemethod of the four As there are speed disadvantages toWU-BLAST and HMMER we did not systematicallyexamine every possible genome with these methodsInstead we searched with these methods only genomeswhere we did not find a specific spliceosomal RNA withFASTA or BLAST We also analyzed with WU-BLASTand HMMER all the RNA sequences found usingFASTA and BLAST In the results shown in Figure 2therefore the efficiency of WU-BLAST and HMMER isprobably underestimated In general the results obtainedin our searches are consistent with previous results (20)where different software including programs used herewere tested against a set of previously known ncRNAsIn summary our results demonstrate that sensitive

sequence alignment methods including profile HMMsare important as a first filtering step to identify ncRNAcandidates In addition a combination of the differentmethods maximizes sensitivity There are two RNAsfrom G sulphuraria and A locustae (Figure 1) that couldonly be identified using an Infernal search against thecomplete genome This finding illustrates that we could belacking more orthologues not identified by HMMERWU-BLAST or FASTA However it is our impressionthat very few RNA genes are missed by the initial screenDuring the production of this article spliceosomal

RNA sequences from Plasmodium (21) Entamoebahistolytica (22) as well as Candida albicans and otherhemiascomycetous yeasts (11) were identified and char-acterized All these sequences are identical to thesequences identified in the present study providingsupport to the reliability of our prediction method

RNAs of the major spliceosome

The spliceosomal RNAs as identified here are summarizedin Figure 1 (actual sequences are in SupplementaryData 1)In 107 species we are able to report one or more ofspliceosomal RNAs where that particular RNA had notbeen reported before as highlighted in the Figure 1 withblue boxes As a guide to the phylogenetic relationships

Nucleic Acids Research 2008 Vol 36 No 9 3003

Figure 1 Phylogenetic distribution of the major and minor spliceosomal RNAs Results of computational prediction of spliceosomal RNAsGreen boxes show instances where a sequence was previously known and it was used as query in searches Blue boxes show RNAs predicted in this workA star () indicates that an RNAwas previously described in the literature (Supplementary Data 5) and lsquoRrsquo shows that the sequence was in Rfam (13) Alsoindicated are sequences known to have introns (lsquoirsquo) and U6atac RNAs of the CC variant type (lsquocrsquo see Results and Discussion section)

3004 Nucleic Acids Research 2008 Vol 36 No 9

between the species investigated here a schematic phylo-genetic tree is shown in Figure 3

We have analyzed RNAs of the U2-type spliceosomeas well as those of the minor U12-type spliceosome(Figures 1 and 3) As to the U2-type we have identifiedsuch RNAs in virtually every species examined The onlyexceptions are the red alga C merolae and the deeplybranching protist G lamblia where we could notidentify any spliceosomal RNAs Of spliceosomalRNAs in T vaginalis only the U2 RNA was identifiedT vaginalis possesses a gene encoding the essentialspliceosomal component PRP8 (23) as well as manyputative introns (24) G lamblia possesses three intronsto date (2526) and 27 spliceosomal proteins (27)In C merolae introns as well as conserved U2 and U5snRNP-protein specific subunits are known to be present(28) Therefore splicing is likely to occur in these organ-isms and it is puzzling that we fail to identify spliceosomalRNAs particularly in C merolae where the genomesequence is complete (29) This could mean that spliceo-somal RNAs are lacking and have been replaced byprotein functions But these organisms could also havespliceosomal RNAs very different from most otherspecies or these genes could be present in a part of thegenome for some reason not yet covered by the genomesequencing A U1 RNA was the only spliceosomal RNAthat we identified in G sulphuraria another red algaAdditional spliceosomal RNAs might be found once itsgenome is fully sequenced

RNA components of the major spliceosome areknown to be present in fungi and recently the evolutionof such RNAs in the hemiascomycetous yeasts wasexamined (11) Major spliceosomal RNAs have previously

been reported in the Basidiomycota Rhodotorula (3031)and Cryptococcus neoformans (Rfam) Here we show thatsuch RNAs are ubiquitous in the Basidiomycota lineage(Figure 1) More deeply branching in the fungi tree areZygomycota and Chytridiomycota (Figures 1 and 3)We show for the first time that spliceosomal RNAs arepresent in the Zygomycota P blakesleeanus R oryzaeand M circinelloides as well as in the ChytridiomycotaB dendrobatidis Spizellomyces punctatus and AllomycesmacrogynusThe microsporidia are believed to be positioned close to

the root of the fungal branch They have been reducedseverly in genome size as compared to other fungi InA locustae and E cuniculi U2 and U6 orthologues arereported in Rfam Here we have also identified the U4and U5 RNA orthologues in both of these Microsporidia(Figure 1) and U1 RNA in A locustae (SupplementaryData 3) We may therefore conclude that the majorspliceosomal RNAs are ubiquitous in all fungal groupsincluding MicrosporidiaThe nucleomorphs of Guillardia theta (a cryptomonad)

and Bigelowiella natans (a chlorarachniophyte) representthe smallest eukaryotic genomes known It is interesting tonote that also in these two genomes spliceosomal RNAgenes are identified (Figure 1) With respect to G thetathe results of our predictions (only a U6 RNA) are com-pletely consistent with available annotation (Douglaset al John Archibald and Paul Gilson personal commu-nication) whereas there are differences with respect topublished annotation for the B natans genome (32)

Minor-spliceosome-specific RNAs are identified in the wormTrichinella spiralis inPhysarum polycephalum and in thefungal lineages Zygomycota and Chytridiomycota

U12-type introns were previously identified in plants inmost of the metazoan taxa including vertebrates insectsand cnidarians (7) and more recently in R oryzae ofZygomycota in Acanthamoeba and in the heterokontPhytophthora (8) Minor spliceosomal RNAs have beenfound in metazoaAcanthamoeba plants and Phytophthora(for references see Supplementary Data 5) A small numberof organisms that have been well studied seem to lack theU12-type splicing such as S cerevisiae S pombe andC elegans (733)In this investigation we discovered many novel minor

spliceosomal RNA orthologues (Figure 1) More impor-tantly phylogenetic groups are represented where suchRNAs were not previously reported These are nematodes(T spiralis) mycetozoa (P polycephalum) and the fungallineages Basidiomycota Zygomycota and Chytridiomy-cota as discussed in more detail subsequently

Trichinella spiralis The major spliceosomal RNAs ofthe nematode C elegans have been characterized (34)Previous analyses have failed to identify minor spliceoso-mal components including U12-type introns in thisorganism (7) In this investigation we analyzed differentspecies of the Rhabditida branch and Brugia malayi ofthe Chromadorea branch In neither of these speciesU12-type RNAs were identified However we identified

Figure 2 Comparison of local alignment and profile HMM methods toidentify spliceosomal RNAs Venn diagram showing the number ofspliceosomal RNA genes found by NCBI BLAST (W7) WU-BLAST(W2) FASTA (W6) and HMMER The number of species where theRNAs are distributed is shown within parentheses

Nucleic Acids Research 2008 Vol 36 No 9 3005

U11 U12 and U6atac RNAs in another nematodeT spiralis (Figure 1) Predicted secondary structures ofthese RNAs are shown in Figure 4 We also foundevidence of U11U12 specific proteins in T spiralis(Supplementary Data 4) providing further support ofa minor spliceosome in this organism

Basidiomycota Zygomycota and Chytridiomycota Nominor spliceosomal components have been described infungi except for minor spliceosomal proteins and potentialU12-type introns in R oryzae (8) a species in the fungal

Zygomycota lineage However we here identified minorspliceosomal RNA components in the ZygomycotaP blakesleeanus R oryzae M circinelloides and inthe Chytridiomycota B dendrobatidis S punctatus andA macrogynus Secondary structure predictions ofR oryzae RNAs are shown in Figure 4 and furtherstructures are shown in Supplementary Data 3 Theseresults provide strong evidence of a U12-type spliceosomein these phylogenetic groups

In the Basidiomycota phylum there was previouslyno evidence of a U12-type spliceosome However we

Figure 3 Schematic phylogenetic tree Phylogenetic groups and their relationships are shown together with example species (genus in italics) Specieswhere one or more U12-type spliceosomal RNAs were found are highlighted (red circles) as well as branches where the U12-type RNAs seem to havebeen lost (dotted lines) In the case of Basidiomycota only two different U12-type RNAs have been identified and for this reason there is only weakevidence of a minor spliceosome Numbers at branches indicate 1) number of genomes analyzed 2) number of query sequences used and 3) numberof new sequences identified

3006 Nucleic Acids Research 2008 Vol 36 No 9

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 4: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

Figure 1 Phylogenetic distribution of the major and minor spliceosomal RNAs Results of computational prediction of spliceosomal RNAsGreen boxes show instances where a sequence was previously known and it was used as query in searches Blue boxes show RNAs predicted in this workA star () indicates that an RNAwas previously described in the literature (Supplementary Data 5) and lsquoRrsquo shows that the sequence was in Rfam (13) Alsoindicated are sequences known to have introns (lsquoirsquo) and U6atac RNAs of the CC variant type (lsquocrsquo see Results and Discussion section)

3004 Nucleic Acids Research 2008 Vol 36 No 9

between the species investigated here a schematic phylo-genetic tree is shown in Figure 3

We have analyzed RNAs of the U2-type spliceosomeas well as those of the minor U12-type spliceosome(Figures 1 and 3) As to the U2-type we have identifiedsuch RNAs in virtually every species examined The onlyexceptions are the red alga C merolae and the deeplybranching protist G lamblia where we could notidentify any spliceosomal RNAs Of spliceosomalRNAs in T vaginalis only the U2 RNA was identifiedT vaginalis possesses a gene encoding the essentialspliceosomal component PRP8 (23) as well as manyputative introns (24) G lamblia possesses three intronsto date (2526) and 27 spliceosomal proteins (27)In C merolae introns as well as conserved U2 and U5snRNP-protein specific subunits are known to be present(28) Therefore splicing is likely to occur in these organ-isms and it is puzzling that we fail to identify spliceosomalRNAs particularly in C merolae where the genomesequence is complete (29) This could mean that spliceo-somal RNAs are lacking and have been replaced byprotein functions But these organisms could also havespliceosomal RNAs very different from most otherspecies or these genes could be present in a part of thegenome for some reason not yet covered by the genomesequencing A U1 RNA was the only spliceosomal RNAthat we identified in G sulphuraria another red algaAdditional spliceosomal RNAs might be found once itsgenome is fully sequenced

RNA components of the major spliceosome areknown to be present in fungi and recently the evolutionof such RNAs in the hemiascomycetous yeasts wasexamined (11) Major spliceosomal RNAs have previously

been reported in the Basidiomycota Rhodotorula (3031)and Cryptococcus neoformans (Rfam) Here we show thatsuch RNAs are ubiquitous in the Basidiomycota lineage(Figure 1) More deeply branching in the fungi tree areZygomycota and Chytridiomycota (Figures 1 and 3)We show for the first time that spliceosomal RNAs arepresent in the Zygomycota P blakesleeanus R oryzaeand M circinelloides as well as in the ChytridiomycotaB dendrobatidis Spizellomyces punctatus and AllomycesmacrogynusThe microsporidia are believed to be positioned close to

the root of the fungal branch They have been reducedseverly in genome size as compared to other fungi InA locustae and E cuniculi U2 and U6 orthologues arereported in Rfam Here we have also identified the U4and U5 RNA orthologues in both of these Microsporidia(Figure 1) and U1 RNA in A locustae (SupplementaryData 3) We may therefore conclude that the majorspliceosomal RNAs are ubiquitous in all fungal groupsincluding MicrosporidiaThe nucleomorphs of Guillardia theta (a cryptomonad)

and Bigelowiella natans (a chlorarachniophyte) representthe smallest eukaryotic genomes known It is interesting tonote that also in these two genomes spliceosomal RNAgenes are identified (Figure 1) With respect to G thetathe results of our predictions (only a U6 RNA) are com-pletely consistent with available annotation (Douglaset al John Archibald and Paul Gilson personal commu-nication) whereas there are differences with respect topublished annotation for the B natans genome (32)

Minor-spliceosome-specific RNAs are identified in the wormTrichinella spiralis inPhysarum polycephalum and in thefungal lineages Zygomycota and Chytridiomycota

U12-type introns were previously identified in plants inmost of the metazoan taxa including vertebrates insectsand cnidarians (7) and more recently in R oryzae ofZygomycota in Acanthamoeba and in the heterokontPhytophthora (8) Minor spliceosomal RNAs have beenfound in metazoaAcanthamoeba plants and Phytophthora(for references see Supplementary Data 5) A small numberof organisms that have been well studied seem to lack theU12-type splicing such as S cerevisiae S pombe andC elegans (733)In this investigation we discovered many novel minor

spliceosomal RNA orthologues (Figure 1) More impor-tantly phylogenetic groups are represented where suchRNAs were not previously reported These are nematodes(T spiralis) mycetozoa (P polycephalum) and the fungallineages Basidiomycota Zygomycota and Chytridiomy-cota as discussed in more detail subsequently

Trichinella spiralis The major spliceosomal RNAs ofthe nematode C elegans have been characterized (34)Previous analyses have failed to identify minor spliceoso-mal components including U12-type introns in thisorganism (7) In this investigation we analyzed differentspecies of the Rhabditida branch and Brugia malayi ofthe Chromadorea branch In neither of these speciesU12-type RNAs were identified However we identified

Figure 2 Comparison of local alignment and profile HMM methods toidentify spliceosomal RNAs Venn diagram showing the number ofspliceosomal RNA genes found by NCBI BLAST (W7) WU-BLAST(W2) FASTA (W6) and HMMER The number of species where theRNAs are distributed is shown within parentheses

Nucleic Acids Research 2008 Vol 36 No 9 3005

U11 U12 and U6atac RNAs in another nematodeT spiralis (Figure 1) Predicted secondary structures ofthese RNAs are shown in Figure 4 We also foundevidence of U11U12 specific proteins in T spiralis(Supplementary Data 4) providing further support ofa minor spliceosome in this organism

Basidiomycota Zygomycota and Chytridiomycota Nominor spliceosomal components have been described infungi except for minor spliceosomal proteins and potentialU12-type introns in R oryzae (8) a species in the fungal

Zygomycota lineage However we here identified minorspliceosomal RNA components in the ZygomycotaP blakesleeanus R oryzae M circinelloides and inthe Chytridiomycota B dendrobatidis S punctatus andA macrogynus Secondary structure predictions ofR oryzae RNAs are shown in Figure 4 and furtherstructures are shown in Supplementary Data 3 Theseresults provide strong evidence of a U12-type spliceosomein these phylogenetic groups

In the Basidiomycota phylum there was previouslyno evidence of a U12-type spliceosome However we

Figure 3 Schematic phylogenetic tree Phylogenetic groups and their relationships are shown together with example species (genus in italics) Specieswhere one or more U12-type spliceosomal RNAs were found are highlighted (red circles) as well as branches where the U12-type RNAs seem to havebeen lost (dotted lines) In the case of Basidiomycota only two different U12-type RNAs have been identified and for this reason there is only weakevidence of a minor spliceosome Numbers at branches indicate 1) number of genomes analyzed 2) number of query sequences used and 3) numberof new sequences identified

3006 Nucleic Acids Research 2008 Vol 36 No 9

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 5: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

between the species investigated here a schematic phylo-genetic tree is shown in Figure 3

We have analyzed RNAs of the U2-type spliceosomeas well as those of the minor U12-type spliceosome(Figures 1 and 3) As to the U2-type we have identifiedsuch RNAs in virtually every species examined The onlyexceptions are the red alga C merolae and the deeplybranching protist G lamblia where we could notidentify any spliceosomal RNAs Of spliceosomalRNAs in T vaginalis only the U2 RNA was identifiedT vaginalis possesses a gene encoding the essentialspliceosomal component PRP8 (23) as well as manyputative introns (24) G lamblia possesses three intronsto date (2526) and 27 spliceosomal proteins (27)In C merolae introns as well as conserved U2 and U5snRNP-protein specific subunits are known to be present(28) Therefore splicing is likely to occur in these organ-isms and it is puzzling that we fail to identify spliceosomalRNAs particularly in C merolae where the genomesequence is complete (29) This could mean that spliceo-somal RNAs are lacking and have been replaced byprotein functions But these organisms could also havespliceosomal RNAs very different from most otherspecies or these genes could be present in a part of thegenome for some reason not yet covered by the genomesequencing A U1 RNA was the only spliceosomal RNAthat we identified in G sulphuraria another red algaAdditional spliceosomal RNAs might be found once itsgenome is fully sequenced

RNA components of the major spliceosome areknown to be present in fungi and recently the evolutionof such RNAs in the hemiascomycetous yeasts wasexamined (11) Major spliceosomal RNAs have previously

been reported in the Basidiomycota Rhodotorula (3031)and Cryptococcus neoformans (Rfam) Here we show thatsuch RNAs are ubiquitous in the Basidiomycota lineage(Figure 1) More deeply branching in the fungi tree areZygomycota and Chytridiomycota (Figures 1 and 3)We show for the first time that spliceosomal RNAs arepresent in the Zygomycota P blakesleeanus R oryzaeand M circinelloides as well as in the ChytridiomycotaB dendrobatidis Spizellomyces punctatus and AllomycesmacrogynusThe microsporidia are believed to be positioned close to

the root of the fungal branch They have been reducedseverly in genome size as compared to other fungi InA locustae and E cuniculi U2 and U6 orthologues arereported in Rfam Here we have also identified the U4and U5 RNA orthologues in both of these Microsporidia(Figure 1) and U1 RNA in A locustae (SupplementaryData 3) We may therefore conclude that the majorspliceosomal RNAs are ubiquitous in all fungal groupsincluding MicrosporidiaThe nucleomorphs of Guillardia theta (a cryptomonad)

and Bigelowiella natans (a chlorarachniophyte) representthe smallest eukaryotic genomes known It is interesting tonote that also in these two genomes spliceosomal RNAgenes are identified (Figure 1) With respect to G thetathe results of our predictions (only a U6 RNA) are com-pletely consistent with available annotation (Douglaset al John Archibald and Paul Gilson personal commu-nication) whereas there are differences with respect topublished annotation for the B natans genome (32)

Minor-spliceosome-specific RNAs are identified in the wormTrichinella spiralis inPhysarum polycephalum and in thefungal lineages Zygomycota and Chytridiomycota

U12-type introns were previously identified in plants inmost of the metazoan taxa including vertebrates insectsand cnidarians (7) and more recently in R oryzae ofZygomycota in Acanthamoeba and in the heterokontPhytophthora (8) Minor spliceosomal RNAs have beenfound in metazoaAcanthamoeba plants and Phytophthora(for references see Supplementary Data 5) A small numberof organisms that have been well studied seem to lack theU12-type splicing such as S cerevisiae S pombe andC elegans (733)In this investigation we discovered many novel minor

spliceosomal RNA orthologues (Figure 1) More impor-tantly phylogenetic groups are represented where suchRNAs were not previously reported These are nematodes(T spiralis) mycetozoa (P polycephalum) and the fungallineages Basidiomycota Zygomycota and Chytridiomy-cota as discussed in more detail subsequently

Trichinella spiralis The major spliceosomal RNAs ofthe nematode C elegans have been characterized (34)Previous analyses have failed to identify minor spliceoso-mal components including U12-type introns in thisorganism (7) In this investigation we analyzed differentspecies of the Rhabditida branch and Brugia malayi ofthe Chromadorea branch In neither of these speciesU12-type RNAs were identified However we identified

Figure 2 Comparison of local alignment and profile HMM methods toidentify spliceosomal RNAs Venn diagram showing the number ofspliceosomal RNA genes found by NCBI BLAST (W7) WU-BLAST(W2) FASTA (W6) and HMMER The number of species where theRNAs are distributed is shown within parentheses

Nucleic Acids Research 2008 Vol 36 No 9 3005

U11 U12 and U6atac RNAs in another nematodeT spiralis (Figure 1) Predicted secondary structures ofthese RNAs are shown in Figure 4 We also foundevidence of U11U12 specific proteins in T spiralis(Supplementary Data 4) providing further support ofa minor spliceosome in this organism

Basidiomycota Zygomycota and Chytridiomycota Nominor spliceosomal components have been described infungi except for minor spliceosomal proteins and potentialU12-type introns in R oryzae (8) a species in the fungal

Zygomycota lineage However we here identified minorspliceosomal RNA components in the ZygomycotaP blakesleeanus R oryzae M circinelloides and inthe Chytridiomycota B dendrobatidis S punctatus andA macrogynus Secondary structure predictions ofR oryzae RNAs are shown in Figure 4 and furtherstructures are shown in Supplementary Data 3 Theseresults provide strong evidence of a U12-type spliceosomein these phylogenetic groups

In the Basidiomycota phylum there was previouslyno evidence of a U12-type spliceosome However we

Figure 3 Schematic phylogenetic tree Phylogenetic groups and their relationships are shown together with example species (genus in italics) Specieswhere one or more U12-type spliceosomal RNAs were found are highlighted (red circles) as well as branches where the U12-type RNAs seem to havebeen lost (dotted lines) In the case of Basidiomycota only two different U12-type RNAs have been identified and for this reason there is only weakevidence of a minor spliceosome Numbers at branches indicate 1) number of genomes analyzed 2) number of query sequences used and 3) numberof new sequences identified

3006 Nucleic Acids Research 2008 Vol 36 No 9

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 6: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

U11 U12 and U6atac RNAs in another nematodeT spiralis (Figure 1) Predicted secondary structures ofthese RNAs are shown in Figure 4 We also foundevidence of U11U12 specific proteins in T spiralis(Supplementary Data 4) providing further support ofa minor spliceosome in this organism

Basidiomycota Zygomycota and Chytridiomycota Nominor spliceosomal components have been described infungi except for minor spliceosomal proteins and potentialU12-type introns in R oryzae (8) a species in the fungal

Zygomycota lineage However we here identified minorspliceosomal RNA components in the ZygomycotaP blakesleeanus R oryzae M circinelloides and inthe Chytridiomycota B dendrobatidis S punctatus andA macrogynus Secondary structure predictions ofR oryzae RNAs are shown in Figure 4 and furtherstructures are shown in Supplementary Data 3 Theseresults provide strong evidence of a U12-type spliceosomein these phylogenetic groups

In the Basidiomycota phylum there was previouslyno evidence of a U12-type spliceosome However we

Figure 3 Schematic phylogenetic tree Phylogenetic groups and their relationships are shown together with example species (genus in italics) Specieswhere one or more U12-type spliceosomal RNAs were found are highlighted (red circles) as well as branches where the U12-type RNAs seem to havebeen lost (dotted lines) In the case of Basidiomycota only two different U12-type RNAs have been identified and for this reason there is only weakevidence of a minor spliceosome Numbers at branches indicate 1) number of genomes analyzed 2) number of query sequences used and 3) numberof new sequences identified

3006 Nucleic Acids Research 2008 Vol 36 No 9

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 7: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

Figure 4 Structures of selected spliceosomal RNAs Highlighted regions are the U12 site pairing to the branch site (underlined) regions of U11 andU6atac proposed to pair to the 50 splice (underlined) U6atac-U12 interaction (shaded background) Sm-site (box with rounded corners) and K-turn(box) in U4atac RNA Organisms represented are T spiralis P polycephalum and R oryzae Structures of additional RNAs are in SupplementaryData 3

Nucleic Acids Research 2008 Vol 36 No 9 3007

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 8: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

here identified a U12 RNA in Phakopsora meibomiae anda U4atac RNA in Phakopsora pachyrhizi At the sametime there is so far no evidence of U12-type introns or ofU12-specific proteins Therefore it is possible that thespliceosomal RNAs that we observe are pseudogenes andremnants from a U12 machinery that was present in anancestral lineage

Acanthamoeba castellanii and Physarum polycephalum InA castellanii a U12 spliceosomal RNA as well as minorintrons have been reported recently (8) These observa-tions provided evidence of a minor spliceosome in thisorganism Consistent with these results we identified twoadditional U12-type RNAs U11 and U6atac in thisspecies (Figure 1)Acanthamoeba is believed to share a common ancestor

with the Mycetozoa where spliceosomal components orU12-type introns were not previously reported Howeverwe here identified U11 U12 and U6atac spliceosomalRNA genes in P polycephalum (Figure 1 and Supplemen-tary Data 1)

The minor spliceosome was lost at multiple instancesduring evolution

The fact that we identified U12-type RNAs in a range ofspecies representing very diverse phyla such as FungiAcanthamoebaMycetozoa Streptophyta and Hetero-konta support the notion that the minor spliceosomewas an early invention in eukaryotic evolution (8) In factwe cannot exclude that such a spliceosome was present inthe last common ancestor of the eukaryotes The detailedmap of the phylogenetic distribution of U12-type RNAsalso allows us to identify a number of occasions duringevolution where it seems that the minor spliceosome waslost (Figure 3 dashed lines)U12-type RNAs were found in T spiralis but not in the

other nematodes examined here T spiralis belongs to aclade that is probably deeply branching within nematodesand is distant to the Rhabditida and Chromadorea groupsof nematodes (35) A mode of evolution therefore seemslikely where the minor spliceosome was present at anearly stage in nematode evolution but was lost in manybranches (Figure 3)There are other examples where the U12-type RNAs are

missing in the fungimetazoan lineage Minor spliceosomalRNAs are present in a majority of metazoa includingTrichoplax the simplest known species of the metazoanbranch An exception is Acropora millepora a coral of thephylum Cnidaria In addition we failed to identify suchRNAs inMonosiga brevicollis a choanoflagellate and closerelative of Metazoa A minor spliceosome is probablypresent in the fungal phyla Zygomycota and Chytridio-mycota as discussed earlier In Ascomycota and Micro-sporidia on the other hand these components seem to belacking It is likely that a minor spliceosome was present atan early stage in the evolution of fungi but was lost inthe development of Ascomycota and Microsporidia It isnot clear why the minor spliceosome was lost in theAscomycota but in the case of Microsporidia it could be aconsequence of the strong pressure to reduce genome size

We identified minor spliceosome-type RNAs inP polycephalum and A castellanii but not in the evolu-tionary related Entamoeba or Dictyostelium (Figure 3)This would suggest that the minor spliceosome was lostin the development of Entamoeba as well as in theDictyostelium branch

The analysis of Streptophyta (plant) genomes revealedthe presence of minor spliceosomal RNAs whereas onlyU2-dependent spliceosomal RNAs were found in greenand red algae Finally in Heterokonta we found U12-typeRNAs in the Oomycetes Phytophthora and Hyalonosporabut not in any diatoms or brown algae

In summary our results point to a large number ofinstances where the minor spliceosome was lost duringevolution of fungimetazoa Mycetozoa Streptophyta andheterkonts We are not able to reach a conclusion as toother phyla such as Euglenozoa and Alveolata because wedo not know whether the common ancestor to theselineages had a minor spliceosomal machinery

All U4 and U4atac RNAs have a K-turn motif

K-turn motifs have previously been identified in a largenumber of RNA families (3637) including the U4 andU4atac RNAs (3839) We found such a motif in all novelU4 and U4atac RNAs reported here Examples areR oryzae (Figure 4) and Phakopsora U4atac RNA(Supplementary Data 3) with characteristic noncanonicalG-A and A-G pairs and a 3-nt loop This would suggestthat this motif is compulsory in U4 RNAs and thatprediction accuracy may be improved by updating thecovariance model in this respect

Identification of a large number of novelU6atac orthologues

The U6atac RNA was previously identified in vertebratesinsects and plants (for references see SupplementaryData 5) Here we identified orthologues in a majority ofmetazoan species in Zygomycota Physarum Acantha-moeba and in Oomycetes (Heterokonta) A multiple align-ment of U6atac sequences was constructed and selectedsequences from that alignment are shown in Figure 5

U6atac RNA pairs with U12 as well as with U4atacRNA (40) The novel sequences of U12 U6atac andU4atac that we have identified are consistent with thesebase-pairing interactions Thus the U6atac and U4atacsequences are all consistent with the formation of thestems 1 and 2 in the complex of these RNAs (Figure 4) Inaddition the U6atacU12 helices 1a and 1b are phyloge-netically supported

A sequence lsquoAAGGArsquo near the 50 end of U6atac hasbeen proposed to pair with a region at the 50 splice site(4041) This sequence is present in a majority of U6atacRNAs ie in vertebrates urochodates S purpuratusLottia gigantea insects and plants as well as in the fungiR oryzae and P blakesleeanus The RNA of the lycophyteSelaginella moellendorffii (spikemoss) has the sequencelsquoATGGArsquo However we observed a different motiflsquo[GU]CCGArsquo (in the following referred to as the CCvariant as opposed to the normal AG sequence)in Trichoplax cnidarians Reniera sp A castellanii

3008 Nucleic Acids Research 2008 Vol 36 No 9

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 9: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

P polycephalum Physcomitrella patens and Oomycetes(Figure 5) We think that the predictions of U6atac RNAin these species are highly reliable because of sequencesimilarity covariance model scores and ability to pair withU4atac and U12 Furthermore it would seem that the CCvariant does not represent a lsquoparaloguersquo of U6atac as in allspecies examined either the AG or the CC homolog ispresent It is intriguing that if the AG motif was theancestral version a change to the CC variant occurred morethan once during evolution Examples are in the land plantbranch and in the development of the lower metazoaTrichoplax andNematostella However the possibility thatthe CC variant represents the ancestral version of the genecannot be excluded Also in such a case there must havebeen multiple independent transitions fromCC to AG Thepresence of the variant CC sequence is difficult to explain ifthe sequence AAGGA is important in pairing to the 50

splice site (4041) U12-type introns of A castellanii andPhytophthora with a sequence able to pair with AAGGAare presented in Russell et al (8) but these introns are notable to pair well with U6atac RNAs with the CC motif

CONCLUSIONS

We have described a method to identify spliceosomalRNAs where in a first step candidates are identified usingsensitive similarity searches or by profile HMM searchesThese candidates are then more rigorously examined usingcovariance models to arrive at a final prediction ofspliceosomal RNA New spliceosomal RNA sequencesfound are used as queries in similar searches until nofurther genes are identified

The results of this procedure clearly illustrate thathighly sensitive local alignment searches and profileHMMs are important in the identification of spliceosomalRNAs These RNAs tend to be conserved in sequenceduring evolution as compared to many other RNAs andperhaps the protocol used here is particularly suited for

this category of ncRNAs Ideally a combination ofmethods should be used to maximize sensitivity At thesame time the covariance models are critical in order toevaluate the hits found in the initial searches We havehere relied to a large extent on the specificity of thesemodels to predict ncRNAs and regard all RNAs reportedhere as strong predictionsA large number of novel RNAs are identified in this

work Most noteworthy is the identification of RNAsbeing components of the minor U12 spliceosome inphylogenetic groups that previously were not known tohave these RNAs or any U12-type spliceosomal compo-nents or introns Examples are Trichinella a nematodewhich in contrast to other nematodes like C eleganscontains minor spliceosomal RNA genes We also haveshown that minor spliceosomal RNAs are present in thedeeply branching fungal branches Zygomycota andChytridiomycotaIn summary therefore these results confirm previous

studies of Russell et al (8) that demonstrate an earlyorigin of the minor spliceosome as U12-type spliceosomalRNAs are present in a variety of evolutionary distantphyla At the same time our results do not allow us toconclude that a minor spliceosome was present in theancestor of eukaryotes Our results also point to multipleinstances in evolution where the minor spliceosome seemto have been lost One example is in the development ofnematodes where the loss of U12-type RNAs occurredafter the divergence of Pseudocoelomata In the caseof fungi such RNAs were present in very early fungalevolution while they were lost in the development ofMicrosporidia and Ascomycota Furthermore minor spli-ceosomal RNAs were lost in the development of Dicty-ostelium of the Mycetozoa branch and in the developmentof the heterokont and plant branches From this itwould seem that U12-type splicing has a comparativelymarginal role and may be disposed of in many phyloge-netic groups

Figure 5 Alignment of U6atac spliceosomal RNA genes Secondary structure elements are shown in a bracket notation at the bottom of thealignment CC motifs in region supposed to pair to 50 splice site as well as regions involved in base-pairing are highlighted with color

Nucleic Acids Research 2008 Vol 36 No 9 3009

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9

Page 10: Computational screen for spliceosomal RNA genes aids in defining the phylogenetic distribution of major and minor spliceosomal components

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

MDL was supported by a grant from CONACYT TheNational Council for Science and Technology MexicoFunding to pay the Open Access publication charges forthis article was provided by Swedish Research School ofGenomics and Bioinformatics

Conflict of interest statement None declared

REFERENCES

1 NilsenTW (2003) The spliceosome the most complex macro-molecular machine in the cell Bioessays 25 1147ndash1149

2 KissT (2004) Biogenesis of small nuclear RNPs J Cell Sci 1175949ndash5951

3 WillCL SchneiderC MacMillanAM KatopodisNFNeubauerG WilmM LuhrmannR and QueryCC (2001) Anovel U2 and U11U12 snRNP protein that associates with the pre-mRNA branch site EMBO J 20 4536ndash4546

4 MadhaniHD and GuthrieC (1992) A novel base-pairing inter-action between U2 and U6 snRNAs suggests a mechanism for thecatalytic activation of the spliceosome Cell 71 803ndash817

5 WassarmanKM and SteitzJA (1992) The low-abundance U11and U12 small nuclear ribonucleoproteins (snRNPs) interact toform a two-snRNP complex Mol Cell Biol 12 1276ndash1285

6 FrilanderMJ and SteitzJA (1999) Initial recognition ofU12-dependent introns requires both U115rsquo splice-site andU12branchpoint interactions Genes Dev 13 851ndash863

7 BurgeCB PadgettRA and SharpPA (1998) Evolutionary fatesand origins of U12-type introns Mol Cell 2 773ndash785

8 RussellAG CharetteJM SpencerDF and GrayMW (2006)An early evolutionary origin for the minor spliceosome Nature443 863ndash866

9 MewesHW AlbermannK BahrM FrishmanD GleissnerAHaniJ HeumannK KleineK MaierlA OliverSG et al(1997) Overview of the yeast genome Nature 387 7ndash65

10 GuthrieC and PattersonB (1988) Spliceosomal snRNAs AnnuRev Genet 22 387ndash419

11 MitrovichQM and GuthrieC (2007) Evolution of small nuclearRNAs in S cerevisiae C albicans and other hemiascomycetousyeasts RNA 13 2066ndash2080

12 PalfiZ SchimanskiB GunzlA LuckeS and BindereifA (2005)U1 small nuclear RNP from Trypanosoma brucei a minimal U1snRNA with unusual protein components Nucleic Acids Res 332493ndash2503

13 Griffiths-JonesS (2007) Annotating noncoding RNA genes AnnuRev Genom Hum Genet 8 279ndash298

14 EddySR (2002) A memory-efficient dynamic programming algo-rithm for optimal alignment of a sequence to an RNA secondarystructure BMC Bioinformatics 3 18

15 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

16 PearsonWR and LipmanDJ (1988) Improved tools for biologicalsequence comparison Proc Natl Acad Sci USA 85 2444ndash2448

17 ThompsonJD HigginsDG and GibsonTJ (1994) CLUSTALW improving the sensitivity of progressive multiple sequencealignment through sequence weighting position-specific gap penal-ties and weight matrix choice Nucleic Acids Res 22 4673ndash4680

18 NotredameC HigginsDG and HeringaJ (2000) T-Coffee anovel method for fast and accurate multiple sequence alignmentJ Mol Biol 302 205ndash217

19 ZukerM (1989) On finding all suboptimal foldings of an RNAmolecule Science 244 48ndash52

20 FreyhultEK BollbackJP and GardnerPP (2007) Exploringgenomic dark matter a critical assessment of the performance ofhomology search methods on noncoding RNA Genome Res 17117ndash125

21 ChakrabartiK PearsonM GrateL Sterne-WeilerT DeansJDonohueJP and AresMJr (2007) Structural RNAs of knownand unknown function identified in malaria parasites by compara-tive genomics and RNA analysis RNA 13 1923ndash1939

22 DavisCA BrownMP and SinghU (2007) Functional charac-terization of spliceosomal introns and identification of U2 U4 andU5 snRNAs in the deep-branching eukaryote Entamoeba histoly-tica Eukaryot Cell 6 940ndash948

23 FastNM and DoolittleWF (1999) Trichomonas vaginalis pos-sesses a gene encoding the essential spliceosomal component PRP8Mol Biochem Parasitol 99 275ndash278

24 VanacovaS YanW CarltonJM and JohnsonPJ (2005)Spliceosomal introns in the deep-branching eukaryote Trichomonasvaginalis Proc Natl Acad Sci USA 102 4430ndash4435

25 RussellAG ShuttTE WatkinsRF and GrayMW (2005) Anancient spliceosomal intron in the ribosomal protein L7a gene(Rpl7a) of Giardia lamblia BMC Evol Biol 5 45

26 NixonJE WangA MorrisonHG McArthurAG SoginMLLoftusBJ and SamuelsonJ (2002) A spliceosomal intron inGiardia lamblia Proc Natl Acad Sci USA 99 3701ndash3705

27 CollinsL and PennyD (2005) Complex spliceosomalorganization ancestral to extant eukaryotes Mol Biol Evol 221053ndash1066

28 MisumiO MatsuzakiM NozakiH MiyagishimaSY MoriTNishidaK YagisawaF YoshidaY KuroiwaH and KuroiwaT(2005) Cyanidioschyzon merolae genome A tool for facilitatingcomparable studies on organelle biogenesis in photosyntheticeukaryotes Plant Physiol 137 567ndash585

29 NozakiH TakanoH MisumiO TerasawaK MatsuzakiMMaruyamaS NishidaK YagisawaF YoshidaY FujiwaraTet al (2007) A 100-complete sequence reveals unusually simplegenomic features in the hot-spring red alga Cyanidioschyzonmerolae BMC Biol 5 28

30 TakahashiY TaniT and OhshimaY (1996) Spliceosomalintrons in conserved sequences of U1 and U5 small nuclearRNA genes in yeast Rhodotorula hasegawae J Biochem 120677ndash683

31 TakahashiY UrushiyamaS TaniT and OhshimaY (1993)An mRNA-type intron is present in the Rhodotorula hasegawae U2small nuclear RNA gene Mol Cell Biol 13 5613ndash5619

32 GilsonPR SuV SlamovitsCH ReithME KeelingPJ andMcFaddenGI (2006) Complete nucleotide sequence of thechlorarachniophyte nucleomorph naturersquos smallest nucleus ProcNatl Acad Sci USA 103 9566ndash9571

33 PatelAA and SteitzJA (2003) Splicing double insights from thesecond spliceosome Nat Rev Mol Cell Biol 4 960ndash970

34 ThomasJ LeaK Zucker-AprisonE and BlumenthalT (1990)The spliceosomal snRNAs of Caenorhabditis elegans Nucleic AcidsRes 18 2633ndash2642

35 BlaxterML De LeyP GareyJR LiuLX ScheldemanPVierstraeteA VanfleterenJR MackeyLY DorrisMFrisseLM et al (1998) A molecular evolutionary framework forthe phylum Nematoda Nature 392 71ndash75

36 KleinDJ SchmeingTM MoorePB and SteitzTA (2001)The kink-turn a new RNA secondary structure motif EMBO J20 4214ndash4221

37 RozhdestvenskyTS TangTH TchirkovaIV BrosiusJBachellerieJP and HuttenhoferA (2003) Binding of L7Ae proteinto the K-turn of archaeal snoRNAs a shared RNA binding motiffor CD and HACA box snoRNAs in Archaea Nucleic Acids Res31 869ndash877

38 VidovicI NottrottS HartmuthK LuhrmannR and FicnerR(2000) Crystal structure of the spliceosomal 155kD protein boundto a U4 snRNA fragment Mol Cell 6 1331ndash1342

39 SchultzA NottrottS HartmuthK and LuhrmannR (2006)RNA structural requirements for the association of the spliceosomalhPrp31 protein with the U4 and U4atac small nuclear ribonucleo-proteins J Biol Chem 281 28278ndash28286

40 TarnWY and SteitzJA (1996) Highly diverged U4 and U6 smallnuclear RNAs required for splicing rare AT-AC introns Science273 1824ndash1832

41 IncorvaiaR and PadgettRA (1998) Base pairing with U6atacsnRNA is required for 5rsquo splice site activation of U12-dependentintrons in vivo RNA 4 709ndash718

3010 Nucleic Acids Research 2008 Vol 36 No 9