A STEM/STEM-LOOP DECOMPOSITION BASED HEURISTIC RNA secondary structures are decomposed into stems and stem-loops, that are then compared using an exact algorithm for the conservative edit distance [Guignon et al., 2005], a variant of the general distance of [Jiang et al., 2002]. Then, these hairpins pairwise comparison are used in a Smith-Waterman based heuristic to produce a distance and alignment between the two complete structures. SEARCH FOR STRUCTURAL HOMOLOGS For example, structured RNA gene detected by a high- throughput computational analysis. Structures database ALGORITHMIC MODEL EDIT DISTANCE The edit distance algorithm is the following problem: - given ● two RNA secondary structures, ● a set of allowed edit operations and ● a cost for each possible operation, - compute ● an alignment of minimum cost between the two structures. RNAStrAT uses the edit distance model defined in [Jiang et al. 2002] that comports: ● single base edit operations (substitution/insertion/deletion), ● base pairs operations such as insertion/deletion, creation/opening or alteration of a hydrogene bond. This model was introduced by [Jiang et al., 2002]. Computing the distance is an NP-hard problem, but several less general version of this problem can be solved exactly and are used in widely in RNA secondary strcutures comparison tools such as RNAForrester [Höchsmann et al., 2004]. ALIGNMENT OF SECONDARY STRUCTURES SUMMARY RNAStrAT is a web server dedicated to the comparison of sets of RNA secondary structures. This server offers tools to align pairs of RNA secondary structures and to search for structural homologs in a database of RNA secondary structures (based on the RFAM). The alignment and search are based on an edit distance algorithm that considers a wide range of edit operations defined in [Jiang et al., 2002]. Tools for the vizualisation of secondary structures and structures alignments are also available. Up to date RNA StrAT is the only server offering all these features (general RNA edit model, RFAM database search, rendering) together. Availability: http://www-lbit.iro.umontreal.ca/rnastrat/ Contact: [email protected] ADDITIONAL FEATURES AND DEVELOPMENT The database of RNA secondary structures Users can access to structure information including links to its RNA family (in the Rfam classification), its organism taxonomy (EMBL), its sequence (EMBL). Structures stored in our database are extracted from the Rfam seed alignments and for each RNA gene, its specific secondary structure is obtained from both its sequence and the family consensus structure. Database search improvements In order to speed-up the database search the search engines first analyze the structural characteristics of the query structure to select a group of candidates in the database that share similar characteristics close to the query ones, eliminating at the same time irrelevant structures. Then, the query structure is compared to these candidates to find which ones have the best similarity scores. The user can modify the parameters that define the candidates. Structures and alignment rendering High structural similarity score Query structure Valentin Guignon 1 , Cedric Chauve 2 and Sylvie Hamel 1 1. DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, QC, H3C 3J7, Canada. 2. Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada. CGL and LaCIM, Université du Québec à Montréal, Montréal, QC, Canada. RNA StrAT: RNA Secondary Structure Analysis Toolkit Figure 1. Structural alignment between two 5S rRNA. Some edit operations are displayed on the structures using arrows. Delftia acidovorans Escherichia coli 5’ 3' 50 100 10 120 U G C C U G G C G G C C G U A G C G C G G U G G U C C C A C C U G A C C C C A U G C C G A A C U C A G A A G U G A A A C G C C G U A G C G C C G A U G G U A G U G U G G G G U C U C C C C A U G C G A G A G U A G G G A A C U G C C A G G C A U 5' 3' U G C C U G A U G A C C A U A G C A A G U U G G U A C C A C U C C U U C C C A U C C C G A A C A G G A C A G U G A A A C G A C U U U G C G C C G A U G A U A G U G C G G G U U C C C G U G U G A A A G U A G G U C A U C G U C A G G C N N 10 50 100 110 >Escherichia coli, V00336 UGCCUGGCGGCCGUAGCGCGGUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGAACUGCCAGGCAU ((((((((((.....((((((((....(((((((.............))))..)))...)))))).)).(((((((..((((((((...))))))))..)))))))...)))))))))). >Delftia acidovorans UGCCUGAUGACCAUAGCAAGUUGGUACCACUCCUUCCCAUCCCGAACAGGACAGUGAAACGACUUUGCGCCGAUGAUAGUGCGGG-U-U-CCCGUGUGAAAGUAGGUCAUCGUCAGGCNN .(((((((((.....((((((((....(((((((.............))))..)))...)))))).)).((.((....((((((.-.-.-.))))))....)).))...))))))))).. pair match base substitution base match completion/altering pair substitution pair half-match pair deletion/insertion pair opening/creation base deletion/insertion Figure 2. Given a query structure and a database of structures, structural homologs can be found using structural similarity scores. C A G G G C C G G G G C C G G G C C A G G G C G base substitution base deletion base insertion C G A G G U G C C G A G G C G C C A A G G U G C C A G G G C C U G G A G C G C U paired base substitution pair substitution pair deletion pair insertion C G U G G C G C G C G U G G C C A G G U G C G C C G C U G G C G pair creation pair opening pair completion pair altering Figure 3. Edit operations. A U A G G G C G G A G G G AAG C U C A U C A G U G G G G C C A C G A G C U G A G U G C G U C C U GU C ACUC C A C U C CC A U G U CCCU U G G G A A G G U C U G A G A C U A G G G C C A G A G G C G G C C C U A A C A G G G C U C U C C C U G A G C UU C G G G G A G G U G A G U U C C C A G A G A ACG G G G C U C C G C G C G A G G U C A G AC U G G G C A G G A G A U G C C G U G G A C C C C GC C C UU C G G G G A G G G G C C C G G C G G A U G C C U C C U U U G C C G G A G C U U G G A A C A G A C U C A C GGCC A G CG A A G U G A G U U C A A U G G C U G A GG U G A G G U AC C C C GC A G G G G A C C U C A U AA C C C A A U U C A G A C C A C U C U C C U C C G C C C A U U 5´ 3´ U P1 P2 P3a P3b P4 P7 P8 P9 P10/11 P12 P19 G G G G C C A C G A G C U G A G U G C G U C C U G U C A C U C C A C U C C C A U G U C C C U G G C C C U A A C A G G G C U C U C C C U G A G C U U C G G G G A G G U C A G AC U G G G C G G A G A U G C C G U G G A C C C C G C C C U U C G G G G A G G G G C C C G G C G G A U G C C U C C G U G A G U U C C C A G A G A A CG G G G C U C C G C G C G A U U G C C G G A G C U U G G A A C A G A C U C A C G G C C G G C C C U C A U G A G A U A G G G C G G A G G G U C C U C C G C C C A U G U G A G G U A C C C C GC A G G G G A C C U C A U decomposition Figure 4. Stem/stem-loop decomposition of an Rnase P structure. Figure 5. Edit distance computation between 2 Rnase P structures. Structure rendering (see figure 7) Figure 6. Database browsing features. Figure 7. Structure online rendering (base on Vienna RNA Package): the rendering form (on the left) enables the user to display structures in various ways (on the right) and browse the structure (middle). Figure 8. The rendering engine enables to compare two aligned structures (on the left) and see which bases changed from a structure to the other. The characteristics of a structure compared to a set of structures can also be rendered (on the right). Each base of the query structure is displayed with a pie chart that shows how often the base has been kept, replaced by an other one or deleted. REFERENCES Griffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S.R. RFam: an RNA family database, Nucleic Acids Research, 2003, 31, 1, 439-441. Jiang T., Lin G., Ma B., Zhang K. A general edit distance between RNA structures. J. Comput. Biol., 9(2):371–388. 2002. Guignon V., Chauve C., Hamel S. Distance d'édition entre tige-boucles, JOBIM 2005, 2005, poster 82. Smith T.F., Waterman M.S. Identification of Common Molecular Subsequences, Journal of Molecular Biology 147: 195–197. doi:10.1016/0022-2836(81)90087-5, 1981. Höchsmann M., Voss B., Giegerich R. Pure Multiple RNA Secondary Structure Alignments: A Progressive Profile Approach in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 2004, pp53-62. Vienna RNA Package, http://www.tbi.univie.ac.at/~ivo/RNA RNase P SM-A12(14) RNase P SM-A18(31) Stem/stem-loop edit distance Global edit distance RNase P SM-A12(14) RNase P SM-A18(31) RNase P SM-A18(31) RNase P SM-A12(14) g a g g a a a g u c c g g g c U C C U U C G G A C A G G G C G C C A G G U A A C G C C U G G G G G G C G U G A G C C C A C G G A A A G U G C C A C A G A A A A U A U A C C G C C A G C U UC G G C U G G UA A G G G U G A A A U G G U G C G GU A A G A G C G C A C C G C G C G A C U G G C A A C G G C U U G C G G C A C G G U A A A C C C C G C C C GGAGCAA G A C C A A AU A G G G G A G C A UG U C C G U C GU G U C C G A A C G G G C U C C C G G G U A G G U U G C U U G A G G U G G C C G G U G A C G G C U A U C C C A G A U G A A U G G U U G U CG A UG a c a g a a c c c g g c u u a c 1 20 40 60 80 100 120 14 0 160 180 200 220 240 260 28 0 g a g g a a a g u c c g g g c U C C A U G G A A G C G C G G U G C C G G A U A A C G U C C G G C G G G G G C GA C C U C A G G G A A A G U G C C A C A G A A A G C A A A C C G C C C U C G A G G C C G AA A G G C U U C G C G G A G G G UA A G GG U G A A A G G G U G C G GU A A G A G C G C A C C G C G U C U U U G G C A A C A A A G G C G G C A A G G C A A A C C C C A C C G G G A C C A A AU A G G G G C U G C A CG G A C G A G AG A U C G U C C A G G U C U G U U U C C A G A C C C G C G G C C C G G G U U G G U U G C A A G A G G C G U C U C G C A A G A G G C G U C C C A G A U G A A U G G C C A U CAC C U C G C A G C A A U G C G A G GA a c a g a a c c c g g c u u a 1 20 40 60 80 10 0 120 140 160 180 200 22 0 240 260 28 0 30 0 320 340 AA C GAG 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9