An algorithm for detecting TPP riboswitches in archaea Department of Biology, Department of Math and Computer Science Denison University, Granville, OH 43023 Motif Sequence Level of conservation Location in sequence GGGG High Towards 5’ end of the riboswitch UGAGA Perfect conservation in all riboswitches No more than 30 bases from GGGG CCCU Fair, some point mutations observed Usually same distance away from TGAGA as GGGG is from UGAGA AACCUGA Low, most sequences had point mutations for this motif Usually at the center of the riboswitch AGGGA Fair, some point mutations observed Towards 3’end of the riboswitch Riboswitch mediated gene regulation by a) prevention of translation initiation b) prevention of proper splicing and c) premature transcription termination. Characteristic secondary structure of the TPP riboswitch in the presence and absence of TPP Secondary structures of the riboswitches predicted in K. cryptofilum (a, b) and C.maquilingensis (c). Source genome of hit fRNAdb match? Nearby proteins E. coli Yes 3’ side: thiamin biosynthesis protein thiC E. coli Yes 3’ side: thiaminebinding periplasmic protein precursor E. coli Yes 3’ side: hydroxyethylthiazole kinase A.Thaliana Yes Within an open reading frame. Gene regulaAon via splicing T. volcanium Yes 3’ side: Major facilitator superfamily permease T.volcanium Yes 3’ side: Major facilitator superfamily permease T.acidophilum Yes 3’ side: Major facilitator superfamily permease T.acidophilum Yes 3’ side: caAonic amino acid transporter related protein K. cryptofilum None HypotheAcal protein Kcr_0861 Sequence is part of the coding region although it does not code for any conserved protein domain. Sequence is located near 3’end of the coding region K. cryptofilum None 3’ side: permease for cytosine uracil thiamine allantoin C. maquilingensis None 3’ side: Nucleoside diphosphate kinase References • Miranda-Rios J, Navarro M and Soberon M. 2001. A conserved RNA structure (THI box) is involved in regulation of thiamin biosynthetic gene expression in bacteria. Proc. Natl. Acad. Sci. USA. 98: 9736 – 9741. • Nudler E and Mironov AS. 2004. The riboswitch control of bacterial metabolism. Trends in Biochemical Sciences. 29(1): 11 – 17. • Serganov A, Polonskaia A, Phan AT, Breaker RR and Patel DJ. 2006. Structural basis for gene regulation by a thiamin pyrophosphate-sensing riboswitch. Nature. 441: 1167 – 1171. • Winkler WC and Breaker RR. 2003. Genetic control by metabolite binding riboswitches. Chembiochem. 4:1024 – 23. Introduction Riboswitches are short sequences of non-coding RNA (100-200nt in length) that are located in the UTRs of genes. Riboswitches consist of highly specialized aptamer regions which recognize and bind to specific metabolites (Winkler and Breaker, 2003). The TPP riboswitch binds to thiamin pyrophosphate. Upon binding to a metabolite, the riboswitch changes its structural conformation, which results in regulation of gene expression (Nudler & Mironov, 2004). The TPP Riboswitch Has the characteristic structure displayed below (Miranda-Rios, 2001; Serganov et al, 2006). Has a motif sequence UGAGA conserved 97% of the time. Has been detected in the genomes of all three domains of life; but only in two archaea species of the order Thermoplasmatales (Miranda-Rios et al, 2001). Step 1: Identify motifs in TPP riboswitches . • Obtained sequences of 355 TPP riboswitches from fRNA Database. • Performed multiple sequence alignment using ClustalX2. • Identified six highly conserved motif sequences. Step 2: Fragment whole genome of target for scanning. • Fragments are 700nt with a 200nt overlap. • Each snippet will be scanned for the motif sequences. Step 3: Modified Smith-Waterman algorithm . • Find all alignments in each fragment to motifs with score above threshold. Step 4: Infer the best sequence of motifs. • Dynamic programming algorithm determines the sequence of individual motifs in each fragment that results in the best total score. Step 5: Predict secondary structure and function . • Folded using the RNAfold server and then compared with the characteristic structure. • Putative riboswitches have a strong resemblance to the characteristic structure. • Nearby genes were determined using NCBI BLAST and the UCSC Genome Browser. Stem Loop Junction Methods Possible new methods of gene regulation in K. cryptofilum : • One of the two predictions in the K. cryptofilum genome was found to be located within an ORF. • No information is available about whether the ORF is actually a gene. • No information is available about possible introns in the coding region. • Novel method of gene regulation may be employed here, such as ribosome shunting. Generalizing the algorithm : • Success of the devised algorithm suggests that it is possible to apply it to other kinds of riboswitches. • Possible to prime the algorithm with motif sequences and characteristic structures of other riboswitches in a similar method. • Further improvements to be made to the algorithm include incorporating a more efficient means of comparing secondary structure than the one employed here as well as automating the detection of motif sequences in known riboswitch sequences. Results Putative TPP riboswitches predicted by the algorithm Testing New Discussion TPP riboswitch conformation with and without TPP Testing on known riboswitches: • Tested on genomes known to possess at least one TPP riboswitch. • Detected all known TPP riboswitches in each genome. • Secondary structures of predicted riboswitches were similar to the characteristic structure. Scanning other archaea : • Executed algorithm on the genomes of 12 archaea species other than those of the order Thermoplasmatales. • Three putative riboswitches detected from genomes of Caldivirga maquilingensis and Korarchaeum cryptofilum. Chinmoy I.S. Bhatiya Jessen T. Havill Jeffrey S. Thompson [email protected] [email protected] [email protected]