RNA Secondary RNA Secondary Structure PredictionStructure Prediction
16s rRNA
RNA Secondary Structure
Hairpin loop
Junction (Multiloop)Bulge
Single-Stranded Interior Loop
Stem
Image– Wuchty
Pseudoknot
Dangling end
RNA secondary structure
G A
A A G G
A-U U-G C-G A-U G-C
Loop
Stem
wobble pair
canonical pair
Legitimate structurePseudoknots
RNA secondary structure representation
Non-canonical interactions of RNA Non-canonical interactions of RNA secondary-structure elementssecondary-structure elements
Pseudoknot
Kissing hairpins
Hairpin-bulge contact
These patterns are excluded from the prediction schemes as their computation is too intensive.
“Rules for 2D RNA prediction”• Base Pairs in stems: GOOD• Additional possible assumptions: e.g.,
G:C better than A:T• Bulges, Loops: BAD• Canonical Interactions (base pairs, stems,
bulges, loops): OK• Non canonical interactions (pseudoknots,
kissing hairpins): Forbidden• The more interactions: The better
Predicting RNA secondary Structure
• Allowed base pairing rules (Watson-Crick A:U, G:C, and Wobble pair G:U)
• Sequences may form different structures• An free energy value is associated with each
possible structure• Predict the structure with the minimal free
energy (MFE)
Simplifying Assumptions for Structure Prediction
• RNA folds into one minimum free-energy structure.
• There are no non-canonical interactions.• The energy of a particular base pair in a double
stranded regions is sequence independent– Neighbors have no influence.
Was solved by dynamic programmingZucker and Steigler 1981
Sequence-dependent free-energy (the nearest neighbor model)
U U
C G G C A UG CA UCGAC 3’
U U
C G U A A UG CA UCGAC 3’
Example values:GC GC GC GCAU GC CG UA -2.3 -2.9 -3.4 -2.1
Free energy computationFree energy computation
U UA A G C G C A G C U A A U C G A U A 3’A5’
-0.3
-0.3
-1.1 mismatch of hairpin-2.9 stacking
+3.3 (1 nt bulge) -2.9 stacking
-1.8 stacking
5’ dangling
-0.9 stacking-1.8 stacking
-2.1 stacking
G= -4.6 KCAL/MOL
+5.9 (4 nt loop)
Prediction Programs
• Mfoldhttp://www.bioinfo.rpi.edu/applications/mfold/old/rna/form1.cgi
• Vienna RNA Secondary Structure Predictionhttp://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi
Mfold - Suboptimal FoldingMfold - Suboptimal Folding• For any sequence of N nucleotides, the expected
number of structures is greater than 1.8N
• A sequence of 100 nucleotides has ~31025 possible folds. If a computer can calculate 1000 folds/second, it would take 1015 years (age of universe = ~1010 years)!
• Mfold generates suboptimal folds whose free energy fall within a certain range of values. Many of these structures are different in trivial ways. These suboptimal folds can still be useful for designing experiments.
Example:
Output: