RNA Structure SS 2009 Robert Giegerich Motivation Lost in Folding Space Abstraction comes to rescue The idea of abstract shapes The general idea Defining shape abstractions Properties of the shape space Simple shape analysis The tool RNAshapes Complete probabilistic shape analysis Shape Probabilitites RNA Structure Prediction and Comparison Session 4 Abstract Shape Analysis Robert Giegerich Faculty of Technology Bielefeld University [email protected]Bielefeld, SS 2009 Robert Giegerich RNA Structure SS 2009
42
Embed
RNA Structure Prediction and Comparison Session 4 … structure is often wrong ... package Properties of Shapes and shreps ... Robert Giegerich RNA Structure SS 2009. RNA Structure
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
“Is this sequence an RNA gene?” ↔“Does it have a known functional structure?”
When sequence conservation is low or no homologs are known:STEP 1: MFE folding (Mfold, RNAfold, pknotsRG)STEP 2: Structure comparison against known functionalstructures
It is not that easy ...
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
RNA Gene Prediction via Structure . . .
“Is this sequence an RNA gene?” ↔“Does it have a known functional structure?”
When sequence conservation is low or no homologs are known:STEP 1: MFE folding (Mfold, RNAfold, pknotsRG)STEP 2: Structure comparison against known functionalstructures
Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR.: Evaluation of the
suitability of free-energy minimization using nearest-neighbor energy
parameters for RNA secondary structure prediction. BMC Bioinformatics.
2004 Aug 5;5:105.
Compares MFE foldings to structures derived by comparativeanalysis and proven by experimental techniques.Findings:
base pair accuracy of about 20% - 71%
no improvement from recently updated thermodynamicparameters
note: did not check for good near-optimal solutions
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Lost in Folding Space (1)
The folding space of a given sequence is LARGE:
number of foldings is exponential in sequence length
number of near-optimal foldings is exponential in energywindow
Look at the 111 “best” structures for a tRNA (using the toolRNAmovies).
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Lost in Folding Space (2)
What we observe from RNAmovie:
LARGE number of close-to-optimal foldings
FEW structural classes holding many similar foldings
Can we reduce the folding space to the representatives of theseclasses?
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
RNA structure prediction based on thermodynamics
Even with the best possible model parameters:
MFE structure is often wrong
Some near-optimal structure is always right
The number of near-optimals is exponential
Most are similar, but some quite distinct
C
U
GC
A
G
UA
G
G
U U GG
UC C
G
CG
C
G
U C
UG
CUG
CGG
U
GC
C G
G
A
AU
C
G
U
C
G
G
U
U
G
G
Multiple Loop
Stacking Region
Hairpin Loop
Internal Loop
Bulge Loop (left)
Bulge Loop (right)
C
C A
C
UGGC
GCC
G
CG
G
GC
C
G
A
CG
UC
G A
CU
A G
G CC
G
C
U
C
GGA
A
A
C
G
G
G
G
U
A
C
C
G
C
G
UU
C
CC
A
C
U
A
G
G
C
G
C
C
GG
Is there a shape LIKE this .............. or NOT like this.....?
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Formalizing the notion of (abstract) shape
Shape abstraction retains nesting and adjacency of stems
Shape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Formalizing the notion of (abstract) shape
Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )
Shape abstraction may retain or disregard presence and type ofbulges and internal loops
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Formalizing the notion of (abstract) shape
Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Levels of abstraction
Level 0 Level 1
All types ofFull structure
loops
Level 3
All helix
Level 4
Multi− and
internal loops,
no bulges
Level 5
Stem
arrangement
only
interruptions
Robert Giegerich RNA Structure SS 2009
RNAStructure SS
2009
RobertGiegerich
Motivation
Lost in FoldingSpace
Abstractioncomes to rescue
The idea ofabstractshapes
The general idea
Defining shapeabstractions
Properties of theshape space
Simple shapeanalysis
The toolRNAshapes
Completeprobabilisticshape analysis
ShapeProbabilitites
The RNAshapespackage
Shape abstraction mathematics
General:
tree-like domains of structures F and shapes Ptree homomorphism π : F → P
For each sequence s:
folding space of sequence s: F (s)
shape space of sequence s: P(s) = π(F (s))
shape class of p in F (s):f (x , p) = {x |x ∈ F (S), π(x) = p}
shape representative structure:shrep = class member of minimal free energy, formally
RNAshapes: an integrated RNA analysis package based onabstract shapes. Steffen P, Voss B, Rehmsmeier M, ReederJ, Giegerich R. Bioinformatics 2006, Feb 15;22(4):500-3.
RNAsifter:Shape based indexing to speed up Rfamsearches by Voss,Janssen, Reeder, Giegerich. BMCBioinformatics, 2007.