Top Banner
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction
20

Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Improving Free Energy Functions for RNA Folding

RNA Secondary Structure Prediction

Page 2: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Why RNA is Important

• Machinery of protein construction

• Catalytic role in cells– May be possible to destroy specific sequences of

RNA (to interrupt protein production)– RNase P (Cech/Altman c.1981)

Page 3: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

RNA Structural Levels

Primary

AAUCG...CUUCUUCCA

Secondary Tertiary

Secondary: http://anx12.bio.uci.edu/~hudel/bs99a/lecture21/lecture2_2.htmlTertiary: http://www.leeds.ac.uk/bmb/courses/teachers/trnballs.html

Page 4: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Abstracting the problem

U

C

C

A G

G

A

C

Zuker (1981) Nucleic Acids Research 9(1) 133-149

Page 5: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Why it is hard

n

nn

E

enOkn

n

8.1]ψ[

)(1

ψ

mer- for RNA sequencessecondary ofnumber ψ

n

5.12/3n

n

Hofacker et al. (1994) Monat. Chem. 125 167-188

• Large search space (hard to enumerate)

Page 6: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Why it is hard

• Secondary structure does not exist.– Unlike proteins– Putative structures (prone to revision)

• Quality of Energy Functions– Discussed later

Page 7: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Current Algorithms

• Single-Strand– Minimum Free Energy (Zuker et. al. 1981)

– Partition Functions (McCaskill 1990)

• Comparative Sequence Analysis– Max. Weighted Matching (Nussinov et. al. 1978)

– Stochastic CFG (Sakikibara et. al. 1994)

– Phylogenetic Trees (Gulko et. al. 1995)

– Statistical Significance (Noller & Woese, early 80’s)

See proposal for references

Page 8: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

MFE / Tinoco Hypothesis

The free energy of a secondary structure equals

the sum of the free energies of the loops and stacked pairs

Tinoco et al. (1971) Nature 230 362-367.

Page 9: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Proposed System

SecondaryStructures

MFE(E)

GA(E’)

AAUCG...CUUCUUCCA

AAUCG...CUUCUUCCA

1

2

3

Page 10: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Step I - Calc MFE Structure

• Given a sequence apply the MFE algorithm– Generates secondary structure S

Page 11: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Step II - Structural Similarity

• Given a database of experimentally verified RNA structures– Let Q be the database structure most

similar to S

– Based on RNase P Database (Brown 1999)

Page 12: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Step III - Construct E’

• Create a new energy function:

step)next (see generation

]1,0( where

SES,1

SE'

n

QSimilarityn

Page 13: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Discussion on E’

• E’ has global information• Global information precludes the use of

dynamic programming (MFE, Partition)

• Leaves (stochastic) combinatorial optimization Gradient Descent (no E/S)Genetic Algorithms / Simulated Annealing

Page 14: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Step IV - Genetic Algorithm

• RNA Structural Prediction by GA– Input: sequence – Output: structure that maximizes E’ for – Steady State Genetic Algorithm– Pseudoknots forbidden (conflicts)– Fitness = -E’

– Effect of Similarity(Q, S) diminishes with each generation (pseudo-SA).

Page 15: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Genetic Algorithm - Repn.

• Stem-loop representation (Chen et. Al. 2000)

– Window method (EMBOSS Palindrome)

23 52

(23 52 3 3.2)

startend

length

weight

Page 16: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Genetic Algorithm - Operators

• Mutation– Add stem from stem pool to a child

• Crossover

P1

P2

C1

C2

Fit stems of P2 into C1 or C2 randomly.

Placement must be conflict free.

Page 17: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Preliminary Results

• E’ does not lead to drastic speed up

• Genetic algorithm is very slow– If initial population generated randomly

from stem pool.– Use suboptimal folding for initial

population.

Page 18: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Preliminary Results Explained

• The real structure is usually very similar the Tinoco optimal structure.

• View E’ as a way of choosing among the suboptimal structures.

Page 19: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

Future Work

• More testing on the entire RNase P Database (> 400 structures)

• Tune E’

• Accuracy comparison to MFE and Partition Function Algorithms

• Parallelize genetic algorithm

Page 20: Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

END