Top Banner
Doug Raiford Lesson 7
28

RNA secondary structure prediction

Mar 19, 2016

Download

Documents

aerona

Doug Raiford Lesson 7. RNA secondary structure prediction. Why do we care. RNA World Hypothesis RNA world evolved into the DNA and protein world DNA advantage: greater chemical stability Protein advantage: more flexible and efficient enzymes ( biomolecules that catalyze) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RNA secondary structure prediction

Doug RaifordLesson 7

Page 2: RNA secondary structure prediction

RNA World Hypothesis RNA world evolved into the

DNA and protein world DNA advantage: greater

chemical stability Protein advantage: more

flexible and efficient enzymes (biomolecules that catalyze)▪ 20 amino acids vs. 4 nucleotides▪ Chemically, more diverse

Remnants remain in ribosomes, nucleases, polymerases, and splicing molecules

Page 3: RNA secondary structure prediction

Primary: sequence

Secondary: double stranded regions Reverse

complementsTertiary: three-

dimensional structure

>tRNA. Carries amino acid for Isolucine AGGCUUGUAGCUCAGGUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGGUGGUUCAAGUCCACUCAGGCCUACCA

CCA Tail

Acceptor Step

D arm

Anticodon arm

Anticodon

T arm

Page 4: RNA secondary structure prediction

How find regions of reverse complementation?

What do we have? Sequence A’s like pairing with U’s and

G’s like pairing with C’s Stronger bond (3 hydrogen

bonds) between G’s and C’s Should result in lowest free energy (max

enthalpy)

Page 5: RNA secondary structure prediction

tRNA Transports amino

acid to the ribosome

CCA Tail

Acceptor Step

D arm

Anticodon arm

Anticodon

T arm

Page 6: RNA secondary structure prediction

Visualization

Page 7: RNA secondary structure prediction

Good at finding longer base-pairings (stacked base-pairs)

Need to find the conformation that provides the minimal total free energy

RNA often has many alternate conformations at different temperatures

Stacked base-pairs add stability

Loops/bulges introduce positive free energy and are destabilizing

Page 8: RNA secondary structure prediction

First nucleotide base-pairs with last

First nucleotide base-pairs with some other (other than last) nucleotide (including none)

Recurse on rest

Recurse on every possible set of two strings

Recurrence relations

jkiSESESErr

SEjkki

jijiji for )()(min

)(),(min)(

,1,

1,1,

Page 9: RNA secondary structure prediction

As luck would have it… Zuker came up with a

dynamic programming solution

j

G G G A A A U C CG 0G 0G 0A 0A 0A 0U 0C 0C 0

i

Page 10: RNA secondary structure prediction

G G G A A A U C CG 0G 0 0G 0 0A 0 0A 0 0A 0 0U 0 0C 0 0C 0 0

Start with zeros on diagonal

Populate diagonally

j

i

Page 11: RNA secondary structure prediction

Will look at last value to illustrate

Match first and last character, recurse on rest

G G G A A A U C CG 0 0 0 0 0 0 -

1-2

-3

G 0 0 0 0 0 0 -1

-2

-3

G 0 0 0 0 0 -1

-2

-2

A 0 0 0 0 -1

-1

-1

A 0 0 0 -1

-1

-1

A 0 0 -1

-1

-1

U 0 0 0 0C 0 0 0C 0 0

j

i

)2(1

)(),( 1,1

jiji SErr

α A C U GA 0 0 -

10

C 0 0 0 -1U -

10 0 0

G 0 -1

0 0

Page 12: RNA secondary structure prediction

G G G A A A U C CG 0 0 0 0 0 0 -

1-2

G 0 0 0 0 0 -1

-2

-3

G 0 0 0 0 -1

-2

-2

A 0 0 0 -1

-1

-1

A 0 0 -1

-1

-1

A 0 -1

-1

-1

U 0 0 0C 0 0C 0

Min of all pairs of substrings

j

i-3

GGGAAAUCC

GGGAAAUCCG-G-G-A C-C-U

A

A

G-G AC-C-U

A

A

G

Page 13: RNA secondary structure prediction

n2 plus 2n for each visited cellSo O(n3)

Populate matrix plus traverse

row/column for each cell

Page 14: RNA secondary structure prediction

Any prediction method must account for these

Page 15: RNA secondary structure prediction

Now O(n4) Interior loops

most expensive Can exploit the

fact that along diagonals, loops have same size

Can calculate once

Limits search space

Back to O(n3)

kkkk

kk

LSEkkrr

LSEkrrLSEkrr

LSErrLijrr

LE

LEjkiSESE

SESE

SE

jikjkijik

jikjijik

jijkijik

jijiji

jiji

ji

ji

jkki

ji

ji

ji

size of loopinterior an ofenergy free ingdestabiliz)( size of bulge a ofenergy free ingdestabiliz)(

pairs baseadjacent ofenergy free gstabilizin size with loophairpin a ofenergy free ingdestabiliz )(

loopinterior an is if,)()(),(min

jon bulge a is if,)()(),(minion bulge a is if,)()(),(min

region helical a is if),(),(loophairpin a is if),1(),(

)(

)(for )()(min

)()(

min)(

,1,1211

,1,11

,1,11

,1,1

,

,

,

,1,

1,

,1

,

21

Page 16: RNA secondary structure prediction

Zuker’s site

1 gccgaggtgg tggaattggt agacacgcta ccttgaggtg gtagtgccca atagggctta61 cgggttcaag tcccgtcctc ggtacca

tRNA for Leucine in E. coli, a prototypical organism

Codon: uuaAnti-codon: aat

CCA Tail

Acceptor Step

D arm

Anticodon arm

Anticodon

T arm

Page 17: RNA secondary structure prediction

Just like proteins: conformation

What if a T-A base-pair mutate to an G-C Still same function

What would this do to a search or sequence alignment?

GCAGGACCAUAUA|||||||||||||CGUCCUGGUAUAU

GCAGGACCAGAUA|||||||||||||CGUCCUGGUCUAU

Page 18: RNA secondary structure prediction

Phenomenon known as covariance (not to be confused with statistical covariance)

GCAGGACCAUAUA|||||||||||||CGUCCUGGUAUAU

GCAGGACCAGAUA|||||||||||||CGUCCUGGUCUAU

Page 19: RNA secondary structure prediction

How might we locate covariant pairs?

MSA then compare all pair-wise combinations of columns

High degree of agreement in two columns (G’s match with C’s, A’s match with U’s) an indication of base-pairing

χ2 testCompare to expected number of parings given sequence composition

Page 20: RNA secondary structure prediction

Pairing depicted with nested parentheses

AAGACUUCGGUCUGGCGACAUUC ((( ))) (( ( )))

Page 21: RNA secondary structure prediction

Mountain plots

A mountain plot represents a secondary structure in a plot of height versus position, where the height m(k) is given by the number of base pairs enclosing the base at position k. I.e. loops correspond to plateaus (hairpin loops are peaks), helices to slopes.

Page 22: RNA secondary structure prediction

Circle plot

Page 23: RNA secondary structure prediction

Data structure capable of capturing secondary structure

Ordered Binary Tree

Page 24: RNA secondary structure prediction

ProductionsS → aSu | uSa | cSg | gScS → aS | cS | gS | uSS → Sa | Sc | Sg | SuS → SSS →⍉

Page 25: RNA secondary structure prediction

DerivationS → aSS → aScS → aSccS → acSgccS → acgScgccS → acggSccgccS → acgggScccgccS → acggggSccccgccS → acgggguSccccgccS → acgggguuSccccgccS → acgggguucSccccgccS → acgggguucgSccccgccS → acgggguucgaSccccgccS → acgggguucgaaSccccgccS → acgggguucgaauSccccgccS → acgggguucgaauccccgcc

Page 26: RNA secondary structure prediction

Parse treea←S | S→c | S→c |c←S→g |g←S→c |g←S→c |g←S→c |g←S→c S→u | |u←S S→a \ / u←S S→a \ / c←S—S→g

Page 27: RNA secondary structure prediction

Conformation of RNA dictates function Determining secondary structure can

help determine tertiary structure Dynamic programming approach to

identifying minimum energy conformations Zuker MFOLD

View using dot plots, nested parens, mountain or circular plots

Covariance: base-pairs mutate but still form pairs, exploit to find pairings

Page 28: RNA secondary structure prediction