Top Banner
RNA Folding CMSC 423 Lecture by Darya Filippova
28

RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Apr 28, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA FoldingCMSC 423

Lecture by Darya Filippova

Page 2: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA Folding

G C

CG

UA

A U

U A

C G

CG

AU

A U

G

G

G

U

A

A

A

A G C C

GGCU

UA

A

A

GA

C

C

G

GU

C

U

U

U

A

CC

C

C

GG

A

U

A

U

G

C

CC

C

A

A

RNA is single stranded and folds up:• G and C stick together• A and U stick together

Page 3: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA Folding Rules

RNA folding rules:1. If two bases are closer than 4 bases apart, they cannot

pair2. Each base is matched to at most one other base3. The allowable pairs are {U, A} and {C, G}4. Pairs cannot “cross.”

G C

CG

UA

A U

U A

C G

CG

AU G CCG UAA UU AC G CG AU

Page 4: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

No Crossings

If (i,j) and (k,m) are paired, we must have i < k < m < j.

Paired bases have to be nested.

i jk m

Page 5: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA Folding

Given: a string r = b1b2b3,...,bn with bi ∈ {A,C,U,G}Find: the largest set of pairs S = {(i,j)}, where i,j ∈ {1,2,...,n} that satisfies the RNA folding rules.

Goal: match as many bases as possible.

Page 6: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Subproblems

G CCG UUA UU AC G CG AU1 j

G CCG UUA UU AC G CG AU1 j

j is not paired with anything

j is paired with some t ≤ j -4

t

OPT(t+1, j-1)OPT(1, t-1)

OPT(1, j-1)

Page 7: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Recurrence

If j - i ≤ 4:

OPT (i, j) = max

(OPT (i, j � 1)

maxt{1 +OPT (i, t� 1) +OPT (t+ 1, j � 1)

If j - i > 4:

In the 2nd case above, we try all possible t with which to pair j.That is, t runs from i to j-4.

OPT (i, j) = 0

Page 8: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Order to solve the subproblems

• In what order should we solve the subproblems?

Page 9: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Order to solve the subproblems

• In what order should we solve the subproblems?

• What problems do we need to solve OPT(i,j)?

OPT(i,t-1) and OPT(t+1, j-1) for every t between i and j

• In what sense are these problems “smaller?”

Page 10: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Order to solve the subproblems

• In what order should we solve the subproblems?

• What problems do we need to solve OPT(i,j)?

OPT(i,t-1) and OPT(t+1, j-1) for every t between i and j

• In what sense are these problems “smaller?”

• They involve smaller intervals of the string:

We solve OPT(i,j) in order of increase value of j - i.

Page 11: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

only use half: i < j

OPT(i,j)

Page 12: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 13: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 14: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 15: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 16: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 17: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 1

i

j

n

1n1

OPT(i,j)

OPT(i,j-1)

OPT (i, j) = max

(OPT (i, j � 1). . .

Page 18: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 1

i

j

n

1n1

OPT(i,j)

OPT(i,j-1)

OPT (i, j) = max

(OPT (i, j � 1). . .

Page 19: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 2

i

j

n

1n1

OPT(i,j)

OPT(t+1,j-1)

OPT(i,t-1)

OPT (i, j) = max

(. . .

maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}

Page 20: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 2

i

j

n

1n1

OPT(i,j)

OPT(t+1,j-1)

OPT(i,t-1)

OPT (i, j) = max

(. . .

maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}

Page 21: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 2

i

j

n

1n1

OPT(i,j)

OPT(t+1,j-1)

OPT(i,t-1)

OPT (i, j) = max

(. . .

maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}

Page 22: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Code

def rnafold(rna): n = len(rna) OPT = make_matrix(n, n) Arrows = make_matrix(n, n) for k in xrange(5, n): # interval length for i in xrange(n-k): # interval start j = i + k # interval end best_t = OPT[i][j-1] arrow = -1 for t in xrange(i, j): if is_complement(rna[t], rna[j]): val = 1 + \\

(OPT[i][t-1] if t > i else 0) + OPT[t+1][j-1] if val >= best_t: best_t, arrow = val, t OPT[i][j] = best_t Arrows[i][j] = arrow return OPT, Arrows

Page 23: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Backtrace code

def rna_backtrace(Arrows): Pairs = [] # holds the pairs in the optimal solution Stack = [(0, len(Arrows) - 1)] # tracks cells we have to visit while len(Stack) > 0: i, j = Stack.pop() if j - i <= 4: continue # if cell is base case, skip it # Arrow = -1 means we didn’t match j if Arrows[i][j] == -1: Stack.append((i, j - 1)) else: t = Arrows[i][j] Pairs.append((t, j)) # save that j matched with t # add the two daughter problems if t > i: Stack.append((i, t - 1)) Stack.append((t + 1, j - 1))return Pairs

Page 24: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Subproblems, 2

• We have a subproblem for every interval (i,j)

• How many subproblems are there?

Page 25: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Subproblems, 2

• We have a subproblem for every interval (i,j)

• How many subproblems are there?

✓n

2

◆= O(n2)

Page 26: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Running Time

• O(n2) subproblems

• Each takes O(n) time to solve(have to search over all possible choices of t)

• Total running time is O(n3).

Page 27: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Summary

• This is essentially “Nussinov’s algorithm,” which was proposed for finding RNA structures in 1978.

• Same dynamic programming idea: write the answer to the full problem in terms of the answer to smaller problems.

• Still have an O(n2) matrix to fill.

• Main differences from sequence alignment:• We fill in the matrix in a different order: entries (i,j) in order

of increasing j - i.• We have to try O(n) possible subproblems inside the max.

This leads to an O(n3) algorithm.

Page 28: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Pseudoknots

(Staple & Butcher, PLoS Biol, 2005)