Top Banner
CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra
17

CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Jan 19, 2018

Download

Documents

Tamsin McCarthy

RNA Secondary Structure RNA: String B = b 1 b 2  b n over alphabet { A, C, G, U } Secondary structure: RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule G U C A GA A G CG A U G A U U A G A CA A C U G A G U C A U C G G G C C G Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA complementary base pairs: A-U, C-G
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

CSCI 256 Data Structures and Algorithm Analysis Lecture 16

Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some by Iker Gondra

Page 2: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Dynamic Programming Review

• Recipe– Characterize structure of problem– Recursively define value of optimal solution– Compute value of optimal solution– Construct optimal solution from computed information

• Dynamic programming techniques– Binary choice: weighted interval scheduling – Multi-way choice: segmented least squares– Adding a new variable: knapsack– Dynamic programming over intervals: RNA secondary structure

Page 3: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

RNA Secondary Structure

• RNA: String B = b1b2bn over alphabet { A, C, G, U }• Secondary structure: RNA is single-stranded so it tends

to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule

G

U

C

A

GA

A

G

CG

A

UG

A

U

U

A

G

AC A

A

C

U

G

A

G

U

C

AU

C

GG

G

C

C

G

Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA

complementary base pairs: A-U, C-G

Page 4: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

RNA Secondary Structure

• Secondary structure: A set of pairs S = { (bi, bj) } that satisfy

– [Watson-Crick] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, C-G, or G-C

– [No sharp turns] The ends of each pair are separated by at least 4 intervening bases. If (bi, bj) S, then i < j - 4

– [Non-crossing] If (bi, bj) and (bk, bl) are two pairs in S, then we cannot have i < k < j < l

CG G

C

A

G

U

U

U A

A U G U G G C C A U

ok

G G

C

A

G

U

U A

G

A U G G G C A U

sharp turn

G4

CG G

C

A

U

G

U

U A

A G U U G G C C A U

crossing

Page 5: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

RNA Secondary Structure

• Out of all the secondary structures that are possible for a single RNA molecule, which are the ones that are likely to arise?– Free energy: Usual hypothesis is that an RNA

molecule will form the secondary structure with the optimum total free energy

– Goal: Given an RNA molecule B = b1b2bn, find a secondary structure S that maximizes the number of base pairs

approximate by number of base pairs

http://www.genebee.msu.su/services/rna2_reduced.html

Page 6: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

RNA Secondary Structure: Subproblems

• First attempt: OPT(j) = maximum number of base pairs in a secondary structure on substring b1b2bj. Either– j is not involved in a pair

• Find optimal secondary structure in: b1b2bj-1

– j pairs with t for some t < j – 4

OPT(j-1)

1 t j

Page 7: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

RNA Secondary Structure: Subproblems

If j pairs with some t where t < j – 4, then the no crossover rule tells us that we can’t have a base pair (k,l) where k < t < l < j; this implies we can’t have (k,l) where

1 ≤ k ≤ t -1 and t + 1 ≤ l ≤ j -1This means that any other pair (k,l) in an optimal structure is

either in b1,b2,…,bt-1 or in bt+1,…bj-1

So we must look at two subproblems which are decoupled due to the noncrossing constraint:

• Find the optimal secondary structure in: b1b2bt-1

• Find the optimal secondary structure in: bt+1bt+2bj-1

Page 8: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

RNA Secondary Structure: Subproblems

• What is different here????• The second subproblem is not on our list of

subproblems, because it does not begin with b1.• We need more subproblems! • We need to be able to work with subproblems

that do not begin with b1.

Page 9: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Dynamic Programming Over Intervals

• Notation: OPT(i, j) = maximum number of base pairs in a secondary structure of the substring bibi+1bj

– Case 1: i j - 4• OPT(i, j) = 0 by no-sharp turns condition

– Case 2: i < j - 4If Base bj is not involved in a pair (Watson –Crick)

• OPT(i, j) = OPT(i, j-1)

If Base bj pairs with bt for some t, i t < j - 4the non-crossing constraint means that:OPT(i, j) = 1 + max t { OPT(i, t-1) + OPT(t+1, j-1) }

take max over t such that i t < j-4 andbt and bj are Watson-Crick complements

Page 10: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

• Hence if i < j – 4 we want the maximum of the two values for Opt(i,j)

• Opt(i,j) = max ( OPT(i, j-1), 1 + max t { OPT(i, t-1) + OPT(t+1, j-1) } )

Page 11: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Dynamic Programming Over Intervals

• What order to solve the sub-problems?– Looking at the recurrence relation, we see that we are

invoking solutions to subproblems on shorter intervals– Need to evaluate Opt for shortest intervals first – this

is different from the subset sum (and knapsack) strategy of doing row by row

– To achieve this need to set an auxillary variable k to a constant and use values of i and j which keep j-i = k

– As k gets larger, the interval for the subproblem bi,bi+1,…bj grows

Page 12: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Dynamic Programming Over Intervals

– Running time: O(n3) Why???

RNA(b1,…,bn) { Initialize Opt[i, j] = 0 whenever i j-4 (ie, i+4 ≥ j) for k = 5, 6, …, n-1 for i = 1, 2, …, n-k set j = i + k Compute Opt[i, j]

return Opt[1, n]}

using recurrence

Page 13: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Running time analysis:

• There are O(n2) subproblems to solve and evaluating the recurrence in each problem takes O(n) time (because we have to find the max over the t’s such that bt and bj are allowable pairs)

• So running time is O(n3)

Page 14: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Example: ACCGGUAGU

• Recall: base pairs allowed: AU, UA, CG, GC• What is the basic array that we need to fill??

(here n= 9)

Page 15: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Example: ACCGGUAGU

• Note if i > j, let Opt (i,j) = 0 (Why??)• Need two dimensions to present the array M of for values for

Opt (i,j) – one for the left endpoint of the interval being considered, and one for the right endpoint

• Some initial values are 0 – whenever i ≥ j – 4 (Why??)• Begin with k = 5; loop over the i’s from 1 to 4 (= 9 – 5)• for Opt (1,6); t = 1 is only t with 1 ≤ t < 6 - 4, and b1b6 is AU

allowable base pair so• Opt (1,6) = max( 0, max(1+0+0) ) = 1• Opt (2,7) t = 2 is only t with 2 ≤ t < 7-4; but b2b7 is CA – not an

allowable base pair so no ts satisfy the conditions and Opt(2,7) = Opt (2,6) = 0

• Next value??

Page 16: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Example: ACCGGUAGU

• Opt(3,8); t = 3 only possible t and b3b8 is allowable base pair so

Opt(3,8)= max( Opt(3,7), max ( 1 + Opt(3,0) + Opt(2,7) ) ) =max ( 0, max(1 + 0+0) ) = 1

Next value to calculate??

Page 17: CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.

Example: ACCGGUAGU

• It is Opt(4,9) • Now let k = 6 and do • Opt( 1, 7) then Opt( 2,8), then Opt (3,9) ….• Note for Opt (1,7) both t = 1 and t = 2 satisfy the

inequality i ≤ t < j – 4 (i = 1 and j = 7); are both base pair allowable?

• This is a fully worked example in the text – check out more values to make sure you are following the algorithm