Top Banner
. . . . . . . . CS711008Z Algorithm Design and Analysis Lecture 6. Basic algorithm design technique: Dynamic programming 1 Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 The slides are made based on Ch 15, 16 of Introduction to algorithms, Ch 6, 4 of Algorithm design. Some slides are excerpted from the slides by K. Wayne with permission. 1 / 145
139
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lec6

. . . . . .

.

......

CS711008Z Algorithm Design and AnalysisLecture 6. Basic algorithm design technique: Dynamic

programming 1

Dongbo Bu

Institute of Computing Technology

Chinese Academy of Sciences, Beijing, China

1The slides are made based on Ch 15, 16 of Introduction to algorithms, Ch6, 4 of Algorithm design. Some slides are excerpted from the slides by K.Wayne with permission.

1 / 145

Page 2: Lec6

. . . . . .

.. Outline

The first example: MatrixChainMultiplication

Elements of dynamic programming technique;

Various ways to describe subproblems: Segmented LeastSquares, Knapsack, RNA Secondary Structure,Sequence Alignment, and Shortest Path;

Connection with greedy technique: Interval Scheduling,Shortest Path.

2 / 145

Page 3: Lec6

. . . . . .

.. If a problem can be reduced into smaller sub-problems I

There are two possible solving strategies:...1 Incremental: to solve the original problem, it suffices to solvea smaller sub-problem; thus the problem is shrunkstep-by-step. In other words, a feasible solution can beconstructed step-by-step.For example, in Gale-Shapley algorithm, the final completesolution is constructed step by step, and a stable, partialmatching is maintained during the construction process.

3 / 145

Page 4: Lec6

. . . . . .

.. If a problem can be reduced into smaller sub-problems II

...2 divide-and-conquer: the original problem is decomposed intoseveral independent sub-problems; thus, a feasible solution tothe original problem can be constructed by assembling thesolutions to independent sub-problems.

4 / 145

Page 5: Lec6

. . . . . .

.. Connection with divide-and-conquer technique I

...1 Dynamic programming, like divide-and-conquer method,solves problems by combining the solutions to subproblems.

...2 Meanwhile, dynamic programming avoids the repetition ofcomputing the common subproblems through “programing”. 2

Here, “programming” means “tabular” rather than “coding”, e.g.

dynamic programming, linear programming, non-linear

programming, semi-definite programming, .....

...3 Dynamic programming (and greedy technique) is typicallyused to solve an optimization problem. An optimization

problem usually has multiple feasible solutions, and each solution is

associated with a value. The goal is to find the solution with the

minimum/maximum value.

5 / 145

Page 6: Lec6

. . . . . .

.. Connection with divide-and-conquer technique II

...4 However, dynamic programming is not limited to optimizationproblems. Generally speaking, dynamic programming applies ifrecursion exists, e.g. p-value calculation problem.Sometimes the original problem should be extended toidentify meaningful recursion.

...5 To identify meaningful recursions, one of the key steps is todefine the general form of sub-problems.Requirements: the original problem is a specific case of the

sub-problems or can be solved using solutions to sub-problems, and

the number of sub-problems is polynomial.

2Program: [Date: 1600-1700; Language: French; Origin: programme, fromGreek, from prographein ’to write before’]

6 / 145

Page 7: Lec6

. . . . . .

.

......The first example: MatrixChainMultiplication problem

7 / 145

Page 8: Lec6

. . . . . .

.. MatrixChainMultiplication problem

.

......

INPUT:A sequence of n matrices A1, A2, ..., An; matrix Ai has dimensionpi−1 × pi;OUTPUT:Fully parenthesizing the product A1A2...An in a way to minimizethe number of scalar multiplications.

8 / 145

Page 9: Lec6

. . . . . .

.. An example of MatrixChainMultiplication

Solutions: (((A1)(A2))(A3))(A4) ((A1)(A2))((A3)(A4))

Cost: 1× 2× 3 1× 2× 3

+1× 3× 4 +3× 4× 5

+1× 4× 5 +1× 3× 5

= 38 = 81

Here, the calculation of A1A2 needs 1× 2× 3 scalarmultiplications.

The objective is to determine a calculation sequence such thatthe number of calculations is minimised.

9 / 145

Page 10: Lec6

. . . . . .

.. The total number of possible parenthesis

Intuitively, a parenthesis can be described as a binary tree,where each node corresponds to a subproblem.

...............((A1)(A2))((A3)(A4))

.

((A1)(A2))

.

((A3)(A4))

.

(A1)

.

(A2)

.

(A3)

.

(A4)

Solution space size:(2nn

)−

(2nn−1

)(Catalan number ).

Thus, brute-force strategy doesn’t work.

10 / 145

Page 11: Lec6

. . . . . .

.

......A dynamic programming algorithm (by S. S. Godbole, 1973?)

11 / 145

Page 12: Lec6

. . . . . .

.. The general form of sub-problems

...1 It is not easy to solve the problem when n is large. Let’s seewhether it is possible to reduce into smaller sub-problems.

...2 Solution: a full parentheses. Imagine the solving process as aprocess of multiple-stage decisions; each decision is to addparentheses at a position.

...3 Suppose we have already worked out the optimal solution O,where the first decision adds two parentheses as(A1...Ak)(Ak+1...An).

...4 This decision decomposes the original problem into twoindependent sub-problems: to calculate A1...Ak andAk+1...An.

...5 Summarizing these two cases, we define the general form ofsub-problems as: to calculate Ai...Aj with the minimalnumber of scalar multiplications.

12 / 145

Page 13: Lec6

. . . . . .

.. The general form of sub-problems cont’d

The general form of sub-problems: to calculate Ai...Aj withthe minimal number of scalar multiplications.

Let’s denote the optimal solution value to the sub-problem asOPT (i, j).

Thus, the original problem can be solved via calculatingOPT (1, n).

It should be pointed out that the total number ofsub-problems is polynomial (n2).

13 / 145

Page 14: Lec6

. . . . . .

.. Key observation: Optimal substructure

For any solution with the first split occurring between Ak

and Ak+1, the following equation holds:Cost(i, j) = Cost(i, k) + Cost(k + 1, j) + pipk+1pj+1

(Here, Cost(i, j) denotes the number of multiplicationsneeded to calculate Ai...Aj . The equality holds due to theindependence between the two sub-problems Ai...Ak andAk+1...Aj .)

As a special case, an optimal solution with the first splitoccurring between Ak and Ak+1 has the following optimalsubstructure property:OPT (i, j) = OPT (i, k) +OPT (k + 1, j) + pipk+1pj+1

14 / 145

Page 15: Lec6

. . . . . .

.. Proof of the optimal substructure property

“Cut-and-paste” proof:

Suppose for Ai...Ak, there is another parentheses OPT ′(i, k)better than OPT (i, k). Then the combination of OPT ′(i, k)and OPT (k + 1, j) leads to a new solution with lower costthan OPT (i, j): a contradiction.

Here, the independence between Ai...Ak and Ak+1...Aj

guarantees that the substitution of OPT (i, k) withOPT ′(i, k) does not affect solution to Ak+1...Aj .

15 / 145

Page 16: Lec6

. . . . . .

.. A recursive solution

So far so good! The only difficulty is that we have no idea ofthe first splitting position k.

How to overcome this difficulty? Enumeration! Weenumerate all possible options of the first decision, i.e. forall k, i ≤ k < j.

.......

k = 1

.(A1)(A2A3A4)

.

(A1)

.

(A2A3A4)

......

k = 2

.(A1A2)(A3A4)

.

(A1A2)

.

(A3A4)

......

k = 3

.(A1A2A3)(A4)

.

(A1A2A3)

.

(A4)

Thus we have the following recursion:

OPT (i, j) =

{0 i = j

mini≤k<j{OPT (i, k) +OPT (k + 1, j) + pipk+1pj+1} otherwise

16 / 145

Page 17: Lec6

. . . . . .

.

......Implementing the recursion: trial 1

17 / 145

Page 18: Lec6

. . . . . .

.. Trial 1: Explore the recursion in the top-down manner

RECURSIVE MATRIX CHAIN(i, j)

1: if i == j then2: return 0;3: end if4: OPT (i, j) = +∞;5: for k = i to j − 1 do6: q = RECURSIVE MATRIX CHAIN(i, k)7: + RECURSIVE MATRIX CHAIN(k + 1, j)8: +pipk+1pj+1;9: if q < OPT (i, j) then

10: OPT (i, j) = q;11: end if12: end for13: return OPT (i, j);

Note: The optimal solution to the original problem can be obtainedthrough calling RECURSIVE MATRIX CHAIN(1, n).

18 / 145

Page 19: Lec6

. . . . . .

.. An example

....

A1

.

A4

..............A1A2A3A4

.

A1A2

.

A3A4

.

A1

.

A2

.

A3

.

A4

...

A1

.

A3

..............

A1A2A3

.

A2A3

.

A1A2

.

A2

.

A3

.

A1

.

A2

...

A4

.

A2

..............

A2A3A4

.

A3A4

.

A2A3

.

A3

.

A4

.

A2

.

A3

Note: each node of the recursion tree denotes a subproblem.

19 / 145

Page 20: Lec6

. . . . . .

.. However, this is not a good implementation

.Theorem..

......

Algorithm RECURSIVE-MATRIX-CHAIN costs exponentialtime.

Let T (n) denote the time used to calculate product of n matrices.Note that T (n) ≥ 1 +

∑n−1k=1(T (k) + T (n− k) + 1) for n > 1.

..

T (4)

...

A1

.

A4

.T (1)

.T (1)

..............A1A2A3A4

.

A1A2

.

A3A4

.

A1

.

A2

.

A3

.

A4

.T (2)

.T (2)

...

A1

.

A3

..............

A1A2A3

.

A2A3

.

A1A2

.

A2

.

A3

.

A1

.

A2

.T (3)

...

A4

.

A2

..............

A2A3A4

.

A3A4

.

A2A3

.

A3

.

A4

.

A2

.

A3

.T (3)

20 / 145

Page 21: Lec6

. . . . . .

.. Time-complexity analysis

.Theorem..

......

Algorithm RECURSIVE-MATRIX-CHAIN costs exponentialtime.

.Proof...

......

We shall prove T (n) ≥ 2n−1 using the substitution technique.

Basis: T (1) ≥ 1 = 21−1

Induction:

T (n) ≥ 1 +∑n−1

k=1(T (k) + T (n− k) + 1) (1)

= n+ 2∑n−1

k=1T (k) (2)

≥ n+ 2∑n−1

k=12k−1 (3)

≥ n+ 2(2n−1 − 1) (4)

≥ n+ 2n − 2 (5)

≥ 2n−1 (6)21 / 145

Page 22: Lec6

. . . . . .

.

......Implementing the recursion: trial 2

22 / 145

Page 23: Lec6

. . . . . .

.. Why the first trial failed?

....

A1

.

A4

..............A1A2A3A4

.

A1A2

.

A3A4

.

A1

.

A2

.

A3

.

A4

.......

A1

.

A3

..............

A1A2A3

.

A2A3

.

A1A2

.

A2

.

A3

.

A1

.

A2

.........

A4

.

A2

..............

A2A3A4

.

A3A4

.

A2A3

.

A3

.

A4

.

A2

.

A3

..

Key observation: there are only O(n2) subproblems. However,some subproblems (in red) were solved repeatedly.

Solution: memorize the solutions to subproblems using anarray OPT [1..n, 1..n] for further look-up.

23 / 145

Page 24: Lec6

. . . . . .

.. Memorize technique

MEMORIZE MATRIX CHAIN(i, j)

1: if OPT [i, j] ̸= NULL then2: return OPT (i, j);3: end if4: if i == j then5: OPT [i, j] = 0;6: else7: for k = i to j − 1 do8: q = MEMORIZE MATRIX CHAIN(i, k)9: +MEMORIZE MATRIX CHAIN(k + 1, j)

10: +pipk+1pj+1;11: if q < OPT [i, j] then12: OPT [i, j] = q;13: end if14: end for15: end if16: return OPT [i, j];

24 / 145

Page 25: Lec6

. . . . . .

.. Memorize technique cont’d

The original problem can be solved by callingMEMORIZE MATRIX CHAIN(1, n) with all OPT [i, j]initialized as NULL.

Time-complexity: O(n3) (The calculation of each entryOPT [i, j] makes O(n) recursive calls in line 8.)

25 / 145

Page 26: Lec6

. . . . . .

.

......Implementing the recursion faster: trial 3

26 / 145

Page 27: Lec6

. . . . . .

..

Trial 3: Faster implementation: unrolling the recursion inthe bottom-up manner

MATRIX CHAIN MULTIPLICATION(P )

1: for i = 1 to n do2: OPT (i, i) = 0;3: end for4: for l = 2 to n do5: for i = 1 to n− l + 1 do6: j = i+ l − 1;7: OPT (i, j) = +∞;8: for k = i to j − 1 do9: q = OPT (i, k) +OPT (k + 1, j) + pipk+1pj+1;

10: if q < OPT (i, j) then11: OPT (i, j) = q;12: S(i, j) = k;13: end if14: end for15: end for16: end for17: return OPT (1, n);

27 / 145

Page 28: Lec6

. . . . . .

.. Tree: an intuitive view of the bottom-up calculation

....

A1

.

A4

..............A1A2A3A4

.

A1A2

.

A3A4

.

A1

.

A2

.

A3

.

A4

..........

A1

.

A3

..............

A1A2A3

.

A2A3

.

A1A2

.

A2

.

A3

.

A1

.

A2

..........

A4

.

A2

..............

A2A3A4

.

A3A4

.

A2A3

.

A3

.

A4

.

A2

.

A3

.......

Solving sub-problems in a bottom-up manner, i.e....1 Solving the sub-problems in red first;...2 Then solving the sub-problems in green;...3 Then solving the sub-problems in orange;...4 Finally we can solve the original problem in blue.

28 / 145

Page 29: Lec6

. . . . . .

.. Step 1 of the bottom-up algorithm

..

OPT

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

.0

.6

...0

.24

..

0

.

60

.

0

.

SPLITTER

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

..1

....2

...

3

.

Step 1:

OPT [1, 2] = p0 × p1 × p2 = 1× 2× 3 = 6;OPT [2, 3] = p1 × p2 × p3 = 2× 3× 4 = 24;OPT [3, 4] = p2 × p3 × p4 = 3× 4× 5 = 60;

29 / 145

Page 30: Lec6

. . . . . .

.. Step 2 of the bottom-up algorithm

..

OPT

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

.0

.6

.18

..0

.24

.64

.

0

.

60

.

0

.

SPLITTER

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

..1

.2

...2

.3

..

3

.

Step 2:

OPT [1, 3] = min

{OPT [1, 2] +OPT [3, 3] + p0 × p3 × p4(= 18)

OPT [1, 1] +OPT [2, 3] + p0 × p2 × p4(= 32)

Thus, SPLITTER[1, 2] = 2.

OPT [2, 4] = min

{OPT [2, 2] +OPT [3, 4] + p1 × p2 × p4(= 90)

OPT [2, 3] +OPT [4, 4] + p1 × p3 × p4(= 64)

Thus, SPLITTER[2, 4] = 3.

30 / 145

Page 31: Lec6

. . . . . .

.. Step 3 of the bottom-up algorithm

..

OPT

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

.0

.6

.18

.38

.0

.24

.64

.

0

.

60

.

0

.

SPLITTER

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

..1

.2

.3

..2

.3

..

3

.

Step 3:

OPT [1, 4] = min

OPT [1, 1] +OPT [2, 4] + p0 × p1 × p4(= 74)

OPT [1, 2] +OPT [3, 4] + p0 × p2 × p4(= 81)

OPT [1, 3] +OPT [4, 4] + p0 × p3 × p4(= 38)

Thus, SPLITTER[1, 4] = 3.

31 / 145

Page 32: Lec6

. . . . . .

.

......

Question: We have calculated the optimal value, but how to getthe optimal parenthesis?

32 / 145

Page 33: Lec6

. . . . . .

..

Final step: constructing an optimal solution through“backtracking” the optimal options

Idea: backtracking! Starting from OPT [1, n], we trace backthe source of OPT [1, n], i.e. which option we take at eachdecision stage.

Specifically, an auxiliary array S[1..n, 1..n] is used.

Each entry S[i, j] records the optimal decision, i.e. the valueof k such that the optimal parentheses of Ai...Aj occursbetween AkAk+1.Thus, the optimal solution to the original problem A1..n isA1..S[1,n]AS[1,n]+1..n.

Note: The optimal option cannot be determined beforesolving all subproblems.

33 / 145

Page 34: Lec6

. . . . . .

.. Backtracking: Step 1

..

OPT

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

.0

.6

.18

.38

.0

.24

.64

.

0

.

60

.

0

.

SPLITTER

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

..1

.2

.3

..2

.3

..

3

.

Step 1: ( A1A2A3 ) ( A4 )

34 / 145

Page 35: Lec6

. . . . . .

.. Backtracking: Step 2

..

OPT

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

.0

.6

.18

.38

.0

.24

.64

.

0

.

60

.

0

.

SPLITTER

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

..1

.2

.3

..2

.3

..

3

.

Step 1: ( A1A2A3 ) ( A4 )Step 2: ( ( A1A2 ) ( A3 ) ) ( A4 )

35 / 145

Page 36: Lec6

. . . . . .

.. Backtracking: Step 3

..

OPT

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

.0

.6

.18

.38

.0

.24

.64

.

0

.

60

.

0

.

SPLITTER

.

1

.

2

.

3

.

4

.1

.2

.

3

.

4

..1

.2

.3

..2

.3

..

3

.

Step 1: ( A1A2A3 ) ( A4 )Step 2: ( ( A1A2 ) ( A3 ) ) ( A4 )Step 3: ( ( ( A1 ) ( A2 ) ( A3 ) ) ( A4 )

36 / 145

Page 37: Lec6

. . . . . .

.. Summary: elements of dynamics programming

...1 It is usually not easy to solve a large problem directly. Let’s considerwhether the problem can be decomposed into smaller sub-problems.How to define sub-problems? 3

Imagine the solving process as a process of multiple-stage decisions.Suppose that we have already worked out the optimal solution.Consider the first/final decision (in some order) in the optimalsolution. The first/final decision might have several options.Enumerating all possible options for the decision, and observing thegenerated sub-problems.

The general form of sub-problems can be defined via summarising

all possible forms of sub-problems.

...2 Show the optimal substructure property, i.e. the optimal solution to theproblem contains within it optimal solutions to subproblems.

...3 Programming: if recursive algorithm solves the same subproblem over andover, “tabular” can be used to avoid the repetition of solving samesub-problems.

3Sometimes problem should be extended to identify meaningful recursion.37 / 145

Page 38: Lec6

. . . . . .

.

......Question: is O(n3) the lower bound?

38 / 145

Page 39: Lec6

. . . . . .

.. An O(n log n) algorithm by Hu and Shing 1981

..

1

.

2

.

3

.

4

.

5

.

A1

.

A2

.

A3

.

A4

.

One-one correspondence between parenthesis and partioning aconvex polygon into non-intersecting triangles.

Each node has a weight wi, and a triangle corresponds to aproduct of the weight of its nodes.The decomposition (red, dashed lines) has a weight sum of 38.In fact, it corresponds to the parenthesis ( ( ( A1 ) ( A2 ) ( A3

) ) ( A4 ).

The optimal decomposition can be found in O(n log n) time.

(See Hu and Shing 1981 for details)39 / 145

Page 40: Lec6

. . . . . .

.

......0/1 Knapsack problem

40 / 145

Page 41: Lec6

. . . . . .

.. A Knapsack instance

Given a set of items, each item has a weight and a value, to selecta subset of items such that the total weight is less than a givenlimit and the total value is as large as possible.

What’s the best solution?

41 / 145

Page 42: Lec6

. . . . . .

.. 0/1 Knapsack problem

.Formalized Definition:..

......

Input:A set of items. Item i has weight wi and value vi, and a totalweight limit W ;

Output:A sub-set of items to maximize the total value with a totalweight below W .

Note:

...1 Here, “0/1” means that we should select an item (1) orabandon it (0), and we cannot select parts of an item.

...2 In contrast, Fractional Knapsack problem allow one toselect a fractional, say 0.5, of an item.

42 / 145

Page 43: Lec6

. . . . . .

.. A Knapsack instance

Intuitive method: selecting “expensive” items first.

NP-Completeness: SubsetSum ≤P Knapsack. (Hint: simplysetting vi = wi).

43 / 145

Page 44: Lec6

. . . . . .

.. Key Observation

It is not easy to solve the problem with n items. Let’s whetherit is possible to reduce into smaller sub-problems.

Solution: a subset of items. Imagine the solving process as asa process of multiple-stage decisions. At the i-th decisionstage, we decide whether item i should be selected.

Suppose we have already worked out the optimal solution.

Consider the first decision, i.e. whether the optimal solutioncontains item n or not. The decision has two options:

...1 Select: then it suffices to select items as “expensive” aspossible from {1, 2, ..., n− 1} with weight limit W − wn.

...2 Abandon: Otherwise, we should select items as “expensive”as possible from {1, 2, ..., n− 1} with weight limit W .

44 / 145

Page 45: Lec6

. . . . . .

.. Key Observation cont’d

Summarizing these two cases, the general form ofsub-problems is: to select items as “expensive” as possiblefrom {1, 2, ..., i} with weight limit w. Denote the optimalsolution value as OPT (i, w).Optimal sub-structure property:OPT (n,W ) = max{OPT (n− 1,W ), OPT (n− 1,W − wn) + vn}(Enumerating two possible decisions for item n.)

45 / 145

Page 46: Lec6

. . . . . .

.. Algorithm

Knapsack(n,W )

1: for w = 1 to W do2: OPT [0, w] = 0;3: end for4: for i = 1 to n do5: for w = 1 to W do6: OPT [i, w] = max{OPT [i−1, w], vi+OPT [i−1, w−wi]};7: end for8: end for

46 / 145

Page 47: Lec6

. . . . . .

.. Example I

47 / 145

Page 48: Lec6

. . . . . .

.. Example II

48 / 145

Page 49: Lec6

. . . . . .

.. Example III

49 / 145

Page 50: Lec6

. . . . . .

.. Example IV

50 / 145

Page 51: Lec6

. . . . . .

.. Backtracking: Step 1

51 / 145

Page 52: Lec6

. . . . . .

.. Backtracking: Step 2

52 / 145

Page 53: Lec6

. . . . . .

.. Time complexity analysis

Time complexity: O(nW ). (Hint: for each entry in thematrix, only a comparison is needed; we have O(nW ) entriesin the matrix.)

Notes:...1 This algorithm is inefficient when W is large, say W = 1M ....2 Remember that a polynomial time algorithm costs timepolynomial in the input length. However, this algorithm coststime mW = m2logW = m2 input length. Exponential!

...3 Pseudo-polynomial time algorithm: polynominal in the value ofW rather than the length of W (logW ).

...4 We will revisit this algorithm in approximation algorithmdesign.

53 / 145

Page 54: Lec6

. . . . . .

.. Note: the general form of subproblems

Here the items were considered in an order. Why?

Let’s consider two general forms of sub-problems:...1 Arbitrary subset s ⊂ {1, 2, ..., n}: The number of sub-problemsis exponential.

...2 The first i items {1, 2, ..., i}: This is much simpler since it iseasy to describe sub-problems.

54 / 145

Page 55: Lec6

. . . . . .

.. Extension: The first public-key encryption system

Cryptosystems based on the knapsack problem were amongthe first public key systems to be invented, and for a whilewere considered to be among the most promising. However,essentially all of the knapsack cryptosystems that have beenproposed so far have been broken. These notes outline thebasic constructions of these cryptosystems and attacks thathave been developed on them.

See The Rise and Fall of Knapsack Cryptosystems for details.

55 / 145

Page 56: Lec6

. . . . . .

.

......RNA Secondary Structure Prediction

56 / 145

Page 57: Lec6

. . . . . .

.. RNA Secondary Structure

RNA is a sequence of nucleic acids. It will automatically formstructures in water through the formation of bonds A−U andC −G.

The native structure is the conformation with the lowestenergy. Here, we simply use the number of base pairs as theenergy function.

57 / 145

Page 58: Lec6

. . . . . .

.. Formulation

.

......

INPUT:A sequence in alphabet Σ = {A,U,C,G};OUTPUT:A pairing scheme with the maximum pairing number

Requirements of base pairs:...1 Watson-Crick pair: A pairs with U , and C pairs with G;...2 There is no base occurring in more than 1 base pairs;...3 No cross-over (nesting): there is no crossover under theassumption of free pseudo-knots.

...4 And two bases i, j (|i− j| ≤ 4) cannot form a base pair.

58 / 145

Page 59: Lec6

. . . . . .

.. An example

59 / 145

Page 60: Lec6

. . . . . .

.. Nesting and Pseudo-knot

Left: nesting of base pairs (no cross-over); Right: pseudo-knots(cross-over);

60 / 145

Page 61: Lec6

. . . . . .

.. Feymann graph

Feymann graph: an intuitive representation form of RNAsecondary structure, i.e. two bases are connected by an edge ifthey form a Watson-Crick pair.

61 / 145

Page 62: Lec6

. . . . . .

.. Key observation I

Solution: a set of nested base pairs. Imagine the solvingprocess as as a process of multiple-stage decisions. At thei-th decision stage, we determine whether base i forms pair ornot.

Suppose we have already worked out the optimal solution.

Consider the first decision made for base n. There are twooptions:

...1 Base n pairs with a base i: we should calculate optimal pairsfor regions i+ 1...n− 1 and 1..i− 1.Note: these two sub-problems are independent due to the“nested” property.

...2 Base n doesn’t form a pair: we should calculate optimal pairsfor regions 1...n− 1.

Thus we can design the general form of sub-problems as: tocalculate the optimal pairs for region i...j. (Denote theoptimal solution value as: OPT (i, j).)

62 / 145

Page 63: Lec6

. . . . . .

.. Key observation II

Optimal substructures property:OPT (i, j) = max{OPT (i, j − 1),maxt{1 +OPT (i, t− 1) +OPT (t+ 1, j − 1)}}, where the second max takes over allpossible t such that t and j form a base pair. (Enumeratingall possible options for base n.)

63 / 145

Page 64: Lec6

. . . . . .

.. Algorithm

RNA2D(n)

1: Initialize all OPT [i, j] with 0;2: for i = 1 to n do3: for j = i+ 5 to n do4: OPT [i, j] = max{OPT [i, j − 1],maxt{1 +OPT [i, t−

1] +OPT [t+ 1, j − 1]}};5: /* t and j can form Watson-Crick base pair. */6: end for7: end for

64 / 145

Page 65: Lec6

. . . . . .

65 / 145

Page 66: Lec6

. . . . . .

66 / 145

Page 67: Lec6

. . . . . .

67 / 145

Page 68: Lec6

. . . . . .

Time complexity: O(n3).

68 / 145

Page 69: Lec6

. . . . . .

.. Extension: RNA is a good example of SCFG.

(see extra slides)

69 / 145

Page 70: Lec6

. . . . . .

.

......Sequence Alignment problem

70 / 145

Page 71: Lec6

. . . . . .

.. Practical problem: genome similarity

To identify homology genes of two species, say Human andMouse. E.g., Human and Mouse NHPPEs (in KRAS genes) show ahigh sequence homology (Ref: Cogoi, S., et al. NAR, 2006).

GGGCGGTGTGGGAA-GAGGGAAG-AGGGGGAG

||| || | ||||| |||||| | |||| |

GGGAGG-GAGGGAAGGAGGGAGGGAGGGAG--

Having calculating the similarity of genomes of various species, areasonable phylogeny tree can be estimated (Seehttps://www.llnl.gov/str/June05/Ovcharenko.html)

71 / 145

Page 72: Lec6

. . . . . .

.. Practical problem: spell tool to correct typos

When you type in ‘‘OCURRANCE’’, spell tools might guesswhat you really want to type through the following alignment,i.e. ‘‘OCURRANCE’’ is very similar to ‘‘OCCURRENCE’’except for INS/DEL/MUTATION operations.

O-CURRANCE

OCCURRENCE

But the following instance is a bit difficult:

abbbaa-bbbbaab

ababaaabbbba-b

72 / 145

Page 73: Lec6

. . . . . .

.. Sequence Alignment: formulation

.

......

INPUT:Two sequence S and T , |S| = m, and |T | = n;OUTPUT:To identify an alignment of S and T that maximizes a scoringfunction.

Note: for the sake of description, the following indexing schema isused: S = S1S2...Sm.

73 / 145

Page 74: Lec6

. . . . . .

.. What is an alignment?

An example of alignment:

O-CURRANCE

| |||| |||

OCCURRENCE

Basic idea:...1 Alignment is usually used to describe the generating process ofan erroneous word from the correct word.

...2 Make the two sequence to have the same length throughadding space ‘-’, i.e. changing S to S′ through adding spacesat some positions, and changing T to T ′ through addingspaces at some positions, too. The only requirement is:|S′| = |T ′|. There are three cases:

...1 T ′[i] =′ −′: S′[i] is simply an INSERTION.

...2 S′[i] =′ −′: S′[i] is simply a DELETION of T ′[i].

...3 Otherwise, S′[i] is a copy of T ′[i] (with possible MUTATION)....3 Thus, an alignment clearly illustrates how to change T into Swith a series of INS/DEL/MUTATION operations.

74 / 145

Page 75: Lec6

. . . . . .

..

How to measure an alignment in the sense of sequencesimilarity?

The similarity is defined as the sum of score of aligned letter pairs,i.e.

d(S, T ) =

|S′|∑i=1

δ(S′[i], T ′[i])

The simplest δ(a, b) is:...1 Match: +1 , e.g. δ(‘C ′, ‘C ′) = 1....2 Mismatch: -1, e.g. δ(‘E′, ‘A′) = −1....3 Ins/Del: -3, e.g. δ(‘C ′, ‘−′) = −3.

4

4Ideally, the score function is designed such that d(S, T ) is proportional tolog Pr[S is generated from T ]. See extra slides for the statistical model forsequence alignment, and better similarity definition, say BLOSUM62, PAM250substitution matrix, etc.

75 / 145

Page 76: Lec6

. . . . . .

.. Alignment is useful

Observation 1: Using alignment, we can determine the mostlikely source of “OCURRANCE”.

...1 T =“OCCUPATION”:

d(S′, T ′) = 1+1−3+1−3−3−1+1−3−3−3+1−3−3 = −28....2 T = “OCCURRENCE”:

d(S′, T ′) = 1− 3 + 1 + 1 + 1 + 1− 1 + 1 + 1 + 1 = 4.

Conjecture: it is more likely that “ocurrance” comes from“occurrence” relative to “occupation”.

76 / 145

Page 77: Lec6

. . . . . .

.. Alignment is useful cont’d

Observation 2: In addition, we can also determine the mostlikely operations changing “occurrence” into “ocurrance”.

...1 Alignment 1:

d(S′, T ′) = 1− 3 + 1 + 1 + 1 + 1− 1 + 1 + 1 + 1 = 4....2 Alignment 2:

d(S′, T ′) = 1− 3 + 1 + 1 + 1− 3− 3 + 1 + 1 + 1 = −1.

Conjecture: the first alignment might describes the realgenerating process of “ocurrance” from ”occurrence”.

77 / 145

Page 78: Lec6

. . . . . .

.. Key observation I

It is not easy to consider long sequences directly. Let’sconsider whether it is possible to reduce into smallersubproblem.

Solution: alignment. Imagine the solving process as as aprocess of multiple-stage decisions. At each decision stage,we decide how to generate S[i] from T [j].

..

Pair

.S: .O. C. U. R. R. A. N. C. E.

T:

.

O

.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.

Insertion

. S:. O. C. U. R. R. A. N. C. E.

T:

.

O

.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.

-

.

Deletion

. S:. O. C. U. R. R. A. N. C. E. -.

T:

.

O

.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

78 / 145

Page 79: Lec6

. . . . . .

.. Key observation II

Suppose we have already worked out the optimal solution.Consider the first decision made for S[m]. There are threecases:

...1 S[m] pairs with T [n], i.e. S[m] comes from T [n]. Then itsuffices to align S[1..m− 1] and T [1..n− 1];

...2 S[m] pairs with a space ‘-’, i.e. S[m] is an INSERTION. Thenwe need to align S[1..m− 1] and T [1..n];

...3 T [n] pairs with a space ‘-’, i.e. S[m] is a DELETION of a letterin T . Then we need to align S[1..m] and T [1..n− 1].

..

Pair

.S: .O. C. U. R. R. A. N. C. E.

T:

.

O

.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.

Insertion

. S:. O. C. U. R. R. A. N. C. E.

T:

.

O

.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.

-

.

Deletion

. S:. O. C. U. R. R. A. N. C. E. -.

T:

.

O

.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

79 / 145

Page 80: Lec6

. . . . . .

.. Key observation III

Thus, we can design the general form of sub-problems as:alignment a prefix of S (denoted as S[1..i]) and prefix of T(denoted as T [1..j]). Denote the optimal solution value asOPT (i, j).

Optimal substructure property:

OPT (i, j) = max

δ(Si, Tj) +OPT (i− 1, j − 1)δ(‘ ′, Tj) +OPT (i, j − 1)δ(Si, ‘

′) +OPT (i− 1, j)

80 / 145

Page 81: Lec6

. . . . . .

.. Needleman-Wunch algorithm 1970

Needleman Wunch(S, T )

1: for i = 0 to m; do2: OPT [i, 0] = −3 ∗ i;3: end for4: for j = 0 to n; do5: OPT [0, j] = −3 ∗ j;6: end for7: for i = 1 to m do8: for j = 1 to n do9: OPT [i, j] = max{OPT [i− 1, j − 1] + δ(Si, Tj), OPT [i−

1, j]− 3, OPT [i, j − 1]− 3};10: end for11: end for12: return OPT [m,n] ;

Note: the first row is introduced to describe the alignment ofprefixes T [1..i] with an empty sequence ϵ, so does the first column.

81 / 145

Page 82: Lec6

. . . . . .

.. The first row/column of the alignment score matrix

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T:’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

..........

−6

..........

−9

..........

−12

..........

−15

..........

−18

..........

−21

..........

−24

..........

−27

..........

−30

.........

Score: d("OCU", "") = -9 Score: d("", "OC") = -6

Alignment: S’= OCU Alignment: S’= --

T’= --- T’= OC

82 / 145

Page 83: Lec6

. . . . . .

.. Why should we introduce the first row/column?

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T:’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

Score: d("OC", "O") = max

d(“OC“,““) −3 (=-9)d(“O“,““) −1 (=-4)d(“O“,“O“) −3 (=-2)

Alignment: S’= OC

T’= O- 83 / 145

Page 84: Lec6

. . . . . .

.. General cases

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T:’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

Score: d("OCUR", "OC") = max

d(“OCUR“,“O“) −3 (=-11)d(“OCU“,“O“) −1 (=-6)d(“OCU“,“OC“) −3 (=-4)

Alignment: S’= OCUR

T’= OC-- 84 / 145

Page 85: Lec6

. . . . . .

.. The final entry

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T:’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

Score: d("OCURRANCE", "OCCURRENCE") = max

d(“OCURRANCE“,“OCCURRENC“) −3 (=-3)d(“OCURRANC“,“OCCURRENC“) +1 (=4)d(“OCURRANC“,“OCCURRENCE“) −3 (=-3)

Alignment: S’= O-CURRANCE

T’= OCCURRENCE 85 / 145

Page 86: Lec6

. . . . . .

.

......Question: how to find the alignment with the highest score?

86 / 145

Page 87: Lec6

. . . . . .

.. Find the optimal alignment via backtracking

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T:’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

Optimal Alignment: S’= O-CURRANCE

T’= OCCURRENCE

87 / 145

Page 88: Lec6

. . . . . .

.. Optimal alignment versus Sub-optimal alignments

It should be noted that in practice, sub-optimal alignments(as an ensemble) are more robust than the optimalalignment due to inaccuracy in the scoring model.

Please refer to Biological Sequence Analysis: ProbabilisticModels of Proteins and Nucleic Acids for details.

88 / 145

Page 89: Lec6

. . . . . .

.

......

Space efficient algorithm: reducing the space requirement fromO(mn) to O(m+ n) ( D. S. Hirschberg 1975)

89 / 145

Page 90: Lec6

. . . . . .

.. Technique 1: two arrays are enough if only score is needed

Key observation 1: it is easy to calculate the final scoreOPT (S, T ) only!, i.e. the alignment information are notrecorded.

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T:’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

90 / 145

Page 91: Lec6

. . . . . .

.. Technique 1: two arrays are enough if only score is needed

Why? Only column j − 1 is needed to calculate column i. Thus, we

use two arrays score[1..m] and newscore[1..m] instead of the

matrix OPT [1..m, 1..n].

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T: ’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

91 / 145

Page 92: Lec6

. . . . . .

.. Technique 1: two arrays are enough if only score is needed

Why? Only column j − 1 is needed to calculate column i. Thus, we

use two arrays score[1..m] and newscore[1..m] instead of the

matrix OPT [1..m, 1..n].

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T: ’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

92 / 145

Page 93: Lec6

. . . . . .

.. Technique 1: two arrays are enough if only score is needed

Why? Only column j − 1 is needed to calculate column i. Thus, we

use two arrays score[1..m] and newscore[1..m] instead of the

matrix OPT [1..m, 1..n].

..

S:

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.T: ’’.O.

C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.0

.−3

.−6

.−9

.−12

.−15

.−18

.−21

.−24

.−27

.−3

.1

.−2

.−5

.−8

.−11

.−14

.−17

.−20

.−23

.

−6

.

−2

.

2

.

−1

.

−4

.

−7

.

−10

.

−13

.

−16

.

−19

.

−9

.

−5

.

−1

.

1

.

−2

.

−5

.

−8

.

−11

.

−12

.

−15

.

−12

.

−8

.

−4

.

0

.

0

.

−3

.

−6

.

−9

.

−12

.

13

.

−15

.

−11

.

−7

.

−3

.

1

.

1

.

−2

.

−5

.

−8

.

−11

.

−18

.

−14

.

−10

.

−6

.

−2

.

2

.

.

−3

.

−6

.

−9

.

−21

.

−17

.

−13

.

−9

.

−5

.

−1

.

1

.

−1

.

−4

.

−5

.

−24

.

−20

.

−16

.

−12

.

−8

.

−4

.

−2

.

2

.

−1

.

−4

.

−27

.

−23

.

−19

.

−15

.

−11

.

−7

.

−5

.

−1

.

3

.

0

.

−30

.

−26

.

−22

.

−18

.

−14

.

−10

.

−8

.

−4

.

0

.

4

93 / 145

Page 94: Lec6

. . . . . .

.. Algorithm

Prefix Space Efficient Alignment( S, T, score )

1: for i = 0 to m do2: score[i] = −3 ∗ i;3: end for4: for i = 1 to m do5: for j = 1 to n do6: newscore[j] = max{score[j − 1] + δ(Si, Tj), score[j]−

3, newscore[j − 1]− 3};7: end for8: newscore[0] = 0;9: for j = 1 to n do

10: score[j] = newscore[j];11: end for12: end for13: return score[n] ;

94 / 145

Page 95: Lec6

. . . . . .

.. Technique 2: aligning suffixes instead of prefixes

Key observation: Similarly, we can align suffixes of S and Tinstead of prefixes and obtain the same score and alignment.

..

S

.

’’

.

O

.

C

.

U

.

R

.

R

.

A

.

N

.

C

.

E

.

’’

. O.C

.

C

.

U

.

R

.

R

.

E

.

N

.

C

.

E

.

T

.4

.0

.−4

.−10

.−12

.−16

.−18

.−22

.−26

.−30

.5

.3

.−1

.−7

.−9

.−13

.−15

.−19

.−23

.−27

.

3

.

6

.

2

.

−4

.

−6

.

−10

.

−12

.

−16

.

−20

.

−24

.

−1

.

2

.

5

.

−1

.

−3

.

−7

.

−9

.

−13

.

−17

.

−21

.

−5

.

−2

.

1

.

4

.

0

.

−4

.

−6

.

−10

.

−14

.

−18

.

−9

.

−6

.

−3

.

0

.

3

.

−1

.

−3

.

−7

.

−11

.

−15

.

−13

.

−10

.

−7

.

−4

.

−1

.

2

.

0

.

−4

.

−8

.

−12

.

−15

.

−12

.

−9

.

−6

.

−3

.

0

.

3

.

−1

.

−5

.

−9

.

−19

.

−16

.

−13

.

−10

.

−7

.

−4

.

−1

.

2

.

−2

.

−6

.

−23

.

−20

.

−17

.

−14

.

−11

.

−8

.

−5

.

−2

.

1

.

−3

.

−27

.

−24

.

−21

.

−18

.

−15

.

−12

.

−9

.

−6

.

−3

.

0

95 / 145

Page 96: Lec6

. . . . . .

.. Final difficulty: identify optimal alignment besides score

...1 However, only the recent two columns of the matrix werekept, the optimal alignment cannot be restored viabacktracking.

...2 A clever idea: Suppose we have already obtained the optimalalignment. Consider the position where S[m

2] is aligned to

(denoted as q). We have

OPT (S, T ) = OPT (S[1..m

2], T [1..q])+OPT (S[

m

2+1..m], T [q+1..n])

...3 Notes:Things will be easy as soon as q was determined. The equalityholds due to the definition of d(S, T ).m2 is chosen for the sake of time-complexity analysis.

..

m2

.S: .O. C. U. R.. R. A. N. C. E.

T:

.

O

.

C

.

C

.

U

.

R

..

R

.

E

.

N

.

C

.

E

.

1 ≤ q ≤ n96 / 145

Page 97: Lec6

. . . . . .

.. Hirchberg’s algorithm for alignment

Linear Space Alignment( S, T )

1: Allocate two arrays f and b; each array has a size of m .2: Prefix Space Efficient Alignment(S, T [1..n2 ], f);3: Suffix Space Efficient Alignment(S, T [n2 + 1, n], b);4: Let q∗ = argmaxq(f [q] + b[q]);5: Free arrays f and b;6: Record < q∗, n2 > in array A;7: Linear Space Alignment(S[1..q∗], T [1..n2 ]);8: Linear Space Alignment(S[q∗ + 1..n], T [n2 + 1, n]);9: return A;

Key observation: at each iteration step, only 2m space isneeded.

How to determine q? Identifying the largest entry inf [q] + b[q].

97 / 145

Page 98: Lec6

. . . . . .

.. Step 1: Determine the optimal aligned position of T[n2 ]

The value of the largest item: Recall that 4 is actually the optimal score

of S and T . 98 / 145

Page 99: Lec6

. . . . . .

.. Step 2: Recursively solve sub-problems

The position of the largest item: Generate two independentsub-problems. 99 / 145

Page 100: Lec6

. . . . . .

.. Space complexity analysis

Space Efficient Alignment(S, T [1..n2 ], f) needs onlyO(m) space;

Line 4 (Record < q∗, n2 > in array A) needs only O(n) space;

Thus, the total space requirement is O(m+ n).

100 / 145

Page 101: Lec6

. . . . . .

.. Time complexity analysis

.Theorem........Algorithm Linear Space Alignment( S, T ) still takes O(mn) time.

.Proof...

......

The algorithm implies the following recursion:T (m,n) = cm+ T (q, n

2) + T (m− q, n

2);

Difficulty: we have no idea of q before algorithm ends; thus, the mastertheorem cannot apply directly. Guess and substitution!!!

Guess: T (m′, n′) ≤ km′n′ follows for any m′ < m and n′ < n.

Substitution:

T (m,n) = cm+ T (q,n

2) + T (m− q,

n

2) (7)

≤ cm+ kqn

2+ k(m− q)

n

2(8)

= cm+ kqn

2+ km

n

2− kq

n

2(9)

≤ (c+k

2)mn (10)

= kmn (set k = 2c) (11)101 / 145

Page 102: Lec6

. . . . . .

.

......Extended Reading 1: From global alignment to local alignment

102 / 145

Page 103: Lec6

. . . . . .

..

From Global alignment to Local alignment:Smith-Waterman algorithm

Global alignment: to identify similarity between two wholesequences;

Local alignment: It is often that we wish to find similarSEGMENTS (sub-sequences).

103 / 145

Page 104: Lec6

. . . . . .

.. Smith-Waterman algorithm [1981]

Needleman-Wunch global alignment algorithm wasdeveloped by biologists in 1970s, about twenty years laterthan Bellman-Ford algorithm was developed.Then Smith-Waterman local alignment algorithm wasproposed.

Please refer to Smith and Waterman1981 for details.104 / 145

Page 105: Lec6

. . . . . .

.

......Extended Reading 2: How to derive a reasonable scoring schema?

105 / 145

Page 106: Lec6

. . . . . .

..

PAM250: one of the most popular substitution matrices inBioinformatics

Please refer to “PAM matrix for Blast algorithm” (by C.Alexander, 2002) for the details to calculate PAM matrix.

106 / 145

Page 107: Lec6

. . . . . .

.

......

Extended Reading 3: How to measure the significance of analignment?

107 / 145

Page 108: Lec6

. . . . . .

.. Measure the significance of a segment pair

When two random sequences of length m and n are compared,the probability of finding a pair of segments with a scoregreater than or equal to S is 1− e−y, where y = Kmne−λS .

Please refer to Altschul1990 for details.

108 / 145

Page 109: Lec6

. . . . . .

.

......

Extended Reading 4: An FPGA implementation ofSmith-Waterman algorithm

109 / 145

Page 110: Lec6

. . . . . .

.. The potential parallelity of SmithWaterman algorithm

For example, in the first cycle, only one element marked as (1)could be calculated. In the second cycle, two elements marked as(2) could be calculated. In the third cycle, three elements markedas (3) could be calculated, etc., and this feature implies that thealgorithm has a very good potential parallelity.

110 / 145

Page 111: Lec6

. . . . . .

.. Mapping Smithg-Waterman algorithm on PE

See Implementation of the Smith-Waterman Algorithm on aReconfigurable Supercomputing Platform for details.

111 / 145

Page 112: Lec6

. . . . . .

.. PE design of a card for Dawning 4000L

112 / 145

Page 113: Lec6

. . . . . .

.. Smith-Waterman card for Dawning 4000L

113 / 145

Page 114: Lec6

. . . . . .

.. Performance of Dawning 4000L

5

5Some pictures were excerpted from Introduction to algorithms114 / 145

Page 115: Lec6

. . . . . .

.

......SingleSourceShortestPath problem

115 / 145

Page 116: Lec6

. . . . . .

.. SingleSourceShortestPath problem

.

......

INPUT:A graph G =< V,E >. Each edge e =< i, j > has a weight ordistance d(i, j). Two special nodes: source s, and destination t;OUTPUT:A shortest path from s to t; that is, the sum weight of the edges isminimized.

..s.

u

.

v

.

t

.

w

.5

.

5

.

6

.4

.

3

.

6

. 2. 1. 2. 7

116 / 145

Page 117: Lec6

. . . . . .

.. ShortestPath problem: cycles

Here d(i, j) might be negative; however, there should be nonegative cycle, i.e. the sum weight of edges in any cycleshould be greater than 0.

..s.

a

. c.

e

.

b

. d.

f

. t.3

.5

.

2

.

−4

.4

.8

.

7

. 6.

3

.-3

.

-6

.0.

3

.5

.

−∞

.

−∞

.−∞

.

-1

.11

Shown above each vertex is its shortest-path weight from source s.Since e and f form a negative-weight cycle reachable from s, theyhave shortest-path weight of −∞.

117 / 145

Page 118: Lec6

. . . . . .

.

......

Trial 1: describing the sub-problem as finding the shortest path ina graph

118 / 145

Page 119: Lec6

. . . . . .

.. Trial 1: a failure start

Solution: a path. Imagine the solving process as series ofdecisions. At each decision stage, we need to determine anedge to the subsequent node.

Suppose we have already worked out the optimal solution O.

Consider the first decision in O. The options are:

All edges starting from s: Suppose we use an edge < s, v >.Then it suffices to calculate the shortest path in graphG′ =< V ′, E′ >, where node s and related edges are removed.

General form of sub-problem: to find the shortest path fromnode v to t in graph G.

119 / 145

Page 120: Lec6

. . . . . .

.. Trial 1: a failure start cont’d

General form of sub-problem: to find the shortest path fromnode v to t in graph G. Denote the optimal solution value asOPT (G, v).

Optimal substructure:OPT (G, s) = minv:<s,v>∈E{OPT (G′, v) + d(s, v)}

..s.

u

.

v

.

t

.

w

.5

.

5

.

6

.4

.3

.

6

. 2. 1. 2. 7.

u

.

v

.

t

.

w

.

6

.4

.

6

. 2. 1. 2. 7

Infeasible! The number of sub-problems is exponential.

120 / 145

Page 121: Lec6

. . . . . .

.

......

Trial 2: simplifying the sub-problem form by introducing a newvariable

121 / 145

Page 122: Lec6

. . . . . .

..

Trial 2: simplifying sub-problem form via introducing anew parameter

Solution: the shortest path from node s to t is a path with at mostn nodes (Why? no negative cycle ⇒ removing cycles in a path canshorten the path). Imagine the solving process as a process ofmultiple-stage decisions; at each decision stage, we decide thesubsequent node from current node.

Suppose we have already worked out the optimal solution O.

Consider the first decision in O. The feasible options are:

All adjacent nodes of s: Suppose we choose an edge < s, v >to node v. Then the left-over is to find the shortest path fromv to t via at most n− 2 edges.

Thus the general form of subproblem can be designed as: to findthe shortest path from node v to t with at most k edges(k ≤ n− 1). Denote the optimal solution value as OPT (v, t, k).

Optimal substructure:

OPT [v, t, k] = min

{OPT [v, t, k − 1],

min<v,w>∈E{OPT [w, t, k − 1] + d(v, w)}

Note: the first item OPT (v, t, k − 1) is introduced to express “atmost”.

122 / 145

Page 123: Lec6

. . . . . .

.. Bellman-Ford algorithm [1956]

Bellman Ford(G, s, t)

1: for any node v ∈ V do2: OPT [v, t, 0] = ∞;3: end for4: for k = 0 to n− 1 do5: OPT [t, t, k] = 0;6: end for7: for k = 1 to n− 1 do8: for all node v (in an arbitrary order) do9: OPT [v, t, k] =

min

{OPT [v, t, k − 1]

min<v,w>∈E{OPT [w, t, k − 1] + d(v, w)}10: end for11: end for12: return OPT [s, t, n− 1];

Note that the algorithm actually finds the shortest path from everypossible source to t (or from s to every possible destination) byconstructing a shortest path tree.

123 / 145

Page 124: Lec6

. . . . . .

.. Richard Bellman

See ”Richard Bellman on the birth of dynamic programming” (S.Dreyfus, 2002) and ”On the routing problem” (R. Bellman, 1958)for details.

124 / 145

Page 125: Lec6

. . . . . .

.. An example

..b.

a

.

c

.

e

.

d

. t.

−4

.

8

.

−2

. −1.

6

. 4.

2

.

−3

.

−3

.

3

Source node k = 0 k = 1 k = 2 k = 3 k = 4 k = 5

t 0 0 0 0 0 0

a - -3 -3 -4 -6 -6

b - - 0 -2 -2 -2

c - 3 3 3 3 3

d - 4 3 3 2 0

e - 2 0 0 0 0

125 / 145

Page 126: Lec6

. . . . . .

.. Shortest path tree

..b.

a

.

c

.

e

.

d

. t.

−4

.

8

.

−2

. −1.

6

. 4.

2

.

−3

.

−3

.

3

Source node k = 0 k = 1 k = 2 k = 3 k = 4 k = 5

t 0 0 0 0 0 0

a - -3 -3 -4 -6 -6

b - - 0 -2 -2 -2

c - 3 3 3 3 3

d - 4 3 3 2 0

e - 2 0 0 0 0

Note: the shortest paths from all nodes to t form a shortest pathtree. 126 / 145

Page 127: Lec6

. . . . . .

.. Time complexity

...1 Cursory analysis: O(n3). (There are n2 subproblems, and foreach subproblem, we need at most O(n) operations in line 7.

...2 Better analysis: O(mn). (Efficient for sparse graph, i.e.m << n2.)

For each node v, line 7 need O(dv) operations, where dvdenotes the degree of node v;Thus the inner for loop (lines 6-8) needs

∑v dv = O(m)

operations;Thus the outer for loop (lines 5-9) needs O(nm) operations.

127 / 145

Page 128: Lec6

. . . . . .

.

......Extension: detecting negative cycle

128 / 145

Page 129: Lec6

. . . . . .

.Theorem..

......

If t is reachable from node v, and v is contained in a negativecycle, then we have: limk→∞OPT (v, t, k) = −∞.

..b.

a

.

c

.

e

. t.

−4

.

4

.

−2

.

0

.

−3

.0

.

0

.

0

Intuition: a traveling of the negative cycle leads to a shorterlength. Say,length(b → t) = 0length(b → e → c → b → t) = −1length(b → e → c → b → e → c → b → t) = −2· · · · · · 129 / 145

Page 130: Lec6

. . . . . .

.Corollary..

......

If there is no negative cycle in G, then for all node v, and k ≥ n,OPT (v, t, k) = OPT (v, t, n).

Source node k=0 k=1 k=2 k=3 k=4 k=5 k=6 k=7 k=8 k=9 k=10 k=11

t 0 0 0 0 0 0 0 0 0 0 0 0

a - -3 -3 -4 -6 -6 -6 -6 -6 -6 -6 -6

b - - 0 -2 -2 -2 -2 -2 -2 -2 -2 -2

c - 3 3 3 3 3 3 3 3 3 3 3

d - 4 3 3 2 0 0 0 0 0 0 0

e - 2 0 0 0 0 0 0 0 0 0 0

..b.

a

.

c

.

e

.

d

. t.

−4

.

8

.

−2

. −1.

6

. 4.

2

.

−3

.

−3

.

3

130 / 145

Page 131: Lec6

. . . . . .

.. Detecting negative cycle via adding edges and a node t

Expanding G to G′ to guarantee that t is reachable from thenegative cycle:

...1 Adding a new node t;

...2 For each node v, adding a new edge < v, t > with d(v, t) = 0;

Property: G has a negative cycle C, say, b → e → c → b) ⇒ tis reachable from a node in C. Thus, the above theoremapplies.

..b .

a

.

c

.

e

.

−4

.

4

.

−2

.

−3

.b.

a

.

c

.

e

. t.

−4

.

4

.

−2

.

0

.

−3

.0

.

0

.

0

131 / 145

Page 132: Lec6

. . . . . .

.. An example of negative cycle

..b.

a

.

c

.

e

. t.

−4

.

4

.

−2

.

0

.

−3

.0

.

0

.

0

Source node k=0 k=1 k=2 k=3 k=4 k=5 k=6 k=7 k=8 · · ·t 0 0 0 0 0 0 0 0 0 · · ·a - 0 -4 -6 -9 -9 -11 -11 -12 · · ·b - 0 -2 -5 -5 -7 -7 -8 -8 · · ·c - 0 0 0 -2 -3 -3 -3 -4 · · ·e - 0 -3 -3 -5 -5 -6 -6 -6 · · ·

132 / 145

Page 133: Lec6

. . . . . .

.

......Application of Bellman-Ford algorithm: Internet router protocol

133 / 145

Page 134: Lec6

. . . . . .

.. Internet router protocol

Problem statement:

Each node denotes a route, and the weight denotes thetransmission delay of the link from router i to j.

The objective to design a protocol to determine the quickestroute when router s wants to send a package to t.

134 / 145

Page 135: Lec6

. . . . . .

..

Internet router protocol: Dijkstra’s algo vs. Bellman-Fordalgo

Choice: Dijkstra algorithm.

However, the algorithm needs global knowledge, i.e. theknowledge of the whole graph, which is (almost) impossible toobtain.

In contrast, the Bellman-Ford algorithm needs only localinformation, i.e. the information of its neighboorhoodrather than the whole network.

135 / 145

Page 136: Lec6

. . . . . .

.. Application: Internet router protocol

AsynchronousShortestPath(G, s, t)

1: Initially, set OPT [t, t] = 0, and OPT [v, t] = ∞;2: Label node t as “active”;3: while exists an active node do4: arbitrarily select an active node w;5: remove w’s active label;6: for all edges < v,w > (in an arbitrary order) do

7: OPT [v, t] = min

{OPT [v, t]

OPT [w, t] + d(v, w)

8: if OPT [v, t] was updated then9: label v as ”active”;

10: end if11: end for12: end while

136 / 145

Page 137: Lec6

. . . . . .

.

......AllPairsShortestPath problem

137 / 145

Page 138: Lec6

. . . . . .

.. AllPairsShortestPath problem

.

......

INPUT:A graph G =< V,E >. Each edge e =< i, j > has a weight ordistance d(i, j) (d(i, j) might be negative)OUTPUT:Shortest paths between all pairs of nodes.

..a.

b

.

c

. d.

4

. 3.

−1

.

−2

.

−2

A feasible solution is to run Bellman-Ford for all n possibledestination t, which takes O(mn2) time.Question: is there a quicker way?

138 / 145

Page 139: Lec6

. . . . . .

..

Trial 1: run Bellman-Ford algorithm for all node pairssimultaneously

Consider a pair of node i and j. We attempt to calculate theshortest path from i to j.

Recall the general form of subproblem in Bellman-Ford algorithm is:to find the shortest path from node v to j with at most k edges(k ≤ n− 1). Denote the optimal solution value as OPT (v, j, k).

Optimal substructure:

OPT [v, j, k] = min

{OPT [v, j, k − 1],

min<v,w>∈E{OPT [w, j, k − 1] + d(v, w)}

139 / 145