Top Banner
Dynamic Programming Algorithms and Sequence Alignment A T - G T A T z - A T C G - A - C ATGTTAT, ATCGTAC ATGTTAT, ATCGTAC T T 4 matches 2 insertions 2 deletions
159

Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Jun 06, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Dynamic Programming Algorithms and

Sequence Alignment

A T - G T A Tz

-

A T C G - A - CATGTTAT, ATCGTACATGTTAT, ATCGTAC T

T

4 matches 2 insertions 2 deletions

Page 2: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

1. Change Problem

2. Manhattan Tourist Problem

3. Longest Paths in Graphs

4. Sequence Alignment

5. Edit Distance

Outline

Page 3: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

The Change Problem

Page 4: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Say we want to provide change totaling 97 cents.

• We could do this in a large number of ways, but the quickest way to do it would be:

• Three quarters = 75 cents

• Two dimes = 20 cents

• Two pennies = 2 cents

• Question 1: How do we know that this is quickest?

• Question 2: Can we generalize to arbitrary denominations?

The Change Problem

Page 5: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Goal: Convert some amount of money M into given denominations, using the fewest possible number of coins.

• Input: An amount of money M, and an array of d denominations c = (c1, c2, …, cd), in decreasing order of value (c1 > c2 > … > cd).

• Output: A list of d integers i1, i2, …, id such that

c1i1 + c2i2 + … + cdid = M

and i1 + i2 + … + id is minimal.

The Change Problem: Formal Statement

Page 6: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Given the denominations 1, 3, and 5, what is the minimum number of coins needed to make change for a given value?

1 2 3 4 5 6 7 8 9 10Value

Min # of coins

The Change Problem: Another Example

Page 7: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Given the denominations 1, 3, and 5, what is the minimum number of coins needed to make change for a given value?

• Only one coin is needed to make change for the values 1, 3, and 5.

1 2 3 4 5 6 7 8 9 10

1 1 1

Value

Min # of coins

The Change Problem: Another Example

Page 8: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Given the denominations 1, 3, and 5, what is the minimum number of coins needed to make change for a given value?

• Only one coin is needed to make change for the values 1, 3, and 5.

• However, two coins are needed to make change for the values 2, 4, 6, 8, and 10.

1 2 3 4 5 6 7 8 9 10

1 2 1 2 1 2 2 2

Value

Min # of coins

The Change Problem: Another Example

Page 9: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Given the denominations 1, 3, and 5, what is the minimum number of coins needed to make change for a given value?

• Only one coin is needed to make change for the values 1, 3, and 5.

• However, two coins are needed to make change for the values 2, 4, 6, 8, and 10.

• Lastly, three coins are needed to make change for 7 and 9.

1 2 3 4 5 6 7 8 9 10

1 2 1 2 1 2 2 2

Value

Min # of coins 3 3

The Change Problem: Another Example

Page 10: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• This example expresses the following recurrence relation:

The Change Problem: Recurrence

Page 11: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• In general, given the denominations c: c1, c2, …, cd, the recurrence relation is:

The Change Problem: Recurrence

Page 12: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

The Change Problem: Pseudocode

Page 13: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

77

The RecursiveChange Tree: ExampleM = 77M = 77

c:1,3,7c:1,3,7

Page 14: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

74

77

76 70

The RecursiveChange TreeM = 77M = 77

c:1,3,7c:1,3,7

Page 15: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

74

77

76 70

75 73 69 73 71 67 69 67 63

The RecursiveChange TreeM = 77M = 77

c:1,3,7c:1,3,7

Page 16: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

74

77

76 70

75 73 69 73 71 67 69 67 63

74 72 68

72 70 66

68 66 62

72 70 66

70 68 64

66 64 60

68 66 62

66 64 60

62 60 56

The RecursiveChange TreeM = 77M = 77

c:1,3,7c:1,3,7

Page 17: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

74

77

76 70

75 73 69 73 71 67 69 67 63

74 72 68

72 70 66

68 66 62

72 70 66

70 68 64

66 64 60

68 66 62

66 64 60

62 60 56

. . . . . .70 70 70 7070

The RecursiveChange TreeM = 77M = 77

c:1,3,7c:1,3,7

Page 18: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• RecursiveChange recalculates the optimal coin combination for a given amount of money repeatedly.

• M = 77, c = (1,3,7):

• The optimal coin combination for 70 cents is computed 9 times!

RecursiveChange: Inefficiencies

Page 19: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• RecursiveChange recalculates the optimal coin combination for a given amount of money repeatedly.

• M = 77, c = (1,3,7):

• The optimal coin combination for 70 cents is computed 9 times!

• The optimal coin combination for 50 cents is computed billions of times!

RecursiveChange: Inefficiencies

Page 20: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Save results of each computation for all amounts from 0 to M.– Reference call to find an already computed value

• Running time: M*d, where M is the amount of money and d is the number of denominations.

• Dynamic Programming.

RecursiveChange: Improvement

Page 21: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

The Change Problem: Dynamic Programming

Page 22: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

0 1 2 3 4 5 6 7 8 90 1 2 1 2 3 2 1 2 3

• For example, let us takec = (1,3,7), M = 9:

DPChange: Example

Page 23: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

0 1

0 1 2

0 1 2 3

0 1 2 3 4

0 1 2 3 4 5

0 1 2 3 4 5 6

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8 9

0 1

0 1 2

0 1 2 1

0 1 2 1 2

0 1 2 1 2 3

0 1 2 1 2 3 2

0 1 2 1 2 3 2 1

0 1 2 1 2 3 2 1 2

0 1 2 1 2 3 2 1 2 3

• For example, let us takec = (1,3,7), M = 9:

00

DPChange: Example

DPChange builds up from easier problem instances to the desired one, avoiding repetition.DPChange builds up from easier problem instances to the desired one, avoiding repetition.

Page 24: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Manhattan Tourist Problem

Page 25: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Hotel

• Imagine that you are a tourist in Manhattan, whose streets are represented by the grid on the right.

Station

Manhattan Tourist Problem

Page 26: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Station*

*

*

*

*

**

* *

*

*

Hotel

*

• Imagine that you are a tourist in Manhattan, whose streets are represented by the grid on the right.

• You are leaving town, and you want to see as many attractions (represented by *) as possible.

Manhattan Tourist Problem

Page 27: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Station*

*

*

*

*

**

* *

*

*

Hotel

*

• Imagine that you are a tourist in Manhattan, whose streets are represented by the grid on the right.

• You are leaving town, and you want to see as many attractions (represented by *) as possible.

• Your time is limited: you only have time to travel east and south.

Manhattan Tourist Problem

Page 28: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Station*

*

*

*

*

**

* *

*

*

Hotel

*

• Imagine that you are a tourist in Manhattan, whose streets are represented by the grid on the right.

• You are leaving town, and you want to see as many attractions (represented by *) as possible.

• Your time is limited: you only have time to travel east and south.

• What is the best path through town?

Additional Example: Manhattan Tourist Problem

Page 29: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Station*

*

*

*

*

**

* *

*

*

Hotel

*

• Imagine that you are a tourist in Manhattan, whose streets are represented by the grid on the right.

• You are leaving town, and you want to see as many attractions (represented by *) as possible.

• Your time is limited: you only have time to travel east and south.

• What is the best path through town?

Additional Example: Manhattan Tourist Problem

Page 30: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Goal: Find the longest path in a weighted grid.

• Input: A weighted grid G with two distinct vertices, one labeled “source” and the other labeled “sink.”

• Output: A longest path in G from “source” to “sink.”

Manhattan Tourist Problem (MTP): Formulation

Page 31: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Our first try at solving the MTP will use a greedy algorithm.

• Main Idea: At each node (intersection), choose the edge (street) departing that node which has the greatest weight.

MTP Greedy Algorithm

Page 32: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP Greedy Algorithm: Example

Page 33: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

0

4

MTP Greedy Algorithm: Example

Page 34: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

0 3

4

MTP Greedy Algorithm: Example

Page 35: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

50 3

4

MTP Greedy Algorithm: Example

Page 36: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

950 3

4

MTP Greedy Algorithm: Example

Page 37: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

13

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

950 3

4

MTP Greedy Algorithm: Example

Page 38: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

13

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

95

15

0 3

4

MTP Greedy Algorithm: Example

Page 39: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

13

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4 19

95

15

0 3

4

MTP Greedy Algorithm: Example

Page 40: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

13

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4 19

95

15

0

20

3

4

MTP Greedy Algorithm: Example

Page 41: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

23

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

13

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4 19

95

15

0

20

3

4

MTP Greedy Algorithm: Example

Page 42: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0

Page 43: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

0 3

1

MTP DP Algorithm: Example

Page 44: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1

Page 45: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

Page 46: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

Page 47: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

7

9

10

9

Page 48: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

7

9

10

9

13

9

17

14

14

Page 49: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

7

9

10

9

13

9

17

14

14

15

20

22

20

Page 50: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

7

9

10

9

13

9

17

14

14

15

20

22

20

24

22

30

Page 51: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3 2 4

0 7 3

3 3 0

1 3 2

4

4

5

6

4

6

5

5

8

2

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 0 2 4 3

3

1

1

2

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

7

9

10

9

13

9

17

14

14

15

20

22

20

24

22

30

25

32 34

Page 52: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

3

7 3

2

4

4

5

6

4

6

5

8

2

5

0 1 2 3

0

1

2

3

j coordinate

i co

ord

ina

te

source

sink

4

3 2 4 0

1 2 4

1

2

2

4

4

MTP DP Algorithm: Example

0 3

1 4

5

5

7

9

10

9

13

9

17

14

14

15

20

22

20

24

22

30

25

32 34

Page 53: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

MTP: DP Implementation

w: weights of N to S edges

w: weights of W to E edges

w: weights of N to S edges

w: weights of W to E edges

Page 54: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• The score si, j for a point (i,j) is given by the recurrence:

• The running time is n x m for an n by m grid.• (n = # of rows, m = # of columns)

MTP: Running Time with Dynamic Programming

Page 55: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Longest Path in a Graph

Page 56: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We would like to compute the score for point v in an arbitrary graph.

• Let Predecessors(v) be the set of vertices with edges leading into v. Then the recurrence is given by:

The running time for a graph with E edges is O(E), since each edge is evaluated once.

Recursion for an Arbitrary Graph

Page 57: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Traversal – order of visiting vertices

• By the time the vertex x is analyzed, the values sy for all its predecessors y should already be computed.

• If the graph has a cycle, we will get stuck in the pattern of going over and over the same cycle.

• Manhattan graph restricts movement in only east or south directions to avoid this problem

Recursion for an Arbitrary Graph: Problem

Page 58: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Directed Acyclic Graph (DAG): A graph in which each edge is provided an orientation, and which has no cycles.– Edges in a DAG is represented with directed arrows.

http://commons.wikimedia.org/wiki/File:Directed_acyclic_graph.svg

Some Graph Theory Terminology

Page 59: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Topological Ordering: A labeling of the vertices of a DAG (from 1 to n, say) such that every edge of the DAG connects a vertex with a smaller label to a vertex with a larger label.

• In other words, if vertices are positioned on a line in an increasing order, then all edges go from left to right.

• Theorem: Every DAG has a topological ordering.

• What this means: Every DAG has a source node (1) and a sink node (n).

Some Graph Theory Terminology

Page 60: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Topological Ordering: Example

1 2

3 5

4

6

7

Page 61: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Goal: Find a longest path between two vertices in a weighted DAG.

• Input: A weighted DAG G with source and sink vertices.

• Output: A longest path in G from source to sink.

• Note: Now we know that we can apply a topological ordering to G, and then use dynamic programming to find the longest path in G.

Longest Path in a DAG: Formulation

Page 62: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Sequence Alignment

Page 63: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Back to Biology: Sequence Alignment

• Original problem: Fit a similarity score on two DNA sequences

• Alignment matrix

ATGTTATATGTTAT

ATCGTACATCGTAC

A T - G T A Tz

-

A T C G - A - C

T

T

4 matches 2 insertions 2 deletions

Page 64: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Given two sequences, v = v1 v2…vm and w = w1 w2…wn

a common subsequence of v and w is a sequence of positions in

v: 1 < i1 < i2 < … < it < m and a sequence of positions in

w: 1 < j1 < j2 < … < jt < n such that the it -th letter of v is equal to the jt-th letter of w.

• Example: v = ATGCCAT, w = TCGGGCTATC. Then take:

• i1 = 2, i2 = 3, i3 = 6, i4 = 7

• j1 = 1, j2 = 3, j3 = 8, j4 = 9

– This gives us that the common subsequence is TGAT.

Common Subsequence

Page 65: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Given two sequences v = v1 v2…vm and w = w1 w2…wn

the Longest Common Subsequence (LCS) of v and w is a sequence of positions in v: 1 < i1 < i2 < … < iT < m and a sequence of positions in w: 1 < j1 < j2 < … < jT < n such that the it -th letter of v is equal to jt-th letter of w and T is maximal.

• Example: v = ATGCCAT, w = TCGGGCTATC.

• TGCAT is a longer subsequence compared to TGAT

• Find the LCS of two sequences.

Longest Common Subsequence

Page 66: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

T

G

C

A

T

A

C

1

2

3

4

5

6

7

0i

A T C T G A T C0 1 2 3 4 5 6 7 8

j• Assign one sequence to the rows, and one to the columns.

Edit Graph for LCS Problem

Page 67: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

T

G

C

A

T

A

C

1

2

3

4

5

6

7

0i

A T C T G A T C0 1 2 3 4 5 6 7 8

j

• Assign one sequence to the rows, and one to the columns.

• Every diagonal edge represents a match of elements.

Edit Graph for LCS Problem

Page 68: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

T

G

C

A

T

A

C

1

2

3

4

5

6

7

0i

A T C T G A T C0 1 2 3 4 5 6 7 8

j• Assign one sequence to the rows, and one to the columns.

• Every diagonal edge represents a match of elements.

Edit Graph for LCS Problem

Page 69: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

T

G

C

A

T

A

C

1

2

3

4

5

6

7

0i

A T C T G A T C0 1 2 3 4 5 6 7 8

j• Assign one sequence to the rows, and one to the columns.

• Every diagonal edge represents a match of elements.

Edit Graph for LCS Problem

+1

+1

+1

+1

+1

+1

+1

+1

+1

+1

Page 70: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

T

G

C

A

T

A

C

1

2

3

4

5

6

7

0i

A T C T G A T C0 1 2 3 4 5 6 7 8

j• Assign one sequence to the rows, and one to the columns.

• Every diagonal edge represents a match of elements.

• In a path from source to sink, the diagonal edges represent a common subsequence. Common Subsequence: TGAT

Edit Graph for LCS Problem

Page 71: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

T

G

C

A

T

A

C

1

2

3

4

5

6

7

0i

A T C T G A T C0 1 2 3 4 5 6 7 8

j• LCS Problem: Find a path with the maximum number of diagonal edges.

Common Subsequence: TGAT

Edit Graph for LCS Problem

Page 72: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Let vi = prefix of v of length i: v1 … vi

• and wj = prefix of w of length j: w1 … wj

• The length of LCS(vi,wj) is computed by:

Computing the LCS: Dynamic Programming

Page 73: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Edit Distance

Page 74: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• The Hamming Distance dH(v, w) between two DNA sequences v and w of the same length is equal to the number of places in which the two sequences differ.

• Example: Given as follows, dH(v, w) = 8:

• These sequences are very similar!

• Hamming Distance is therefore not an ideal similarity score, because it ignores insertions and deletions.

Hamming Distance

v: ATATATATw: TATATATA

Page 75: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Minimum number of elementary operations (insertions, deletions, and substitutions) needed to transform one string into the other

d(v,w) = MIN number of elementary operations

to transform v w

Edit Distance

Page 76: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Shift w one nucleotide to the right, and see that w is obtained from v by one insertion and one deletion:

• Hence the edit distance, d(v, w) = 2.

• Note: In order to provide this distance, we had to “fiddle” with the sequences. Hamming distance was easier to find.

Edit Distance: Example 1

v: ATATATAT-w: -TATATATA

Page 77: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Transform TGCATAT ATCCGAT.

Edit Distance: Example 2

Page 78: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT

Edit Distance: Example 2

Page 79: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT (delete last T)

Edit Distance: Example 2

Page 80: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT (delete last T)TGCATA (delete last A)

Edit Distance: Example 2

Page 81: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT (delete last T)TGCATA (delete last A)ATGCAT (insert A at front)

Edit Distance: Example 2

Page 82: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT (delete last T)TGCATA (delete last A)ATGCAT (insert A at front)ATCCAT (substitute C for G)

Edit Distance: Example 2

Page 83: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT (delete last T)TGCATA (delete last A)ATGCAT (insert A at front)ATCCAT (substitute C for G)ATCCGAT (insert G before last A)

Edit Distance: Example 2

Page 84: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• We can transform TGCATAT ATCCGAT in 5 steps:

TGCATAT (delete last T)TGCATA (delete last A)ATGCAT (insert A at front)ATCCAT (substitute C for G)ATCCGAT (insert G before last A)

• Note: This only allows us to conclude that the edit distance is at most 5.

Edit Distance: Example 2

Page 85: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Theorem: Given two sequences v and w of length m and n, the edit distance d(v,w) is given by d(v,w) = m + n – s(v,w), where s(v,w) is the length of the longest common subsequence of v and w.

Solving the LCS problem for v and w is equivalent to finding the edit distance between them!

Key Result

Page 86: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Return to the Edit Graph

Page 87: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Every alignment corresponds to a path from source to sink.

• Horizontal and vertical edges correspond to indels (deletions and insertions).

Return to the Edit Graph

Page 88: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Every alignment corresponds to a path from source to sink.

• Horizontal and vertical edges correspond to indels (deletions and insertions).

• Diagonal edges correspond to matches and mismatches.

Return to the Edit Graph

Page 89: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• Find LCS in ATCGTAC, ATGTTAT.

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

Page 90: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

Page 91: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0

Page 92: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,1) =

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0

Page 93: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,1) =

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0

Page 94: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,1) = Score (indel)

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0

-

A

Page 95: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,1) = 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0

Page 96: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0

Page 97: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

Page 98: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

Page 99: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,1) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

Page 100: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,1) = ?

• Three possibilities

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

-

A -

A

A

A

0 0 1

Page 101: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,1) = ?

• Three possibilities

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

-

A -

A

A

A

0 0 1

Page 102: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,1) =

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

-A -

AAA

0 0 1

Page 103: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,1) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

Page 104: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,i) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

Page 105: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,i) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

Page 106: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,i) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

-

A -

T

A

T

0 0 0

Page 107: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,i) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

Page 108: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

• Score (1,i) = ?

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

Page 109: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Score (0,j) = 0

• Score (i,0) = 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

Page 110: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2

Page 111: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

Page 112: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

Page 113: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

Page 114: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

Page 115: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

Page 116: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

Page 117: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

-

T

Page 118: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

-

T

G

G

Page 119: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

-

T

G

G

C

-

Page 120: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

-

T

G

G

C

-

T

T

Page 121: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

-

T

G

G

C

-

T

T

A

A

Page 122: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• ATCGTAC, ATGTTAT

• Match: +1

• Mismatches and indels: 0

• Optimal Alignment, LCS

Alignment as a Path in the Edit Graph: Example

ε A T C G T A C

ε

A

T

G

T

T

A

T

0 0 0 0 0 0 0 0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1 1 1 1 1 1 1

2 2 2 2 2 2

2

2

2

2

2

2 3 3 3 3

2

2

2

2

3

3

3

3

4

4

4

4

4

4

5

5

4

4

5

5

C

-

-

T

A

A

T

T

-

T

G

G

C

-

T

T

A

A

Page 123: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Dynamic Alignment: Pseudocode

Page 124: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Printing LCS: Backtracking

Page 125: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

O(nm) to fill in the n x m dynamic programming matrix: the pseudocode consists of a nested “for” loop inside of another “for” loop.

LCS: Runtime

Page 126: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Global Alignment

Page 127: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Simplest scoring schema: For some positive numbers μ and σ:– Match Premium: +1– Mismatch Penalty: –μ– Indel Penalty: –σ

Alignment score =

Choice of µ and σ depends on how we wish to penalize mismatches and indels.

From LCS to Alignment: Change the Scoring

Page 128: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

The Global Alignment Problem

Input : Strings v and w and a scoring schema

Output : An alignment with maximum score

Use DP to solve the Global Alignment Problem:

: mismatch penaltyσ : indel penalty

• Align ATCGTAC and ATGTTAT. : 1, σ : 0.5

Page 129: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Needleman and Wunsch Algorithm

A C T C G

0 -1 -2 -3 -4 -5

A -1

C -2

A -3

G -4

T -5

A -6

G -7

Gap Penalty = -1Match Score = +1Mismatch Score = 0

ACTCG vs. ACAGTAG

Page 130: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Needleman and Wunsch Algorithm

A C T C G

0 -1 -2 -3 -4 -5

A -1 1

C -2

A -3

G -4

T -5

A -6

G -7

Gap Penalty = -1Match Score = +1Mismatch Score = 0

ACTCG vs. ACAGTAG

Page 131: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Needleman and Wunsch Algorithm

A C T C G

0 -1 -2 -3 -4 -5

A -1 1 0

C -2

A -3

G -4

T -5

A -6

G -7

Gap Penalty = -1Match Score = +1Mismatch Score = 0

ACTCG vs. ACAGTAG

Page 132: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

A C T C G

0 -1 -2 -3 -4 -5

A -1 1 0 -1 -2 -3

C -2

A -3

G -4

T -5

A -6

G -7

Gap Penalty = -1Match Score = +1Mismatch Score = 0

ACTCG vs. ACAGTAG

Needleman and Wunsch Algorithm

Page 133: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

A C T C G

0 -1 -2 -3 -4 -5

A -1 1 0 -1 -2 -3

C -2 0 2 1 0 -1

A -3

G -4

T -5

A -6

G -7

Gap Penalty = -1Match Score = +1Mismatch Score = 0

ACTCG vs. ACAGTAG

Needleman and Wunsch Algorithm

Page 134: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

A C T C G

0 -1 -2 -3 -4 -5

A -1 1 0 -1 -2 -3

C -2 0 2 1 0 -1

A -3 -1 1 2 1 0

G -4 -2 0 1 2 2

T -5 -3 -1 1 1 2

A -6 -4 -2 0 1 1

G -7 -5 -3 -1 0 2

Gap Penalty = -1Match Score = +1Mismatch Score = 0

ACTCG vs. ACAGTAG

Needleman and Wunsch Algorithm

Page 135: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

A C T C G

0 -1 -2 -3 -4 -5

A -1 1 0 -1 -2 -3

C -2 0 2 1 0 -1

A -3 -1 1 2 1 0

G -4 -2 0 1 2 2

T -5 -3 -1 1 1 2

A -6 -4 -2 0 1 1

G -7 -5 -3 -1 0 2

Gap Penalty = -1Match Score = +1Mismatch Score = 0

A C A G T A GA C – – T C G

ACTCG vs. ACAGTAG

Needleman and Wunsch Algorithm

Page 136: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Scoring Matrices

Page 137: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Scoring Matrices: Example

A G T C —

A 1 -0.8 -0.2 -2.3 -0.6

G -0.8 1 -1.1 -0.7 -1.5

T -0.2 -1.1 1 -0.5 -0.9

C -2.3 -0.7 -0.5 1 -1

— -0.6 -1.5 -0.9 -1 n/a

Page 138: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Scoring Matrices: Example

A G T C —

A 1 -0.8 -0.2 -2.3 -0.6

G -0.8 1 -1.1 -0.7 -1.5

T -0.2 -1.1 1 -0.5 -0.9

C -2.3 -0.7 -0.5 1 -1

— -0.6 -1.5 -0.9 -1 n/a

A-GTC-A

CGTTGGScore: –0.6 – 1 + 1 + 1 – 0.5 – 1.5 – 0.8 = –2.4

• Align AGTCA and CGTTGG with the scoring matrix:

Sample Alignment:

Page 139: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

How Do We Make a Scoring Matrix?

Scoring matrices are created based on biological evidence.

Alignments can be thought of as two sequences that differ due to mutations.

Some of these mutations have little effect on the protein’s function, therefore some penalties, δ(vi , wj), will be less harsh than others.

Page 140: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Amino Acid Scoring Matrix

A R N K

A 5 -2 -1 -1

R -2 7 -1 3

N -1 -1 7 0

K -1 3 0 6

R and K have a positive mismatch score.Both positively charged amino acids this mismatch will not greatly change the function of the protein.Positive mismatch scores for amino acid changes that tend to preserve the physicochemical properties of the original residue (identical polarity, similar behaviour)

Page 141: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Scoring Matrices: Amino Acid vs. DNA

Two commonly used amino acid substitution matrices:1. PAM2. BLOSUM

DNA substitution matrices:• DNA is less conserved than protein sequences• It is therefore less effective to compare sequences at

the nucleotide level

Page 142: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

PAM

PAM: Stands for Point Accepted Mutation

1 PAM = PAM1 = 1% average change of all amino acid positions.

• Note: This doesn’t mean that after 100 PAMs of evolution, every residue will have changed:• Some residues may have mutated several times.• Some residues may have returned to their original

state.• Some residues may not changed at all.

Page 143: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

PAMX

PAMx = PAM1x (x iterations of PAM1)

– Example: PAM250 = PAM1250

PAM250 is a widely used scoring matrix:

Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys ... A R N D C Q E G H I L K ...Ala A 13 6 9 9 5 8 9 12 6 8 6 7 ...Arg R 3 17 4 3 2 5 3 2 6 3 2 9Asn N 4 4 6 7 2 5 6 4 6 3 2 5Asp D 5 4 8 11 1 7 10 5 6 3 2 5Cys C 2 1 1 1 52 1 1 2 2 2 1 1Gln Q 3 5 5 6 1 10 7 3 7 2 3 5...Trp W 0 2 0 0 0 0 0 0 1 0 1 0Tyr Y 1 1 2 1 3 1 1 1 3 2 2 1Val V 7 4 4 4 4 4 4 4 5 4 15 10

Page 144: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

BLOSUM

BLOSUM: Stands for Blocks Substitution Matrix

Scores are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins.

• BLOSUM62 was createdusing sequences sharingno more than 62%identity.

C S T P … F Y W

C 9 -1 -1 3 … -2 -2 -2

S -1 4 1 -1 … -2 -2 -3

T -1 1 4 1 … -2 -2 -3

P 3 -1 1 7 … -4 -3 -4

… … … … … … … … …

F -2 -2 -2 -4 … 6 3 1

Y -2 -2 -2 -3 … 3 7 2

W -2 -3 -3 -4 … 1 2 11

http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm

Page 145: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment

Page 146: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local vs. Global Alignment: Example

• Global Alignment:

• Local Alignment—better alignment to find conserved segment:

--T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC | || | || | | | ||| || | | | | |||| | AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C

tccCAGTTATGTCAGgggacacgagcatgcagagac ||||||||||||

aattgccgccgtcgttttcagCAGTTATGTCAGatc

Page 147: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment: Why?

Two genes in different species may be similar over short conserved regions and dissimilar over remaining regions.

Page 148: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment: Why?

Two genes in different species may be similar over short conserved regions and dissimilar over remaining regions.

Example: Homeobox genes (regulate embryonic development) have a short homeodomains that are highly conserved among species.

• Aligning entire sequence (Global alignment) may miss homeodomains.

• Search for an alignment which has a positive score locally• (Alignment on substrings of the given sequences that has

a positive score)

Page 149: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment: Illustration

Global alignment

Compute a “mini” Global Alignment to get Local Alignment

Page 150: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

The Local Alignment Problem

Goal: Find the best local alignment between two strings.

Input : Strings v and w as well as a scoring matrix δ

Output : Alignment of substrings of v and w whose alignment score is maximum among all possible alignments of all possible substrings of v and w.

Page 151: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment: How to Solve?

Global Alignment Problem finds the longest path between vertices (0,0) and (n,m) in the edit graph.

Local Alignment Problem finds the longest path among paths between arbitrary vertices (i,j) and (i’, j’) in the edit graph.

Page 152: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment: How to Solve?

Global Alignment Problem finds the longest path between vertices (0,0) and (n,m) in the edit graph.

Local Alignment Problem finds the longest path among paths between arbitrary vertices (i,j) and (i’, j’) in the edit graph.

In the edit graph with negatively-scored edges, Local Alignment may score higher than Global Alignment.

Page 153: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Global alignment

Local alignment

The Problem with This Setup

• In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source.

Page 154: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

The Problem with This Setup

• In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source.

• For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time.

Page 155: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source.

• For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time.

The Problem with This Setup

Page 156: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

• In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source.

• For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time.

• This gives an overall runtime of O(n4), which is a bit too slow…can we do better?

The Problem with This Setup

Page 157: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Local Alignment Solution: Free Rides

• Add “free” edges to the edit graph.

• The dashed edges represent the“free rides” from (0, 0) to everyother node.

• Each “free ride” is assignedan edge weight of 0.

• If we start at (0, 0) instead of(i, j) and maximize the longestpath to (i’, j’), we will obtainthe local alignment.

Page 158: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Smith-Waterman Local Alignment Algorithm

• The largest value of si,j over the whole edit graph is the score of the best local alignment.

• The recurrence:

• O(n2)

Page 159: Dynamic Programming Algorithms and Sequence Alignment › ... › 5.Dynamic.Programming-Sequence.Alignmen… · Sequence Alignment A T - G T A T z-A T C G - A - C ATGTTAT, ATCGTACATGTTAT,

Smith and Waterman Algorithm

A A C C T A T A G C T

0 0 0 0 0 0 0 0 0 0 0 0

G 0 0 0 0 0 0 0 0 0 1 0 0

C 0 0 0 1 1 0 0 0 0 0 2 1

G 0 0 0 2 0 0 0 0 0 1 0 1

A 0 1 1 1 0 0 1 0 1 0 0 0

T 0 0 0 0 0 1 0 2 1 0 0 1

A 0 0 1 3 0 0 2 0 3 2 1 0

T 0 0 0 3 0 0 1 3 2 2 1 2

A 0 0 0 3 0 0 2 2 4 3 2 1

AACCTATAGCT, GCGATATA

Gap Penalty = -1Match Score = +1Mismatch Score = 0