Top Banner
Gotoh Scan Algorithm for matching RNA sequences By Hila Abukasis & Shai Kerer
32

Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Oct 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Gotoh Scan Algorithmfor matching RNA sequences

By Hila Abukasis

& Shai Kerer

Page 2: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Contents

What is RNA ?

Matching RNA

Needleman-Wunsch Algorithm

Global Alignment VS Local Alignment

Smith-Waterman Algorithm

Gotoh Scan Algorithm Ideal Gap Penalty

Algorithm

Summary

Page 3: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

What is RNA ?

A “copy” of a sub-sequence of the DNA.

Carry information from DNA to the

Ribosome – where it is translated to

proteins.

להראות סרטון

Page 4: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Matching RNAMotivation

It is believed that RNA is the most ancient genetic

material.

Finding similarity between 2 RNA sequences can

teach us about evolutionary relations.

Accurate RNA sequence

alignment is an essential

tool needed to

understand basic biological

and evolutionary processes.

Page 5: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Matching RNA

Given 2 RNA sequences (strings), we want

to find the optimal alignment between

them.C A G C U G

% % $ $ $ %

G A C A A U A G U C

A A A A A C A U A C A A C A G C

% % $ $ % % % ~ % $ % % % $ % %

C A A A G C A C A _ A U A A C U G C C C

Page 6: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Needleman-Wunsch Algorithm (1970)

The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming) used to find optimal sequence alignment.

Any partial sub-path that tends at a point along the true optimal path must itself be the optimal path leading up to that point.

Therefore the optimal path can be determined by incremental extension of the optimal sub-paths.

In a Needleman-Wunsch alignment, the optimal path must stretch from beginning to end in both sequences (hence the term „global alignment‟).

Page 7: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Needleman-Wunsch Algorithm

Given 2 RNA strings – A,B. We build a matrix as followed –

Each alignment gets a score, which indicates of the

compatibility of the 2 strings

Where S(Ai, Bj) is the score for matching single a char

from the 2 strings

Gap is a Penalty for entering a gap in the string.

M(i,j) = MAX{Mi-1,j-1 + S(Ai, Bj)

Mi-1, j + gap

Mi,j-1 + gap}

Page 8: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

8412-2-5-4-8-12-16A

46240-4-2-6-10-14C

02462-20-4-8-12T

2-20240-3-2-6-10A

-20-3-112-10-4-8C

-6-4-202412-2-6G

-10-8-6-4-20240-4C

-14-12-10-8-6-4-202-2A

-18-16-14-12-10-8-6-20

ACTTAGTCA

Gap = -2 ; Mismatch = -3 ; Match = 2

-4

A C T G A T T C A

_ A C G C A T C A

A C T G A T T C A

_ _ A C G C A T C A

_ A C T G A T T C A

A C G C A T C A

Max { M[0, 0] + S(A, A) ; (0+2)

M[0, 1] + Gap ; (-2 + -2)

M[1, 0] + Gap } (-2 + -2)

A C T G A T T C A

A C G C A T C A

Max { M[0, 1] + S(A, C) ; (-2 + -3)

M[0, 2] + Gap ; (-4 + -2)

M[1, 1] + Gap } (2 + -2)

A C T G A T T C A

A _ C G C A T C A

Max { M[1, 0] + S(C, A) ; (-2 + -3)

M[1, 1] + Gap ; (2 + -2)

M[2, 0] + Gap } (-4 + -2)

A _ C T G A T T C A

A C G C A T C A

A….

A….

A_...

_A….

_A….

A_….

Page 9: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

• Trace Back - The optimal path is traced beginning from the

lower right-hand corner

• Each step we go to the highest neighbor

• Horizontal and vertical movement is a gap, and diagonal

movement is a match

Page 10: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Needleman-Wunsch - Result

A C T G _ A T T C A

| | | | | | |

A C _ G C A T _ C A

Score = (AA) + (CC) + (T-) + (GG) + (-C) + (AA) + (TT) + (T-) + (CC) + (AA)

= 2 + 2 – 2 + 2 – 2 + 2 + 2 - 2 + 2 + 2

= 8

Page 11: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Global Alignment

VS

Local Alignment

Page 12: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Smith-Waterman Algorithm (1981)

Modification of Needleman-Wunsch.

Used to find Local Alignment

M(i,j) = MAX{Mi-1,j-1 + S(Ai, Bj)

Mi-1, j + gap

Mi,j-1 + gap

0 }

Page 13: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Smith-Waterman - ExampleGap = -1 ; Mismatch = -3 ; Match = 2

ACATT

A

G

C

A

C

G

0

0

0

0

0

0 0 00

0

0

0

0

M(i,j) = MAX{Mi-1,j-1 + S(Ai, Bj) (0+(-3))

Mi-1, j + gap (0+(-1))

Mi,j-1 + gap (0+(-1))

0 } (0)

2 1

0

0

0

0

0

0

0

0

0

1

1

2

0

0

3

2

00

5

00

2

0

4

3

4

3

T….

A….

_TTAC..

AGC….

TTAC…

_AGC…

M(i,j) = MAX{Mi-1,j-1 + S(Ai, Bj) (0+2)

Mi-1, j + gap (0+(-1))

Mi,j-1 + gap (0+(-1))

0 } (0)

TT_..

A...

TTA ..

_A..

TTA..

A…

T

_

_

A

Page 14: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Smith-Waterman - TraceBackGap = -1 ; Mismatch = -3 ; Match = 2

A_CA

AGCA

Score=

(AA)+(_G)+(CC)+(AA) =

2+(-1)+2+2=5

ACATT

A

G

C

A

C

G

0

0

0

0

0

0 0 00

0

0

0

0

2 1

0

0

0

0

0

0

0

0

0

1

1

2

0

0

3

2

00

5

00

2

0

4

3

4

3

Page 15: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Run Time Complexity

Needleman-Wunsch :

Smith-Waterman :

O(nm)

O(nm)

Space Complexity

Needleman-Wunsch :

Smith-Waterman :

O(nm)

O(nm)

Page 16: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Space Improvement ?

Instead of nxm array, we can use 2 linear

arrays of size n.

ACATT

A

G

C

A

C

G

M(i,j) = MAX{

Mi-1,j-1 + S(Ai, Bj)

Mi-1, j + gap

Mi,j-1 + gap

0 }

0

0

0

0

0

0

0 2

000

1

0

0

1

20

0

0

0

0 0 0 30 3

0

0

00 001

2

4

5

2300

0

4

Problem ?

Page 17: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Ideal Gap Penalty

With the two previous algorithms the strategy is

to add a fixed gap penalty when a gap occurs

regardless what the alignment was.

It is likely that if a particular character is gapped,

the probability of

the next one being

gapped is higher,

and hence should

be penalized less.

Page 18: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Ideal Gap PenaltyRun Time Complexity

Mi,j = MAX { Mi-1, j-1 + S(Ai, Bj)

maxk (Mi-k, j + GAP(k))

maxk (Mi, j-k + GAP(k))

0 }

The algorithm using ideal gap penalty

costs O(n2 * m) (assuming n > m)

When evaluating the gap penalty we need to loop

through all previous nucleotides to find the one that

gives the maximum score.

Page 19: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Affine Gap Penalty

The algorithm using ideal gap penalty

costs O(n^2 * m), which is too expensive.

In order to keep our O(n*m), we‟ll use a

“compromise” : Gap Opening – Expensive

Gap Extension - Cheap

Page 20: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Ideal / Affine Gap –

how to?

M(i,j) = MAX{ Mi-1,j-1 + S(Ai, Bj)

Mi-1, j + gap

Mi,j-1 + gap }

Page 21: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Affine Gap PenaltyRun Time ComplexityIdeal Gap

Mi,j = MAX { Mi-1, j-1 + S(Ai, Bj)

maxk (Mi-k, j + GAP(k))

maxk (Mi, j-k + GAP(k))

0 }

Di,j = max1<=k<=i (Mi-k, j + GAP(k))

= Max{Mi-1, j + G(1) ; max1<=k<=i-1 (Mi-1-k, j + GAP(k+1))}

= Max{Mi-1, j + G(1) ; max1<=k<=i-1 (Mi-1-k, j + [GAP(k) + u])}

Solution - Affine Gap – GAP(k) = v + u*k

= Max{Mi-1, j + G(1) ; Di-1, j + u}

= Max{Mi-1, j + G(1) ; max2<=k<=i (Mi-k, j + GAP(k))}

Problem

ל"מש

Page 22: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Semi Global Alignment

To Explain what is Semi-Global Alignment,

we will use meaningful names for the 2

RNA sequences, instead A, B

Query – The 1st String – A

DataBase – The 2nd String – B

Semi-Global – Match the whole query to

sub-sequence of the dataBase.

Page 23: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Semi Global –

How to?

Hint : Table initialization

Page 24: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Gotoh Scan

Semi-Global + Affine Gap

Page 25: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

-7

-6

-5

-4

-3

-2

D

Gap Open = -2 Gap Extension = -1

Match = 2 Mismatch = -3

D[i][j] = Max {Si-1,j +go, Di-1,j +ge}

F[i][j] = Max {S i,j-1 +go, F i,j-1 +ge}

S[i][j] = Max {S i-1,j-1 + score(Ai,Bj),

D i,j , F i,j }

GGA U AC

A

G

U

C

A

A

------ - 00 0000F GGA U AC

A

G

U

C

A

A --

--

-

--

-7

-6

-5

-4

-3

-2

S GGA U AC

A

G

U

C

A

A

0 0 0 0 0 00

G….

_....GA….

_ _....

A….

....

Page 26: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

-7

-6

-5

-4

-3

-2

D

Gap Open = -2 Gap Extension = -1

Match = 2 Mismatch = -3

D[i][j] = Max {Si-1,j +go, Di-1,j +ge}

F[i][j] = Max {S i,j-1 +go, F i,j-1 +ge}

S[i][j] = Max {S i-1,j-1 + score(Ai,Bj),

D i,j , F i,j }

GGA U AC

A

G

U

C

A

A

------ - 00 0000F GGA U AC

A

G

U

C

A

A --

--

-

--

-7

-6

-5

-4

-3

-2

S GGA U AC

A

G

U

C

A

A

0 0 0 0 0 00

-2 -2 -4 -4

-2 2

G….

A_....

G_….

_A....

G….

A_....

G….

AG_....

G_….

A_G..

G….

AG....

Page 27: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

-7

-6

-5

-4

-3

-2

D

Gap Open = -2 Gap Extension = -1

Match = 2 Mismatch = -3

D[i][j] = Max {Si-1,j +go, Di-1,j +ge}

F[i][j] = Max {S i,j-1 +go, F i,j-1 +ge}

S[i][j] = Max {S i-1,j-1 + score(Ai,Bj),

D i,j , F i,j }

GGA U AC

A

G

U

C

A

A

------ - 00 0000F GGA U AC

A

G

U

C

A

A --

--

-

--

-7

-6

-5

-4

-3

-2

S GGA U AC

A

G

U

C

A

A

0 0 0 0 0 00

-2 -2 -2 -2

-3

-2-2

00 -3-2 -3-1-2 -1 -4

-2-3

-3

-1

-2 -5-4

-4 -2-3-3 -3-1

-2-3

-4 -4-5 1 -1

-4 -4 0 0

-5

-2-1

-2-2 -4-2 -3-3-6 -3 -4

-4-3

-7

-5

-5 -1-8

-4 -2-5-6 -3-5

-61

-6 -6-9 -4 -1

-2 2 2 0

0

-2-1

00 1-1 -3-1-1 -1 -4

-2

-3

-3

-1

-2 -1-4

1 -2-3-3 3-1

-21

-4 -4-4 1 5

Page 28: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

-7

-6

-5

-4

-3

-2

D

Gap Open = -2 Gap Extension = -1

Match = 2 Mismatch = -3

D[i][j] = Max {Si-1,j +go, Di-1,j +ge}

F[i][j] = Max {S i,j-1 +go, F i,j-1 +ge}

S[i][j] = Max {S i-1,j-1 + score(Ai,Bj),

D i,j , F i,j }

GGA U AC

A

G

U

C

A

A

------ - 00 0000F GGA U AC

A

G

U

C

A

A --

--

-

--

-7

-6

-5

-4

-3

-2

S GGA U AC

A

G

U

C

A

A

0 0 0 0 0 00

-2 -2 -2 -2

-3

-2-2

00 -3-2 -3-1-2 -1 -4

-2-3

-3

-1

-2 -5-4

-4 -2-3-3 -3-1

-2-3

-4 -4-5 1 -1

-4 -4 0 0

-5

-2-1

-2-2 -4-2 -3-3-6 -3 -4

-4-3

-7

-5

-5 -1-8

-4 -2-5-6 -3-5

-61

-6 -6-9 -4 -1

-2 2 2 0

0

-2-1

00 1-1 -3-1-1 -1 -4

-2

-3

-3

-1

-2 -1-4

1 -2-3-3 3-1

-21

-4 -4-4 1 5

?2 options are the

same?

G A A U C A

| X | | |

A G _ G U C A

Score = (GG) + (A_) + (AG) + (UU) + (CC) + (AA)

= 2 – 2 – 3 + 2 + 2 + 2 = 3 = 5

Page 29: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

-7

-6

-5

-4

-3

-2

D GGA U AC

A

G

U

C

A

A

------ - 00 0000F GGA U AC

A

G

U

C

A

A --

--

-

--

-7

-6

-5

-4

-3

-2

S GGA U AC

A

G

U

C

A

A

0 0 0 0 0 00

-2 -2 -4 -4

-2 2

P GGA U AC

A

G

U

C

A

A

Page 30: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

-7

-6

-5

-4

-3

-2

D GGA U AC

A

G

U

C

A

A

------ - 00 0000F GGA U AC

A

G

U

C

A

A --

--

-

--

-7

-6

-5

-4

-3

-2

S GGA U AC

A

G

U

C

A

A

0 0 0 0 0 00

-2 -2 -2 -2

-3

-2-2

00 -3-2 -3-1-2 -1 -4

-2-3

-3

-1

-2 -5-4

-4 -2-3-3 -3-1

-2-3

-4 -4-5 1 -1

-4 -4 0 0

-5

-2-1

-2-2 -4-2 -3-3-6 -3 -4

-4-3

-7

-5

-5 -1-8

-4 -2-5-6 -3-5

-61

-6 -6-9 -4 -1

-2 2 2 0

0

-2-1

00 1-1 -3-1-1 -1 -4

-2

-3

-3

-1

-2 -1-4

1 -2-3-3 3-1

-21

-4 -4-4 1 5

GGA U AC

A

G

U

C

A

A

P

G A A U C A

| | | |

A G G _ _ U C A

Score = (GG) + (A_) + (A_) + (UU) + (CC) + (AA)

= 2 – 2 – 1 + 2 + 2 + 2 = 5

Page 31: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Here Is A Thought

Can we make it even more accurate

regarding the Gap Penalty with the limits of

O(nm) ?

Page 32: Gotoh Scan Algorithm - BGUmichaluz/seminar/Gotoh.pdf · Needleman-Wunsch Algorithm (1970) The Needleman-Wunsch algorithm is an application of a best-path strategy (dynamic programming)

Summary

Thanks to Algorithms such as Gotoh Scan,

we have more proof about our origin